Designing Large SharePoint 2010 Lists
There are several things to consider before you can implementing a large list in SharePoint 2010.
The biggest one is the business case requirements. Those requirements include a service legal agreement, time to backup and restore content, the size of your content, and the amount of the content. Both the size and the demand of the application require you to make some decisions about hardware, storage content, and SharePoint information architecture.
Any large SharePoint application can offer millions of items and service hundreds of users at the same time. They can require hardware that stands alone for a given project. There can also be a document reposition with many users across those large amounts of documents.
The end result of the planning involves a list of column types, indexes, usage of pages and links, folder structure, links for navigating, planned structure for permission, and estimated number of items and total data size. The details should include information about the types of queries that will be performed. The data from the list will be able to be accessed, created, and also updated.
It is a good idea to allocate for about 20%-20% to be added to the content database for information that includes files. The search index will be about 75% of the content data size, give or take. When you plan the design and implementation of the large list you need to build a prototype from the application. This part of the planning helps you to design something you can use to implement as proof of the concept. You need to validate that it is going to be able to work within the framework!
It can be very helpful if you populate the environment with a large amount of content for validation purposes. The result of this process is for proof of concept for the system, content type, folder structure, views, indexing, and columns for the metadata navigation for retrieving. There is the use of various Web Parts as well as features that can be used to organize the content.
There are large list solutions, and estimating is a great way for you to create a plan for your decision. There are some numbers you need to take a good look at to make the most of your plan, such as the total content of the database size, average & maximum file size, number of versions, and amount of items that will be in the list.
The total size of the content is very important for you to plan when it comes to the amount of disk space and hardware. You also need to find out what is going to be supported in the way of backup, restoring, and the service legal agreement. The total content size can be broken out for the total size of all the content. The overall content database size is vital for figuring out the amount of down time that will be required in order to properly backup and restore. The size of all the content can be estimated by calculating the average document size. Multiply it by the average number of versions per document. Multiply it by the expected number of documents. Add 20%-22% for content database data in addition to the files depending on how conservative you want to be.
The number will be high because there are many versions necessary to increase the size over time for the average file size. This can be checked with the documents in general so that the average file size of all the versions is in check. Make sure you allow for enough of a buffer so that the list can grow larger than you may have estimated.
The maximum file size needed for the correct Web application setting has to be specific, very specific! The files can then be uploaded and they will default to 50 MB with the maximum to be 2 GB. The average file size used for understanding the rate that content is able to grow. It can also be used to estimate the total content size. The average file size is estimated by evaluating files in systems that have a role for the intended system.
You need to consider the versions so that you can increase the size of the content. There are several methods you can use for that. One of them is to use information management retention policies so you can delete the previous versions after a given period of time has passed. You can also limit the number of versions that will be saved. There are several other things to consider including the fact that you may not have a need for any versions at all.
The SharePoint content organizer copies the latest checked in version. Then if the documents in the repository are actively being edited by users, you have the option for co-authoring each of those sessions that have been created. With the use of the repository the evaluation of the solutions for estimating the average number of versions that should be created for a document.
The total number of items in a given list is the amount of content. To find out how much you need, evaluate the existing sources of content that you have. What is going to be moved to the new system? How many users will be on the system? What is the purpose of the system? As you answer such questions you will be able to relate numbers including items per container and items per metadata pivot. They are very important for your overall plan regarding views and metadata navigation.
Those lists that have large storage requirements can trigger a fundamental decision for how to store the documents. SharePoint Server 2010 has a default that stores all of the documents as BLOB’s in the SQL Server database. It works in conjunction with SQL Server 2008 to provide a remote BLOB Storage API. This allows the documents to be stored outside the SQL Server database. This results in a smaller database. You can decide to use the remote BLOB storage in order to help with saving money if it’s on the menu. Microsoft has conducted some testing that shows the remote BLOB storage can cause a 5% or 10% decrease in throughput. For large files this means that there isn’t any perceptual difference when it comes to latency. The performance can be different with the specific remote BLOB storage provider that is used.
With remote BLOB storage you can reduce the amount of content for the database size. However, this doesn’t mean that you will have the ability to store more items in the content database. The performance will be affected by the number of items in lists of the SQL Sever database. The BLOB’s will be removed but the list size doesn’t change. A couple of different scenarios can affect the cost savings and that is a great benefit to consider. They include data that is archived instead of collaborated and storage of large BLOB’s including videos and images that don’t need to be updated often.
When using the remote BLOB storage you can add servers and technology to your farm with as much ease. However, it is going to require the additional use of a remote BLOB storage provider. With can support storage of BLOB’s on less expensive storage outside of the SQL server database. The SQL server enterprise is required for the remote BLOB storage API.
There is a cross over point where remote BLOB storage becomes cost effective. This can be in the range of terabytes data. You won’t need to use remote BLOB storage only because you have the terabyte sized content databases. You need to carefully consider your backup and restore process for the service level agreements. The remote BLOB storage makes disaster recovery more challenging though. This is because it requires two technologies to be in sync. The main concern is really the amount of time it takes for the restore to be completed.