The SQL Server and .NET Hub: Executing Heavy Set-Based Operations Against VLDBs in SQL Server

Recently I had to design and execute some heavy set-based T-SQL operations against a large number of very large databases (VLDBs) within a SQL Server 2005 instance.

I won't enter the debate of Row-Based vs. Set-Based processing right now, as I plan to write an article on this in the near future, but I will rather discuss some concerns that need to be taken into account when performing such operations.

First of all, in terms of hardware resources you will need enough RAM and of course, more than one processor (remember to check the Processor affinity setting in your SQL Server instance).

Now, the most important aspect when executing such operations (set-based) against large databases is the log space.

As the execution of the operation continues, a significant amount of transaction log space is being reserved thus requiring more storage. The reason this is happening is that the operation is not committed until it completes its execution against the entire data set. So, if your target database's recovery model is set to "FULL" it means that any changes performed by the set-based operation will be logged up to the latest detail thus requiring more storage.

So, how can you ensure that the set-based operation will be completed without any problems such as full disk spaces, etc. and thus avoid the possibility of having the operation abnormally terminated?

Well, there are two approaches (if not more); the easy one and the complex one.

The complex approach is to break-up the set-based operation thus targeting smaller data sets within the database. Not so good right? I don't like it either! In my opinion, this should be the very last option as “segmenting” the target data set is not a trivial task, mainly in terms of data consistency.

And here’s the simple approach:

1. Find a date and time where there are not operations targeting the database (this is actually a downtime :)

2. Set the database's recovery model to SIMPLE as in the following example:

ALTER DATABASE [Database_Name] SET RECOVERY SIMPLE
GO

3. Execute the set-based operation.

4. Set the database's recovery model to FULL as in the following example:

ALTER DATABASE [Database_Name] SET RECOVERY FULL
GO

When your database uses the Full Recovery Model, SQL Server preserves the transaction log until you back it up. By doing this, it allows you to recover the database to a specific point of time based on the entries in the transaction log. That’s why when this recovery model is selected SQL Server logs everything in the database’s log file. Of course this requires more disk space and slightly affects performance but it enables you to fully recover your database in the case of a disaster.

When you use the Simple Recovery Model, SQL Server keeps only a minimal amount of information in the log file. Also the file is automatically truncated by SQL Server whenever the database reaches a transaction checkpoint. Even though this is not the best choice for disaster recovery purposes, it allows the processing on the database to be faster.

So, as I had to execute some really heavy set-based T-SQL operations I changed the recovery model to Simple in order for the processing to be faster and to ensure (up to the possible limit) that I wouldn’t experience any disk space issues due to the size increase of the transaction log files.

Hope this helps!