Whitepaper

Handling a full SQL Server transaction log

The transaction log for a SQL Server database is critical to the operation of the database and the ability to minimize data loss in the event of a disaster. As such, proper management of log backups and sizing of the transaction log is crucial to managing a database in SQL Server.

In severe cases, transaction log mismanagement or sudden, rapid, log growth can cause a transaction log to grow and grow and eventually eat up all available space on its drive. At this point, it can grow no more and you'll encounter Error 9002, the transaction log full error, and the database will become read-only.

This walkthrough describes the correct way to respond to this error, reviews the danger of some of the courses of action suggested on various community sites and forums, and offers advice on how to best to avoid the occurrence of this problem.

Diagnosing a Full Log

Despite the urgency of this problem, it's important to react calmly, and avoid the sort of "spontaneous" solutions that are covered in the forthcoming section, Mismanagement or What Not To Do. Obviously the pressing concern is to allow SQL Server to continue to write to the log, by making more space available. The first port of call is to establish if the cause is a lack of log backups. The value of the log_reuse_wait_desc column in sys. databases Catalog view will show the current reason why log space cannot be reused, as shown in Listing 1:

DECLARE @DatabaseName VARCHAR(50) ;
SET @DatabaseName = 'VeryImportant'
SELECT	name ,
	recovery_model_desc ,
	log_reuse_wait_desc
FROM	sys.databases
WHERE	name = @DatabaseName

Listing 1: Examining the value of the log_reuse_ wait_desc column

If the value for the log_reuse_wait_desc column is Log Backup then this is the likely cause of the issue. It is possible more than one issue is preventing log reuse. The sys.databases view will only show one of the reasons. It is therefore possible to resolve one problem, query sys.databases again and see a different log_reuse_wait reason. We don't have room in this whitepaper to cover all possible reasons, but they are listing in Books Online and described in some detail by Gail Shaw, in her article Why is my transaction log full?. One other value commonly seen for log_reuse_wait is ACTIVE_ TRANSACATION, indicating that a long-running or uncommitted transaction is preventing truncation of log and so space reuse.

Assuming the value is Log Backup then a query to the backupset table in the MSDB database, as shown in Listing 2, will confirm whether or not log backups are being taken on the database, and when the last one was taken.

USE msdb ;
SELECT 	backup_set_id ,
	backup_start_date ,
	backup_finish_date ,
	backup_size ,
	recovery_model ,
	[type]
FROM 	dbo.backupset
WHERE 	database_name = 'DatabaseName'

Listing 2: Determine when the last log backup was taken

In the type column, a D represents a database backup, L a log backup and I a differential backup. If log backups aren't being taken, or are being taken very infrequently, then your best course of action is to take a log backup (assuming the database is operating in FULL or BULK LOGGED recovery model). Hopefully, this will free up substantial space within the log and you can then implement an appropriate log backup scheme, and log file growth management strategy.

If, for some reason, it is not possible to perform a log backup due to a lack of disk space, or the time it would take to perform a log backup exceeds the acceptable time to resolve the problem, then it might, depending on the disaster recovery policy for the database in question, be acceptable to force a truncation of the log by temporarily switching the database to the SIMPLE recovery model in order that inactive Virtual Log Files (VLFs) in the transaction log can be truncated on CHECKPOINT. You can then switch the recovery model back to FULL and perform a new full database backup (or a differential backup, assuming a full backup was taken at some previous time) to restart the log chain for point-in-time recovery. Of course, you'll still need to investigate the problem fully, in order to make sure that the space isn't simply eaten up again. One common cause of excessive logging and explosive log growth, for example, is performing bulk operations, such as index rebuilds, using the FULL recovery model. In such cases, it may be beneficial, assuming your Service Level Agreement allows for it, to switch to BULK LOGGED recovery at the start of index rebuild, to minimize the amount of space required to rebuild the index. If that is not possible, you might consider adapting your index maintenance so that indexes are reorganized rather than rebuilt, wherever possible. This is not meant to imply that index reorganization is a panacea for logging issues; it isn't. However, index rebuilds are accomplished in a single transaction, which can be very long-running (if the index is big) and prevent large swathes of log space from being reused, while the operation proceeds. By contrast, index reorganization proceed via a series of multiple, shorter transactions and so, with frequent log backups, it does become easier to curtail explosive log growth during these operations.

Bear in mind that if the problem preventing space reuse is anything other than Log Backup, then the technique of temporarily switching to SIMPLE recovery model won't work, since the records will simply remain part of the active log, preventing truncation. If a lack of log backups isn't the problem, or taking a log backup doesn't solve the problem, then investigating the cause will require a little more time. If you can easily and quickly make extra space on the log drive, by shifting off other files, or adding capacity to the current log drive, or adding an addition log file on a different disk array, then this will buy you the bit of breathing space you need to get the database out of read-only mode, and perform a log backup. Next, you need to find out what else might be preventing space reuse in the log (Listing 1) and take appropriate action.

Ultimately, having resolved any space reuse issue, you may still be left with a log file that is consuming the vast majority of the space on the drive. As a one-off measure, i.e. assuming steps will be taken to ensure proper management of log growth in the future (see the Proper Log Management section, shortly), it is acceptable to use DBCC SHRINKFILE to reclaim the space used by a bloated transaction log file.

You can either specify a target_size to which to shrink the log file, or you can specify 0 (zero) as the target size and shrink the log to its smallest possible size, and then immediately resize it to a sensible size using ALTER DATABASE. The latter is the recommended way, as it minimizes fragmentation of the log file. This fragmentation issue is the main reason why you should never schedule regular DBCC SHRINKFILE tasks as a means of controlling the size of the log; this is discussed in more detail in the next section.

Mismanagement or What Not To Do

Unfortunately, a quick search of the Internet for "Transaction Log Full" will return a number of forums threads, blog posts, and even articles published on seemingly reputable SQL Server sites, which recommend remedial action that is, frankly, dangerous. We'll cover a few of the more popular suggestions here.

Detach database, delete log file

The idea here is that you clear all users off the database, detach the database (or shut it down), delete the log file (or rename it) and then re-attach the database, causing a new log file to be created at whatever size is dictated by the model database. This is arguably the most appalling of all the terrible ways to handle a full transaction log. It can result in the database failing to start, leaving it in the RECOVERY_PENDING state.

Depending on whether or not the database had been cleanly shut down at the time of the log deletion, the database may not be able to perform the UNDO and REDO operations that are a normal part of the database recovery process, because the transaction log is missing, and so can't return the database to a consistent state. When the log file is missing, and the database requires the transaction log to perform crash recovery, the database will fail to start up properly and the only recourse will be to restore the database from the most recent backup available, which will most likely result in data loss.

Creating, detaching, re-attaching, and fixing a suspect database

Under specific circumstances, it may be possible to hack the existing database into a configuration that allows the transaction log to be rebuilt, although it may compromise the integrity of the data contained in the database. This type of operation is, at best, a last-ditch effort that may be used when there is absolutely no other way of recovering the database data, and it is not a recommended practice of the author, or anyone else involved in the authoring of this whitepaper. For an explanation of how to attempt hacking a database back into SQL Server where the transaction log file has been deleted, see Paul Randal's blog post, Creating, detaching, re-attaching, and fixing a suspect database.

Forcing log file truncation

In SQL Server 2000, BACKUP LOG WITH TRUNCATE_ONLY was a supported way of forcing SQL Server to truncate the transaction log, while the database was operating in the FULL or BULK LOGGED model, without actually making a backup copy of the contents of the log; the records in the truncated VLFs are simply discarded. So, unlike with a normal log backup, you're destroying your LSN chain and will only be able to restore to a point in time in any previous log backup files. Also, even though the database is set to FULL (or BULK LOGGED) recovery, it will actually, from that point on, operate in an auto-truncate mode, continuing to truncate inactive VLFs on checkpoint. In order to get the database operating in FULL recovery again, and restart the LSN chain, you'd need to perform a full (or differential) backup.

This command was often used without people realizing the implications it had for disaster recovery, and it was deprecated in SQL Server 2005 and removed from SQL Server 2008. Unfortunately, an even more insidious variation of this technique, which continues to be supported, has crept up to take its place, and that is BACKUP LOG TO DISK='NUL', where NUL is a "virtual file" that discards any data that is written to it. The really nasty twist to this technique is that, unlike with BACKUP LOG WITH TRUNCATE_ONLY, SQL Server is unaware that the log records have simply been discarded. As far as SQL Server is concerned, a log backup has been performed, the log records are safely stored in a backup file so the LSN chain is intact, and any inactive VLFs in the live log can safely be truncated. Any subsequent, conventional log backups will succeed but will be entirely useless from the point of view of disaster recovery since a log backup file is "missing" and so the database can only be restored to some point in time covered by the last standard log backup that was taken before BACKUP LOG TO DISK='NUL' was issued.

Do not use either of these techniques. The right way to "force" log truncation is to temporarily switch the database into the SIMPLE recovery model, as discussed earlier.

Scheduled shrinking of the transaction log

As discussed there are rare circumstances, where transaction log growth has occurred due to a lack of management and where the log growth is now being actively managed, in which using DBCC SHRINKFILE to reclaim the space used by the transaction log file is an acceptable operation.

However, the transaction log should never be shrunk using DBCC SHRINKFILE, or a database maintenance plan step to shrink the database, as part of normal, scheduled maintenance operations. The reason for this is that every time you shrink the log, it will need to immediately grow again to store log records for subsequent transactions and every log. If auto-growth is being relied upon solely for transaction log growth (see the next section for a fuller discussion), excessive VLFs can accumulate in the log file and this log fragmentation will impact the performance of any process that needs to read the log file and, if fragmentation gets really bad, possibly even the performance of data modifications. Also, since the transaction log cannot take advantage of instant file initialization, all log growths incur the cost to zero-byte the storage space being allocated.

The best practice for the transaction log file continues to be to size it appropriately up front so it does not have to grow under normal operations, and then to monitor its usage periodically to determine if the need to grow it manually occurs, allowing you to determine the appropriate growth size and determine the number and size of VLFs that will be added to the log file.

Proper Log Management

In the absence of any unexpected operations or problems that have resulted in unusual log growth (uncommitted transactions, replication problems and so on), if the transaction log associated with a FULL recovery model database fills up, and is forced to grow, there are really only two causes:

the size of the log file was too small to support the volume of data changes that were occurring in the database.
the frequency of log backups was insufficient to allow rapid reuse of space within the log file.

The best thing to do, if you can't increase the frequency of the log backups by decreasing the amount of time between log backups, is to manually grow the log file to a size that prevents it from having to grow using auto-growth when under load, and then leave the log file that size. Having a large transaction log file that has been properly grown to minimize the number of VLFs is not a bad thing, even if the log file has free space a majority of the time.

When configuring the initial size of the transaction log for a database, it is important to take into account the size of the largest table in the database, and whether or not index maintenance operations will be performed. As a rule of thumb, the transaction log should be sized to 1.5 times the size of the largest index or table in the database, to allow for logging requirements to rebuild the index under FULL recovery.

In addition to the initial sizing requirements, it is important to monitor and adjust the size of the transaction log periodically to fit the size of the database as it grows. There are a couple of problems with the auto-growth settings that a database will inherit from model, which is currently to grow in steps of 10% of the current transaction log size:

initially, when the log file is small, the incremental growth will be small, resulting in the creation of a large number of small VLFs in the log, causing the fragmentation issues discussed earlier.
when the log file is very large, the growth increments will be correspondingly large; since the transaction log has to be zeroed out during initialization, large growth events can take time, and if the log can't be grown fast enough this can result in 9002 (transaction log full) errors and even in the auto-growth timing out and being rolled back.

The auto-growth settings should be configured explicitly to a fixed size that allows the log file to grow quickly, if necessary, while also minimizing the number of VLFs being added to the log file for each growth event.

To minimize the chances of a timeout occurring during normal transaction log growth, you should measure how long it takes to grow the transaction log by a variety of set sizes while the database is operating under normal workload, and based on the current IO subsystem configuration. In cases where the necessary zero-initialization performance characteristics are not known for a database, I recommend, as a general rule, a fixed auto-growth size of 512 MB.

Ultimately, though, remember that auto-growth should be configured as a security net only, allowing the system to automatically grow the log file when you are unable to respond manually. Auto-growth is not a replacement for appropriate monitoring and management of the transaction log file size.

Overview

Protect

Automate

Monitor

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

SQL Backup Pro