Saturday, January 29, 2011

What happens during a live SQL Server backup?

Some of my coworkers were surprised when I told them that I can back up an SQL Server database while it's still running and wondered how that's possible. I know that SQL Server is capable of backing up a database while it is still online but I'm not sure how to explain why it's possible. My question is what effect does this have on the database?

If data is modified (by an insert, update, or delete) while the backup is running, will the backup contain those changes or will it be added to the database afterwards?

I'm assuming that the log file plays an important role here but I'm not quite sure how.

edit: Just as a note, my case involves backing up the databases using SQL Server Agent and the effects of database modifications during this process.

  • You can't just copy it over since there can be alterations to the database mid-copy as you alluded to in the question.

    It has to be done with agents that are aware of the database functionality and then take a "snapshot" via OS functions or can use a utility to dump the database in a safe state (like mysqldump, if using mysql).

    Otherwise you get a backup that can be corrupted and you won't know it until you restore it. I think Joel and Jeff recently talked about it a little on a recent StackOverflow podcast.

    And you're right in that the log file is important. If the journal/log file is out of sync with the actual data, restoring the files will result in corruption.

    It boils down to a backup taken using a safe state of the database, either through a database-aware agent or snapshot application or application that is aware of how to properly hook the database into dropping data without interfering with updates during the data dump then backing up the resulting file.

    Dynamo : This is a more generalized answer but I forgot to include that I was working through SQL Server Agent at first. It still provided some good details as to what might be going on during this process despite the method so it helped anyways! Thanks.
  • There are so many ways to do this (generally speaking, no idea how MSSQL normally does it) like simply dumping the database to file while appending any changes to a log file which is committed after the dump is completed - to utilizing file system specific snapshot features like VSS on Windows.

    David Spillett : Also databases that are MVCC based from the ground up (like PostGres) can make use of that set of behaviours to maintain a snapshot that the backup copies from while still allowing updates to the datafiles by transactions that start during the backup run. As you say there are several methods, the one used depends upon the core design of the data storage engine and transaction processing features.
  • Full backup contains both the data and log. For data, it simply copies each page of the database into the backup, as is at the moment it reads the page. It then appends into the backup media all the 'relevant' log. This includes, at the very least, all the log between the LSN at the start of the backup operation and the LSN at the end of the backup operation. In reality there is more log usually, as it has to include all active transactions at the start of backup and log needed by replication. See Debunking a couple of myths around full database backups.

    When the database is restored, all the data pages are copied out into the database files, then all the log pages are copied out into the log file(s). The database is inconsistent at this moment, since it contains data page images that may be out of sync with one another. But now a normal recovery is run. Since the log contains all the log during the backup, at the end of the recovery the database is consistent.

    Dynamo : Great post and the link helped provide a good example. Thanks.

0 comments:

Post a Comment