What impact do SQL Server data and log file sizes have on backup size?

Many SQL Server DBAs seem to be very keen to shrink their database files at every opportunity. There seems to be a worry that any spare space inside a data file or transaction log is a problem. Is it?

Once of the common misconceptions is that a large data file translates to a large backup file. So now to try and lay that myth to bed. We’ll run through a couple of scenarios and compare the backup sizes of different size databases.

These examples were all done on SQL Server 2012 Developer Edition, but will apply to any other version as well.

Also, for all these examples we have no transactions occuring during the Backup. If the data in the data files changes during the backup, then SQL Server will include the relevant portions of the Transaction log in the backup so that you have a consistent backup.

First we’ll create 2 databases of different sizes, and populate them with the same data:

use master;
go

CREATE DATABASE dbsmall
ON 
( NAME = small_dat,
    FILENAME = 'C:\DATA\smalldat.mdf',
    SIZE = 10,
    MAXSIZE = 50,
    FILEGROWTH = 5 )
LOG ON
( NAME = small_log,
    FILENAME = 'C:\DATA\smalllog.ldf',
    SIZE = 5MB,
    MAXSIZE = 25MB,
    FILEGROWTH = 5MB ) ;
GO


CREATE DATABASE dbbig
ON 
( NAME = big_dat,
    FILENAME = 'C:\DATA\bigdat.mdf',
    SIZE = 100,
    MAXSIZE = 150,
    FILEGROWTH = 5 )
LOG ON
( NAME = big_log,
    FILENAME = 'C:\DATA\biglog.ldf',
    SIZE = 5MB,
    MAXSIZE = 25MB,
    FILEGROWTH = 5MB ) ;
GO

use dbsmall;
go
create table tbl_junk
(junk varchar(max)
);
go
insert into tbl_junk (junk) select REPLICATE('Junk',1000);
go 50

use dbbig;
go
create table tbl_junk
(junk varchar(max)
);
go
insert into tbl_junk (junk) select REPLICATE('Junk',1000);
go 50

So dbsmall has a primary data file of 10MB and dbbig has a primary data file of 100MB, both contain a single table of 200KB.

So now let’s back them up to see if there’s any difference. And just to make sure we’ll try it with compression as well:

use master;
go

backup database dbsmall to disk=N'c:\backup\dbsmall_normal.bak';
go
backup database dbbig to disk=N'c:\backup\dbbig_normal.bak';
go
backup database dbsmall to disk=N'c:\backup\dbsmall_compressed.bak' with compression;
go
backup database dbbig to disk=N'c:\backup\dbbig_compressed.bak' with compression;
go

exec xp_cmdshell 'dir c:\backup\';

And the results from the xp_cmdshell are:

01/16/2013  04:49 PM           371,200 dbbig_compressed.bak
01/16/2013  04:49 PM         3,232,256 dbbig_normal.bak
01/16/2013  04:49 PM           371,200 dbsmall_compressed.bak
01/16/2013  04:49 PM         3,232,256 dbsmall_normal.bak
               4 File(s)      7,206,912 bytes
               2 Dir(s)  16,019,734,528 bytes free

So that shows that all the backups are the same size. Which is to be expected, as it’s well documented that SQL Server only backs up the pages in use within the data files, and as both databases contain exactly the same amount of data the backups will be the same size.

So that’s the data files checked. But what about transaction logs I hear you ask. Well, let’s have a look. In these examples we’re only looking at full database backups, transaction log backups are a different thing and I’ll be looking at those in another post.

So let’s increase the size of the transaction log for dbbig and see what that does for the backup size:

alter database dbbig modify file (Name=big_log, size=200MB);
go

backup database dbbig to disk=N'c:\backup\dbbig_log_normal.bak';
go

backup database dbbig to disk=N'c:\backup\dbbig_log_compressed.bak' with compression;
go

exec xp_cmdshell 'dir c:\backup';

And this tie xp_cmdshell tells us:

01/16/2013  05:10 PM           375,296 dbbig_log_compressed.bak
01/16/2013  05:10 PM         3,240,448 dbbig_log_normal.bak
01/16/2013  04:49 PM           371,200 dbbig_compressed.bak
01/16/2013  04:49 PM         3,232,256 dbbig_normal.bak

Hang on, that’s 8KB more. Are we missing something here? No, not really. The 8k is an artefact of the alter database statement, we get that whatever we increase the transaction log to:

drop database dbbig;
go

restore database dbbig from disk=N'c:\backup\dbbig_normal.bak' with recovery;
go

alter database dbbig modify file (Name=big_log, size=100MB);
go

backup database dbbig to disk=N'c:\backup\dbbig_1log_normal.bak';
go

backup database dbbig to disk=N'c:\backup\dbbig_1log_compressed.bak' with compression;
go

drop database dbbig;
go

restore database dbbig from disk=N'c:\backup\dbbig_normal.bak' with recovery;
go

alter database dbbig modify file (Name=big_log, size=500MB);
go

backup database dbbig to disk=N'c:\backup\dbbig_5log_normal.bak';
go

backup database dbbig to disk=N'c:\backup\dbbig_5log_compressed.bak' with compression;
go

drop database dbbig;
go

restore database dbbig from disk=N'c:\backup\dbbig_normal.bak' with recovery;
go

alter database dbbig modify file (Name=big_log, size=1000MB);
go

backup database dbbig to disk=N'c:\backup\dbbig_10log_normal.bak';
go

backup database dbbig to disk=N'c:\backup\dbbig_10log_compressed.bak' with compression;
go

exec xp_cmdshell 'dir c:\backup';
go

Which gives:

01/16/2013  05:30 PM         3,240,448 dbbig_10log_normal.bak
01/16/2013  05:29 PM         3,240,448 dbbig_1log_normal.bak
01/16/2013  05:29 PM         3,240,448 dbbig_5log_normal.bak

So we take that small 8kb hit for any size transaction log. Now the eagle eyed will have spotted that those are empty transaction logs, so what happens when we have a full transaction log?

We’ll reset dbbig, increase the transaction and then run something to fill up the transaction log. But then we’ll empty the datafile back to where it was:

use master;
go
drop database dbbig;
go

restore database dbbig from disk=N'c:\backup\dbbig_normal.bak' with recovery;
go

alter database dbbig modify file ( name = N'big_log', maxsize = 150MB , FILEGROWTH = 25MB );
go

create table dbbig.dbo.tbl_junk2
(junk varchar(max)
);
go

insert into dbbig.dbo.tbl_junk2 (junk) select REPLICATE('Junk',200) from sys.all_columns;
go 4
drop table dbbig.dbo.tbl_junk2;
go

backup database dbbig to disk=N'c:\backup\dbbig_full_log.bak';
go

backup database dbbig to disk=N'c:\backup\dbbig_compressed_full_log.bak' with compression;
go

exec xp_cmdshell 'dir c:\backup';

Which gives us:

01/16/2013  06:14 PM           375,808 dbbig_compressed_full_log.bak
01/16/2013  06:14 PM        29,585,920 dbbig_full_log.bak

Hang on a sec, that’s a much larger backup there. What’s going on?

Well this is an example of the Ghost Writer process in action. The short version is that SQL Server doesn’t instantly deallocate the pages in a data file when you drop/truncate a table. It just marks the pages as deletable and the Ghost Write process actually clears them the next time it runs. (lot’s of good detail from Paul Randal (who wrote the code) in this post: Inside the Storage Engine: Ghost cleanup in depth).

So if we rerun the above SQL with this appended:

waitfor delay '00:02:00';
go

backup database dbbig to disk=N'c:\backup\dbbig_full_logaq.bak';
go

exec xp_cmdshell 'dir c:\backup';

Then we get:

01/16/2013  06:14 PM           375,808 dbbig_compressed_full_log.bak
01/16/2013  06:14 PM        29,585,920 dbbig_full_log.bak
01/16/2013  06:16 PM         3,240,448 dbbig_full_logaq.bak

Which is the same size as we’ve been seeing in all the other examples. BTW, I give a very generous 2 minutes for Ghost Cleanup to kick in to make sure, it’s usually much quicker than that, though actual time will depend on how busy your system is.

Stuart Moore

Stuart Moore

Nottingham based IT professional with over 15 years in the industry. Specialising in SQL Server, Infrastructure design, Disaster Recovery and Service Continuity management. Now happily probing the options the cloud provides

When not in front of a computer, is most likely to be found on the saddle of his bike

More Posts - Website

Follow Me:
TwitterLinkedInGoogle Plus

No comments yet.

Leave a Reply