Friday, April 14, 2017

SQL Server Temporal Tables

One of the newest and most useful features of SQL Server is Temporal Tables, which are available in SQL Server 2016 and SQL Azure.  In a nutshell, temporal tables give you a simple way to capture all of the changes that are made to rows in a table.  Of course you could do this in prior versions of SQL Server by defining your own history table and trigger, but now this functionality is built directly into SQL Server.

Why is this useful?  Have you ever had to go back and audit when a piece of data changed in a table?  Maybe you have needed to see the history of all of the changes to a certain record in the table?  Or needed to recreate the table as it was at a certain point of time.  If any of these apply, then temporal tables can be a big help.

Defining a Temporal Table
The syntax for defining a temporal table is straightforward and shown below.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 CREATE TABLE Contacts
 (
     ContactID        INT IDENTITY(1,1)      NOT NULL,
     FirstName        VARCHAR(30)            NOT NULL,
     LastName         VARCHAR(30)            NOT NULL,
     CompanyName      VARCHAR(30)            NULL,
     PhoneNumber      VARCHAR(20)            NULL,
     Email            VARCHAR(50)            NULL,
     ValidFrom        DATETIME2(3) GENERATED ALWAYS AS ROW START,
     ValidTo          DATETIME2(3) GENERATED ALWAYS AS ROW END,
     PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo), 
     CONSTRAINT PK_Contacts PRIMARY KEY (ContactId)
 )
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ContactsHistory));

Lines 2 through 8 contain our normal column definitions for the table, in this case, a simple table to manage contact information.

We then need to define two columns that represent when the column is start and end times that the row is considered active, and this is done on lines 9 and 10 with the ValidFrom and ValidTo columns.  You can use any names for these columns you want, its just a good idea to make sure the names convey what these columns represent, the time the row is active.  These columns will have a DATETIME data type, and in this case I am using a DATETIME2(3) data type so we will store these values down to millisecond precision.  Finally, you need to include the text GENERATED ALWAYS AS ROW START and GENERATED ALWAYS AS ROW END for your respective start and stop columns.  This signals to SQL Server to automatically populate these columns with the UTC value of the system time when a row is inserted, updated or deleted.

On line 11, we have some syntax that indicates that our ValidFrom and ValidTo columns represent a period in time.  Finally, at the end of our table definition on line 14, we tell SQL Server that we need to have System Versioning of this table on and give SQL Server the name of the history table that we want to use.

If you don't want users to see the ValidFrom and ValidTo columns, you can hide these columns like so:

1
2
3
4
5
ALTER TABLE Contacts   
   ALTER COLUMN ValidFrom ADD HIDDEN;   

ALTER TABLE Contacts   
   ALTER COLUMN ValidTo ADD HIDDEN; 

If you change your mind and want to have them show up again, then simply do this.

1
2
3
4
5
ALTER TABLE Contacts   
   ALTER COLUMN ValidFrom DROP HIDDEN;   

ALTER TABLE Contacts   
   ALTER COLUMN ValidTo DROP HIDDEN;  
That just drops the hidden attribute, not the column itself.

When you execute this statement, SQL Server will create both this table and the specified history table.  Where things get interesting is when you start inserting and updating data in your table.

Working With Temporal Tables
You insert, update and delete data in your temporal table just like you would any other table.  The only thing you need to know is that you do not specify values for the ValidFrom or ValidTo columns.  SQL Server will take care of populating these columns for you.

So what happens when we perform DML operations against out temporal table?


Operation Description
INSERT The row will be inserted into the primary table (in this case contacts).

The ValidFrom column is populated with the current system time in UTC

The ValidTo field is populated with '9999-12-31 23:59:59.999' (the system max time)

No entry is made in the history table.

UPDATE The existing row will be moved from the primary table to the history table.  When this is done, its ValidTo time is populated with the current system time in UTC.

The existing row is then replaced by the new row in the primary table.  The new row will have a ValidFrom of the current system time in UTC and the ValidTo field will be the system max time.

DELETE The existing row will be moved from the primary table to the history table with its ValidTo time populated as the current system time in UTC

No row will be present in the primary table, since the row has been deleted.


Querying Data
SQL Server also gives us some new constructs to query data out of our temporal tables and see what has changed in time or what the state of the data was at any point in time.  What is nice about these constructs is that we don't have to query the primary table and the history table individually and then union them together in order to get a complete view of our data over time.  SQL Server does this for us behind the scenes, leaving us with a much simpler query syntax.  So lets take a look.

First of all, we can query the table normally which will show us all of the rows that are currently active.

1
2
3
-- Plain old SQL
SELECT * 
    FROM Contacts;

No surprises here in terms of the data that comes back, just what rows are currently in the primary table


We can see these rows have been inserted at different times over a few days, and we see that all of the ValidTo dates are set to the system max time.  But more has been going on with this table, so lets check that out.

Show Complete History of a Table
If you want to look at every instance of every row that has been in a table, you can use the FOR SYSTEM TIME ALL clause immediately after your table name in your FROM clause like this:

1
2
3
4
5
-- Gets all records
SELECT *
    FROM Contacts
        FOR SYSTEM_TIME    
            ALL;

Now lets take a look at these results.

So now we see that there have indeed been some changes to the data in our table over time.  We see the 5 active rows that we saw before, but we also see some additional rows.  This view shows that we had two contacts that were deleted, contact number 4 for Garfield Arbuckle and contact number 6, Odie Arbuckle.  In addition, we see that the record for contact number 1 Charlie Brown has changed, and specifically it looks like his phone number changed in the row.  And by looking at the ValidTo date, we can tell when each of these events occurred.


What Did the Data Look Like at a Point In Time?
We can use the FOR SYSTEM TIME AS OF clause to view the data in a table as it appeared in a point in time as follows.

1
2
3
4
SELECT *
    FROM Contacts
        FOR SYSTEM_TIME    
            AS OF '2017-04-12 14:00:00.0000000';

This query will view the table as it was as of April 12, 2017 at 14:00 UTC.  Remember when specifying this time that you need to use UTC time as that is what is contained in the ValidFrom and ValidTo tables and not your system local time.

It should also be pointed out that we could very well attach a WHERE clause to this query to get just a subset of rows or even a single row.  In our case though, here are the results that we get.

This shows us that at this time, records for Sally Brown, Franklin Armstrong and Odie Arbuckle had not even been added yet and that the record for Garfield Arbuckle was still active.  This can be extremely useful when you are trying to recreate a report that was run at a certain time, because you can see exactly what the data was in the table at that instant in time.

Querying All Records Active for a Period of Time
Finally, we have some new syntax which will show us any record that was active during a given time period.

  • FROM <start date> TO <end date> - Shows all rows that were active during this time period, but excludes rows that became active on the boundary
  • FOR SYSTEM TIME BETWEEN <start date> AND <<end date> - Shows all rows active in this time period, including rows that became active on the upper boundary of the time period
  • FOR SYSTEM TIME CONTAINED IN (<start date>, <end data>) - Gets all the rows that opened and closed during the specified time period

Summary
I've found temporal tables to be one of the best additions to SQL Server.  They are super easy to use and being able to go back and look at all of the changes are to your table is extremely helpful.  They are so easy to use that I find almost no reason not to use them.  So next time you need to create a table, think about making the table a temporal table.  The first time you need to look up what has changed about a record, you will be glad that you did.


Further Reading
Official Microsoft Documentation
https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables


1 comment: