Friday, March 20, 2015

Populating Created Date and Last Modified Date Columns in SQL Server Using Triggers

I wrote a blog post a while back on updating what I call "administrative columns" like created date and last modified date in Oracle.  However, I don't just work in environments with Oracle as the backend RDBMS, but also work with SQL Server quite a bit.  Recently, I needed to write the same functionality for SQL Server, and as it turns out, you can achieve the same result in SQL Server, but the approach is quite a bit different.  So I think it is good to blog about how to do this, so if anyone else has this need, they can see my approach and either use it straight away or adapt it to their needs.

Lets briefly summarize the goals before we get started so it is clear what we are trying to accomplish.

  • The CreatedDate column should reflect the date and time when the row was first inserted into the table.
  • The CreatedDate column should never be able to change.  That is, we want to prevent anyone from updating the value in this column, either accidentally or for some malicious reason.
  • The LastModifiedDate column reflects the last time the row was changed.  For a new row, the LastModifiedDate is the CreatedDate.  Once a row is updated, the LastModifiedDate value is the time that update occured.
  • We want to populate both of these columns automatically from the current database time.  We don't want to rely on user supplied values for these columns.  This is for two reasons:
    • It is easier if the user doesn't have to worry about supplying values for these administrative type columns so they can just focus on the business data
    • We want to make sure the values are correct.  We don't want to use a user supplied date/time that is in the past or future or a value that might be in a different time zone.  Using the database value, we know the value is correct and consistent.


So first of all, lets define our target table.  For this example, I am going to use a table that captures restaurant information.  This table is simplified for our purposes, but will work just fine to demonstrate what we need to.  So here is the table.

CREATE TABLE Restaurants
(
    RestaurantId       INT           NOT NULL,
    RestaurantName     VARCHAR(40)   NOT NULL,
    Cuisine            VARCHAR(25)   NOT NULL,
    City               VARCHAR(30)   NOT NULL,
    State              VARCHAR(2)    NOT NULL,
    CreatedDate        DATETIME      NULL,
    LastModifiedDate   DATETIME      NULL,
    CONSTRAINT PK_Restaurants_RestaurantsId
        PRIMARY KEY (RestaurantId),         
);

Note that I am defining both the CreatedDate and LastModifiedDate columns as nullable columns.  This is important because in SQL Server, a trigger always runs after the DML statement that it fires for.  Unlike in Oracle where you can have a BEFORE trigger to intercept the statement and modify or replace values, in SQL Server, the trigger always fires after the DML statement.  So even though these columns accept NULL values, we'll take care of that in the triggers that follow.

You could define these columns as NOT NULL, but if you do so, then you need to provide a default value for the column (say GetUtcDate() ).  Again, the reason for this is that in SQL Server, the trigger fires after the DML statement has completed.  If the column is NOT NULL and you don't have any value (either one in the statement or a default value), then the statement will fail due to the NOT NULL constraints.  So if you choose to go the NOT NULL route, then provide a default value.

OK, lets take a look at the triggers that can accomplish this.  We'll need two triggers, and the first one to look at is the trigger that will fire after an INSERT statement.

CREATE TRIGGER TRG_Restaurants_After_Insert
    ON dbo.Restaurants
    FOR INSERT
    AS
    BEGIN
        SET NOCOUNT ON
        
        UPDATE dbo.Restaurants
            SET CreatedDate = GETUTCDATE(),
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Restaurants INNER JOIN inserted 
            ON dbo.Restaurants.RestaurantID = inserted.RestaurantID;
    END
GO

In SQL Server, the pseudo table INSERTED lists all of the 'after' values of the rows that were affected by the DML statement, so in this case, like its name implies, the inserted rows.  So what we need to do in order to update these columns is write an UPDATE statement like above where we join our target table to the INSERTED pseudo table in order to get the correct set of rows, and then set the values of CreatedDate and LastModifiedDate to the correct time, which in this case we are using UTC time.  So basically what this does is get the set of affected rows, and go back to the table and update the times to the times that we want.  In this way, even if someone supplies a value for CreatedDate or LastModifiedDate, we don't care.  We'll make sure the correct date/time value gets put into the column via this trigger.

Now for the trigger that will fire after an UPDATE statement:

CREATE TRIGGER TRG_Restaurants_After_Update
    ON dbo.Restaurants
    FOR UPDATE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        UPDATE dbo.Restaurants
            SET CreatedDate = deleted.CreatedDate,
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Restaurants 
        INNER JOIN deleted
            ON dbo.Restaurants.RestaurantId = deleted.RestaurantID;
        
    END
GO

The first thing you might notice about this trigger is the IF block wrapped around the SELECT trigger_nestlevel() function.  This is necessary to prevent this trigger firing as a nested trigger.  For example, during an INSERT statement, the TRG_Restaurants_After_Insert trigger defined above will fire, which in turn does an UPDATE statement against the table.  Without this code block, the TRG_Restaurants_After_Update statement would also complete on the update statement fired by the first trigger, and that is not what we want.  We only want this trigger to complete for user/application executed UPDATE statements against the table.

Next, we again need to update the CreatedDate and LastModifiedDate columns in our table.  This time, we'll use the DELETED pseudo table to join to, which includes the old values of all of the rows affected by the UPDATE statement.  Joining to the DELETED pseudo table gives us the proper set of rows, and we can pull the existing (old) CreatedDate value out of these rows and use it in our trigger's UPDATE statement as shown above.  What this does is keeps someone from updating the value of CreatedDate.  As we'll show below, even if someone tries to update the value of CreatedDate, it doesn't matter, because our trigger will insure we retain the existing value in our table.

Testing It All Out

Lets test everything out and make sure it is working.  At this point, I have created the table above and both of my triggers.  So lets insert a few rows into the table.

INSERT INTO Restaurants (RestaurantId, RestaurantName, Cuisine, City, State)
    VALUES (1, 'Crepes de Lux', 'French', 'Iowa City', 'IA');
 
INSERT INTO Restaurants (RestaurantId, RestaurantName, Cuisine, City, State)
    VALUES (2, 'Ginger Rootz', 'Chinese', 'Appleton', 'WI');
  
INSERT INTO Restaurants (RestaurantId, RestaurantName, Cuisine, City, State, 
        CreatedDate, LastModifiedDate)
    VALUES (3, 'House of Pho', 'Asian', 'Schaumburg', 'IL', 
        '01/01/2020', '01/01/2010');    

We see that in the third row, I am supplying a value for CreatedDate and LastModifiedDate.  However, as I am executing this statement, it is most certainly not January 1st, 2020 nor January 1st, 2010.  This is exactly the type of situation we are designing for, to make sure we always have correct values in these columns and not allow them to be overridden by user supplied values like this that may potentially be incorrect.

So lets see what is on our table at this point.



This is exactly what we want.  Even though we did not supply values in the first two INSERT statements, our trigger has automatically populated these columns for use with the correct value.  And for the third statement when incorrect values were supplied, these values were not used but again, we have the correct values in these two columns.

Lets run an UPDATE statement now.  And to start with, we'll just perform the most basic UPDATE statement, changing the Cuisine for House of Pho from Asian to Vietnamese.

UPDATE Restaurants 
    SET Cuisine = 'Vietnamese'
    WHERE RestaurantId = 3;

And now we will check our table data once again.


We see here that for "House of Pho", the LastModifiedDate has been automatically updated by our trigger to the appropriate time for when the row was changed.  The CreatedDate record was left unchanged, as were the records for the other rows in the table.  So this is exactly what we were looking for.

Finally, lets test the scenario where someone tries to run an UPDATE statement that will modify the CreatedDate column, which as we have discussed, we do not want to allow.  So to test this out, we'll run the following statement:

UPDATE Restaurants 
    SET Cuisine = 'Pan-Asian',
        CreatedDate = '01/01/2010'
    WHERE RestaurantId = 2;

So we can see here, we are trying to backdate the CreatedDate of the row to a different value.  However, with our trigger, we don't allow this to happen and we just retain the existing value.  Here is the data:


We see here that row 2 for "Ginger Rootz" has a new value for LastModifiedDate (and this value is correct).  But even though a value for CreatedDate was supplied in the UPDATE statement, the supplied value was ignored and the original CreatedDate value retained, which is the correct behavior.

Why All of This Matters

What we are essentially trying to do is establish an audit trail for our data.  We want to know when a row came into existence and when it was last changed.  For this audit trail to be effective, we need to make sure that these columns always contain the correct values.  We don't want to burden a developer or someone writing a SQL statement to worry about how to populate these columns, because we can do that automatically.  At the same time though, we want to prevent these columns from being changed, whether maliciously or accidentally so we are confident the values in these columns are always correct.  Using these two triggers, we accomplish both of those goals.  

When researching this problem, I did see a number of other blog posts and postings on StackOverflow that said just to use a DEFAULT value to populate the CreatedDate column.  This solves half of the problem, in that a developer no longer has to worry about providing a value.  But it does not address the problem of preventing the column to be modified.  And I feel this is an equally important aspect to solve.

Lately, I've been involved in researching a number of security related items.  I am not able to give away many details beyond saying I was involved in researching some issues, but what I can say is that two of the questions that always come up are "when did this row come into existence?" and "when was it last modified?".  Having audit columns that you know are correct and you know cannot be tampered with is critical when researching these sorts of issues.  

I have to admit, I myself am sometimes lazy about including columns like CreatedDate and LastModifiedDate in my tables.  I need to be better about that, and about making sure these columns are always properly updated.  With these triggers, now I don't have an excuse any more.

4 comments:

  1. This is awesome! thank you so much.

    ReplyDelete
  2. I feel there is a need to make up for more SQL servers and look up to the best services anyhow.One always needs to care about and stress on more of its uses.

    SQL Server Load Soap API

    ReplyDelete