Monday, March 30, 2015

Archiving Updated and Deleted Rows on a Table in SQL Server

I previously wrote about how to archive rows in Oracle using a trigger.  Today, I want to discuss how to do this in SQL Server.  Again, the syntax varies between the two databases, but you can still accomplish the same functionality.

So to recap on what we want to accomplish:

  • Any time a row in a our table is UPDATED or DELETED, we want to store a copy of the old row in the archive table
  • We also want to store with the archived record date and times of when that version of the row was the active version of the row in the table
  • We would like to store some information about what user ran the DML operation and the client machine it came from
  • We want all of this to happen transparently and automatically.  That is, we don't want a user or application to have to do anything except run their SQL statement, and the archival of the old version of the row happens automatically.
The last requirement is what drives this solution to use a trigger.  While it often causes a lot of trouble to start putting business logic in a trigger, a requirement like this is the perfect job for a trigger.  Having a trigger archive the data makes sure that it always is done, regardless if the DML comes from an application, stored procedure or a user running an ad-hoc SQL statement.  And a trigger can perform this archiving transparently in the background, the the application or user running the DML statement does not even have to think about it happening.

A Sample Table

So lets take a look at a sample table that we will archive data for:

CREATE TABLE dbo.Contacts
(
    ContactId          INT           NOT NULL,
    FirstName          VARCHAR(20)   NOT NULL,
    LastName           VARCHAR(20)   NOT NULL,
    Email              VARCHAR(40)   NULL,
    Telephone          VARCHAR(20)   NULL,
    CreatedDate        DATETIME      NULL,
    LastModifiedDate   DATETIME      NULL,
    CONSTRAINT pk_contacts PRIMARY KEY (ContactId)
);


And here will be our archive table

CREATE TABLE dbo.ContactsArchive
(
    ArchiveId          INT IDENTITY  NOT NULL,
    ContactId          INT           NOT NULL,
    FirstName          VARCHAR(20)   NOT NULL,
    LastName           VARCHAR(20)   NOT NULL,
    Email              VARCHAR(40)   NULL,
    Telephone          VARCHAR(20)   NULL,
    OperationType      VARCHAR(1)    NOT NULL,
    VersionStartTime   DATETIME      NOT NULL,
    VersionEndTime     DATETIME      NOT NULL,
    ChangeUser         VARCHAR(30)   NULL,
    ChangeIp           VARCHAR(20)   NULL,
    CONSTRAINT pk_contacts_archive PRIMARY KEY (ArchiveId)
);

We of course see our data columns from the original table in our archive table, but lets quickly explain the purpose of the other columns in this table.

  • ArchiveId - The primary key of our archive table.  Just a sequential integer
  • OperationType - A column containing a code of the DML operation that resulted in the row being archived.  For updates, the code will be 'U' and for deletes 'D' will be used
  • VersionStartTime - The date/time of when this version of the row became active in the table.
  • VersionEndTime - The date/time when this version of the row was removed/replaced from the table.  Together with VersionStartTime, this is when this version of the row was in the main table.
  • ChangeUser - The SQL Server identity of the user that ran the DML operation
  • ChangeIp - The IP Address the SQL client was connected from when this change was performed.
The last two columns will help us understand who performed the DML statement that changed the data in the row.  This is not perfect.  Say we have a web application, we get the name of the SQL Server user used to log into the database, not the username of who was logged into the web application.  But this is still useful information to have because we can differentiate changes done by different applications (assuming the app use different SQL logins) and changes done by an app versus a user with ad-hoc SQL.

I also want to point out that in this case, I am using the two triggers from this previous blog post to automatically update the CreatedDate and LastModifiedDate columns of the contacts table.  Again, we want these dates to be updated automatically, transparently and to always be correct, so using these triggers makes sense.  For completeness, here are the triggers I am using for this post:

CREATE TRIGGER dbo.TRG_Contacts_On_Insert_Row_Dates
    ON dbo.Contacts
    FOR INSERT
    AS
    BEGIN
        SET NOCOUNT ON
        
        UPDATE dbo.Contacts
            SET CreatedDate = GETUTCDATE(),
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Contacts INNER JOIN inserted 
            ON dbo.Contacts.ContactId = inserted.ContactId;
    END
GO


CREATE TRIGGER TRG_Contacts_On_Update_Row_Dates
    ON dbo.Contacts
    FOR UPDATE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        UPDATE dbo.Contacts
            SET CreatedDate = deleted.CreatedDate,
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Contacts 
        INNER JOIN deleted
            ON dbo.Contacts.ContactId = deleted.ContactId;
        
    END
GO

Triggers to Archive Changes

The first case to deal with is when data in a row is changed, so for this scenario, we'll use the following trigger.


CREATE TRIGGER TRG_Contacts_Archive_Updates
    ON dbo.Contacts
    FOR UPDATE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        INSERT INTO dbo.ContactsArchive
            (ContactId, FirstName, LastName, Email, Telephone, 
             OperationType, VersionStartTime, VersionEndTime, 
             ChangeUser, ChangeIp)
        SELECT
            deleted.ContactId, deleted.FirstName, deleted.LastName, deleted.Email, deleted.Telephone,
            'U', deleted.LastModifiedDate, GETUTCDATE(),
             CURRENT_USER, CONVERT(varchar(20), CONNECTIONPROPERTY('client_net_address'))
        FROM deleted
        
    END
GO

We see that the first thing we do is look at the nesting level of the trigger and only execute the code for this trigger for the original DML statement.  If we didn't do this, we would actually get two rows in the archive table, because this trigger would also fire when the other trigger on the table (the one that updates the LastModifiedDate) executes.  But we just want one record, one that captures the original state of the row before the triggering UPDATE statement ran.

Then, we are dealing with a straightforward insert statement into our ContactsArchive table.  We can use the DELETED pseudo-table to get access to the original values in each row and to determine what rows were updated by the statement.  The original LastModified date of the row becomes the VersionStartTime of the row.  That is when this particular version of the row became active in the database.  For the VersionEndTime, we use the current UTC date.

Finally, we use a couple of SQL Server functions to get the current user logged into SQL Server and the IP Address that user is logged in from.

Now lets take a look at the FOR DELETE trigger

CREATE TRIGGER TRG_Contacts_Archive_Deletes
    ON dbo.Contacts
    FOR DELETE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        INSERT INTO dbo.ContactsArchive
            (ContactId, FirstName, LastName, Email, Telephone, 
             OperationType, VersionStartTime, VersionEndTime, 
             ChangeUser, ChangeIp)
        SELECT
            deleted.ContactId, deleted.FirstName, deleted.LastName, deleted.Email, deleted.Telephone,
            'D', deleted.LastModifiedDate, GETUTCDATE(),
             CURRENT_USER, CONVERT(varchar(20), CONNECTIONPROPERTY('client_net_address'))
        FROM deleted
        
    END
GO

This trigger is exactly the same as the trigger above, except it inserts an operation type code of 'D' into the OperationType column.  There are some ways within a trigger in SQL Server to determine if the triggering DML statement is an INSERT, UPDATE or DELETE, but these aren't nearly as clear as the syntax that Oracle gives you to do this.  So I decided to make this a completely separate trigger.

Trying Things Out

Lets put some data into our sample table and try things out.  Here are some sample rows that we will insert:

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (1, 'Christine', 'Jankowski', 'ChristineJJankowski@dayrep.com', '985-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (2, 'Dwight', 'Robertson', 'DwightSRobertson@jourrapide.com', '734-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (3, 'Maria', 'Beck', 'MariaBBeck@teleworm.us', '859-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (4, 'Michael', 'Harrington', 'MichaelCHarrington@dayrep.com', '718-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (5, 'Andrea', 'Alexander', 'AndreaMAlexander@rhyta.com', '972-555-1212');

And now, we'll change one of those rows with this UPDATE statement:

UPDATE Contacts
    SET Telephone = '213-555-1234'
    WHERE ContactId = 2;

So we should be able to take a look at our archive table and find one record in there of the old version of the row.  And indeed we do.




Now let's try out deleting a row from our sample table with this DELETE statement.

DELETE FROM Contacts 
    WHERE ContactId = 5;

And again, we'll look in our archive table to find a copy of the deleted row.


Conclusion

Many times, it is important to keep a copy of the old version of data from a table whenever data in that table is updated or deleted.  This can be easily done with the triggers found above, and it makes sure that this archiving of data is done automatically each time a change occurs to the table.  Further, this process is completely transparent to the user or application executing the SQL statement.  They don't have to take any additional steps outside of their normal DML statement.

Feel free to use these triggers or variants of them in your own database as the need arises.






2 comments:

  1. Your blog is in a convincing manner, thanks for sharing such an information with lots of your effort and time
    sql server dba online training
    SQL dba online course

    ReplyDelete
  2. I feel there is a need to provide more and more aspects of SQL server and look for whether one can make it useful.

    SQL Server Load Soap API

    ReplyDelete