Monday, January 26, 2015

Archiving Rows in Oracle Using a Trigger

Often times, we are faced with the situation that any time a row is updated or deleted in a table, we want to store the previous version of the row in an archive table.  In some cases this is a requirement for audit purposes. Even in cases where it is not required for audit, it is still quite nice to be able to go back and look what a row was on a certain date and to know when the value last changed.

For example, several places I have worked have kept application settings in a table in the database.  And I've run into situations where a value was updated and all of the sudden things started to go wrong.  It is those times I wished that we had some sort of archiving mechanism so we could easily tell what changed, when, and what the previous values were.

This is something that can be easily accomplished with a trigger.  And the nice thing about the trigger solution is that this happens transparently in the background.  It doesn't matter if a row is updated by application code or an ad-hoc query in a SQL Editor, the prior version of the row will get archived.  So lets see how this works.

First, consider that we have the following data table.

CREATE TABLE students
(
    student_id      NUMBER(10)      GENERATED ALWAYS AS IDENTITY NOT NULL,
    first_name      VARCHAR2(30)    NOT NULL,
    last_name       VARCHAR2(30)    NOT NULL,
    email           VARCHAR2(50)    NULL,
    phone           VARCHAR2(20)    NULL,
    create_date     DATE            NOT NULL,
    modified_date   DATE            NOT NULL,
    CONSTRAINT pk_students PRIMARY KEY (student_id)
);

Then, we have this as an archive table.  Note, I am not enforcing a foreign key on the STUDENT_ID column back to the STUDENTS table from STUDENTS_ARCHIVE.  If we deleted a row from the STUDENTS table, then there would be no parent key to reference, so this would not work.

CREATE TABLE students_archive
(
    archive_id          NUMBER(10)      GENERATED ALWAYS AS IDENTITY NOT NULL,
    student_id          NUMBER(10)      NOT NULL,
    first_name          VARCHAR2(30)    NOT NULL,
    last_name           VARCHAR2(30)    NOT NULL,
    email               VARCHAR2(50)    NULL,
    phone               VARCHAR2(20)    NULL,
    create_date         DATE            NOT NULL,
    active_start_time   DATE            NOT NULL,
    active_end_time     DATE            NOT NULL,
    operation_type      VARCHAR2(1)     NOT NULL,
    CONSTRAINT pk_students_archive PRIMARY KEY (archive_id)
);

And now for the trigger.  Its quite simple really.

CREATE OR REPLACE TRIGGER tr_students_archive_row
    AFTER UPDATE OR DELETE    
    ON students    
    FOR EACH ROW        
DECLARE 
        op_type  VARCHAR2(1);
BEGIN
    CASE
        WHEN UPDATING THEN
            op_type := 'U';          
        WHEN DELETING THEN
            op_type := 'D';
    END CASE;
    
    INSERT INTO students_archive (student_id, first_name, last_name, 
        email, phone, create_date, 
        active_start_time, active_end_time, operation_type)
    VALUES (:old.student_id, :old.first_name, :old.last_name, 
        :old.email, :old.phone, :old.create_date, 
        :old.modified_date, sysdate, op_type);
    
END;

The trigger fires AFTER EACH ROW on an UPDATE or DELETE statement.  I fire AFTER EACH ROW because of course I want to archive every row affected and also because I want to make sure the UPDATE or DELETE was successful.

Then, I use a CASE...WHEN statement to put a code into a variable indicating if an UPDATE or DELETE operation is being performed.

Finally, I insert the OLD data (via the :OLD) pseudorecord into the archive table.  The time this version the record started being active is the modified_date on the old version of the record, and the time it stopped being active is right now.  In this case, I am using DATE values, but you just as well could use TIMESTAMP values for more accuracy.

And that is it.  Now, whenever a record is changed or deleted in our data table, we'll have a history of it in our archive table.  It is a very clean solution to the problem.

If the updates and deletes against your table are mostly of the single row variety, then this trigger will work great for you.  If you are doing a lot of bulk updates against the data in your table, then a compound trigger will likely offer better performance, which I'll cover in a future blog post.

SQL Server DMV to get Stored Procedure Usage Stats

If you are working in a SQL Server environment that uses stored procedures for data access, then ii is very useful to get some execution stats around those stored procedures.  What procedures are executing most often?  How much CPU and logical IO has a procedure used?  What is the average time it takes for a procedure to execute.

The following query uses the dm_exec_procedure_stats DMV in SQL Server to do just that.


SELECT sc.name, p.name, st.cached_time,
    st.execution_count, st.total_elapsed_time, st.total_worker_time, st.total_logical_reads ,
 st.total_elapsed_time / st.execution_count as avg_elapsed_time,
 st.total_worker_time / st.execution_count as avg_worker_time,
 st.total_logical_reads / st.execution_count as avg_logical_reads
FROM sys.procedures AS p
INNER JOIN sys.schemas AS sc
  ON p.[schema_id] = sc.schema_id
LEFT OUTER JOIN sys.dm_exec_procedure_stats AS st
  ON p.[object_id] = st.object_id
ORDER BY st.execution_count DESC;


Now it is possible to go down to the query level by looking at the dm_exec_query_stats, and that is also a useful query.  But sometimes, its nice to look at things from the procedure level before diving down to the individual statement level, and that is what this query offers.

Important to note, this view only holds items that are currently in the SQL Cache, so it is not a statistical count since SQL Server was rebooted or anything like that.  It is just what is in the SQL cache, which is usually quite a bit.  If you are looking to see if a procedure is no longer used though, you might want to run this query multiple times at regular intervals over the course of a few days to make sure that the procedure is not getting cached out such that you don't see it in an individual query.

Updating Administrative Columns Using a Trigger in Oracle

Consider a table like the one below:

CREATE TABLE students
(
    student_id     NUMBER(10)      GENERATED ALWAYS AS IDENTITY NOT NULL,
    first_name     VARCHAR2(30)    NOT NULL,
    last_name      VARCHAR2(30)    NOT NULL,
    email           VARCHAR2(50)    NULL,
    phone           VARCHAR2(20)    NULL,
    create_date   DATE            NOT NULL,
    modified_date DATE            NOT NULL,
    CONSTRAINT pk_students PRIMARY KEY (student_id)
);

Specifically, we have two columns in this table CREATE_DATE and MODIFIED_DATE that I call administrative columns.  These are here not because they have anything to do with the business domain data for the record (in this case a student), but purely for tracking purposes.  Its useful to know when a row was created and last modified.  In fact, if you are performing optimistic locking to update columns, it is imperative to know when the row was last modified, but that is a topic for a different day.

The problem we face is that we want to keep these rows accurate and in sync.  If we only create and update rows in our application, no problem.  We can make sure that our data access code or stored procedure appropriately sets these rows for each use case and we are gold.

But here is the catch.  We can't always guarantee that someone will insert or update rows from our application.

Maybe someone writes another application that needs to modify data in our database.  Yes, we would like for it to use the same data access libraries, but in the real world, we know this is not always the case.

And what about the use case for someone modifying data via their SQL Editor.  Sure, in production, things should be locked down such that this doesn't happen often, but inevitably, it will need to happen at some point.  Some rows will get messed up and someone will need to go in and fix them.

And so our challenge is how do we make sure these columns are always correct and always updated no matter what the use case is when they are inserted or updated to.

This is a place where a trigger can really help.

Consider the following:

CREATE OR REPLACE TRIGGER tr_students_create_mod_date 
    BEFORE INSERT OR UPDATE
    ON students
    FOR EACH ROW
    BEGIN
    CASE
        WHEN INSERTING THEN
            :new.create_date := sysdate;
            :new.modified_date := sysdate;
        WHEN NOT UPDATING THEN
            :new.create_date := :old.create_date;
            :new.modified_date := sysdate;        
    END CASE;
END;

This is a very simple trigger.  When a row is being inserted, it sets the value of CREATE_DATE and MODIFIED_DATE to the current server date (sysdate).  When a row is being updated, the MODIFIED_DATE is set to sysdate, but the CREATE_DATE is set to the original create date.  The effect this has is that even if someone tries to change the CREATE_DATE for a row after the row is inserted, it won't matter.  We'll always preserve that original CREATE_DATE.

And so this is one of the little niceties that make our life easier.  We don't have to include these values in our statements and we can be assured that they will always be correct, no matter how a row is inserted or updated in our database.

An Oracle Trigger for an Auto-Increment Column

In Oracle 12c, you can now define columns as auto increment columns, similar to what you can do in SQL Server and other databases.  Ultimately, what Oracle is doing is creating a sequence in the background and wiring this up behind the scenes to the column in your table.

Here is an example of an auto incrementing column in Oracle 12c

CREATE TABLE students
(
    student_id     NUMBER(10)      GENERATED ALWAYS AS IDENTITY NOT NULL,
    first_name     VARCHAR2(30)    NOT NULL,
    last_name      VARCHAR2(30)    NOT NULL,
    email          VARCHAR2(50)    NULL,
    phone          VARCHAR2(20)    NULL,
    create_date    DATE            NOT NULL,
    modified_date  DATE            NOT NULL,
    CONSTRAINT pk_students PRIMARY KEY (student_id)
);

However, most databases around today were designed before 12c was available.  No worries though, you can use the combination of a trigger and a sequence to synthesize this behavior in Oracle.  And in fact, people have been doing this for years, at least as long as I have been working with Oracle which goes all the way back to 1996.  So how do we do this:

First, lets consider a table of the form:

CREATE TABLE students
(
    student_id     NUMBER(10)      NOT NULL,
    first_name     VARCHAR2(30)    NOT NULL,
    last_name      VARCHAR2(30)    NOT NULL,
    email          VARCHAR2(50)    NULL,
    phone          VARCHAR2(20)    NULL,
    create_date    DATE            NOT NULL,
    modified_date  DATE            NOT NULL,
    CONSTRAINT pk_students PRIMARY KEY (student_id)
);

And now, we need to define a sequence and a trigger:

-- Start at any number you want to - here I'll start at 100    
CREATE SEQUENCE seq_student_id_sequence START WITH 100;    
    
    
CREATE OR REPLACE TRIGGER tr_students_assign_id 
    BEFORE INSERT
    ON students
    FOR EACH ROW
    BEGIN
        :new.student_id := seq_student_id_sequence.nextval;
END;

The trigger is pretty simple.  Before a row is inserted, it will grab the next value from the sequence and use the :new psuedorow to put that value in the appropriate field, in this case student id.

As I said in my Pluralsight course, if you work with Oracle for any length of time, you are going to see triggers very similar to this in the databases you work with.  So even though we might not need to create triggers like this in a post 12c world, it is still good to understand how this works, because databases using these constructs will be around for years.