Monday, March 30, 2015

Archiving Updated and Deleted Rows on a Table in SQL Server

I previously wrote about how to archive rows in Oracle using a trigger.  Today, I want to discuss how to do this in SQL Server.  Again, the syntax varies between the two databases, but you can still accomplish the same functionality.

So to recap on what we want to accomplish:

  • Any time a row in a our table is UPDATED or DELETED, we want to store a copy of the old row in the archive table
  • We also want to store with the archived record date and times of when that version of the row was the active version of the row in the table
  • We would like to store some information about what user ran the DML operation and the client machine it came from
  • We want all of this to happen transparently and automatically.  That is, we don't want a user or application to have to do anything except run their SQL statement, and the archival of the old version of the row happens automatically.
The last requirement is what drives this solution to use a trigger.  While it often causes a lot of trouble to start putting business logic in a trigger, a requirement like this is the perfect job for a trigger.  Having a trigger archive the data makes sure that it always is done, regardless if the DML comes from an application, stored procedure or a user running an ad-hoc SQL statement.  And a trigger can perform this archiving transparently in the background, the the application or user running the DML statement does not even have to think about it happening.

A Sample Table

So lets take a look at a sample table that we will archive data for:

CREATE TABLE dbo.Contacts
(
    ContactId          INT           NOT NULL,
    FirstName          VARCHAR(20)   NOT NULL,
    LastName           VARCHAR(20)   NOT NULL,
    Email              VARCHAR(40)   NULL,
    Telephone          VARCHAR(20)   NULL,
    CreatedDate        DATETIME      NULL,
    LastModifiedDate   DATETIME      NULL,
    CONSTRAINT pk_contacts PRIMARY KEY (ContactId)
);


And here will be our archive table

CREATE TABLE dbo.ContactsArchive
(
    ArchiveId          INT IDENTITY  NOT NULL,
    ContactId          INT           NOT NULL,
    FirstName          VARCHAR(20)   NOT NULL,
    LastName           VARCHAR(20)   NOT NULL,
    Email              VARCHAR(40)   NULL,
    Telephone          VARCHAR(20)   NULL,
    OperationType      VARCHAR(1)    NOT NULL,
    VersionStartTime   DATETIME      NOT NULL,
    VersionEndTime     DATETIME      NOT NULL,
    ChangeUser         VARCHAR(30)   NULL,
    ChangeIp           VARCHAR(20)   NULL,
    CONSTRAINT pk_contacts_archive PRIMARY KEY (ArchiveId)
);

We of course see our data columns from the original table in our archive table, but lets quickly explain the purpose of the other columns in this table.

  • ArchiveId - The primary key of our archive table.  Just a sequential integer
  • OperationType - A column containing a code of the DML operation that resulted in the row being archived.  For updates, the code will be 'U' and for deletes 'D' will be used
  • VersionStartTime - The date/time of when this version of the row became active in the table.
  • VersionEndTime - The date/time when this version of the row was removed/replaced from the table.  Together with VersionStartTime, this is when this version of the row was in the main table.
  • ChangeUser - The SQL Server identity of the user that ran the DML operation
  • ChangeIp - The IP Address the SQL client was connected from when this change was performed.
The last two columns will help us understand who performed the DML statement that changed the data in the row.  This is not perfect.  Say we have a web application, we get the name of the SQL Server user used to log into the database, not the username of who was logged into the web application.  But this is still useful information to have because we can differentiate changes done by different applications (assuming the app use different SQL logins) and changes done by an app versus a user with ad-hoc SQL.

I also want to point out that in this case, I am using the two triggers from this previous blog post to automatically update the CreatedDate and LastModifiedDate columns of the contacts table.  Again, we want these dates to be updated automatically, transparently and to always be correct, so using these triggers makes sense.  For completeness, here are the triggers I am using for this post:

CREATE TRIGGER dbo.TRG_Contacts_On_Insert_Row_Dates
    ON dbo.Contacts
    FOR INSERT
    AS
    BEGIN
        SET NOCOUNT ON
        
        UPDATE dbo.Contacts
            SET CreatedDate = GETUTCDATE(),
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Contacts INNER JOIN inserted 
            ON dbo.Contacts.ContactId = inserted.ContactId;
    END
GO


CREATE TRIGGER TRG_Contacts_On_Update_Row_Dates
    ON dbo.Contacts
    FOR UPDATE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        UPDATE dbo.Contacts
            SET CreatedDate = deleted.CreatedDate,
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Contacts 
        INNER JOIN deleted
            ON dbo.Contacts.ContactId = deleted.ContactId;
        
    END
GO

Triggers to Archive Changes

The first case to deal with is when data in a row is changed, so for this scenario, we'll use the following trigger.


CREATE TRIGGER TRG_Contacts_Archive_Updates
    ON dbo.Contacts
    FOR UPDATE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        INSERT INTO dbo.ContactsArchive
            (ContactId, FirstName, LastName, Email, Telephone, 
             OperationType, VersionStartTime, VersionEndTime, 
             ChangeUser, ChangeIp)
        SELECT
            deleted.ContactId, deleted.FirstName, deleted.LastName, deleted.Email, deleted.Telephone,
            'U', deleted.LastModifiedDate, GETUTCDATE(),
             CURRENT_USER, CONVERT(varchar(20), CONNECTIONPROPERTY('client_net_address'))
        FROM deleted
        
    END
GO

We see that the first thing we do is look at the nesting level of the trigger and only execute the code for this trigger for the original DML statement.  If we didn't do this, we would actually get two rows in the archive table, because this trigger would also fire when the other trigger on the table (the one that updates the LastModifiedDate) executes.  But we just want one record, one that captures the original state of the row before the triggering UPDATE statement ran.

Then, we are dealing with a straightforward insert statement into our ContactsArchive table.  We can use the DELETED pseudo-table to get access to the original values in each row and to determine what rows were updated by the statement.  The original LastModified date of the row becomes the VersionStartTime of the row.  That is when this particular version of the row became active in the database.  For the VersionEndTime, we use the current UTC date.

Finally, we use a couple of SQL Server functions to get the current user logged into SQL Server and the IP Address that user is logged in from.

Now lets take a look at the FOR DELETE trigger

CREATE TRIGGER TRG_Contacts_Archive_Deletes
    ON dbo.Contacts
    FOR DELETE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        INSERT INTO dbo.ContactsArchive
            (ContactId, FirstName, LastName, Email, Telephone, 
             OperationType, VersionStartTime, VersionEndTime, 
             ChangeUser, ChangeIp)
        SELECT
            deleted.ContactId, deleted.FirstName, deleted.LastName, deleted.Email, deleted.Telephone,
            'D', deleted.LastModifiedDate, GETUTCDATE(),
             CURRENT_USER, CONVERT(varchar(20), CONNECTIONPROPERTY('client_net_address'))
        FROM deleted
        
    END
GO

This trigger is exactly the same as the trigger above, except it inserts an operation type code of 'D' into the OperationType column.  There are some ways within a trigger in SQL Server to determine if the triggering DML statement is an INSERT, UPDATE or DELETE, but these aren't nearly as clear as the syntax that Oracle gives you to do this.  So I decided to make this a completely separate trigger.

Trying Things Out

Lets put some data into our sample table and try things out.  Here are some sample rows that we will insert:

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (1, 'Christine', 'Jankowski', 'ChristineJJankowski@dayrep.com', '985-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (2, 'Dwight', 'Robertson', 'DwightSRobertson@jourrapide.com', '734-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (3, 'Maria', 'Beck', 'MariaBBeck@teleworm.us', '859-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (4, 'Michael', 'Harrington', 'MichaelCHarrington@dayrep.com', '718-555-1212');

INSERT INTO Contacts (ContactId, FirstName, LastName, Email, Telephone)
    VALUES (5, 'Andrea', 'Alexander', 'AndreaMAlexander@rhyta.com', '972-555-1212');

And now, we'll change one of those rows with this UPDATE statement:

UPDATE Contacts
    SET Telephone = '213-555-1234'
    WHERE ContactId = 2;

So we should be able to take a look at our archive table and find one record in there of the old version of the row.  And indeed we do.




Now let's try out deleting a row from our sample table with this DELETE statement.

DELETE FROM Contacts 
    WHERE ContactId = 5;

And again, we'll look in our archive table to find a copy of the deleted row.


Conclusion

Many times, it is important to keep a copy of the old version of data from a table whenever data in that table is updated or deleted.  This can be easily done with the triggers found above, and it makes sure that this archiving of data is done automatically each time a change occurs to the table.  Further, this process is completely transparent to the user or application executing the SQL statement.  They don't have to take any additional steps outside of their normal DML statement.

Feel free to use these triggers or variants of them in your own database as the need arises.






Friday, March 20, 2015

Populating Created Date and Last Modified Date Columns in SQL Server Using Triggers

I wrote a blog post a while back on updating what I call "administrative columns" like created date and last modified date in Oracle.  However, I don't just work in environments with Oracle as the backend RDBMS, but also work with SQL Server quite a bit.  Recently, I needed to write the same functionality for SQL Server, and as it turns out, you can achieve the same result in SQL Server, but the approach is quite a bit different.  So I think it is good to blog about how to do this, so if anyone else has this need, they can see my approach and either use it straight away or adapt it to their needs.

Lets briefly summarize the goals before we get started so it is clear what we are trying to accomplish.

  • The CreatedDate column should reflect the date and time when the row was first inserted into the table.
  • The CreatedDate column should never be able to change.  That is, we want to prevent anyone from updating the value in this column, either accidentally or for some malicious reason.
  • The LastModifiedDate column reflects the last time the row was changed.  For a new row, the LastModifiedDate is the CreatedDate.  Once a row is updated, the LastModifiedDate value is the time that update occured.
  • We want to populate both of these columns automatically from the current database time.  We don't want to rely on user supplied values for these columns.  This is for two reasons:
    • It is easier if the user doesn't have to worry about supplying values for these administrative type columns so they can just focus on the business data
    • We want to make sure the values are correct.  We don't want to use a user supplied date/time that is in the past or future or a value that might be in a different time zone.  Using the database value, we know the value is correct and consistent.


So first of all, lets define our target table.  For this example, I am going to use a table that captures restaurant information.  This table is simplified for our purposes, but will work just fine to demonstrate what we need to.  So here is the table.

CREATE TABLE Restaurants
(
    RestaurantId       INT           NOT NULL,
    RestaurantName     VARCHAR(40)   NOT NULL,
    Cuisine            VARCHAR(25)   NOT NULL,
    City               VARCHAR(30)   NOT NULL,
    State              VARCHAR(2)    NOT NULL,
    CreatedDate        DATETIME      NULL,
    LastModifiedDate   DATETIME      NULL,
    CONSTRAINT PK_Restaurants_RestaurantsId
        PRIMARY KEY (RestaurantId),         
);

Note that I am defining both the CreatedDate and LastModifiedDate columns as nullable columns.  This is important because in SQL Server, a trigger always runs after the DML statement that it fires for.  Unlike in Oracle where you can have a BEFORE trigger to intercept the statement and modify or replace values, in SQL Server, the trigger always fires after the DML statement.  So even though these columns accept NULL values, we'll take care of that in the triggers that follow.

You could define these columns as NOT NULL, but if you do so, then you need to provide a default value for the column (say GetUtcDate() ).  Again, the reason for this is that in SQL Server, the trigger fires after the DML statement has completed.  If the column is NOT NULL and you don't have any value (either one in the statement or a default value), then the statement will fail due to the NOT NULL constraints.  So if you choose to go the NOT NULL route, then provide a default value.

OK, lets take a look at the triggers that can accomplish this.  We'll need two triggers, and the first one to look at is the trigger that will fire after an INSERT statement.

CREATE TRIGGER TRG_Restaurants_After_Insert
    ON dbo.Restaurants
    FOR INSERT
    AS
    BEGIN
        SET NOCOUNT ON
        
        UPDATE dbo.Restaurants
            SET CreatedDate = GETUTCDATE(),
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Restaurants INNER JOIN inserted 
            ON dbo.Restaurants.RestaurantID = inserted.RestaurantID;
    END
GO

In SQL Server, the pseudo table INSERTED lists all of the 'after' values of the rows that were affected by the DML statement, so in this case, like its name implies, the inserted rows.  So what we need to do in order to update these columns is write an UPDATE statement like above where we join our target table to the INSERTED pseudo table in order to get the correct set of rows, and then set the values of CreatedDate and LastModifiedDate to the correct time, which in this case we are using UTC time.  So basically what this does is get the set of affected rows, and go back to the table and update the times to the times that we want.  In this way, even if someone supplies a value for CreatedDate or LastModifiedDate, we don't care.  We'll make sure the correct date/time value gets put into the column via this trigger.

Now for the trigger that will fire after an UPDATE statement:

CREATE TRIGGER TRG_Restaurants_After_Update
    ON dbo.Restaurants
    FOR UPDATE
    AS
    BEGIN
        SET NOCOUNT ON
        
        IF ( (SELECT trigger_nestlevel() ) > 1 )
            RETURN
            
        UPDATE dbo.Restaurants
            SET CreatedDate = deleted.CreatedDate,
                LastModifiedDate = GETUTCDATE()
        FROM dbo.Restaurants 
        INNER JOIN deleted
            ON dbo.Restaurants.RestaurantId = deleted.RestaurantID;
        
    END
GO

The first thing you might notice about this trigger is the IF block wrapped around the SELECT trigger_nestlevel() function.  This is necessary to prevent this trigger firing as a nested trigger.  For example, during an INSERT statement, the TRG_Restaurants_After_Insert trigger defined above will fire, which in turn does an UPDATE statement against the table.  Without this code block, the TRG_Restaurants_After_Update statement would also complete on the update statement fired by the first trigger, and that is not what we want.  We only want this trigger to complete for user/application executed UPDATE statements against the table.

Next, we again need to update the CreatedDate and LastModifiedDate columns in our table.  This time, we'll use the DELETED pseudo table to join to, which includes the old values of all of the rows affected by the UPDATE statement.  Joining to the DELETED pseudo table gives us the proper set of rows, and we can pull the existing (old) CreatedDate value out of these rows and use it in our trigger's UPDATE statement as shown above.  What this does is keeps someone from updating the value of CreatedDate.  As we'll show below, even if someone tries to update the value of CreatedDate, it doesn't matter, because our trigger will insure we retain the existing value in our table.

Testing It All Out

Lets test everything out and make sure it is working.  At this point, I have created the table above and both of my triggers.  So lets insert a few rows into the table.

INSERT INTO Restaurants (RestaurantId, RestaurantName, Cuisine, City, State)
    VALUES (1, 'Crepes de Lux', 'French', 'Iowa City', 'IA');
 
INSERT INTO Restaurants (RestaurantId, RestaurantName, Cuisine, City, State)
    VALUES (2, 'Ginger Rootz', 'Chinese', 'Appleton', 'WI');
  
INSERT INTO Restaurants (RestaurantId, RestaurantName, Cuisine, City, State, 
        CreatedDate, LastModifiedDate)
    VALUES (3, 'House of Pho', 'Asian', 'Schaumburg', 'IL', 
        '01/01/2020', '01/01/2010');    

We see that in the third row, I am supplying a value for CreatedDate and LastModifiedDate.  However, as I am executing this statement, it is most certainly not January 1st, 2020 nor January 1st, 2010.  This is exactly the type of situation we are designing for, to make sure we always have correct values in these columns and not allow them to be overridden by user supplied values like this that may potentially be incorrect.

So lets see what is on our table at this point.



This is exactly what we want.  Even though we did not supply values in the first two INSERT statements, our trigger has automatically populated these columns for use with the correct value.  And for the third statement when incorrect values were supplied, these values were not used but again, we have the correct values in these two columns.

Lets run an UPDATE statement now.  And to start with, we'll just perform the most basic UPDATE statement, changing the Cuisine for House of Pho from Asian to Vietnamese.

UPDATE Restaurants 
    SET Cuisine = 'Vietnamese'
    WHERE RestaurantId = 3;

And now we will check our table data once again.


We see here that for "House of Pho", the LastModifiedDate has been automatically updated by our trigger to the appropriate time for when the row was changed.  The CreatedDate record was left unchanged, as were the records for the other rows in the table.  So this is exactly what we were looking for.

Finally, lets test the scenario where someone tries to run an UPDATE statement that will modify the CreatedDate column, which as we have discussed, we do not want to allow.  So to test this out, we'll run the following statement:

UPDATE Restaurants 
    SET Cuisine = 'Pan-Asian',
        CreatedDate = '01/01/2010'
    WHERE RestaurantId = 2;

So we can see here, we are trying to backdate the CreatedDate of the row to a different value.  However, with our trigger, we don't allow this to happen and we just retain the existing value.  Here is the data:


We see here that row 2 for "Ginger Rootz" has a new value for LastModifiedDate (and this value is correct).  But even though a value for CreatedDate was supplied in the UPDATE statement, the supplied value was ignored and the original CreatedDate value retained, which is the correct behavior.

Why All of This Matters

What we are essentially trying to do is establish an audit trail for our data.  We want to know when a row came into existence and when it was last changed.  For this audit trail to be effective, we need to make sure that these columns always contain the correct values.  We don't want to burden a developer or someone writing a SQL statement to worry about how to populate these columns, because we can do that automatically.  At the same time though, we want to prevent these columns from being changed, whether maliciously or accidentally so we are confident the values in these columns are always correct.  Using these two triggers, we accomplish both of those goals.  

When researching this problem, I did see a number of other blog posts and postings on StackOverflow that said just to use a DEFAULT value to populate the CreatedDate column.  This solves half of the problem, in that a developer no longer has to worry about providing a value.  But it does not address the problem of preventing the column to be modified.  And I feel this is an equally important aspect to solve.

Lately, I've been involved in researching a number of security related items.  I am not able to give away many details beyond saying I was involved in researching some issues, but what I can say is that two of the questions that always come up are "when did this row come into existence?" and "when was it last modified?".  Having audit columns that you know are correct and you know cannot be tampered with is critical when researching these sorts of issues.  

I have to admit, I myself am sometimes lazy about including columns like CreatedDate and LastModifiedDate in my tables.  I need to be better about that, and about making sure these columns are always properly updated.  With these triggers, now I don't have an excuse any more.

Tuesday, March 10, 2015

Using Cassette to Bundle and Minify Files in MVC2, MVC3 and Older WebForms Apps

Over the last few years, we have seen the emergence of highly interactive web applications.  In order to achieve this level of interactivity, most web pages now contain multiple JavaScript and CSS files that power this behavior.

As the number of files required by a web page goes up though, the performance of the page can be degraded, because the browser has to load each one of these dependent files from the web server.  This impacts performance in three ways

  • Number of Downloads - Each separate file is a round trip from the browser to the server, and each round trip will incur network latency, the amount of time it takes for a packet to travel from the browser to server and back.  Downloading many small files is generally sower than downloading a single large file due to the impact of network latency.
  • Download Size - Multiple files to download adds up in terms of the amount of data that must be downloaded by the browser from the web server.  Simply put, larger pages take longer to download and ultimately display to the user, so we want to trim download size wherever possible,
  • Number of Concurrent Downloads - This is something that is not realized by most developers, but your browser will only download so many files concurrently from a given host at a time.  For most browsers, the limit is 6 concurrent downloads.  For IE 11, the number is 13 (check out all browsers at http://www.browserscope.org/).  What this means is that if you have 20 files that need to be downloaded from your website, only N will downloaded at a time, and the remaining files will queue behind these N files.  When a file from that group finishes, then the next download can start.  So if we can minimize the number of items that need to be queued, we can download and render our page faster.

Minification and Bundling

So how does minification and bundling help these problems?

Mnification - Looking at any JavaScript file, you will notice that there are lots of spaces, line feeds and comments in the file.  As a developer, these are good because this is what makes the file readable for when we have to work with it.  But from the point of view of the JavaScript interpreter in the browser, this information is extraneous.  It doesn't care if there is one space or eight, so long as it can parse the file.  So by removing these bytes from the file, we can shrink the size of the file, which means fewer bytes have to be sent over the wire from the server to the browser.  And these savings can be dramatic.  For example, the minified version of jQuery 2.1.3 is only 34 KB, where as the non-minified version is over 87 KB (click on the links to see the differences between a minified and non-minified file).  And minification doesn't just apply to JavaScript files.  You can see dramatic savings in CSS as well.

Note that minification is not compression.  Compression is the process of encoding information such that statistical redundancies in the data are reduced such that data can be represented in a shorter, more concise format.  Minification is the process of removing extraneous information from the file.  So what we ultimately want to do is first minify our JavaScript and CSS files and then compress them using HTTP compression so that we minimize the number of bytes that need to be sent over the wire.

Bundling - Bundling is the process of concatenating multiple files together so they can be downloaded as a single file.  Lets say that you site makes use of three CSS files to define styles.  And we want to keep these as three separate files because this makes the editing and maintenance of these files easier.  But, this means that the browser now has to perform three separate downloads in order to get each of these files.  As we said above, we are going to incur additional network latency by having these as three separate downloads, and these three files will all count against our concurrent download limit, which may block the browser from starting to download other resources required by the page.

The answer here is bundling.  These three files can be concatenated together such that now only one file needs to be downloaded by the browser.  The total size of the file will be just the sum of the size of each individual file, but we save in terms of not having separate downloads for each one and by freeing up some of the concurrent network connections the browser has to perform other work, like downloading other resources

How Do I Accomplish This?

You could manually minify your JavaScript by using a tool like UglifyJS.  And many people do just that in their build process.  Bundling presents a bigger challenge though, because we really want separate files when we are developing, and then only to combine them at deployment or run time.   It is possible to build this into your deployment process with a number of scripts, but this is a hassle to maintain.

In ASP.NET 4.5, Microsoft introduced the Bundling and Minification API, which is well covered in this excellent article by Rick Anderson.  The bundling and minification allows you to define JavaScript and CSS files that should be bundled together in your code, and then at run time, it will automatically minimize, concatenate and serve these files for your site.  The beauty of this is that it all happens transparently to you.  You work with files as you normally would in development, and then with some simple configuration, they are automatically optimized at run time.

However, many web projects exist today that are on older versions of Microsft frameworks.  Yes, there are many MVC2, MVC3 and older WebForms apps out there.  And due to constrained IT budgets, higher priority projects and most of all time, it is not always possible to simply lift these projects up to the latest version of the framework.  So these new features are out of reach for your apps using older frameworks.  Or are they?

Enter Cassette

Cassette is a package created by Andrew Davey that brings the same bundling and minification functionality available in ASP.NET 4.5 to prior versions of ASP.NET.  While the syntax varies somewhat, the concept is the same.  We can define 'bundles' of either CSS or JavaScript files in our ASP.NET application, and at runtime, these files will be bundled together and minified, thereby increasing the performance of our web site.  Lets take a look how this happens.

First, we want to add Cassette to our application.  This is most easily done with the NuGet.  The package you want is called Cassette.AspNet.  You can install this package using the GUI:


Or you can use the Package Manager Console where you will see something like this when you install:

Once Cassette is installed, you will see a new file in the root directory of your ASP.NET called CassetteConfiguration.  You want to edit this file to configure the various bundles needed by your project.

Defining Bundles

The next step is to define how you want to bundle various files together for your site.  A bundle can be either a Stylesheet bundle or a Script bundle, but not both.  What you want to do is think about how your stylesheet and JavaScript files logically map to pages.  If you have multiple stylesheets that are included in your master page or master layout view, then a bundle that contains all of these files makes sense.  A similar bundle for all of your JavaScript files that are on your master page makes sense as well.  Then you may have additional bundles that represent scripts that are only present on a subset of your pages.

Bundles are defined in the CassetteConfiguration.cs file that was shown above.  There are a number of different ways to define a bundle, but I prefer to create a List of the files to be included in the bundle and then add that list to the BundlesCollection object as shown below.


Here, I am defining three bundles.  The first contains all of the CSS files that are in my master layout view, the second all of the JavaScript I include in my master layout view and the third a single script used for working with Google Maps on one of the pages in my site.

You might ask, why would I create a bundle for a single file, because after all, the word bundle implies there should be multiple files.  What I am after here is minification of this JavaScript file.  By including this file in a bindle, Cassette will automtically minify the file for me at runtime.  So now, I can work with a readable file like normal in Visual Studio, but be assured that Cassette will take care of minification when the time comes.  Further, we will see in a bit that Cassette also adds a cache header for each bundle, which further improves performance.

When you are calling the bundles.Add() method, you will see intellisense in Visual Studio as follows:


The first argument is what Cassette calls applicationRelativePath.  Indeed, you can use this to point to a path in your application in order to include files in the bundle.  However, also note that it says this does not to be a real directory path.  And that is what I am doing.  I am using this parameter to give a meaningful name to the bundle, a meaningful name we will use in a moment when we go to include the bundle in one of our web pages.

There are a number of other ways to create your bundles, like providing a subdirectory name and allowing Cassette to create a bundle of all of the files in that subdirectory.  I like this method though, where I explicitly define the files in a collection and then add them to the bundles, giving each bundle a meaningful name.  I think this makes it very easy for someone else to follow what I am doing.  Do know though, there are other options available, and these are covered in the Cassette Documentation.

Adding Bundles To Our Web Pages

In this article, I am working with an MVC3 project.  You can however use Cassette on MVC2 and WebForms projects.  The syntax will be slightly different, but the concepts are the same.

In my View, first you need to add a code block at the top that defines the stylesheet and script bundles you are going to use in the view.  Note, you can do this in both a master layout view and an individual page view (the example below is actually in my _Layout.cshtml file).




Then, you call Bundles.RenderStylesheets() and Bundles.RenderScripts() at the points in your page where you want the stylesheet and script bundles to appear respectively.

For stylesheets, these go in the <head> tag



And it is best for performance is your scripts are placed just before the closing body tag.

There is a way that you can have scripts render in different places in your HTML page, but I will save how to accomplish that for a later blog post.

Turning on Cassette

There is one last step, and that is to turn on Cassette.  For Cassette to bundle and minify your files, you need to set the debug flag to false.



If the debug flag is set to true (as it might be for building on your local machine), then Cassette will just output the links to the regular version each CSS and JavaScript file.  This is useful for when you are debugging and might need to debug through some JavaScript code.  But in your production environments, you would have debug set to false to get the full benefit of bundling and minification (among other things).

So how does this look when things are working.  Here is the view from the Network tab in the Chrome Developer Tools.


So we can see what Cassette is doing.  It is using an HTTP Handler to service the request, and this handler bundles and minifies all of the associated files for this bundle.  Again, the nice thing is this all happens to us transparently as developers.  Cassette puts the correct links in each of our pages.  All we have to do is set the bundles up correctly.

But Wait, There is More...

One of the things Cassette also does is add the appropriate caching headers to your bundles that are sent down to the browser.  This means once a browser has downloaded the bundle the first time, it will not have to download it again for another one year.



So in the case above where we have bundles named MasterCss and MasterScripts containing the CSS and JavaScript files we use on every page in our site, these bundles will only be loaded once, not on every page a user navigates to while visiting our site.  And if they come back and visit our site again tomorrow or next week, again, these bundles will be caches locally on their browser and not need to be reloaded.  By caching these assets on the browser after the first page load, we'll save significant bandwidth on ever subsequent page load.

What happens though if we need to change one of these files?  Perhaps we find a bug in our JavaScript or need to change our stylesheets to support a new color scheme?  How does the browser know that a particular bundle has changed so that it should be downloaded again?

What Cassette does is calculate a SHA1 hash over the contents of the bundle, base 64 encodes this hash and then embeds this base 64 encoded hash value in the name of the bundle.  That is the long string in the name of the bundle in the screeenshot in the previous section.

In this way, if the contents of any of the files in the bundle changes, the computed SHA1 hash will change and hence the name of the bundle will change.  When the browser sees this new bundle name, it will realize it does not have this version of the bundle cached and download the new bundle from the web server.  This makes sure that for any changes that you need to make to your files, the browser will always download the correct version, and saves us from manually having to version our files and manage this process within our web application.

Summary

Using Cassette, we get all of the benefits of bundling and minification in our ASP.NET projects on earlier versions of the framework.  We can continue to work with the unminified, separate versions of each file while developing the project, and Cassette will automatically optimize these at runtime.  Further, each bundle will include the appropriate cache headers such that the bundles contents will be cached in the browser.

From a performance standpoint, we achieve better performance in by three major elements:
  • CSS and JavaScript files sent to the browser will be minimized, thereby removing unnecessary information like spaces and comments and reducing the number of overall bytes that need to be sent down to the client.
  • We create bundles of CSS and JavaScript files that can contain multiple files concatenated together.  This reduces the total number of files the browser must download, which reduces the penalty associated with network latency that you have to pay for each individual file download.  Also, all browsers have a limit on the number of files they will concurrently download from a site, so bundling files together helps reduce queuing of files that must be downloaded by the browser.
  • By including an aggressive cache header, a bundle will only have to be downloaded once by a browser and then can be served from cache on all subsequent page views that need that bundle.  This again reduces the number of bytes that need to be sent down to the browser.  Further, in the case where a file does change in the bundle, it is already built into Cassette to calculate a new hash value and embed the value in the name such that the browser will automatically know that it needs to download a new version of the bundle.
The example I showed here was from the MVC Music Store application.  However, I have used Cassette in production on a large consumer website that received hundreds of thousands of hits every month and it has worked flawlessly.  Of course, you want to testing of any new component like you normally would, but I can say from experience that I have had success with the package in some large scale environments.

We all know that many of these ASP.NET applications on older versions will still be around for a few more years.  So if you have one of these applications that you are responsible for, I urge you to use Cassette to get the advantages of bundling and minification to improve the performance of that application.