Saturday, April 29, 2017

Monday, April 24, 2017

A Scary Incident at Twin Cities Code Camp

We had a scary incident at Twin Cities Code Camp this last weekend.

About 45 minutes into Scott Addie's talk on Reusable UI Components in ASP.NET Core, an attendee collapsed, fell out of his chair and hit the floor completely unconscious.  I was looking down at the time, and by the time I looked up, several other attendees were rushing to aid the gentleman who had passed out.

Fortunately, he came to in 10 seconds, though he still complained about being light headed.  Though he was able to talk, he clearly was not right, so another attendee called 911.  Within a few minutes, Campus Safety had arrived followed by paramedics who took the gentleman to a local hospital to run further tests and keep him under observation.  During this 10-15 minutes from the time to collapsed until the time he was wheeled out on a stretcher, the man was conscious, coherent and able to talk.  But like I said above, he clearly was not right.  I do not have any more details on his present condition, but all of us at the event and in the technology community in general hope he makes a speedy recovery and is able to get whatever underlying issues that may be going on addressed.

This was scary, terribly scary.  When I looked up and first realized what was happening, my first thought was that he may have had a heart attack.  I was scared for this gentleman and I felt very helpless at the time.  It was just a few minutes until Campus Safety arrived, but those few minutes ticked by very slowly.  In a moment all of us went from trying to learn something new to wondering what kind of medical help this individual needed and if that help would arrive in time.

It was scary too in that you think of your own health, and in my case, I realize it isn't nearly as good as it should be.  I'm in my mid 40's now, and health isn't something that can be taken for granted.  The last few years, I have been much less active than I used to be, and as a result, I've gained a lot of weight.  When your clothes don't quite fit right or they don't fit at all any more, that is telling you something.  And not just that I have become too sedentary.  It also speaks to my eating habits and how in general, I am just not taking good care of my most important possession, my health.

I don't think I am alone in the technology community.  Like many of us, I sit in a chair all day.  I consume way too much soda while doing so.  And then I go home, eat, and sit in a different chair to code some more.  I know where too little activity and an unhealthy diet leads.  We all do.  Sometimes we need a wake up call though in order to take some action.

What I am Going to Do About It

I had a lot of time to think about this driving back from the Twin Cities, and I decided that I have to change some things.  Now.  These are things that I am changing right away:

  1. I will make sure to get at least 30 minutes of activity a day.  Ideally with summer coming a lot of this time will be on my road bike, and then can be a lot longer in duration.  But no matter what, every day, I need to get out and get some activity.  Tonight, that was a 30 minute walk around the neighborhood after dinner.  That is all it takes.  I'm not looking to be an Olympian, but we all need to be active every day, no exceptions.
  2. I will cut my soda intake in half.  Hate to say it, but soda is an addiction for me like cigarettes are for others.  Yes, I've tried to quit before, and that soda habit comes right back.  I wish I could quit cold turkey today, but I have to be realistic, and I want these goals to be achievable.  Cutting my intake in half will be major progress for me, and my weight and health will improve because of it.
  3. I will sign up for a Red Cross First Aid/CPR class and complete it by the end of summer. The last time I had first aid and CPR training was way back in college.  I hope this is training I never have to use.  But I didn't like that feeling of being helpless.   If a similar situation ever happens in the future, I want to be confident I know what to do and be able to do it.  
On June 1st, a little over a month from now, I'll post again on this topic with my progress.  and I'll post again at the beginning of September.  Mostly that is to help me hold myself accountable.

What I Am Encouraging Everyone Else To Do


I would encourage you to set up your own heath goals like I have above.  If you do, leave a comment here.  Or blog about them or tweet about them.  And then do so again in a month.  In the mean time, hold yourself accountable for them.  We can all do a 30 minute walk each and every day.  Yes, it means a little less time glued to our devices or a little less time hacking on our personal projects.  Its OK.  Our teams, our friends and most importantly our families need us, all of us to be healthy.  Besides, some time away from all of the screens we stare at all day might actually help us improve our creative processes.

On Saturday, everything appears to have turned out OK.  But it was a scary moment.  What we can do is learn from it and take action about it.  We spend a lot of time keeping our codebases healthy.  We need to pay just as much attention to keeping ourselves healthy.  I got that message loud and clear.



Sunday, April 23, 2017

JSON Functionality in SQL Server - Part 2

This is the second part of a series on JSON functionality in SQL Server.  This post will focus on accessing data in arrays that may be part of your JSON data.

You can find the other parts of the series here (I'll update as I go)

  1. Accessing scalar properties on JSON objects stored in SQL Server
  2. Accessing array data in JSON objects stored in SQL Server (this post)


Also, if you want to recreate the database I have here and try the samples out for yourself, you can find everything you need on Github ar:



In the first part of this series, we had a simple table that contained weather data, and each row of the table contained a single JSON object with the weather observations.

The JSON object in the ObservationData column contains all scalar properties.  That is, if we want a value, we can just write a path expression and use the JSON_VALUE function to get the data element we want back.  But life is not always so simple.  What if instead of a single object, we had an object that contained an array of values or an array of objects.  How would we interrogate the JSON data in our column to get access to those values?

Lets consider the following table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE DailyWeatherDataJson
(
    ObservationId int IDENTITY(1,1)    NOT NULL,
    StationCode       VARCHAR(10)      NOT NULL,
    City              VARCHAR(30)      NOT NULL,
    State             VARCHAR(2)       NOT NULL,
    ObservationDate   DATE             NOT NULL,
    ObservationData   VARCHAR(max)     NOT NULL,
    CONSTRAINT PK_DailyWeatherDataJson
        PRIMARY KEY (ObservationId)
);

This is very similar to the table we had before with the following differences:

  • There will be one row in the table for each weather station per day
  • The ObservationData column will now hold a JSON object that contains an array of all of the observations for that day
  • Since our JSON will be larger, we have changed to a VARCHAR(MAX) data type to store our JSON
The JSON in the ObservationData column will look like this:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
{
  "stationCode": "14819",
  "observationDate": "2016-07-15T00:00:00",
  "weatherObservations": [
    {
      "stationCode": "14819",
      "location": {
        "city": "Chicago",
        "state": "IL",
        "stationName": "CHICAGO MIDWAY INTL ARPT"
      },
      "observationDate": "20160715",
      "observationTime": "0053",
      "observationDateTime": "2016-07-15T00:53:00",
      "skyCondition": "FEW070",
      "visibility": 10,
      "dryBulbFarenheit": 72,
      "dryBulbCelsius": 22.2,
      "wetBulbFarenheit": 64,
      "wetBulbCelsius": 17.5,
      "dewpointFarenheit": 58,
      "dewpointCelsius": 14.4,
      "relativeHumidity": 62,
      "windSpeed": 7,
      "windDirection": "280",
      "stationPressure": 29.26,
      "seaLevelPressure": null,
      "recordType": " ",
      "hourlyPrecip": null,
      "altimeter": 29.92
    },
    {
      "stationCode": "14819",
      "location": {
        "city": "Chicago",
        "state": "IL",
        "stationName": "CHICAGO MIDWAY INTL ARPT"
      },
      "observationDate": "20160715",
      "observationTime": "0153",
      "observationDateTime": "2016-07-15T01:53:00",
      "skyCondition": "FEW070",
      "visibility": 10,
      "dryBulbFarenheit": 71,
      "dryBulbCelsius": 21.7,
      "wetBulbFarenheit": 63,
      "wetBulbCelsius": 17,
      "dewpointFarenheit": 57,
      "dewpointCelsius": 13.9,
      "relativeHumidity": 61,
      "windSpeed": 8,
      "windDirection": "280",
      "stationPressure": 29.26,
      "seaLevelPressure": null,
      "recordType": " ",
      "hourlyPrecip": null,
      "altimeter": 29.92
    },
    ...
  ]
}

The '...' is meant to represent that in almost all cases, we'll have more than two elements in the weatherObservations array, since most National Weather Service stations (where I downloaded this data from) take observations at least every hour, if not more often.

Accessing Individual Elements with the JSON_VALUE Function
If we know we want to access a value on an individual element of the array, we can do that using the JSON_VALUE function we saw in the last post.  The following SQL statement will get the temperature reading for the first element of the array that represents readings for Chicago on July 15, 2016.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT 
        ObservationId,
        StationCode,
        City,
        State,
        ObservationDate,
        JSON_VALUE(ObservationData, '$.weatherObservations[0].dryBulbFarenheit') As Temperature,
        JSON_VALUE(ObservationData, '$.weatherObservations[0].observationDateTime') As ObservationTime,
        ObservationData
    FROM DailyWeatherDataJson
    WHERE City = 'Chicago'
    AND State = 'IL'
    AND ObservationDate = '2016-07-15';

And here are the results

This is useful, but we are still pretty limited.  We don't really want to go element by element through the array like this, and besides, we aren't even sure how many elements the array will have.  We need to be able to work with the array as an array, so lets look at a way to do that.

Accessing The Array With the JSON_QUERY Function
What I'm going to do here is jump ahead to the solution and then explain what is going on.  Our goal is to get back each element of the JSON array and basically turn it into a row in our result set such that we can operate on the data like it was in a normal relational format.  So lets see the answer:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
SELECT 
        ObservationId,
        StationCode,
        City,
        State,
        ObservationDate,        
        CONVERT(datetime, JSON_VALUE(Observations.[Value], '$.observationDateTime'), 126) As ObservationTime,
        JSON_VALUE(Observations.[Value], '$.dryBulbFarenheit') As Temperature,
        JSON_VALUE(Observations.[Value], '$.relativeHumidity') As Humidity,
        JSON_VALUE(Observations.[Value], '$.windDirection') As WindDIrection,
        JSON_VALUE(Observations.[Value], '$.windSpeed') As WindSpeed
    FROM DailyWeatherDataJson d
    CROSS APPLY OPENJSON(JSON_QUERY(ObservationData, '$.weatherObservations')) Observations
    WHERE City = 'Chicago'
    AND State = 'IL'
    AND ObservationDate = ('2016-07-15');

Lets start on line 13, because this is where the magic starts happening.

The JSON_QUERY function lets us pull back a JSON array or object inside of another JSON object as a String, so we use this with the path expression '$.weatherObservations' to pull just the array of observations out of our larger JSON object.

Then, we use the OPENJSON table valued function which will parse the JSON result we have into a rowset.  And then the CROSS APPLY operator is used so we can join these results back to our main data set.

So now we basically have a row for each individual observation in the JSON array that was in our original JSON object.  And in each row, we have an individual JSON object that represents that individual observation.  That object is in the Value property of the Observations alias, and now we can access properties on that object using JSON_VALUE like we did before, and we see this on lines 7 through 11.  So basically what we have done is get all of the array elements out of the array and turn them into a JSON object that we have a row for, and now, we can extract data out of that individual JSON object like we saw in the last post.

Yes, there is a lot going on here.  But this is just a pattern.  When you have a JSON object that has an array in it, you just look up this pattern, plug in the appropriate pieces and apply it.  And what is really cool is that now data that is down in a JSON array, we can interact with just like it was stored as normal rows and columns in our database.  So we can apply filters in the form of a WHERE clause, join to values, use aggregate functions, whatever we need to do and are familiar with using SQL.

Putting a View Over the Top
I think most people would probably agree, they don't want to have to memorize that syntax or write it every time.  So the answer is just to put a view over the top.  The data is still stored as JSON, but you can write traditional SQL that we are all familiar with in order to interact with the data.

I want to come back to one of the themes that I started the last post off with.  A lot of times today, the choice sees to be presented to us that you are either in a relational world or a NoSQL world.  The two camps strongly disagree with each other and if you do it the other way, someone thinks you are dead wrong.

What I take from this functionality in SQL Server is that really within one database, I can have a foot in both worlds.  If I want to act in a NoSQL way and just store JSON, sure, I can do that.  But if I need to interact with that data in traditional SQL ways, I can do that too.  So the question isn't either/or.  In fact, I don't even think there is a question.  The answer is I can do both, in the same place at the same time.  So whatever works best for my application, I can do, and I find that very exciting.







Saturday, April 15, 2017

JSON Functionality in SQL Server

SQL Server 2016 gives us the ability to work with JSON data directly in SQL Server, which is a very useful.  Traditionally, we've thought of relational databases and NoSQL databases as distinct entities, but with databases like SQL Server implementing JSON functionality directly in the database engine, we can start to think about using both relational and no-sql concepts side by side, which opens up a lot of exciting possibilities.

There is enough to cover that I will split this topic into multiple blog posts.  This is post number one, where I will cover how to store JSON in your tables and how to interact with scalar values.  The other posts are (will be updated as I go):

You can find the other parts of the series here (I'll update as I go)

  1. Accessing scalar properties on JSON objects stored in SQL Server (this post)
  2. Accessing array data in JSON objects stored in SQL Server


Also, if you want to recreate the database I have here and try the samples out for yourself, you can find everything you need on Github ar:



So lets take a look at what we can do.

Storing JSON Data in a Table
SQL Server does not contain a dedicated data type for JSON data, you simply just store your data in a VARCHAR (or NVARCHAR) data type of the appropriate size.

Below is a table that I have defined that is going to hold some weather data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE WeatherDataJson
(
    ObservationId     INT IDENTITY(1,1)  NOT NULL,
    StationCode       VARCHAR(10)        NOT NULL,
    City              VARCHAR(30)        NOT NULL,
    State             VARCHAR(2)         NOT NULL,
    ObservationDate   DATETIME           NOT NULL,
    ObservationData   VARCHAR(4000)      NOT NULL,
    CONSTRAINT PK_WeatherDataJson
        PRIMARY KEY (ObservationId)
)

Our JSON will be stored in the column named ObservationData.  The data type for this field is simply a VARCHAR(4000) field, which is sufficient to hold the JSON objects we''l be storing in it.  If you have larger objects, you can use a VARCHAR(MAX) field, and indeed we'll se an example of this later.

This is a sample of the JSON we'll be storing in this column.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "stationCode": "04825",
  "location": {
    "city": "Appleton",
    "state": "WI",
    "stationName": "OUTAGAMIE CO RGNL AIRPORT"
  },
  "observationDate": "20160701",
  "observationTime": "1245",
  "observationDateTime": "2016-07-01T12:45:00",
  "skyCondition": "SCT045",
  "visibility": 10,
  "dryBulbFarenheit": 66,
  "dryBulbCelsius": 19,
  "wetBulbFarenheit": 55,
  "wetBulbCelsius": 12.6,
  "dewpointFarenheit": 45,
  "dewpointCelsius": 7,
  "relativeHumidity": 47,
  "windSpeed": 7,
  "windDirection": "360",
  "stationPressure": 29.13,
  "seaLevelPressure": null,
  "recordType": " ",
  "hourlyPrecip": null,
  "altimeter": 30.11
}

So we see we have a number of weather related properties off of the main object and a nested object for the location.  Our data in this column may be for different cities, but it all has this same format, which sets us up for the next step, querying the data.

Querying Scalar Values in JSON Data
First lets look at if we run a plain old SQL Query what we get back.   Here is our initial query:

1
2
3
4
5
6
SELECT * 
    FROM WeatherDataJson
    WHERE City = 'Appleton'
        AND State = 'WI'
        AND ObservationDate > '2016-07-01'
        AND ObservationDate < '2016-07-02'

And our results:

We can clearly see there is JSON stored in the ObservationData column.  But now we want to do something with it.  Lets pull out the temperature and humidity readings from each observation and make those columns in our result set.

To do this, we use a new function in SQL Server called JSON_VALUE.  JSON_VALUE takes two arguments:

  • A JSON expression, which is typically a the name of a column that contains JSON text, but could also be a T-SQL variable containing JSON.
  • A path expression, which describes how to navigate to the scalar value you want to extract out of the JSON.
Path expressions start with the $ sign and then use dot notation to navigate to the property that you want the value of.  So if I wanted to get the station name of where the measurement was made in our JSON above, I would use the expression '$.location.stationName'.  To get the temperature and humidity, we'll use the expressions '$.dryBulbFarenheit' and '$.relativeHumidity' respectively.

So now, we are going to rewrite our query to extract these fields out as columns like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT 
        ObservationId,
        StationCode,
        City,
        State,
        ObservationDate,
        JSON_VALUE(ObservationData, '$.dryBulbFarenheit') As Temperature,
        JSON_VALUE(ObservationData, '$.relativeHumidity') As Humidity,
        ObservationData
    FROM WeatherDataJson
    WHERE City = 'Appleton'
        AND State = 'WI'
        AND ObservationDate > '2016-07-01'
        AND ObservationDate < '2016-07-02';

And here are our results.

We see that SQL Server used the JSON_VALUE function to dynamically pull those values out of our JSON so we can see those as columns in our result set.  And now we can interact with them like we would any other column in a table.  So we can keep this data stored as JSON, but interact with it in a relational manner when that makes sense, which is pretty cool.  Traditionally, we would have needed an ETL process that would have parsed this data out and placed each element in a column, but now we can do that on the fly.

The Power of Virtual (Computed) Columns
With our weather data, it would be pretty cool if I didn't have to write that syntax each time I queried the table but could just get a couple of these key fields like temperature and humidity.  By combining the JSON_VALUE function with SQL Server's computed columns feature, we can create a set of virtual columns on our table which will do just that, pulling the data out of the JSON each time so it appears as just another column on the table.  Here is what the syntax looks like.

1
2
3
4
5
ALTER TABLE WeatherDataJson
    ADD Temperature AS JSON_VALUE(ObservationData, '$.dryBulbFarenheit');

ALTER TABLE WeatherDataJson
    ADD Humidity AS JSON_VALUE(ObservationData, '$.relativeHumidity');

Now, if we enter our original SELECT * query against the table, we will see that these two columns just show up in our result set like any other columns.  And of course, we could also refer to them by name, either in a SELECT clause, WHERE clause or JOIN condition.  The function just like any other column, even though what is happening is real time, SQL Server is parsing the JSON and retrieving this value for us.

That is an important point if we want to use these columns in a WHERE clause or a JOIN condition.  We know that for a table of any size, if we query the table by a column that is not indexed, our performance is going to be very slow because SQL Server has to read all of the rows of the table from disk and find the columns we are looking for.  Here, our results would probably be even worse if we tried to filter by one of these JSON virtual columns, because not only is SQL Server going to have to scan the whole table, it will also have to parse the JSON for each row.  Not good!

Indexing Computed Columns
However, SQL Server allows you to create an index over a computed (virtual) column.  When you do this, the values of the computed columns are computed at index creation time and then persisted in the index so they can be searched quickly like values of any other column.  In this way, but using the JSON_VALUE function in conjunction with a computed column and an index over that computed column, we can quickly search values in our JSON data like we would any other column in our table.  Lets look at an example.

Imagine we redefined our table from above to be about as simple as could be, a surrogate key ObservationId for the primary key and a column to hold our JSON data so that our CREATE TABLE statement now looks like this.

1
2
3
4
5
6
7
CREATE TABLE BasicWeatherDataJson
(
    ObservationId     INT IDENTITY(1,1)  NOT NULL,
    ObservationData   VARCHAR(4000)      NOT NULL,
    CONSTRAINT PK_BasicWeatherDataJson
        PRIMARY KEY (ObservationId)
)

This is about as simple as storing data can get.  All of our data is inside the ObservationData  column.  This includes fields that we will probably want later in order to look the data up by, things like the observation time, the city, the state and the station code.  We could write a query like this...


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT 
        ObservationId,
        JSON_VALUE(ObservationData, '$.stationCode') As StationCode,
        JSON_VALUE(ObservationData, '$.location.city') As City,  
        JSON_VALUE(ObservationData, '$.location.state') As State,
        CONVERT(datetime2(3), JSON_VALUE(ObservationData, '$.observationDateTime'), 126) As ObservationDate,  
        JSON_VALUE(ObservationData, '$.dryBulbFarenheit') As Temperature,
        JSON_VALUE(ObservationData, '$.relativeHumidity') As Humidity
    FROM BasicWeatherDataJson
    WHERE
        JSON_VALUE(ObservationData, '$.location.city') = 'Appleton'
        AND JSON_VALUE(ObservationData, '$.location.state') = 'WI'
        AND CONVERT(datetime2(3), JSON_VALUE(ObservationData, '$.observationDateTime'), 126) > '2016-07-01'
        AND CONVERT(datetime2(3), JSON_VALUE(ObservationData, '$.observationDateTime'), 126) < '2016-07-02'

Not exactly the paragon of simplicity.  Further, the execution plan confirms, this query is doing a full table scan to go and find the data, so it is both slow and resource intensive.


So first we are going to create some virtual columns to one, make the table easier to work with and two, so that we can create an index in the next step.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
ALTER TABLE BasicWeatherDataJson
    ADD StationCode AS JSON_VALUE(ObservationData, '$.stationCode');

ALTER TABLE BasicWeatherDataJson
    ADD City AS JSON_VALUE(ObservationData, '$.location.city');
     
ALTER TABLE BasicWeatherDataJson
    ADD State AS JSON_VALUE(ObservationData, '$.location.state');

ALTER TABLE BasicWeatherDataJson
    ADD ObservationDate AS CONVERT(datetime2(3), JSON_VALUE(ObservationData, '$.observationDateTime'), 126);

ALTER TABLE BasicWeatherDataJson
    ADD Temperature AS JSON_VALUE(ObservationData, '$.dryBulbFarenheit');

ALTER TABLE BasicWeatherDataJson
    ADD Humidity AS JSON_VALUE(ObservationData, '$.relativeHumidity');

And now for the indexes.

1
2
3
4
5
CREATE INDEX IX_BasicWeatherJson_ObservationDate_StationCode
    ON BasicWeatherDataJson (ObservationDate, StationCode);

CREATE INDEX IX_BasicWeatherJson_ObservationDate_State_City
    ON BasicWeatherDataJson (ObservationDate, State, City);

When you create these indexes, you will get a warning that you may be exceeding the maximum non-clustered index key size, because theoretically you could be pulling a very large string out of the JSON.  If you have larger strings, you want to make sure that these will fit under the 1700 byte limit for the key size of an index, and in our case, we are well below that, so we are fine to continue.

So now, we can rewrite our query so it looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT 
        ObservationId,
        StationCode,
        City,  
        State,
        ObservationDate,  
        Temperature,
        Humidity
    FROM BasicWeatherDataJson
    WHERE
        City = 'Appleton'
        AND State = 'WI'
        AND ObservationDate > '2016-07-01'
        AND ObservationDate < '2016-07-02';

This is much simpler and performs much better, as now the query is using one of our indexes.  So again, pretty cool that we can not just reach into the JSON to grab the values out, but we can also index any key properties we have so we can quickly search through the data that we have.

Summary
So there are the basics of storing JSON in SQL Server and interacting with scalar values.  in part 2, we'll take a look at interacting with arrays that may be stored in your JSON data.

Friday, April 14, 2017

SQL Server Temporal Tables

One of the newest and most useful features of SQL Server is Temporal Tables, which are available in SQL Server 2016 and SQL Azure.  In a nutshell, temporal tables give you a simple way to capture all of the changes that are made to rows in a table.  Of course you could do this in prior versions of SQL Server by defining your own history table and trigger, but now this functionality is built directly into SQL Server.

Why is this useful?  Have you ever had to go back and audit when a piece of data changed in a table?  Maybe you have needed to see the history of all of the changes to a certain record in the table?  Or needed to recreate the table as it was at a certain point of time.  If any of these apply, then temporal tables can be a big help.

Defining a Temporal Table
The syntax for defining a temporal table is straightforward and shown below.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 CREATE TABLE Contacts
 (
     ContactID        INT IDENTITY(1,1)      NOT NULL,
     FirstName        VARCHAR(30)            NOT NULL,
     LastName         VARCHAR(30)            NOT NULL,
     CompanyName      VARCHAR(30)            NULL,
     PhoneNumber      VARCHAR(20)            NULL,
     Email            VARCHAR(50)            NULL,
     ValidFrom        DATETIME2(3) GENERATED ALWAYS AS ROW START,
     ValidTo          DATETIME2(3) GENERATED ALWAYS AS ROW END,
     PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo), 
     CONSTRAINT PK_Contacts PRIMARY KEY (ContactId)
 )
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ContactsHistory));

Lines 2 through 8 contain our normal column definitions for the table, in this case, a simple table to manage contact information.

We then need to define two columns that represent when the column is start and end times that the row is considered active, and this is done on lines 9 and 10 with the ValidFrom and ValidTo columns.  You can use any names for these columns you want, its just a good idea to make sure the names convey what these columns represent, the time the row is active.  These columns will have a DATETIME data type, and in this case I am using a DATETIME2(3) data type so we will store these values down to millisecond precision.  Finally, you need to include the text GENERATED ALWAYS AS ROW START and GENERATED ALWAYS AS ROW END for your respective start and stop columns.  This signals to SQL Server to automatically populate these columns with the UTC value of the system time when a row is inserted, updated or deleted.

On line 11, we have some syntax that indicates that our ValidFrom and ValidTo columns represent a period in time.  Finally, at the end of our table definition on line 14, we tell SQL Server that we need to have System Versioning of this table on and give SQL Server the name of the history table that we want to use.

If you don't want users to see the ValidFrom and ValidTo columns, you can hide these columns like so:

1
2
3
4
5
ALTER TABLE Contacts   
   ALTER COLUMN ValidFrom ADD HIDDEN;   

ALTER TABLE Contacts   
   ALTER COLUMN ValidTo ADD HIDDEN; 

If you change your mind and want to have them show up again, then simply do this.

1
2
3
4
5
ALTER TABLE Contacts   
   ALTER COLUMN ValidFrom DROP HIDDEN;   

ALTER TABLE Contacts   
   ALTER COLUMN ValidTo DROP HIDDEN;  
That just drops the hidden attribute, not the column itself.

When you execute this statement, SQL Server will create both this table and the specified history table.  Where things get interesting is when you start inserting and updating data in your table.

Working With Temporal Tables
You insert, update and delete data in your temporal table just like you would any other table.  The only thing you need to know is that you do not specify values for the ValidFrom or ValidTo columns.  SQL Server will take care of populating these columns for you.

So what happens when we perform DML operations against out temporal table?


Operation Description
INSERT The row will be inserted into the primary table (in this case contacts).

The ValidFrom column is populated with the current system time in UTC

The ValidTo field is populated with '9999-12-31 23:59:59.999' (the system max time)

No entry is made in the history table.

UPDATE The existing row will be moved from the primary table to the history table.  When this is done, its ValidTo time is populated with the current system time in UTC.

The existing row is then replaced by the new row in the primary table.  The new row will have a ValidFrom of the current system time in UTC and the ValidTo field will be the system max time.

DELETE The existing row will be moved from the primary table to the history table with its ValidTo time populated as the current system time in UTC

No row will be present in the primary table, since the row has been deleted.


Querying Data
SQL Server also gives us some new constructs to query data out of our temporal tables and see what has changed in time or what the state of the data was at any point in time.  What is nice about these constructs is that we don't have to query the primary table and the history table individually and then union them together in order to get a complete view of our data over time.  SQL Server does this for us behind the scenes, leaving us with a much simpler query syntax.  So lets take a look.

First of all, we can query the table normally which will show us all of the rows that are currently active.

1
2
3
-- Plain old SQL
SELECT * 
    FROM Contacts;

No surprises here in terms of the data that comes back, just what rows are currently in the primary table


We can see these rows have been inserted at different times over a few days, and we see that all of the ValidTo dates are set to the system max time.  But more has been going on with this table, so lets check that out.

Show Complete History of a Table
If you want to look at every instance of every row that has been in a table, you can use the FOR SYSTEM TIME ALL clause immediately after your table name in your FROM clause like this:

1
2
3
4
5
-- Gets all records
SELECT *
    FROM Contacts
        FOR SYSTEM_TIME    
            ALL;

Now lets take a look at these results.

So now we see that there have indeed been some changes to the data in our table over time.  We see the 5 active rows that we saw before, but we also see some additional rows.  This view shows that we had two contacts that were deleted, contact number 4 for Garfield Arbuckle and contact number 6, Odie Arbuckle.  In addition, we see that the record for contact number 1 Charlie Brown has changed, and specifically it looks like his phone number changed in the row.  And by looking at the ValidTo date, we can tell when each of these events occurred.


What Did the Data Look Like at a Point In Time?
We can use the FOR SYSTEM TIME AS OF clause to view the data in a table as it appeared in a point in time as follows.

1
2
3
4
SELECT *
    FROM Contacts
        FOR SYSTEM_TIME    
            AS OF '2017-04-12 14:00:00.0000000';

This query will view the table as it was as of April 12, 2017 at 14:00 UTC.  Remember when specifying this time that you need to use UTC time as that is what is contained in the ValidFrom and ValidTo tables and not your system local time.

It should also be pointed out that we could very well attach a WHERE clause to this query to get just a subset of rows or even a single row.  In our case though, here are the results that we get.

This shows us that at this time, records for Sally Brown, Franklin Armstrong and Odie Arbuckle had not even been added yet and that the record for Garfield Arbuckle was still active.  This can be extremely useful when you are trying to recreate a report that was run at a certain time, because you can see exactly what the data was in the table at that instant in time.

Querying All Records Active for a Period of Time
Finally, we have some new syntax which will show us any record that was active during a given time period.

  • FROM <start date> TO <end date> - Shows all rows that were active during this time period, but excludes rows that became active on the boundary
  • FOR SYSTEM TIME BETWEEN <start date> AND <<end date> - Shows all rows active in this time period, including rows that became active on the upper boundary of the time period
  • FOR SYSTEM TIME CONTAINED IN (<start date>, <end data>) - Gets all the rows that opened and closed during the specified time period

Summary
I've found temporal tables to be one of the best additions to SQL Server.  They are super easy to use and being able to go back and look at all of the changes are to your table is extremely helpful.  They are so easy to use that I find almost no reason not to use them.  So next time you need to create a table, think about making the table a temporal table.  The first time you need to look up what has changed about a record, you will be glad that you did.


Further Reading
Official Microsoft Documentation
https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables


Wednesday, November 9, 2016

Why I Avoid Using Hints in my Database Queries

For those who have watched my Pluralsight courses on either Oracle or SQL Server performance tuning, you will notice that I don't talk about database hints in either course.  This is intentional on my part.  I tend not to use hints in any of my production SQL statements and overall I only use hints in very limited cases.

I made the decision not to cover hints in my courses because too often times, I have seen hints used incorrectly, often by someone who didn't understand what the hint was doing.  I've seen cases where someone read about hints "on the Internet" and thought by dropping a hint into their SQL statement, it would act as some sort of magical performance booster for their statement.  Unfortunately, there is no magic going on, and like any technology or technique, the result can actually be a worse outcome if you don't understand what is going on.

So lets explain what a little bit about hints.

The Query Optimization Process
When you submit a SQL statement to any database, a piece of software called the Query Optimizer parses the SQL statement and determines the fastest way to process that statement.  It is the query optimizer that determines if an index can be used or a table scan should be performed.  It also determines how to perform joins between tables and in what order the tables should be joined if you have multiple join conditions.

To determine the most efficient way to process your statement, the optimizer looks at the statistics of the tables involved in the statement, including the total number of rows and the distribution of those rows in the table.  It also looks at the indexes on the table and the number of unique keys in the index and matches all of this data up with the where clauses and join criteria you have specified in your statement.  Using these statistics, the optimizer can estimate the cost of all of the different ways to perform the SQL statement, and it will pick the lowest cost combination of these operations.

All of this usually happens in 100 milliseconds or less.  And optimizers today are really, really good at picking the right execution plan that will result in the fastest way to execute your statement.

What Does a Hint Do
When you supply a database hint, you are taking control of how the statement will execute and taking this control away from the query optimizer.  The problem is that the best way to execute a statement may change over time.  Maybe a table gets more data or the distribution of the data in a table changes.  Under normal circumstances, the query optimizer can adjust to these changes and come up with a new plan which is most efficient for the current data set.

When you provide a hint, you are taking this flexibility away from the optimizer.  So now as things change in your database, the optimizer cannot adjust.  So now, you have a sub-optimal plan, and many times that plan is much less efficient than the plan the query optimizer would have come up with on its own.  The problem is that while you can figure out the right hint for the way the data looks today, you have no way of knowing what the data will look like tomorrow.  Yet by providing a hint, you are really committing to a specific execution plan, even if that plan is wildly inefficient for tomorrows problem.

Is There Ever a Time to Use Hints
Sometimes I am surprised when I get a certain execution plan.  For example, I may have expected the optimizer to use a different index than what it did.  So in my SQL editor, I'll use a hint to look at that different version of the plan.  In almost every case, the plan produced by the optimizer is less expensive than the plan with the hint.  Its not that I don't trust the optimizer, but sometimes being able to contrast the two execution plans helps me understand why the optimizer is making the decisions it is.

The other time to use a hint would be if you were instructed to do so by Oracle or Microsoft technical support.  I've never had to do this, and I'm guessing these instances are very few and very far between.

Summary
I really feel like it is best to avoid using hints in any production SQL statements your application might run.  At first, they may seem magical.  But the optimizers included in today's database products are really, really good.  When you start including hints in your statement, you are taking control away from the optimizer and its ability to find the best plan for the current state of the table.  This almost always results in a less efficient plan and therefore slower running SQL statement.  So I encourage you to avoid using hints in any production SQL you might run.

Monday, November 7, 2016

MKE DOT NET 2016

On Saturday, October 29th, I attended MKE DOT NET in Pewaukee, WI. This is a one day conference focused around .NET organized by Centare. I was fortunate enough to be accepted to speak at the event, which gave me a chance to share my knowledge as well as attend and learn from others.
One of the big draws of this event is the speaker lineup. In the two years the event has existed, Centare has used their connections in the .NET community to get some high profile speakers for the event. In 2015 (which I did not attend) they had uncle Bob Martin give the keynote. This year, the keynote was delivered by Scott Hanselman of Microsoft. In addition, Stephen Cleary and Ben Day were also speaking at this years event. However, there was also a strong contingent of speakers from the Milwaukee and Madison area, so there was a nice balance of well known names with local practitioners. This was one of the first things that I really liked about the event was this balance of different perspectives, and I hope this continues in future years. Below, I'll give a brief review of the talks I attended, my impressions and my takeaways.

 Keynote - Scott Hanselman
The morning started with a an hour long talk by Scott Hanselman discussing how and why Microsoft is evolving .NET to the .NET Core platform.  We read a lot about .NET Core these days, but there is very little written about the reasons why the change is being made.  Mainly, Microsoft wants a much smaller package that someone can get started with coding .NET without having to download a 2 GB package containing Visual Studio and all the other tooling that is necessary to code in .NET today.  He also talked about how while the .NET Core runtime has been released, a lot of the tooling is still in beta.  This is useful to know where the state of everything is as you decide what version to use for new projects.

The one piece of advice I really liked is when he said "know what level of abstraction you are working at and make sure to know how things work one level deeper".  This is great advice, and something I have always believed in.  I think the is a great way to put this and sums things up nicely.  We always want to know what are tools and frameworks are doing behind the scenes, and this statement encapsulates that advice nicely.

Session #2 - Ben Day - Claims Based Security
I was speaking in the first slot, so the first session I attended was Ben Day's session on Claim based security.  I think he did a really good job of summarizing why role based authentication is really too simplistic of a model in today's world.  He described claims based security, and how most systems will fist check if a user is in a role and then check individual claims.  I've started to see more system s move to more of a claims based model, and this talk provided a good overview of how this is implemented in .NET using the identity provider.


Session #3 - Josh Schultz - Working With Humans
This was a talk about how to work with people, and I found a lot of the advice very practical and useful.  I loved the suggestion of having lunch with a coworker you may not get along with very well and talking about anything outside of work.  A lot of his advice was very practical, like considering other perspectives and not making assumptions about what others are thinking.  All things we know, but that we should constantly be reminded of.  I also liked that he stressed that there is a really important aspect to development which is understanding the problem you are trying to solve, and asking lots of questions to make sure you do.  This was a really good talk.  While Josh claimed he didn't have all the answers, he had a lot of good advice in his presentation.

Session #4 - Andrew Bloechl - Building SaaS Products in the Cloud
This talk was really more like a retrospective of an effort to move one of their applications to the cloud.  He went through the decisions that they made, why they made those decisions and some of the decisions that they later reversed.  I really liked this talk because it wasn't the typical overview of a cloud service.  It was real world experience of the ups and downs.  

For example, they encountered problems where the system was very slow for the first user because Azure services would all shut down after 20 minutes of non-use, meaning the whole stack had to be fired up again when the next user hit the site.  There is now a feature where you can keep everything in an 'always up' state, but that isn't something you typically think about from the start.  They also had issues where people created virtual machines, forgot about them and left them running and got a big bill at the end of the month.  Overall, they did really like the cloud solution and the flexibility it brought, but it was good to hear about some of the challenges they had to overcome.  So often these solutions are presented in such a blue sky manner, its great to see talks like this that remind us there is a learning curve and can help the rest of us get up to speed on that curve.

Session #5 - Samrat Saha - Azure Functions
This was an interesting talk on Azure Functions, which allow you to write a function to be implemented in Azure and not worry at all about how the function is hosted or set up from an infrastructure perspective.  The example the speaker used was a series of functions that could respond to text messages.  He actually talked about how you could chain these functions together by using message queues so that you could keep each function small and concise, which is the point.  He also talked about some of the challenges they had in sharing code between different functions and how they solved that by coming up with a set of shared libraries.  Azure functions is something I want to look at more, so this provided a good introduction to the service.


Other Details
The event was extremely well organized.  When checking in, everyone received a nicely printed program listing all of the talks and what room they were in.  In addition, all of this information was posted on the outside of each room.  Each room had plenty of seating and there was a water station set up in the back

When I went to my room to setup my presentation, they had multiple video adapters up at the podium so you could connect your laptop to the projector.  I carry my own adapters, but this is really thoughtful just in case someone forgot to bring their adapter.  There was a staff member in each room to help with any setup issues as well.  Finally, each room was equipped with a wireless mic.  From a speakers perspective, the room setup was one of the best that I have seen.

Both breakfast and lunch was provided at the event.  In addition, they had a social event afterwards where everyone received two drink tickets plus a nice commemorative glass from the event.  Way better than a t-shirt, which all of us already have zillions of.  Now I have a nice glass that I can think back to the good memories any time I use it.

The cost to attend was $99.  While this might be a detriment to some, I think the event was well worth it.  Some first rate speakers, a nice facility, meals and first rate organization.  I don't have any connection to Centare other than speaking at this conference, but I think they are really to be commended for how well the event was put together.  If you are in the upper Midwest next fall, be sure to watch for when the event is announced, because I really think this event is worth attending.