Sunday, April 23, 2017

JSON Functionality in SQL Server - Part 2

This is the second part of a series on JSON functionality in SQL Server.  This post will focus on accessing data in arrays that may be part of your JSON data.

You can find the other parts of the series here (I'll update as I go)

  1. Accessing scalar properties on JSON objects stored in SQL Server
  2. Accessing array data in JSON objects stored in SQL Server (this post)


Also, if you want to recreate the database I have here and try the samples out for yourself, you can find everything you need on Github ar:



In the first part of this series, we had a simple table that contained weather data, and each row of the table contained a single JSON object with the weather observations.

The JSON object in the ObservationData column contains all scalar properties.  That is, if we want a value, we can just write a path expression and use the JSON_VALUE function to get the data element we want back.  But life is not always so simple.  What if instead of a single object, we had an object that contained an array of values or an array of objects.  How would we interrogate the JSON data in our column to get access to those values?

Lets consider the following table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE DailyWeatherDataJson
(
    ObservationId int IDENTITY(1,1)    NOT NULL,
    StationCode       VARCHAR(10)      NOT NULL,
    City              VARCHAR(30)      NOT NULL,
    State             VARCHAR(2)       NOT NULL,
    ObservationDate   DATE             NOT NULL,
    ObservationData   VARCHAR(max)     NOT NULL,
    CONSTRAINT PK_DailyWeatherDataJson
        PRIMARY KEY (ObservationId)
);

This is very similar to the table we had before with the following differences:

  • There will be one row in the table for each weather station per day
  • The ObservationData column will now hold a JSON object that contains an array of all of the observations for that day
  • Since our JSON will be larger, we have changed to a VARCHAR(MAX) data type to store our JSON
The JSON in the ObservationData column will look like this:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
{
  "stationCode": "14819",
  "observationDate": "2016-07-15T00:00:00",
  "weatherObservations": [
    {
      "stationCode": "14819",
      "location": {
        "city": "Chicago",
        "state": "IL",
        "stationName": "CHICAGO MIDWAY INTL ARPT"
      },
      "observationDate": "20160715",
      "observationTime": "0053",
      "observationDateTime": "2016-07-15T00:53:00",
      "skyCondition": "FEW070",
      "visibility": 10,
      "dryBulbFarenheit": 72,
      "dryBulbCelsius": 22.2,
      "wetBulbFarenheit": 64,
      "wetBulbCelsius": 17.5,
      "dewpointFarenheit": 58,
      "dewpointCelsius": 14.4,
      "relativeHumidity": 62,
      "windSpeed": 7,
      "windDirection": "280",
      "stationPressure": 29.26,
      "seaLevelPressure": null,
      "recordType": " ",
      "hourlyPrecip": null,
      "altimeter": 29.92
    },
    {
      "stationCode": "14819",
      "location": {
        "city": "Chicago",
        "state": "IL",
        "stationName": "CHICAGO MIDWAY INTL ARPT"
      },
      "observationDate": "20160715",
      "observationTime": "0153",
      "observationDateTime": "2016-07-15T01:53:00",
      "skyCondition": "FEW070",
      "visibility": 10,
      "dryBulbFarenheit": 71,
      "dryBulbCelsius": 21.7,
      "wetBulbFarenheit": 63,
      "wetBulbCelsius": 17,
      "dewpointFarenheit": 57,
      "dewpointCelsius": 13.9,
      "relativeHumidity": 61,
      "windSpeed": 8,
      "windDirection": "280",
      "stationPressure": 29.26,
      "seaLevelPressure": null,
      "recordType": " ",
      "hourlyPrecip": null,
      "altimeter": 29.92
    },
    ...
  ]
}

The '...' is meant to represent that in almost all cases, we'll have more than two elements in the weatherObservations array, since most National Weather Service stations (where I downloaded this data from) take observations at least every hour, if not more often.

Accessing Individual Elements with the JSON_VALUE Function
If we know we want to access a value on an individual element of the array, we can do that using the JSON_VALUE function we saw in the last post.  The following SQL statement will get the temperature reading for the first element of the array that represents readings for Chicago on July 15, 2016.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT 
        ObservationId,
        StationCode,
        City,
        State,
        ObservationDate,
        JSON_VALUE(ObservationData, '$.weatherObservations[0].dryBulbFarenheit') As Temperature,
        JSON_VALUE(ObservationData, '$.weatherObservations[0].observationDateTime') As ObservationTime,
        ObservationData
    FROM DailyWeatherDataJson
    WHERE City = 'Chicago'
    AND State = 'IL'
    AND ObservationDate = '2016-07-15';

And here are the results

This is useful, but we are still pretty limited.  We don't really want to go element by element through the array like this, and besides, we aren't even sure how many elements the array will have.  We need to be able to work with the array as an array, so lets look at a way to do that.

Accessing The Array With the JSON_QUERY Function
What I'm going to do here is jump ahead to the solution and then explain what is going on.  Our goal is to get back each element of the JSON array and basically turn it into a row in our result set such that we can operate on the data like it was in a normal relational format.  So lets see the answer:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
SELECT 
        ObservationId,
        StationCode,
        City,
        State,
        ObservationDate,        
        CONVERT(datetime, JSON_VALUE(Observations.[Value], '$.observationDateTime'), 126) As ObservationTime,
        JSON_VALUE(Observations.[Value], '$.dryBulbFarenheit') As Temperature,
        JSON_VALUE(Observations.[Value], '$.relativeHumidity') As Humidity,
        JSON_VALUE(Observations.[Value], '$.windDirection') As WindDIrection,
        JSON_VALUE(Observations.[Value], '$.windSpeed') As WindSpeed
    FROM DailyWeatherDataJson d
    CROSS APPLY OPENJSON(JSON_QUERY(ObservationData, '$.weatherObservations')) Observations
    WHERE City = 'Chicago'
    AND State = 'IL'
    AND ObservationDate = ('2016-07-15');

Lets start on line 13, because this is where the magic starts happening.

The JSON_QUERY function lets us pull back a JSON array or object inside of another JSON object as a String, so we use this with the path expression '$.weatherObservations' to pull just the array of observations out of our larger JSON object.

Then, we use the OPENJSON table valued function which will parse the JSON result we have into a rowset.  And then the CROSS APPLY operator is used so we can join these results back to our main data set.

So now we basically have a row for each individual observation in the JSON array that was in our original JSON object.  And in each row, we have an individual JSON object that represents that individual observation.  That object is in the Value property of the Observations alias, and now we can access properties on that object using JSON_VALUE like we did before, and we see this on lines 7 through 11.  So basically what we have done is get all of the array elements out of the array and turn them into a JSON object that we have a row for, and now, we can extract data out of that individual JSON object like we saw in the last post.

Yes, there is a lot going on here.  But this is just a pattern.  When you have a JSON object that has an array in it, you just look up this pattern, plug in the appropriate pieces and apply it.  And what is really cool is that now data that is down in a JSON array, we can interact with just like it was stored as normal rows and columns in our database.  So we can apply filters in the form of a WHERE clause, join to values, use aggregate functions, whatever we need to do and are familiar with using SQL.

Putting a View Over the Top
I think most people would probably agree, they don't want to have to memorize that syntax or write it every time.  So the answer is just to put a view over the top.  The data is still stored as JSON, but you can write traditional SQL that we are all familiar with in order to interact with the data.

I want to come back to one of the themes that I started the last post off with.  A lot of times today, the choice sees to be presented to us that you are either in a relational world or a NoSQL world.  The two camps strongly disagree with each other and if you do it the other way, someone thinks you are dead wrong.

What I take from this functionality in SQL Server is that really within one database, I can have a foot in both worlds.  If I want to act in a NoSQL way and just store JSON, sure, I can do that.  But if I need to interact with that data in traditional SQL ways, I can do that too.  So the question isn't either/or.  In fact, I don't even think there is a question.  The answer is I can do both, in the same place at the same time.  So whatever works best for my application, I can do, and I find that very exciting.







No comments:

Post a Comment