being silly with spatial data

Okay, a fun post with no educational value (sort of). My Adopted Twin, also a Database geek, has been under the weather recently so to cheer her up I have been playing with spatial data, under her orders to create this game. After reading Purple Frogs spatial data blog (http://www.purplefrogsystems.com/blog/2011/05/sql-server-art-using-spatial-data/) I’ve created a simple game for her.

It is basically Rock, Paper, Scissors however I added the “famous” Big Bang Theory version ‘Lizzard’, ‘Spock’ to it as well ( and just when you thought it couldn’t get geekier).

So the break down of it is a random number generator that creates a number between 0 and 2 (or 4 with RPSLS) inclusive and then a vector line art pattern based on the selected number is created in to the Table variable and finally the selected picture is shown in the spatial tab.

RockPaperScissors

(created as a doc file but should be opened up in SQL Server)

Enjoy

Posted in spatial data, SQL | 2 Comments

Schema validation using Reg Ex

One of the most powerful tools that XML in SQL Server brings to the table is Regular expresions. In XML Schemas Regular Expressions are an easy method to validate your data without having to bolt on complex CLR functions to you database.

Consider a UK Postcode they are constructed in a known pattern.

MK13 3XZ

The first part of a post code is either 1 or 2 letters in the range of a to z or (to be pedantic, as with databases you have to) A to Z.  This is followed by a number in the range of 1  to 99 or for use of regex’s 2 numbers in the range of 1 to 9.  The two parts of thepostcode should be seperated with a space.

The second part of the postcode is constructed of a single number in the range 0 to 9 and then two letters a to z or A to Z.

Admittedly there are other ways to use sql server to confirm that the data is a valid post code my favourite being running a query against all valid post codes but for a first parse a regEx could be the way forward.

A RegEx for a postcode could be “[a-zA-Z]{1,2}[0-9]{2} [0-9][a-zA-Z]{2}”

There are lots of articles on the web for the construction of RegEx’s and this is a small blog article so i won’t go into the deconstruction of the regEx here.

The regular expresion can be put into the XML schema in the following fashion

CREATE XML SCHEMA COLLECTION Postcode AS
N'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="Address">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Postcode" minOccurs="0">
        <xsd:simpleType>
         <xsd:restriction base="xsd:string">
          <xsd:pattern value="[a-zA-Z]{1,2}[0-9]{2} [0-9][a-zA-Z]{2}"/>
         </xsd:restriction>
       </xsd:simpleType>
     </xsd:element>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>'
GO
DECLARE @xml XML(DOCUMENT Postcode)
SET @XML = N'<Address><Postcode>NP32 4Df</Postcode></Address>'

The above XML can be loaded into the Schema’d variable. and be validated. however the xml fragment of

DECLARE @xml XML(DOCUMENT Postcode)
SET @XML = N'<Address><Postcode>3NP2 4Df</Postcode></Address>'

will automatically be rejected.

In summary you can can use the xml regex to automatically validate data with a known sting format before the data is processed (i.e. in the loading stage of SQL Server).  This can be useful if you are trying to load large volumes of data from xml or you want to keep all you validation conditions in one area.

Posted in SQL, XML | Leave a comment

XML .value simple syntax

There are several ways to get .value to extract an item of information from xml.  in the following article I will show some of the syntax and reasons why some methods are better than others.

All the calls in this post will all reference the following blob.  This is a simple blob for demonstration purposes only.

<xmlblob>

<personlist>

<person>Phil</person>

<person>Stina</person>

</personlist>

</xmlblob>

or

The Following 3 methods of .value calls all work and will all bring back the same value (which is the first person element in the blob)

DECLARE @xml xml = N'<xmlblob><personlist><person>Phil</person><person>Stina</person></personlist></xmlblob>’

SELECT @xml.value(‘(xmlblob/personlist/person)[1]’,‘varchar(50)’)

 SELECT @xml.value(‘(xmlblob/personlist/person/text())[1]’,‘varchar(50)’)

 SELECT @xml.value(‘(xmlblob[1]/personlist[1]/person[1])’,‘varchar(50)’)

 The first query is potentially the kind you would want to have if you were using typed xml via a schema,  however in this case without a schema the second query is far better especially with reference to the Query plans generated. this is due to the /text() on the end of the element name that directs the xml to be processed as text.

100% to untyped xml, 0% for typed xml

The third option is interesting as it is the slowest to run under the above conditions and would normally be avoided however the interesting thing with this syntax is that when other elements are accessed the query is seemingly scaled.

The following queries look at the second element in the person list.

SELECT @xml.value(‘(xmlblob/personlist/person)[2]’,‘varchar(50)’)

 SELECT @xml.value(‘(xmlblob/personlist/person/text())[2]’,‘varchar(50)’)

 SELECT @xml.value(‘xmlblob[1]/personlist[1]/person[2]’,‘varchar(50)’)

  Still the second method remains the best with the third following closely behind. 

 Improvements can be made to the third query by using the text() method as in the second query, but the QP cost is still above that of the second query.

Posted in SQL, XML | Leave a comment

Reducing size of a nonclustered index with Filtering

SQL Server 2008 has a very nice new feature that enables a partial index to be created based on a specific where clause.

I have recently used this as an solution checking a table that contained an isSent flag. by creating a non-clustered index on the condition of ‘isSent’ = 0 (where 0 means not sent) I was able to reduce the scans on the clustered index. previously this index was scanned every time an email delivery system was polled.  This lead to some considerable savings.

The three examples below are taken from the same query that was run with different mechanisms. Each Table is identical with data having 96% of rows having the ‘isSent’ flag set to 1.  there were approx. 200000 rows in the table. In each example only the non clustered indexes changed.

the first table emailqueuev1 had no nonclustered index index so the table was scanned on each search, the second had a non-clustered index created on the ‘isSent’ field and contained a reference to each row in the clustered index. The third contains a filtered non-clustered index this only contained rows from the clustered index that fitted the condition ‘isSent’ = 0

The following queries were then run on the seperate tables.

Select * from emailqueuev1 where isSent = 0
Select * from emailqueuev2 where isSent = 0
Select * from emailqueuev3 where isSent = 0

 

As you can see the first index scans the entire table due to the size of this table it means there is a query cost 20 times the other 2 methods. The second seeks the non-clustered index and the third also seeks the nonclustered index both of these methods are fairly light weight.  There is a small improvement between the second and third queries in processing.  This improvement will only get better in time as the size of the clustered index increases

The major saving is from storage space.  The second index is comprised of 203 pages of data.  This could increase dramatically in size if you created included columns within the index

.

The third Index is comprised though of only 2 pages (16kb) of data.

The beauty of the filtered index is once the isSent flag is changed the reference is removed from this small index.  So it never gets too large.

Posted in database performance, filtered indexes, non-clustered Indexes, SQL, Uncategorized | Leave a comment