My Data Day at Build – Azure Elastic Databases, Azure Search, IoT, Azure Stream Analytics, and more

[This post is by Jeff Mlakar, a member of the Business Intelligence Team at Bennett Adelson.  Follow us @BIatBA and @JeffMlakar]

Today was Day 2 for me at Microsoft Build. And it was all about DATA. From the new Azure SQL Database Elastic Databases, to Azure Search, to Big Data, to Azure Stream Analytics, and Azure Machine Learning. All data. And, more amazingly, all data in the cloud. I remember when Azure’s data offerings were limited to their blob and table storage and the beginnings of SQL Server Data Services. To see how much the data offerings in Azure have exploded is surprising and exciting. Even today’s keynote was heavy on Azure Machine Learning. Analyzing mapped human genomes in R and exposing that algorithm as an api in the cloud so anyone with a (now surprisingly accessible) mapped human genome can create a heat map of their health risks shows how what is happening in data analytics can really make personal differences in our lives. And, I gotta say, when thinking about IoT and ML, I didn’t see cow pedometer artificial insemination coming. “AI meets AI”…

My first session:

Modern Data in Azure

Presented by Lance Olson, et al. In many ways, it was perfect that this session kicked off my day of data. This demo was presented as a tour of Azure data offerings, primarily from a developer point of view. As in, it wasn’t just an explanation of different data storage types in Azure, as I was expecting. They built a web app and brought in each form of data technology as it was needed for the application. A nice approach. The app was called the WingTip Ticketing application, and would be expanded on in the next session to be a SaaS offering. The first data offering being added to the ticketing application was:

Azure SQL Database Elastic Database Pool

The ticket ordering was to be handled by a relational database, Azure SQL Database. The argument for using the brand new Elastic Database Pool (announced today and in preview) was that it would make sense to logically and physically partition the database by artist, as some artists might naturally have far more load than others, depending on demand. They demonstrated ticket sales loads against a standard database and against a newly sharded Elastic Database Pool, sharded by artist. The load was measured in DTUs, or Database Throughput Units. I’ll explain this more in a bit as it was heavily covered in the next session. I was impressed by how performance could be increased, but I was more impressed with how easy it all was to configure. This includes setting up the sharding strategy, and integrating the Elastic Database Client Library into the web application code. Elastic Databases were covered thoroughly in the next session, so I’ll come back to them then.

Azure Document DB

They used Azure’s NoSQL document database service, Azure Document DB to handle ratings and reviews, with the thought that this data would be largely unstructured. For those who don’t work with document databases, you can basically think of a document as a record in a table whose schema is fluid. This way when new data is added, no schema changes are needed, just updates to the Model, View, and Controller in the web. Documents could be created with the code:

Documents doc = Client.CreateDocument(Collection.SelfLine, entity);

Even SQL-like queries could be created using


Azure Search

Azure Search was utilized to provide users with an intuitive search box in the ticketing application. Azure Search is a fully managed Search-as-a-Service in the cloud. It reminds me of working with ElasticSearch in the way you set up indexes, analyzers, and suggestions, though since we are in the Azure cloud it is FAR easier to provision and set up. Once an index and indexer was set up and data populated, wiring up the search was easy using the namespace:


And using the SearchIndexClient for operations like:


They showed the use of better scoring in Search, as well as suggestions. They didn’t demo any highlighting of hits capability, but I asked them afterwards and they said this was available.

Apache Storm for HD Insight

Some Big Data work was then done for the interesting example of upping the search results score based on number of recent tweets. They used Apache Storm on HDInsight with a spout to twitter based on a hashtag of a fictional music star. They bolted this to our Azure Search index and then had us in the audience tweet to the hashtag. When the hard-coded number of 10 tweets for the hashtag was met, that artist’s score would increase in the Search results. A compelling example.

All-in-all a great tour of many newer Azure data offerings. It was like 4 sessions in one.

My next session:

Building Highly Scalable and Available SaaS Applications with Azure SQL Database

Presented by Bill Gibson and Torsten Grabs. This session was more of a deep-dive into the new elastic capabilities of Azure SQL Databases, like I mentioned before. For me, I kept trying to get my head around how this was different from Federations. I’m starting to get the idea now, though, that this is not just a logical separation, but a strong data sharding strategy that can handle predicted and unpredicted loads while saving you from having to write all the routing code that you had to with Federations.

We’re back in the WingTip Ticketing application (a tongue-twister name all presenters were having trouble with). This time we’re making the application to be a SaaS offering, with different customers using the service for their own ticketing pages. We’re shown how to set up our elastic databases in PowerShell scripts. We create a database, a database per customer, we register these databases with the ShardMap, and then we add our customers to traffic manager rules. What we end up with is a collection of customer databases and a common customer catalog database. Not that different from a Federation, but without the usual bottlenecks.

Establishing the connection is achieves as follows:

SqlConnection conn = SaasSharding.GetCustomerShardmap.OpenConnectionForKey(

passing in the customer’s key.

We’re shown how we can scale our databases’ min and max DTUs via the portal (see pic below), PowerShell, Rest APIs, or T-SQL.


I mentioned DTUs (Database Throughput Units) before, so let me elaborate on at least my current understanding of what it means. Of 4 dimensions of performance: Reads, Writes, ComputeCPU, and Memory, a DTU is the max value of these 4, after they have (somehow) been adjusted. I’ll have to read their whitepaper sometime to see exactly how this is done.

In the end, it all looks good. Easy to manage and with the possibility of handling unpredictable usage.

Building Data Analytics Pipelines using Azure Data Factory, HDInsight, Azure ML, and More

Presented by Mike Flasko. Boy, if ever there was a session that made me feel like I didn’t know anything about data flows, this was it. Azure Data Factories, named so because they resemble a Henry Ford assembly line for data, are a shift for me in my SSIS-centric ideas of data flows. They are a new preview service for modeling and executing the data analytics pipeline. In a visual designer in the Azure portal, we create data sets (be they tables or files), activities (like Hadoop jobs, custom code, ML models, etc), and pipelines (a series of Activities) to complete a data analytics load process.

The Data Set source is defined in a json document. Activities to partition data are done via a Hive script on an HDInsight cluster. We combine and aggregate data in an activity defined by a JSON object. A final activity is used to call an Azure ML scoring activity. We don’t need to know its inner workings. Only the schema of the input and output and how to call the algorithm.

The end result is a process that takes cell phone log data, combines it with our existing customer data, aggregates it, and spits out a data set that says what the probability is of each customer cancelling their service. This end result is then easily (also using the Factory) sent to PowerBI for a lovely dashboard.

This is all still really new to me and I really need to study up on this. I have left over questions like how would you handle workflows for bad data and what is the best way to promote a factory from staging to prod (Mike answered the latter for me after the session: leaving the factory as is and swapping linked services definitions to make the factory run against production). One way or another, the data game is changing, and this session was an excellent introduction to the brave new world.

Gaining Real-Time IoT Insights using Azure Stream Analytics, Azure ML, and Power BI

Presented by Bryan Hollar et al. Azure Steam Analytics was just released to GA 2 weeks ago and this is the first I’ve gotten to see it in action. After seeing case studies from Fujitsu and the Kinect team, ASA implementation was mapped out for us. The shift is from thinking about reporting on data at rest, to data in motion. For example, we could analyze how many twitter users switched sentiment on a topic within a minute in the last ten minutes. SAQL (Stream Analytics Query Language), makes this easy by being a flavor of SQL mixed with temporal extensions. You’re analyzing within a time window, and these windows could be tumbling, hopping, or sliding depending on how you’ve set up your queries. For example:

SELECT Topic, COUNT(*) AS TotalTweets
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY Topic, HoppingWindow(second,10,5)

With Azure Stream Analytics, your data flow pipeline is set to pull from existing event hubs, analyze, and persist (or display) its results. The processed data doesn’t even need to be persisted to be reported on. ASA basically exists in the same place in the data flow pipeline as ML. But where ML would take the “cold path” of analyzing large sets of data at rest, ASA is analyzing the data as it streams. Though now, for a brief preview, ML is integrated into ASA. I’m told you’ll be able to sign up for this preview at the ASA team blog:

As far as integrating the Internet of Things, it’s basically a matter of configuring your event hub in your data pipeline. So, there’s not much difference pulling from twitter or the Internet of Things. You can then configure your output to be PowerBI (also in preview) for a real-time dashboard. The most impressive IoT and ASA example was by Fujitsu who showed an impressive app that geo-mapped energy consumption data and could zero in on spikes right down to an area of a building in real time. Though, now that I think about it, they may be tied with pedometer analysis to tell when your cows were in heat. And now I finally know why they call it a “Heat Map”.

Building Big Data Applications Using Azure HDInsight Service

Presented by Asad Khan. The final session of my day was a tour-de-force of Big Data in Azure. It was four one-hour sessions compressed to one and it was a doozy. Started with 30 seconds of the basics of big data: caring about volume, velocity, variety, and variability of data. They mentioned how Apache Hadoop is an Open Source platform for large amounts of unstructured data, but how the managed infrastructure of Azure makes HDInsight an enticing implementation. They covered HBase for NoSQL, and taught us how to use Storm for streams of data, by showing us how to build spouts for twitter and bolts for Signal R to display data on the web in real time. It was an ambitious session, but pulled off very well.

Final thought:

I’m just now realizing that of everything I’ve mentioned, Big Data with HDInsight is the old man at the ripe old age of a year and a half. Speaks volumes to how much Microsoft is invested in growing its cloud data offerings. Can’t wait to see what’s next!

Leveraging SQL Server Profiler to troubleshoot 18456 Events

Many times I am brought in to assist in troubleshooting strange things that the client can’t identify easily on their own. In this particular occasion I was assisting in supporting a SharePoint solution and SQL Server kept generating the following 18456 event: “Login failed for user ‘NT AUTHORITY\NETWORK SERVICE’. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: <local machine>]”  every minute in the event log. The client was not sure why this was occurring and thought it may have been from an outage they had recently.

Event 18456 - Login failed for user\

A quick web search of the event showed people who had problems with applications accessing a database, but none with this specific account. That is because this is a generic message showing that some account is accessing some database from some computer and doesn’t have the appropriate permissions to do so. Some of that information is provided, however it doesn’t tell us why it is happening. So how do we get more information so that we can suggest the correct path to resolve it?

On the surface, my first impression was that a service was trying to access a database within SQL Server running as the Network Service, and was not permitted to access it. I gathered this from the fact the login was listed as ‘NT AUTHORITY\NETWORK SERVICE’ and the client was defined as local machine, CLIENT: <local machine>. Going with my first thoughts, I opened the Services console and sorted by login to determine the services running as Network Service.


This directed me to what I was pretty sure the problem was. If you look, there are two services related to SQL that were configured to run as Network Service. In addition, the client had all of the other SQL services configured to run with a defined service account, so these were anomalies to not also have been configured in the same manner. While confident this was most likely the source of the event generation, I needed to be sure.

SQL Server Profiler to the rescue!

This is where SQL Server Profiler comes in handy. This is a great tool to give you incite into your SQL environment and what is happening on a transactional basis. You can use it to trace events occurring in SQL to find stored procedures that are having problems, or long running queries, or any number of other problems that you just aren’t sure and need additional view of. In this case, we are looking for failed login attempts.

For this troubleshooting session, I knew that the logged event was only once every minute. This meant that if I configured the trace correctly, I would not be scrolling through a lot of event instances looking for my event. As well, I would not need to capture a lot of data, so outputting the profiler to a database or file wasn’t necessary.

Getting Started: Setting up the Trace.

imageTo get started, open the Start Menu and navigate to Microsoft SQL Server 2008 R2 > Performance Tools > SQL Server Profiler (also available from SQL Server Management Studio under the Tools tab). When you first launch SQL Server Profiler, it will prompt you for the trace properties. the first tab (General) defines the initial properties of the trace. The section ‘Use the Template’ is of most interest of us in this troubleshooting. This defines the most probable list of counters and columns that we want to start with for capturing information in the trace. This is because  the actual amount of information we can choose from is vast and can be overwhelming if this is your first look into tracing or if you are not a seasoned SQL admin. The additional fields for saving the output to a file/database and trace stop time are not relevant to our isolated troubleshooting. However they can be handy when you are trying to find an intermittent problem and want to run a trace for a long time or have a lot of events you are capturing. Again, not relevant in this particular instance.

SQL Server Profiler trace properties

For this troubleshooting let’s start with the Standard (default) template. Once selected, go to the Events Selection tab. This will show you all the events and columns that are selected to be captured and displayed in the trace.


As you can see, we are capturing a lot of additional data that is probably not relevant to what we are looking for. Namely, we were looking for something associated with logins (remember: “Login failed for user ‘NT AUTHORITY\NETWORK SERVICE’…). With that, I removed the events that I didn’t think would be required. I also unchecked columns of data that I didn’t think would help me once I found the appropriate event (I don’t care about which CPU is being used, or the duration, etc…)


Now I could run this trace as-is, and you can even do so just to see the amount of data being captured and the information in the trace session. However, this will not give me the event I was looking for. This is because my specific event is a failed login. This trace will only show me successful logins and logoffs. So how do we get the data I really want?

Finding Audit Login Failed

First, I select Show all events to show all the possible events that I can trace. From the selection above, you will see that Security Audit has some events already selected.


I want to be more specific however. I unchecked the Audit Login  and Audit Logoff events and instead chose Audit Login Failed. This chose all the standard columns but won’t give us all the information we need. For that, I selected Show all Columns.


To troubleshoot I then chose NTUserName, SPID (can’t uncheck that one), ApplicationName, DatabaseName, and Error.


I then clicked Run to start tracing the events. Because this event only triggers once a minute, I only had to wait a short time to see the error captured. As you can see, it was the Report Server (Reporting Services Service) accessing the master database. You can also see that we have the matching 18456 event number.

SQL Server Profiler trace output

With that I had the information needed to take back to the client and inquire more as to why this service might have had access removed (not being defined in SQL security), be misconfigured (changed from a specific login to Network Service as in maybe it was recently added as a feature but misconfigured), or if there was some other explanation.

In this case, it turns out that the engineer troubleshooting an earlier problem wasn’t aware as to the state of the services and set SQL Reporting Services and SQL Integration Services from disabled to automatic and started them in an attempt to resolve a SQL problem that they were having. It didn’t solve their problem, but because they didn’t document their troubleshooting (or perform proper analysis as like above) they left those services running and in a state that caused additional work to troubleshoot and resolve.

While this is a very specific incident and resolution, I hope that this quick view into the SQL Server Profiler gives you an additional tool to properly research errors and resolve your problems. For additional information on the tool, please explore this MSDN link :

Jason Condo, MCITP
Principal Consultant, Systems Management and Operations