Azure Service Fabric: a Platform for Mission Critical Apps – Part 1

Microsoft’s unveiling of HoloLens at Build 2015 caused a lot of excitement. But for me the biggest excitement was something Microsoft released that we can download now. It’s this:

image

This is Preview 1 of Azure Service Fabric. If you aren’t sure what that is, look on the left. Onebox is my laptop, and it is running a local cluster comprised of four applications and five nodes. The applications are named ClusterManagerService, FailoverManagerService, ImageStoreService and NamingService. Sound familiar?

Those four applications are what we commonly refer to as “Azure.” That’s the Azure Microsoft runs in the cloud to host Azure SQL Database, Power BI, Cortana, DocumentDb, Event Hubs, and “many other core Azure services”. That Azure is now running on my laptop.

Which means I can build Azure apps and run them locally on real Azure, not an emulator. I’ll be able to deploy them to Azure on Microsoft’s public cloud, and they’ll run the same there as they do locally. And I’ll be able to deploy them to on-premise data centers or ISV data centers when they run Microsoft Azure Stack on Windows Server 2016.

I haven’t gotten a chance to try a HoloLens yet. But this is plenty of excitement for me. Let’s consider how Azure Service Fabric apps are different from other apps.

Azure Service Fabric applications run in clusters. A cluster is a group of virtual or physical machines, each hosting a collection of isolated processes called nodes. On my laptop a Windows Service called FabricHost.exe is managing the cluster. Each of the five nodes is implemented by a trio of Windows processes running Fabric.exe, FabricGateway.exe and FileStoreService.exe.

image

An Azure Service Fabric application consists of one or more microservices. Each microservice will be deployed in one or more containers on one or more nodes. Microservices run in isolation from each other, and can be either stateless or stateful.

Here is what my Service Fabric Explorer looks like after I have deployed four Service Fabric applications (one from each project template in Visual Studio). Each application contains one microservice, but most of them are deployed on multiple nodes.

image

What can we gain by dividing applications into microservices and running them on clusters of nodes?

First we gain High Availability. When a microservice crashes, the Service Fabric intervenes immediately to redirect traffic to a backup copy of the microservice on a different node. Thanks to the Naming Service, microservices hide their physical locations, so redirection to a new instance happens transparently. Service Fabric then instantiates a new microservice instance to replace the old one. If the failed microservice is stateful, then the backup instance will include its own copy of the state, and the replacement instance will get a copy of the current state as well. By the same means, Service Fabric can quickly recover from the loss of an entire node or the loss of an entire machine in the cluster.

We gain High Scalability. We can have as many instances of our microservices running on as many nodes as we need. We can constantly right-size our deployment to optimize performance while minimizing costs.

The support for stateful microservices brings important advantages, and I think this is one of the big things that sets Service Fabric applications apart from other kinds of Azure applications. By putting state side-by-side with code – by not separating between “service tiers” and “data tiers” – we can dramatically Reduce Latency. Think of Microsoft’s Cortana, an Azure service that finds restaurants and looks up movie times in split seconds. And by packing data and computing together, we can reduce our apps’ hardware footprint and thus further Reduce Costs. And the Programming Experience for developers can become much simpler.

The last big gain I’ll mention now (I’ll talk about many more in coming posts) is support for Rolling Live Upgrades. To deploy a new version of a microservice, you don’t need to stop what’s already in production. Service Fabric will create instances of the new version and silently substitute them for old instances as they become idle, taking care that work underway is never handled by inconsistent versions. This is the same way Microsoft rolls out updates to Azure SQL Database and its other cloud services.

I think these make Azure Service Fabric an excellent platform for building all sorts of apps, but especially mission critical apps, such as

  • Apps that need to run all the time, never going down for planned or unplanned reasons
  • Apps that handle heavy workloads
  • Apps requiring split-second throughput
  • Apps with heavy resource demands that need to be frugal with costs

In this series I will explore Azure Service Fabric in some detail. Watch for more posts exploring the architecture, the tooling, and the application lifecycle on this new but proven platform.

In the meantime you can start your own exploration by downloading the Service Fabric Preview 1 here.

My Data Day at Build – Azure Elastic Databases, Azure Search, IoT, Azure Stream Analytics, and more

[This post is by Jeff Mlakar, a member of the Business Intelligence Team at Bennett Adelson.  Follow us @BIatBA and @JeffMlakar]

Today was Day 2 for me at Microsoft Build. And it was all about DATA. From the new Azure SQL Database Elastic Databases, to Azure Search, to Big Data, to Azure Stream Analytics, and Azure Machine Learning. All data. And, more amazingly, all data in the cloud. I remember when Azure’s data offerings were limited to their blob and table storage and the beginnings of SQL Server Data Services. To see how much the data offerings in Azure have exploded is surprising and exciting. Even today’s keynote was heavy on Azure Machine Learning. Analyzing mapped human genomes in R and exposing that algorithm as an api in the cloud so anyone with a (now surprisingly accessible) mapped human genome can create a heat map of their health risks shows how what is happening in data analytics can really make personal differences in our lives. And, I gotta say, when thinking about IoT and ML, I didn’t see cow pedometer artificial insemination coming. “AI meets AI”…

My first session:

Modern Data in Azure

Presented by Lance Olson, et al. In many ways, it was perfect that this session kicked off my day of data. This demo was presented as a tour of Azure data offerings, primarily from a developer point of view. As in, it wasn’t just an explanation of different data storage types in Azure, as I was expecting. They built a web app and brought in each form of data technology as it was needed for the application. A nice approach. The app was called the WingTip Ticketing application, and would be expanded on in the next session to be a SaaS offering. The first data offering being added to the ticketing application was:

Azure SQL Database Elastic Database Pool

The ticket ordering was to be handled by a relational database, Azure SQL Database. The argument for using the brand new Elastic Database Pool (announced today and in preview) was that it would make sense to logically and physically partition the database by artist, as some artists might naturally have far more load than others, depending on demand. They demonstrated ticket sales loads against a standard database and against a newly sharded Elastic Database Pool, sharded by artist. The load was measured in DTUs, or Database Throughput Units. I’ll explain this more in a bit as it was heavily covered in the next session. I was impressed by how performance could be increased, but I was more impressed with how easy it all was to configure. This includes setting up the sharding strategy, and integrating the Elastic Database Client Library into the web application code. Elastic Databases were covered thoroughly in the next session, so I’ll come back to them then.

Azure Document DB

They used Azure’s NoSQL document database service, Azure Document DB to handle ratings and reviews, with the thought that this data would be largely unstructured. For those who don’t work with document databases, you can basically think of a document as a record in a table whose schema is fluid. This way when new data is added, no schema changes are needed, just updates to the Model, View, and Controller in the web. Documents could be created with the code:

Documents doc = Client.CreateDocument(Collection.SelfLine, entity);

Even SQL-like queries could be created using

Client.CreateDocumentQuery

Azure Search

Azure Search was utilized to provide users with an intuitive search box in the ticketing application. Azure Search is a fully managed Search-as-a-Service in the cloud. It reminds me of working with ElasticSearch in the way you set up indexes, analyzers, and suggestions, though since we are in the Azure cloud it is FAR easier to provision and set up. Once an index and indexer was set up and data populated, wiring up the search was easy using the namespace:

Microsoft.Azure.Search

And using the SearchIndexClient for operations like:

SearchIndexClient.Documents.SearchAsync
and
SearchIndexClient.Documents.SuggestAsync

They showed the use of better scoring in Search, as well as suggestions. They didn’t demo any highlighting of hits capability, but I asked them afterwards and they said this was available.

Apache Storm for HD Insight

Some Big Data work was then done for the interesting example of upping the search results score based on number of recent tweets. They used Apache Storm on HDInsight with a spout to twitter based on a hashtag of a fictional music star. They bolted this to our Azure Search index and then had us in the audience tweet to the hashtag. When the hard-coded number of 10 tweets for the hashtag was met, that artist’s score would increase in the Search results. A compelling example.

All-in-all a great tour of many newer Azure data offerings. It was like 4 sessions in one.

My next session:

Building Highly Scalable and Available SaaS Applications with Azure SQL Database

Presented by Bill Gibson and Torsten Grabs. This session was more of a deep-dive into the new elastic capabilities of Azure SQL Databases, like I mentioned before. For me, I kept trying to get my head around how this was different from Federations. I’m starting to get the idea now, though, that this is not just a logical separation, but a strong data sharding strategy that can handle predicted and unpredicted loads while saving you from having to write all the routing code that you had to with Federations.

We’re back in the WingTip Ticketing application (a tongue-twister name all presenters were having trouble with). This time we’re making the application to be a SaaS offering, with different customers using the service for their own ticketing pages. We’re shown how to set up our elastic databases in PowerShell scripts. We create a database, a database per customer, we register these databases with the ShardMap, and then we add our customers to traffic manager rules. What we end up with is a collection of customer databases and a common customer catalog database. Not that different from a Federation, but without the usual bottlenecks.

Establishing the connection is achieves as follows:

SqlConnection conn = SaasSharding.GetCustomerShardmap.OpenConnectionForKey(

passing in the customer’s key.

We’re shown how we can scale our databases’ min and max DTUs via the portal (see pic below), PowerShell, Rest APIs, or T-SQL.

min-max-DTUs

I mentioned DTUs (Database Throughput Units) before, so let me elaborate on at least my current understanding of what it means. Of 4 dimensions of performance: Reads, Writes, ComputeCPU, and Memory, a DTU is the max value of these 4, after they have (somehow) been adjusted. I’ll have to read their whitepaper sometime to see exactly how this is done.

In the end, it all looks good. Easy to manage and with the possibility of handling unpredictable usage.

Building Data Analytics Pipelines using Azure Data Factory, HDInsight, Azure ML, and More

Presented by Mike Flasko. Boy, if ever there was a session that made me feel like I didn’t know anything about data flows, this was it. Azure Data Factories, named so because they resemble a Henry Ford assembly line for data, are a shift for me in my SSIS-centric ideas of data flows. They are a new preview service for modeling and executing the data analytics pipeline. In a visual designer in the Azure portal, we create data sets (be they tables or files), activities (like Hadoop jobs, custom code, ML models, etc), and pipelines (a series of Activities) to complete a data analytics load process.

The Data Set source is defined in a json document. Activities to partition data are done via a Hive script on an HDInsight cluster. We combine and aggregate data in an activity defined by a JSON object. A final activity is used to call an Azure ML scoring activity. We don’t need to know its inner workings. Only the schema of the input and output and how to call the algorithm.

The end result is a process that takes cell phone log data, combines it with our existing customer data, aggregates it, and spits out a data set that says what the probability is of each customer cancelling their service. This end result is then easily (also using the Factory) sent to PowerBI for a lovely dashboard.

This is all still really new to me and I really need to study up on this. I have left over questions like how would you handle workflows for bad data and what is the best way to promote a factory from staging to prod (Mike answered the latter for me after the session: leaving the factory as is and swapping linked services definitions to make the factory run against production). One way or another, the data game is changing, and this session was an excellent introduction to the brave new world.

Gaining Real-Time IoT Insights using Azure Stream Analytics, Azure ML, and Power BI

Presented by Bryan Hollar et al. Azure Steam Analytics was just released to GA 2 weeks ago and this is the first I’ve gotten to see it in action. After seeing case studies from Fujitsu and the Kinect team, ASA implementation was mapped out for us. The shift is from thinking about reporting on data at rest, to data in motion. For example, we could analyze how many twitter users switched sentiment on a topic within a minute in the last ten minutes. SAQL (Stream Analytics Query Language), makes this easy by being a flavor of SQL mixed with temporal extensions. You’re analyzing within a time window, and these windows could be tumbling, hopping, or sliding depending on how you’ve set up your queries. For example:

SELECT Topic, COUNT(*) AS TotalTweets
FROM TwitterStream TIMESTAMP BY CreatedAt
GROUP BY Topic, HoppingWindow(second,10,5)

With Azure Stream Analytics, your data flow pipeline is set to pull from existing event hubs, analyze, and persist (or display) its results. The processed data doesn’t even need to be persisted to be reported on. ASA basically exists in the same place in the data flow pipeline as ML. But where ML would take the “cold path” of analyzing large sets of data at rest, ASA is analyzing the data as it streams. Though now, for a brief preview, ML is integrated into ASA. I’m told you’ll be able to sign up for this preview at the ASA team blog:

http://blogs.msdn.com/b/streamanalytics/

As far as integrating the Internet of Things, it’s basically a matter of configuring your event hub in your data pipeline. So, there’s not much difference pulling from twitter or the Internet of Things. You can then configure your output to be PowerBI (also in preview) for a real-time dashboard. The most impressive IoT and ASA example was by Fujitsu who showed an impressive app that geo-mapped energy consumption data and could zero in on spikes right down to an area of a building in real time. Though, now that I think about it, they may be tied with pedometer analysis to tell when your cows were in heat. And now I finally know why they call it a “Heat Map”.

Building Big Data Applications Using Azure HDInsight Service

Presented by Asad Khan. The final session of my day was a tour-de-force of Big Data in Azure. It was four one-hour sessions compressed to one and it was a doozy. Started with 30 seconds of the basics of big data: caring about volume, velocity, variety, and variability of data. They mentioned how Apache Hadoop is an Open Source platform for large amounts of unstructured data, but how the managed infrastructure of Azure makes HDInsight an enticing implementation. They covered HBase for NoSQL, and taught us how to use Storm for streams of data, by showing us how to build spouts for twitter and bolts for Signal R to display data on the web in real time. It was an ambitious session, but pulled off very well.

Final thought:

I’m just now realizing that of everything I’ve mentioned, Big Data with HDInsight is the old man at the ripe old age of a year and a half. Speaks volumes to how much Microsoft is invested in growing its cloud data offerings. Can’t wait to see what’s next!

Simple Augmented Browsing for Website Development and Troubleshooting

Often times developers face the challenge of quickly making a few trivial changes to an existing website just to see how a change of an image or a css style would look. We can make these changes in a development environment, no problem there. But what if you have to do it to a live website, and the changes cannot impact any other user except yourself?

Augmented browsing techniques can come to our rescue. You might have used GreaseMonkey, a popular add-on that lets you change the look and feel of any website. In short, it installs scripts that read the DOM of the loaded html and alter its html/css etc. But creating and running the scripts might be overkill or cumbersome to work with, especially if you need to test with many different browsers.

Let’s take an alternative approach. How about intercepting the incoming resource file requested by the webpage and loading a different resource file that is stored in your local drive? aha!

For this I use my favorite tool, Fiddler. It is a debugging proxy that sits between your browser and the server and intercepts calls between them. The tool has many features that make a developer’s life easier, and we are going to use the feature “AutoResponder”.

clip_image002

Here are the steps to follow to intercept an image file and point to your own image.

a. Download, install & run Fiddler

b. Select the AutoResponder tab, check ‘Enable automatic responses’, and check ‘Unmatched requests pass-through’. This says that if no rule matches the incoming resource then do not intercept and use the file served from the web server.

c. Get the url of the image on the page you want to change. You can probably find it by viewing the page’s source code.

d. Have your replacement image in your local drive ready.

e. Click Add Rule button (or you can import the rule, if you previously exported it).

f. In the bottom of the window, type in the relative URL of the source image in the first dropdown.

clip_image004

g. For the second one, type in the local file path of the image file to be used in place of the original one.

h. Save

i. Refresh the webpage, voila! new image in place of the original!

clip_image006

j. You may turn on/off the interception by check/uncheck the checkbox in front of each of the rules you specify

clip_image008

To alter a .css or .js file, first download the file from the web server and store it in your local drive, add the interception rule, do modifications to that local file and refresh the page to see the change.

Happy coding!