Business Intelligence and Silverlight

Looking forward to an upcoming book on two of my favorite areas of interest: Business Intelligence and Silverlight.



SQL Server 2005 vs 2008 Query Performance

I was recently working on a transactional web application for a client. This web application had some "operational" type reports included within it that required aggregations of, what will eventually be, millions of rows of data. This meant I'd be writing queries directly against the transactional tables doing the aggregations (SUMs and COUNTs mostly) "on demand".

Note: In more heavily used applications, aggregating as much data as we were on demand might not be acceptable, but in this case we were measuring in transactions "per minute" and "per hour" rather then "per second", so we did not anticipate any blocking of our queries by database writes.

These queries were going to be used on the landing page for web application seen by every employee within the company. The landing page receives 20,000 hits per day over roughly 17 hours of operation a day. This is about 19 hits per minute or 1 hit every 3 seconds. Not all that much compared to many applications I've seen, but definitely enough to warrant some sort of load/performance testing under a larger data load.

I set up some SQL scripts to, somewhat randomly, load a data sample that was representative of about 2 years worth of data. My initial query times were awful, and as I suspected, there was some room for improvement. I did some refactoring (with some help from a friend) and had things running at about 1-2 seconds per query. I thought there was still some room for improvement, but this was within an acceptable range.

Thinking things were fine, I promoted my code from my local development environment to our integration environment. I then ran my data load script to test in that environment, and suprisingly, was seeing query times in the 20-25 second range. WHAT?!? I just tuned things. After some tinkering, and reloading of data, I was seeing the inconsistencies in my query times. Sometimes in the 20-25 second range, other times in the 3-5 second range. What was going on?

My initial thought was that there was something different in the query processing between my local development environment (SQL Server 2008 Developer Edition) and the integration environment (SQL Server 2005 Standard edition). Some quick searches found that I might be using the Star Join Query Optimization available in 2008 Developer/Enterprise editions. There are ways that I could have confirmed this assumption, but I did not spend the time to do so. Regardless if I was or wasn’t using this optimaztion, I still didn’t want to take a chance that there might have been something else in the query processor that was making things more “efficient” on 2008 Developer vs 2005 Standard. I thought it’d be more reliable if I just set up a more representative environment and did my testing/tuning there.

I built a 2005 Standard virtual machine, built my application, loaded sample data, and ran the queries. Started seeing query times similar to our integration environment. OK, so it was reproduceable. Now let's tune it. I went straight to an indexed view I was using for the query, made a minor tweak. As I ran the script to rebuild the indexed view, it hit me. I had just loaded 2 years worth of data, and had done no database maintenance that would "normally" be done gradually, such as rebuilding indexes and defragging the hard drive. I rebuilt my indexes, defragged the drive, and sure enough my query times were back down in the 1-2 second range.

After tweaking just a bit more by including my numeric columns used in my query aggregations in an index, I redeployed to our integration environment and finally started getting consistent and performant results.

A few important take-aways here...

  1. In order to truly gauge/test performance, perform maintenance on your development/test environment just as you would in a “normal” environment, especially after large data loads. That is rebuild your indexes, defrag the hard drive, etc.
  2. Always (or to the extent possible) develop and/or test on machines that are representative of the environment you are actually going to deploy to, especially with regards to versions/editions of applications being used. There are many features within SQL Server that are only available in the Enterprise Edition of the product, including some query optimizations. I usually do this, but for a long period of time on my current project, the client couldn’t give a definitive answer on what our environments would look like, so we continued development with what we originally started with (which was no longer valid). Once I built a more representative environment of what I was actually deploying to, then deployed my application and test data there, I was able to troubleshoot and fix the issue more quickly.

Excel Services Part Deux - Excel Web Services

In an earlier blog entry, I discussed in very broad terms Microsoft Excel Services – a server side calculation engine that can be leveraged for SharePoint Business Intelligence solutions. Excel Services, you may recall, is divided into three components; Excel Calculation Services (ECS), Excel Web Access (EWA), and Excel Web Services (EWS). Excel Web Services allows access to the Excel Calculation Engine via Web Service calls, and is today’s focus.

Excel Web Services utilizes SOAP over http and acts as an interface between custom client programs and Excel Services. In order to call the service, it is necessary to establish a reference to the Excel Web Services WSDL, accessed through ExcelService.asmx?wsdl. However, if you are writing code within SharePoint Services (for example, developing custom web parts), Microsoft recommends linking directly to Microsoft.Office.Excel.Server.WebServices.dll and making local calls rather than using loop-back SOAP calls.

EWS - What is it good for?
Absolutely nothing (say it again…)? No, there are a number of practical examples where utilizing Excel Web Services comes in handy. One common scenario occurs when a company’s business logic is embedded in Excel spreadsheets. These spreadsheets can be maintained on a corporate server, and custom applications developed to utilize the logic in the spreadsheet without having to recode it in a “conventional” programming language. This also means that the spreadsheet logic can be maintained by business Subject Matter Experts (SMEs) who are comfortable working in Excel. Custom code can be developed to set values to cells and ranges, process the workbook, and retrieve calculated values (or even the entire workbook).

Coding Steps 101
Dealing with EWS is relatively straightforward, and numerous code examples can be found in the literature and on the web, such as here: http://blogs.msdn.com/excel/archive/2005/11/21/495454.aspx


Typical steps involved in coding against EWS are as follows:


1. Instantiate the web service via creation of ExcelService object
2. Set the URL via ExcelWebService property
3. Set the credentials via Credentials property. Default credentials (using applications own credentials) can be used by setting this property to System.Net.CredentialCache.DefaultCredentials.
4. Start the Session by calling OpenWorkbook() method
5. Set Cell Parameters via SetCell() method
6. Calculate Spreadsheet (if workbook not set to auto-recalculate) via Calculate() or CalculateWorkbook() method
7. Retrieve the Results via GetCell() method
8. Close Session via call to CloseWorkbook() method


Other methods are available as well, including the ability to set a range of cells and read an entire workbook into memory.

Business Example
An RDA client manages a customer support system that utilizes a custom authorization engine that controls which support agents have access to specific functionality within the application. For example, agents that support a specific product line do not have access to workflows supporting other product lines, and only agents of a certain tier level have the rights to issue refunds to customers. Although this authorization data is stored in a relational database, an Excel Spreadsheet on a SharePoint site is used by both the business and application support personnel to maintain and modify this information. When changes are made, the spreadsheet is loaded into the relational database through custom software.

This approach has a number of benefits:


1. SharePoint is used as a means of version control, to maintain a historical record of changes made.
2. Anyone with access to the SharePoint site can view the details surrounding the current authorization hierarchy, without the need for special software or reports.
3. The spreadsheet itself contains business logic to help organize the information and maintain data integrity.
4. The spreadsheet is easy to use and modify for subject matter experts who may not have a programming background.

However, as initially conceived there are some difficulties associated with this approach:


1. Making changes and deploying to the relational database involves a number of time-intensive manual steps, including downloading the spreadsheet, exporting to XML, applying an XSLT transform, and executing a custom application.
2. In addition to the labor involved, the manual steps are prone to error.
3. In the event that the changes made are not what were desired, either through human error or a change in business strategy, rollback of the changes can be difficult.

Excel Web Services can provide an ideal platform to address these shortcomings. A SharePoint web part or other custom application can be developed that, utilizing Excel Services, can recalculate and read the latest spreadsheet data, generate the necessary output, and load the relational database in (from an end users point of view) a single step. No modifications to the existing spreadsheet logic would be necessary, and the business users can use and maintain the spreadsheet as before. SharePoint security can be leveraged to restrict access to the Web Part. In addition, custom rollback functionality can be added via code.

Finally…

As indicated in my last installment, relying on business logic residing in an Excel spreadsheet may not be considered “ideal” from an architectural standpoint – but is often a necessity when practical considerations such as time, budget, staff experience, and maintainability come into play. Excel Web Services provides an IT organization a valuable tool to leverage that platform for maximum benefit.