Retrieve Web Page Results Using SSIS

Scenario

A client is wanting to use the web to retrieve coordinate information from Google Maps to store the information for a Marketing campaign.

Solution

The SSIS Script task can connect to the web page and return the results. The following script provides sample code on how to get the Script Task to perform the action. The script task was developed in SQL SSIS 2008 using VB, the 2005 version is similar but the guts of the HTTP code remains the same (thanks to my colleague Andrew Crowder for putting together the main syntax):

First, a package is created with two variables of type String. The variables are passed into the script task in the following manner:

  • ReadOnly: User::URL
  • ReadWrite: User::HTML

Now, the script itself:

Imports System
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Net

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum
    Public Sub Main()
        Try

            'setup the variables, URL is set in the package as a var and read
            Dim URL As String = Dts.Variables("URL").Value
            Dim request As HttpWebRequest = WebRequest.Create(URL)
            Dim response As HttpWebResponse = request.GetResponse()
            Dim reader As StreamReader = New StreamReader(response.GetResponseStream())

            'read the http data into a string to be used later in the package
            Dim str As String = reader.ReadToEnd()

            Dts.Variables("HTML").Value = str.ToString()

            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            'fire the info event and return as error from the task
            Dts.Events.FireInformation(0, "Web Script Task", ex.Message.ToString(), "", 0, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub

Execution

The following URL value is set and returns the coordinate information into the HTML variable:

URL: http://maps.google.com/maps/geo?q=1600+Amphitheatre+Parkway,+Mountain+View,+CA&output=csv&oe=utf8&sensor=false&key=ABQIAAAAP6yK9BJ44wmvT1Dd7V5cKRQz9eeYvRAhuPGO19pGZUf7Pj2DYxRZ0Zbtz0o1UAkk17kmD6xk5qDnzg

HTML: 200,8,37.4219720,-122.0841430

The script task can also return the HTML from a page, for example http://www.rdacorp.com, and allow for other SSIS tasks to manipulate the HTML variable as per the requirements and design of the package as it would like any other variable within the package.

Introduction to Excel Services

Microsoft and other software vendors offer a dizzying array of BI, business productivity, and data management tools from powerful relational databases on up. Yet research (and personal experience) has shown that a large number of corporations still maintain a significant quantity of business logic in plain-old Excel spreadsheets . Logic stored in spreadsheets can lead to a proliferation of cut-and-paste in an attempt at reuse that can become a maintenance nightmare. Responding to the needs of its customers, Microsoft introduced Excel Services for SharePoint Server 2007.

Basically, Excel Services is a server-side version of the Excel calculation engine. Some common usage scenarios to Excel Services are:

  1. A set of SharePoint web parts. This allows anyone with access to a SharePoint site to interact with Excel Data stored on a server without necessarily having full control to modify the data or change how it is presented. A Business Intelligence Dashboard showing KPIs (Key Performance Indicators) would be a common application.
  2. An interactive spreadsheet viewer. Even if a customer has Excel installed on their workstation, this allows the user to view up-to-date Excel data without having to leave their browser when accessing their corporate SharePoint site
  3. Maintaining “one version of the truth” of data (such as quarterly sales, etc.) on a server without having to worry about changes to multiple copies floating around individual users desktops.
  4. A set of APIs for developers to leverage the calculation abilities of Excel from within their own applications.


Architecturally, Excel Services is also divided into three components:

  1. Excel Calculation Services (ECS) – handles the spreadsheet refresh and recalculation
  2. Excel Web Access (EWA) – handles the rendering of the spreadsheet to the client, e.g. SharePoint site
  3. Excel Web Services (EWS) – allows access to the Excel calculation engine via web service calls.


Alas, Excel Services does not provide you with 100% of the full power of Excel to your SharePoint site; limitations include lack of support for VBA Macros, references to other Excel workbooks, ActiveX controls, and most graphical elements.


Excel Services may not be the most glamorous technology, and on the surface might seem like a “hack” – using a lowly spreadsheet to manage your important business data, and to provide the calculation engine for custom applications that might be better served by interfacing with a commercial or homegrown software library. However, in the real world of legacy systems and shrinking budgets, Excel Services has its place. In future blog entries, I will be diving in and exploring the different components of Excel Services in greater detail.

Default Cube Action in Excel

Scenario

When using Excel 2007 to connect to a SQL 2005\2008 SSAS cube, double-clicking a cell, or right clicking the cell and selecting Show Details, within a Pivot table brings up details about the specific sliced cell within the Pivot table. However, the default nature of the cube is to bring up the raw Fact table data for that slice, which displays all fields within the Fact table for the records that make up the cell. The fields within a Fact table are often surrogate keys that point to the Dimensions that join to the Fact table as well as all Measurements stored within the Fact table. This default nature is not the most helpful information to business analysts since the keys have no business meanings to the users.

The question then is how to change this default behavior to bring back information to supply meaningful details to a business analyst when using the cube. The answer is to create a Drillthrough Action with the SSAS cube and set the Default property to True. The Drillthrough Action will return the information that the business defines as meaningful when wanting to see the details within the cube.

The following lists out the steps on how to create the Drillthrough Action within the SSAS project for the cube and then demonstrates how to use the Drillthrough Action in an Excel Pivot Table.

Setup the Drillthrough Action

In the SSAS cube, navigate to the Action table and select the icon for the New Drillthrough Action:

SSAS_Drillthrough_Action

After selecting the New Drillthrough Action, setting up the what the Action should do is the next step. For this example, the AdventureWorks cube is being used to build the action and the example will return the following fields for the sliced cell as its default:

  • Reseller Sales Amount, Reseller Order Quantity
  • Date
  • City, State, Country

The following screen prints lays out these fields and sets up the Default property to True:

SSAS_Drillthrough_Action_Setup

Using the SSAS Action

After creating the action and processing the cube, an analyst can now connect to the cube via Excel 2007 and create a Pivot table as its starting report. For the example, the report is going to select Sales Territory by Calendar Year for Reseller Sales Amount.

With the Pivot setup, the following methods will display the the SSAS action, note these actions can only be performed on Non Calculated Members within the cube:

  • Double click the cell within the Pivot Table
  • Right click on the cell to drill into and select Show Details
  • Right click on the cell and select Additional Actions-->AW Default Action (this is the name of our Drillthrough Action in this example).

Right Click-->Show Details Menu

Excel_Show_Details

Right Click-->Additional Actions Menu:

Excel_Additional_Actions

Excel Report

Regardless of the method chosen, Excel will create a new Tab within the Excel workbook and display the data as defined within the Drillthrough Action:

Excel_Show_Details_Tab

Conclusion

As demonstrated, a Drillthrough Action within a SSAS cube is another tool for a business analyst to dive into their data and answer questions that is pertinent for the question at hand. Setting up a Drillthrough Action with the Default behavior set to True allows for the users to not see data that is meaningless to the users, since the users will not be able to translate surrogate keys to actual dimension text, but instead provides useful information to the analyst to help make business decisions. The tool of providing of this useful information is the point of developing the cube in the first place.

A Toolkit book for your BI Toolbox

As part of a ramp-up plan to get more familiar with the latest Microsoft stack of BI tools and products, a colleague suggested I take a look at The Data Warehouse Toolkit by Ralph Kimball and Margy Ross (ISBN 978-0471200246, on Amazon). I am not going to provide a detailed synopsis of each chapter, you can check the key concepts listed in the text for that, but I want share some thoughts on how I found it useful and an interesting read as well. The book follows a case study format, which helps provide real-world context to the concepts being explained.

If you are new to data warehousing, the first chapter provides all the background and vocabulary you need to get started. This chapter contains explanations and descriptions of facts, dimensions, measures, etc and draws parallels to relational concepts as well.

A four-step design process is laid out early in Chapter 2 and referred to throughout the book. Key here is the notion to select a business process to model. Similar to other database or software development efforts, you need to start with the business requirements first. This is no different in a data warehouse design. The concepts of a data warehouse bus architecture and bus matrix are introduced in Chapter 3. While it’s not exactly the same as an Enterprise Service Bus, occasionally seen in a software solution, there are definitely some parallel ideas.

The case studies in the early chapters start out simple and somewhat generic, but they help illustrate the basic design principals. The more complex case studies in later chapters still provide a good reference for more complex models, techniques, and industry specific scenarios.

A chapter near the end of the book pulls it all together with a project lifecycle description to design and build a data warehouse. Here again, not too different from any other software development effort: gather requirements, design, implement, test, communicate with stakeholders, etc.

Overall, I found the book very useful and plan to keep it handy as a companion reference. I was able to apply some of the modeling techniques directly in a solution leveraging Microsoft SQL Server 2008 Analysis Services, Integration Services and Reporting Services.

Using a View for a Fact Table in SSAS Project

I recently was involved in building a custom data warehouse and 2005 SSAS cubes which contained many financial calculations off of the original data. Many of the calculations needed to be at the grain level of the warehouse before aggregations could be applied and therefore it made sense to perform these calculations as Named Calculations in the Data Source View (DSV). After the fields were added to the DSV, the new fields were added to the cube and then MDX calculated members were created in the cube for the final calculations to be consumed by the business users.

However, after many Named Calculations were added to the DSV, the performance of processing the SSAS partitions was becoming very poor and from looking at the SQL that the SSAS project was generating, it was clearly not optimized.

The solution was to create a database view that had the optimized SQL code containing the granular calculations and then within the DSV, replace the Fact table with the view. The following screen print displays the menu that appears by right clicking on the table to be replaced within the DSV and then navigating to the Replace Table\With Other Table option:

DS_Replace_Table

The view also provided a benefit for the SSAS partitions that were created by being able to setup each partition as a SELECT * from VIEW WHERE condition. The SELECT * allowed for more fields to be added to the view without having to go back into each SSAS partition and recreate the SQL that made up the partition for the additional field. Originally, all of the SSAS cube partitions had the non-optimized SQL code as the source and any additions to the FACT table would require going back to each partition and adding the new field to each and every partition created. Considering were had 150 partitions, this was not an enviable position to be going forward.

However, one property within the DSV needed to be changed to ensure that the DSV did not lose all of the relationships that were previously defined when originally using the Fact table itself. The warehouse had FK relationships amongst the Fact table and the many dimensions that joined to the Fact table. When the DSV was originally created, the relationships were read from the data source and automatically created. When replacing the table with a view, the relationships are not pointing to the view within the database and when you hit OK, you see all of your relationships disappear!

The fix for this problem is to change the DSV RetrieveRelationships property to False. The plus, you do not lose your relationships already defined in the DSV. The minus, you must manually create any new relationships that exist going forward, which is an acceptable task considering the many relationships that would have been lost and any future refreshing of the DSV would eliminate the relationships that were recreated.

The following screen print displays the location of the RetrieveRelationships property to update:

DSV_Properties

The final result of changing the Fact table from the warehouse with Named Calculations to a SQL Database View was being able to update the processing time for each partition from the SSAS generated query that was taking minutes to return to retrieving the same data within seconds by using the SQL Database View. In addition, we were able to improve the maintenance of the partitions for field updates from a painstaking manual update procedure and then a cube process to simply reprocessing the cube when field changes were made.