Using FOR XML PATH to Combine Multiple Record Values into One Column

Scenario

I have been asked on numerous occasions to provide a data feed that provides data for a customer, or plant or some other entity with a catch, can you take a one to many relationship and make one column of data out of it. That answer, of course, is “Sure, I can do anything!”  but then I get into the how and I start to build a stored procedure that uses temp tables, involves a WHILE loop of some kind and I have to manually code the correct answer. The process is nothing complicated but it definitely involved some extra steps before providing back the results.

My thinking though is that it would be great if I could do that in a SQL statement and let SQL Server do all of the work. Low and behold, I ran across a method that turned my original process into a single query statement, which obviously is much simpler!

Example Data Set

To illustrate, I have the following scenario:

  • Please provide a Customer list by State and include all of the Customer Regions in one column on each state’s row.

Simple request but the issue is that within a state, I have multiple regions depending on a postal code of the address of the customer office.

In my sample database, my data is broken out into three tables: (for the sake of being able to create this locally as an example, I have done everything in tempdb)

create table #tmpCustomer (CustomerId int identity, CustomerName varchar(max))

create table #tmpCustomerAddress (AddressId int identity, CustomerId int, StateAbbreviation varchar(5), PostalCode varchar(10))

create table #tmpPostalCode (PostalCodeId int identity, PostalCode varchar(20), PostalRegion varchar(30))

The relationships are:

  • A customer can have multiple addresses
  • For a state, you can have multiple regions based upon the postal code
  • The postal code belongs to a postal region

These relationships imply that a customer can have multiple regions per state as well, where the postal code of the customer address will map to the region.

For sample data, the following inserts are executed:

insert into #tmpCustomer (CustomerName)

select 'Customer A'

 

 

insert into #tmpCustomerAddress(CustomerId, StateAbbreviation, PostalCode)

select 1, 'PA', '19111'

union

select 1, 'PA', '19103'

union

select 1, 'NJ', '08034'

 

insert into #tmpPostalCode (PostalCode, PostalRegion)

select '19111', 'NE Philly'

union

select '19103', 'Center City'

union

select '08034', 'South Jersey'

Query Solution

Now, if we queried the data using straight joins we would get the following:

select

            c.CustomerName,

            ca.StateAbbreviation,

            pc.PostalRegion

from

            #tmpCustomer c

            inner join #tmpCustomerAddress ca

            on c.CustomerId = ca.CustomerId

           

            inner join #tmpPostalCode pc

            on ca.PostalCode = pc.PostalCode

CustomerName StateAbbreviation PostalRegion
Customer A NJ South Jersey
Customer A PA Center City
Customer A PA NE Philly

However, the above dataset does not provide the solution, the request is to roll up the State Regions for the customer into one row per state.

The following query gives us the answer:

select distinct

            c.CustomerName,

            ca.StateAbbreviation,

            substring(

            (select '; ' + ipc.PostalRegion

            from #tmpPostalCode ipc

                        inner join #tmpCustomerAddress ica

                        on ipc.PostalCode = ica.PostalCode

            where

                        ica.CustomerId = ca.CustomerId

                        and ica.StateAbbreviation = ca.StateAbbreviation

            for xml path('')), 3, 1000) as Region

from

            #tmpCustomer c

            inner join #tmpCustomerAddress ca

            on c.CustomerId = ca.CustomerId

           

            inner join #tmpPostalCode pc

            on ca.PostalCode = pc.PostalCode

Here are the results:

CustomerName StateAbbreviation Region
Customer A NJ South Jersey
Customer A PA Center City; NE Philly


The key of the query is using the for xml path('') syntax as a subquery in the select portion of the field list, as well as using the distinct keyword to remove duplicate entries.

To note, the for xml path('') statement would put in xml tags if the column was named in the subquery but since we left the column name empty when we concatenated the semicolon and the postal region, the xml nodes are empty. The extra substring is just to parse off the preceding delimiter that we start all records with.

The above example could then be used as an inner query tied to other tables within the database for more fields that are also at a single Customer State level.

Conclusion

As the above example showed, the ability to roll up a one to many relationship into a single row set can be be very useful and it would have saved me time in the past if I had come across this method earlier. In addition, having the ability to do this using one T-SQL statement as opposed to having to generate a loop and a procedure to build your dataset is definitely a simpler and in my opinion a better answer.

Database Partitioning with the Entity Framework

 

Problem

Requirements on a recent project called for a cloud-based solution and support for a lot of data. Microsoft Windows Azure was selected as the platform and Microsoft SQL Azure was selected for the database. How much is a lot of data? It was expected to be much greater than 50GB. Now we had a problem, since the maximum size of a SQL Azure database was 50GB and SQL Azure Federation has not yet been released.

 

Solution

An additional technical requirement for the solution was to use the latest Microsoft Entity Framework. A ready-made solution for database partitioning on SQL Azure with the Entity Framework was not to be found, so we developed our own implementation. The key features of the solution included:

1. Minimal impact to the development team, who were already familiar with EF

2. Ability to add additional databases on-the-fly as one or more database approached capacity

3. Self-leveling of the databases: they should effectively grow at the same rate.

 

Partitioning Strategy

Regardless of the implementation, selecting the appropriate partitioning strategy is critical. We chose a horizontal partition. The Atomic Unit for the partition (or shard) was a Customer. All transactions (Orders) and other data related to a Customer are in the same shard. The benefits of horizontal partitioning in this manner include:

1. Performance: Once a customer was selected, all database activity for that customer was in the same database. There was no spanning of databases to store or retrieve data related to the current customer. Only Customer searches needed to span databases.

2. Self-leveling: The logic for which database to insert the next new customer was simple – always add to the smallest database.

3. Positioned for Growth: Additional databases could be added without having to re-partition or divide existing data.

 

Implementation Details

The two main areas of the solution that required enhancements to an otherwise ‘standard’ EF implementation were searches and management of connection strings. Searches had to span databases and return which database a customer was found in as part of the result. Likewise, knowing which database a specific customer belonged to had to be easily passed into EF methods and translated to full connection strings when necessary.

Searches

We chose parallel LINQ queries for searches. See below for an example. An array of connections (how many shard databases we currently have) is maintained in a static ConnectionFactory. So, c.Item1 in the final select below contains the shard where the result was found. c.Item2 contains the full connection string passed into the query.Invoke().

public static IEnumerable<TResult> ExecuteAgainstShards<TArg0, TArg1, TResult, TPreShape>(

this Func<TArg0, TArg1, IEnumerable<TPreShape>> query, TArg1 arg1, Func<string, TPreShape, TResult> shapeResult) where TArg0 : ObjectContext, new()

{

var result = from c in ConnectionFactory.Instance.CreateConnections<TArg0>().AsParallel()

from x in ExecuteWithRetry(() => query.Invoke(c.Item2, arg1))

select shapeResult(c.Item1, x);

return result.ToArray();

}

The web.config contains only the connection string for the first shard and there are some helper methods to parse and create connection strings as needed to access the remaining shards. The web.config (or ServiceConfiguration.cscfg in Windows Azure) also contains the number of active shards used by the ConnectionFactory. This provides the mechanism for adding databases without changing code or adding additional connection strings.

Below is an example compiled query and its calling method that leverage the ExecuteAgainstShards parallel execution.

private readonly Func<MyEntities, CustomerSearchTerms, IQueryable<Data.Person>> customerSearch = CompiledQuery.Compile<MyEntities, CustomerSearchTerms, IQueryable<Data.Person>>

(

(es, search) =>

from p in es.People

where

p.Customers.Count() >= 1 && (search.CustomersId == null ||

p.Customers.Count(e => e. CustomerID == search. CustomerId) > 0) &&

(string.IsNullOrEmpty(search.FirstName) || p.FirstName.StartsWith(search.FirstName)) &&

(string.IsNullOrEmpty(search.LastName) || p.LastName.StartsWith(search.LastName)) &&

select p

);

public IEnumerable<CustomerPersonReference> FindCustomer(CustomerSearchTerms terms)

{

var items = customerSearch.ExecuteAgainstShards

(

terms,

(shardName, person) =>

{

var patient = person.Customers.First();

return new CustomerPersonReference

( CommonRoutines.MapHeader(person, shardName),

});

return items.OrderBy(p => p.Item1.LastName).ThenBy(p => p.Item1.FirstName). ToArray();

}

Connection Strings

As mentioned above, once a Customer has been found in a specific shard, all data related to that Customer will be and must be in that shard. In addition to the shard helper methods for generating and parsing connection string, the development team needed to be disciplined to always use the EF constructor that included a connection string. For example:

using (var entities = new MyEntities(ShardUtilities.GetConnectionString(customerShard))

{

// real work goes here

}

where customerShard is likely the shard returned from the search query above.

 

Summary

Implementation of a database partitioning solution with the Microsoft Entity Framework was necessary to overcome the current SQL Azure database size limitation. But, the solution is certainly not limited to SQL Azure. If you are using a version of Microsoft SQL Server that does not support partitioning or if you want to maintain a database solution that will work with either SQL Azure or on-premise SQL Server, this fairly straightforward implementation might come in handy.

Where is my Dashboard designer?

Finding SharePoint performance point Dashboard designer tool was a little tricky. It’s a click once download that is not available anywhere in MSDN downloads or any other place. However you can download it from your Business Intelligence Center page in SharePoint.

To make your life easier, if you ever need it and you already have PerformancePoint installed on your SharePoint sever then here is how you can get it:
  1. Open your browser as an admin user
  2. Go to this URL http://<server-name>/sites/<site-name>/pages/ppssample.aspx
  3. Click the “Run Dashboard Designer” and let it download the tool for you.
In highly customized SharePoint 2010 sites this page can be hard to find.



Scale Breaks in SSRS 2008 R2

Ever came across an issue when in your chart some axis values are so high that rest of the chart items almost look like they do not have any values? I recently came across the same issue and thanks to SQL Reporting Services 2008 R2's new Scale Break feature which made my charts look more meaningful.

To enable scale breaks on the chart
1. Right-click the vertical axis and then open "Axis Properties".
2. Select the "Enable scale breaks" check box.

To change the style of the scale break
1. On the design surface, right-click on the y-axis of the chart. The properties for the y-axis object are displayed in the Properties pane.
2. In the Scale section, expand the ScaleBreakStyle property.
3. Change the values for ScaleBreakStyle properties, such as BreakLineType and Spacing.

Note: You cannot specify where to place a scale break on your chart. The chart uses its own calculations based on the values in your dataset to determine whether there is sufficient separation between data ranges to draw a scale break on the value axis (y-axis) at run time.

File system error: The following error occurred during a file operation: There is not enough space on the disk.

Since last two days my package which processes 30+ dimensions and a cube was failing with the File system error. Package was complaining that we were running out of space on file system. I checked both SSIS server and SQL Server and space was not an issue at all. After spending some time on troubleshooting I realized that when we were processing all dimensions in one ssas processing task (ssis task), package was throwing "out of space" error but after breaking the dimension processing task into multiple tasks by dividing dimensions equally into three tasks, package started running successfully for me. Now if you come across the same issue then you know what you need to do. break your dimension processing task into multiple tasks by dividing dimensions equally into multiple tasks. I am processing no more than 15 dimensions in one task.

Microsoft BI Labs

Microsoft recently launched a new site called, "Microsoft BI Labs".
http://www.microsoft.com/bi/en-us/Community/BILabs/Pages/Home.aspx


As shared on their site, "BI Labs is a collection of experimental business intelligence projects and useful applications made available from internal sources across Microsoft. These projects are prototypes and concepts, and there are no current plans to include them in Microsoft products."


Those of you who are interested in learning more about upcoming techonologies in BI world, this might be a great resource for them. They already have some interesting technologies shared on their site e.g.


1- PivotViewer Extension for Microsoft SQL Server Reporting Services


2- Microsoft SQL Server Data Mining for the Cloud


3- Fuzzy Lookup Add-in for Microsoft Excel 2010


4- MDX and DAX Formatter


5- Microsoft SQL Server Reporting Services Log Viewer


6- BI vision in multiple industries (Video Series)


Enjoy!!!

Microsoft SharePoint 2010 Business Intelligence Unleashed Book

BI SharePoint Book Cover 

Several members from the RDA BI team have put together an Unleashed Series book covering Business Intelligence with SharePoint 2010 and can be found online at the following link: http://www.informit.com/store/product.aspx?isbn=0132660113.

The release date for the book is May 9th, 2011 and was written by: Chuck Rivel, Ray Barley, Jim Pletscher, Aneel Ismaily and Steve Mann (who was the original editor of the RDA Blog site and former team lead for the RDA BI Team)

The book covers working with the Microsoft BI stack for SQL Server 2008 R2 and SharePoint 2010, which involves the following services and applications: PerformancePoint Services, Reporting Services, PowerPivot, Visio Services and the SharePoint 2010 Business Intelligence Center.

The following is the the official book description as published on the informIT website:

“SharePoint 2010 is rapidly emerging as the preferred presentation and deployment system for Business Intelligence (BI) solutions that use the Microsoft technology stack. This is a complete guide to both SharePoint 2010 and BI. This book reflects the unsurpassed real-world experience of five expert consultants at RDA, a Microsoft Gold Partner specializing in delivering advanced BI solutions with SharePoint. The authors offer practical, how-to guidance for every key task involved in building, configuring, and deploying BI solutions within a SharePoint environment, delivering crucial information that can't be found in any other book. They bridge the gap between SharePoint 2010 and SQL Server 2008, carefully guiding technical professionals on utilizing BI data stored in SQL Server databases. Both a hands-on tutorial and a thorough reference, SharePoint 2010 BI Unleashed will be an indispensable resource for every BI developer, implementer, manager, or integrator working with Microsoft technologies. “

If looking for help on how to integrate the BI stack into your SharePoint 2010 environment, this book provides a thorough reference to the applications listed above as well as provides real world step by step examples on how to integrate several of the BI components to build a complete BI solution.

Setup up a SQL Server 2008 R2 cluster using Virtual Machines

Sometimes, SQL Server Professionals need to practice setting up a SQL Server 2008 R2 Cluster on their local machines to gain some experience before setting it up for real in Staging or Production environments. In the past, this used to be virtually impossible without specific hardware such as a Storage Area Network (SAN) which is typically used for hosting the Shared Disks. However, with the release of Windows Storage Server 2008 and Windows Storage Server 2008 R2, it’s become much easier to do so using Virtual Machines. Please note that the following configuration is NOT recommended for a real-world Production environment. This configuration should only be used for a Development environment.





The following are the steps on how to go about it:
  1. Setup 3 Virtual Machines running Windows Server 2008 R2 with 2 NICs on each of them. You can use either VMWare Player or Virtual Box. Configure the 1st NIC with a dynamic IP Address that will be used to connect to the host machine. Configure the 2nd NIC with a static IP Address that will be used by the 3 Servers to talk to each other.


  2. Configure the 1st server as your Active Directory Domain Controller. You can call it MainServer.contoso.net.


  3. Add the 2nd and 3rd servers to the Active Directory created on the 1st server. You can call them SQLServer1.contoso.net and SQLServer2.contoso.net respectively.


  4. Configure MainServer.contoso.net as the Storage Server that will host the Shared Disks. You will need to install the iSCSI Target 3.3 software that can be found in the ISO image of Windows Storage Server 2008 R2. For more information on how to install it, please refer to: http://blogs.technet.com/b/josebda/archive/2010/09/27/windows-storage-server-2008-r2-and-the-microsoft-iscsi-software-target-3-3-are-available-on-msdn-technet-here-s-how-to-install-them.aspx. NOTE: You ONLY need the iSCSI Target software. You do NOT need to install all the other bits for Windows Storage Server 2008 R2.


  5. Configure the iSCSI Target and 2 Shared Disks on MainServer.contoso.net i.e. 1 for the quorum and the other for SQL Server data. This blog gives excellent step-by-step instructions on how to do so: http://mohsartawi.wordpress.com/2010/05/25/implementing-a-virtual-machine-failover-cluster-with-windows-storage-server-2008/. Even though it talks about Windows Storage Server 2008 (not R2), the steps are very similar.


  6. Create a Service Account in Active Directory such as SQLService which will be used to run the SQL Server instances on both the servers.


  7. Run the SQL Server setup on SQLServer1.contoso.net and choose the option “New SQL Server failover cluster installation”. Use a generic name such as SQLServer.contoso.net for the SQL Cluster.


  8. Run the SQL Server setup on SQLServer2.contoso.net and choose the option “Add node to a SQL Server failover cluster”.


  9. Create a sample database on SQLServer1 but connect to it from your host machine using SQLServer.contoso.net. Shut down SQLServer1. You should still be able to access the database because SQLServer2 will take over.

SSRS report parameter to show or hide columns

Sometimes reports are used mostly for data analysis, rather than presentation. In such cases users request many filtering options and flexibility in choosing the data included in a report. So frequently they ask to select which report columns to show.

When resulting data set is large it may be better to filter the data in the query or stored procedure and return only the columns user requests. However, there is a simple way to let a user show or hide many columns. All the column selections can be included as one additional report parameter.

Suppose a tabular report lists a number of projects with several details. There are about 20 columns, such as project name, employee contact information, project address, etc. Let's say users want ability to hide most of the columns.

To give users such ability add a multi-valued parameter to the report that lists all the columns that users can hide. The parameter values are "non-queried" and correspond to the column names. By default the report shows all columns, so all possible parameter values are also included in the default list.
In report design select a column that users can hide and find and expand Visibility property of that column. If the Hidden attribute of Visibility property is true, the column is not shown in the resulting report. Set the value of the Hidden attribute to the following expression:

=NOT Join(Parameters!p_columns.Value).Contains("COLUMNVALUE")

The COLUMNVALUE must match the value of the parameter that identifies the selected column, such as "projectname", "address", etc. The Join function combines all possible parameter values into a single string. Then Contains function tests whether the user has checked that particular value in the list. The negation of expression tells Reporting Services to hide the column if the user has not checked its value in the parameter list. Set the Hidden property to such expression for every column you want to be able to hide.

One note of caution about using this approach with SSRS 2005. Due to the way SSRS 2005 text renderer works, the exported file does not include any column that can be hidden. That affects CSV and XML export. If you export directly to Excel, all selected columns show and have values. SSRS 2008 R2 does not have such a problem: text formats include all columns regardless of visibility settings, and other formats include all columns selected by the parameter values.

Processing an OLAP cube with a T-SQL Stored Procedure


When processing cubes, you have a couple of different choices about how to process your cube. One method is use an SSIS package with Cube\Dimension Processing tasks.

Another method to process your cube is to use a stored procedure.

Below is a simple SQL Server stored procedure that you can call to process an OLAP cube using T-SQL.



Create Procedure ProcessCube

@Database varchar(100),

@Cube varchar(100),

@Partition varchar(100) = null, -- If NULL, process the entire Cube

@Server varchar(100) = 'localhost'

as



/* variables used to store object handles */

declare @o_svr int, @o_db int, @o_cube int, @o_part int, @o_mds int

declare @hr int



/* different cube processing options. This SP uses "default" */

declare @PROCESS_DEFAULT int

declare @PROCESS_FULL int

declare @PROCESS_REFRESH_DATA int



set @PROCESS_DEFAULT = 0

set @PROCESS_FULL = 1

set @PROCESS_REFRESH_DATA = 2



-- create a DSO.Server object:

exec @hr = sp_OACreate 'DSO.Server', @o_svr out

if @hr <> 0

begin

print 'Error at create server:'

exec sp_OAGetErrorInfo @o_svr

goto cleanup

end



-- Connect to the server:

exec @hr = sp_OAMethod @o_svr, 'Connect', null, @Server

if @hr <> 0

begin

print 'Error at connect to server:'

exec sp_OAGetErrorInfo @o_svr

goto cleanup

end



-- Get the MDStores property from the Server:

exec @hr = sp_OAGetProperty @o_svr,'MDStores', @o_mds OUT

if @hr <> 0

begin

print 'Error at get getting Server MDStores:'

exec sp_OAGetErrorInfo @o_svr

goto cleanup

end



-- Get the database from the MDStores:

exec @hr = sp_OAGetProperty @o_mds,'Item',@o_db OUT,@Database

if @hr <> 0

begin

print 'Error at get database:'

exec sp_OAGetErrorInfo @o_mds

goto cleanup

end



-- get the MDStores property from the database:

exec sp_OADestroy @o_mds

exec @hr = sp_OAGetProperty @o_db,'MDStores', @o_mds OUT

if @hr <> 0

begin

print 'Error at get database MDStores:'

exec sp_OAGetErrorInfo @o_db

goto cleanup

end



-- get the Cube from the MDStores

exec @hr = sp_OAGetProperty @o_mds,'Item',@o_cube OUT, @Cube

if @hr <> 0

begin

print 'Error at get Cube:'

exec sp_OAGetErrorInfo @o_mds

goto cleanup

end



if @Partition is null -- Process the entire Cube, not just a single partition

begin

exec @hr = sp_OAMethod @o_cube, 'Process', null, @PROCESS_DEFAULT

if @hr <> 0

begin

print 'Error at process Cube:'

exec sp_OAGetErrorInfo @o_cube

goto cleanup

end

end



else -- just process the specified Partition



begin

-- Get the MDStores property of the Cube:

exec sp_OADestroy @o_mds

exec @hr = sp_OAGetProperty @o_cube,'MDStores', @o_mds OUT

if @hr <> 0

begin

print 'Error at get Cube MDStores:'

exec sp_OAGetErrorInfo @o_cube

goto cleanup

end



-- Get the partition to process:

exec @hr = sp_OAGetProperty @o_mds,'Item',@o_part OUT, @Partition

if @hr <> 0

begin

print 'Error at get Parition:'

exec sp_OAGetErrorInfo @o_mds

goto cleanup

end



-- Process the partition:

exec @hr = sp_OAMethod @o_part, 'Process', null, @PROCESS_DEFAULT

if @hr <> 0

begin

print 'Error at process Partition:'

exec sp_OAGetErrorInfo @o_part

goto cleanup

end

end



-- And unlock all objects on the server:

exec @hr = sp_OAMethod @o_svr, 'UnlockAllObjects'

if @hr <> 0

begin

print 'Error at unlock all server objects:'

exec sp_OAGetErrorInfo @o_svr

goto cleanup

end



cleanup:



if @o_mds is not null exec sp_OADestroy @o_mds

if @o_Part is not null exec sp_OADestroy @o_Part

if @o_cube is not null exec sp_OADestroy @o_cube

if @o_db is not null exec sp_OADestroy @o_db

if @o_svr is not null exec sp_OADestroy @o_svr