Articles

HomePower BIHow to Profile SQL Server Data Using Power BI

How to Profile SQL Server Data Using Power BI

With the announcement of General Availability for Power BI 2.0, Microsoft made BI to the masses possible. There were some steps back (chart/map drill down, tile slicers, etc.) but many more steps forward. They removed the reliance on Excel, added formatting options, new visualizations, and more.

With the new release, we wanted to create a dashboard so we started with SQL Server data profiling.  This helps in analyzing your data source for optimizing the data model.  What columns have high or low cardinality?  What is the uncompressed data size?  What are the data types of high cardinality columns?  This helps in analyzing the data to see if data types need to be modified, if columns need to be split, if tables are at a lower granularity than needed, etc.

Here is the sample dashboard created on top of the data model using the AdventureWorks Data Warehouse database AdventureWorksDW2012:

Power BI SQL Metadata

We included the SQL Server logo (this same method with altered SQL could be used for Oracle, Teradata, etc.), the number of schemas for the database, the number of tables, the number of columns, the size in MB for each table, the number of distinct elements and corresponding size by column, the number of distinct elements by table and column in a heat map, and a data grid with corresponding information.

We created a stored procedure to extract this data but it could also be passed in as a query to the SQL Server database.

Download SQL Here

declare @results table
(
ID varchar(36),
SchemaName varchar(250),
TableName varchar(250),
ColumnName varchar(250),
DataType varchar(250),
MaxLength varchar(250),
Shortest int,
Longest int,
Minimum varchar(1000),
Maximum varchar(1000),
NullCount int,
NotNullCount int,
DistinctCount int,
SizeMB decimal(12,2),
FullSQL varchar(1500),
FullSQLBit varchar(1500)
)
INSERT INTO @results(ID,SchemaName,TableName,ColumnName,DataType,MaxLength,Shortest,Longest,Minimum,Maximum,NullCount,NotNullCount,DistinctCount,SizeMB,FullSQL,FullSQLBit)
SELECT
NEWID() as [ID],
OBJECT_SCHEMA_NAME(TB.[object_id],DB_ID()) AS [SchemaName],
Object_Name(c.object_id) as [TableName],
c.name as [ColumnName],
t.Name as [DataType],
case
when t.Name not in (‘varchar’,’nvarchar’,’nchar’) Then ‘NA’
when c.max_length = -1 then ‘Max’
else CAST(c.max_length as varchar)
end as [MaxLength],
0 as [Shortest],
0 as [Longest],
0 as [Minimum],
0 as [Maximum],
0 as [NullCount],
0 as [NotNullCount],
0 as [DistinctCount],
0 as [SizeMB],
CASE WHEN t.Name = ‘bit’ THEN NULL ELSE
‘SELECT ”’+t.Name+”’ as [DataType],
Min(Len(‘ + c.name + ‘)) as [Shortest],
Max(Len(‘ + c.name + ‘)) as [Longest],
case when ”’+t.Name+”’ = ”varbinary” then ”NA” else Min(‘ + c.name + ‘) end as [Minimum],
case when ”’+t.Name+”’ = ”varbinary” then ”NA” else Max(‘ + c.name + ‘) end as [Maximum],
SUM( CASE WHEN ‘ + c.name +’ IS NULL THEN 1 ELSE 0 END) as [NullCount],
SUM( CASE WHEN ‘ + c.name + ‘ IS NOT NULL THEN 1 ELSE 0 END) as [NotNullCount],
COUNT(DISTINCT ‘ + c.name + ‘) as [DistinctCount],
SUM(DATALENGTH(‘ + c.name + ‘)) / 1048576.0 as [SizeMB] FROM ‘ + OBJECT_SCHEMA_NAME(c.object_id) + ‘.’ + Object_Name(c.object_id) END as [FullSQL],
CASE WHEN t.Name = ‘bit’ THEN
‘SELECT ”’+t.Name+”’ as [DataType],
Min(Len(‘ + c.name + ‘)) as [Shortest],
Max(Len(‘ + c.name + ‘)) as [Longest],
Min(‘ + c.name + ‘+0) as [Minimum],
Max(‘ + c.name + ‘+0) as [Maximum],
SUM( CASE WHEN ‘ + c.name +’ IS NULL THEN 1 ELSE 0 END) as [NullCount],
SUM( CASE WHEN ‘ + c.name + ‘ IS NOT NULL THEN 1 ELSE 0 END) as [NotNullCount],
COUNT(DISTINCT ‘ + c.name + ‘) as [DistinctCount],
SUM(DATALENGTH(‘ + c.name + ‘)) / 1048576.0 as [SizeMB] FROM ‘ + OBJECT_SCHEMA_NAME(c.object_id) + ‘.’ + Object_Name(c.object_id) ELSE NULL END as [FullSQLBit]
FROM
sys.[tables] AS TB
INNER JOIN sys.[all_columns] C ON TB.[object_id] = C.[object_id] INNER JOIN sys.[types] T ON C.[system_type_id] = T.[system_type_id] AND C.[user_type_id] = T.[user_type_id] WHERE
tb.name not in (‘DatabaseLog’)
–select * from @results order by MaxLength desc
—————————————-COMBINE ALL (MAX,MIN,COUNT) INTO SINGLE SQL
–LONGEST
DECLARE @id varchar(36)
DECLARE @fullsql varchar(1500)
DECLARE @fullsqlbit varchar(1500)
DECLARE @receiver table(
DataType varchar(250),
Short varchar(250),
Long varchar(250),
Minimum varchar(1000),
Maximum varchar(1000),
NullCount varchar(250),
NotNullCount varchar(250),
DistinctCount varchar(250),
SizeMB decimal(12,2)
)
declare @receiverbit table(
DataType varchar(250),
Short varchar(250),
Long varchar(250),
Minimum varchar(1000),
Maximum varchar(1000),
NullCount varchar(250),
NotNullCount varchar(250),
DistinctCount varchar(250),
SizeMB decimal(12,2)
)
DECLARE length_cursor CURSOR
FOR SELECT ID, FullSQL, FullSQLBit FROM @results –WHERE MaxLength != ‘NA’
OPEN length_cursor
FETCH NEXT FROM length_cursor
INTO @id, @fullsql, @fullsqlbit
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @receiver (DataType,Short,Long,Minimum,Maximum,NullCount,NotNullCount,DistinctCount,SizeMB)
exec(@fullsql)
UPDATE @results
SET Shortest = (SELECT Short FROM @receiver),
Longest = (SELECT Long FROM @receiver),
Minimum = (SELECT Minimum FROM @receiver),
Maximum = (SELECT Maximum FROM @receiver),
NullCount = (SELECT NullCount FROM @receiver),
NotNullCount = (SELECT NotNullCount FROM @receiver),
DistinctCount = (SELECT DistinctCount FROM @receiver),
SizeMB = (SELECT SizeMB FROM @receiver)
WHERE ID = @id
AND DataType <> ‘bit’
DELETE FROM @receiver
INSERT INTO @receiverbit (DataType,Short,Long,Minimum,Maximum,NullCount,NotNullCount,DistinctCount,SizeMB)
exec(@fullsqlbit)
UPDATE @results
SET Shortest = (SELECT Short FROM @receiverbit),
Longest = (SELECT Long FROM @receiverbit),
Minimum = (SELECT Minimum FROM @receiverbit),
Maximum = (SELECT Maximum FROM @receiverbit),
NullCount = (SELECT NullCount FROM @receiverbit),
NotNullCount = (SELECT NotNullCount FROM @receiverbit),
DistinctCount = (SELECT DistinctCount FROM @receiverbit),
SizeMB = (SELECT SizeMB FROM @receiverbit)
WHERE ID = @id
AND DataType = ‘bit’
DELETE FROM @receiverbit
FETCH NEXT FROM length_cursor
INTO @id, @fullsql, @fullsqlbit
END
CLOSE length_cursor
DEALLOCATE length_cursor
SELECT
SchemaName,
TableName,
ColumnName,
DataType,
MaxLength,
Shortest,
Longest,
Minimum,
Maximum,
NullCount,
NotNullCount,
DistinctCount,
SizeMB
FROM
@results

We then renamed columns to make Q&A more user friendly, added measures to their own blank query to delineate dimensions in the main query and measures in the blank query, and hid columns.

At this point, you can create your reports, add text boxes if needed, any images, slicers if needed, and create your visualizations on top of the data.

Once created, we uploaded to our PowerBI.com service online and pin visualizations or images to our newly created ‘SQL Metadata’ dashboard. If you need to keep it up to date because data is changing, download and install the Power BI Personal Gateway application.

Download the .pbix here!

(Visited 2,059 times)

3 Responses to “How to Profile SQL Server Data Using Power BI”

By PowerDAX - 13 September 2015 Reply

So, I just copied this SQL onto another pc and realized that it is reformatted when inside WordPress. I am so sorry if anyone had trouble utilizing it. I added a link to download the actual SQL file. My apologies!

By Pär - 22 September 2017 Reply

I tried this, but the SQL query does not work for me. I get this error message:

Msg 257, Level 16, State 3, Line 1
Implicit conversion from data type varchar to timestamp is not allowed. Use the CONVERT function to run this query.

By PowerDAX - 22 September 2017 Reply

Hi Pär – the SQL has to be altered a bit if you have columns that require brackets, etc. – depending on the naming convention used for the SQL tables.

The query was built to be fairly inclusive but may need to be altered for your specific environment.

Leave a Comment