Automatically analyzing and documenting SAS-code

In SAS Enterprise Guide and SAS-DI you have the possibility to analyze the code for a SAS-program.

The picture below shows the Analyze Program option in SAS Enterprise Guide.

Analyzing the code should result in a conversion of the SAS-code to a SAS Enterprise Guide flow or SAS-DI flow. But none of these code analyzers are very good. Depending on the complexity of the SAS-code you put into the analyzer, they will leave you with a more or less successful conversion. And more times than not they will fail at doing the job.

But SAS comes with a procedure PROC SCAPROC that does a really good job at analyzing and documenting SAS-code.

Below is an example. The ‘attr‘ option writes additional information about the variables in the input data sets and views. The ‘expandmacros‘ option expands macro invocations into separate tasks.

proc scaproc;
record '<PATH>' attr expandmacros;
run;
 
proc scaproc;
write;
run;

Be aware, that PROC SCAPROC is not able to get information from the execution of your SAS-program flow, if you are rsubmitting to other servers.

The links below gives you further descriptions of PROC SCAPROC and its options. There’s also a guide on how to do a graphical presentation of the result fra PROC SCAPROC. http://support.sas.com/resources/papers/proceedings17/1104-2017.pdf

Overview of the SCAPROC Procedure
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003199745.htm

Program for parsing the output from PROC SCAPROC to create a data set with inputs and outputs.
http://support.sas.com/kb/58/047.html

Innovative Performance Improvements Through Automated Flowcharts In SAS
http://support.sas.com/resources/papers/proceedings16/11580-2016.pdf

Uniqueness in data

The SAS-macro below will tell you if a variable in a dataset is unique.

/********************************************************************************
Author        : 
Creation date : ddmmmyyy
Description   : Gets info about uniqness in a SAS-dataset.
Example       : %uniq(sashelp.class, name, print)
*********************************************************************************
Input
-----
&datset   : The dataset to test.
&variable : The variable to test for uniqueness.
&print    : If the output/result should be shown in a PROC PRINT.
*********************************************************************************
Output
------
freq_result          : Dataset sorted with doublets as first rows.
freq_result_doublets : Data containing only the doublets.
********************************************************************************/
%macro uniq(dataset, variable, print);
	proc freq data=&dataset.;
		tables &variable / noprint out=freq_result;
	run;

	proc sort data=freq_result;
		by descending count;
	run;

	data freq_result_doublets;
		set freq_result;
		where count gt 1;
	run;

	proc sql noprint;
		select count(*) into :doublets
		from freq_result_doublets
		where count gt 1
		;
	quit;

	%put --------------------------------------------------------------------------------------------;
	%put NUMBER OF DOUBLETS IN [%upcase(&dataset.)] FOR VARIABLE [%upcase(&variable.)]: &doublets.;
	%put --------------------------------------------------------------------------------------------;

	%if &doublets. eq 0 %then
	%do;
		%put !!!! NO DOUBLETS !!!;
	%put --------------------------------------------------------------------------------------------;
	%end;

        %if &print. ne %then
        %do;
             proc print data=freq_result;
             run;
        %end;
%mend;

 

SSIS and IBM DB2 – Don’t do it!

SSIS is a product that should be indifferent to whatever RDBMS you use. That’s also true. It will work with every RDBMS through drivers ODBC, OLE DB, ADO.NET etc. BUT it works best with Microsofts SQL-server.

This blogpost will take a look at using IBM DB2 with SSIS. Below is a test using ODBC-driver for IBM DB2 from IBM, OLE DB for IBM DB2 from Microsoft and an ADO.NET-driver.

Transfer rate of data

Transferring 100.000 rows from one IBM DB2-server to another IBM DB2-server.

OLE DB-driver for IBM DB2 from Microsoft 6 minuts
ADO.NET-driver 30 seconds
ODBC-driver for IBM DB2 from IBM 5 seconds

As seen above the ODBC-driver for IBM DB2 from IBM is the best solution when it comes to transfer rate. The OLE DB-driver for IBM DB2 from Microsoft is a really poor choice. But using the ODBC-driver for DB2 from IBM is not the solution to everything!

Lookup-task in SSIS

As seen in the picture below it’s only possible to use an OLE DB-connection in the Lookup-task. The Lookup-task is really slow and it might be better doing it in SQL. Regarding a lot of the tasks in SSIS, they are really slow (see links below).

OLE DB Command

SSIS also contains a ‘OLE DB Command’-task and no ‘ODBC Command’-task.

Fast load/Bulk load

OLE DB
It’s NOT possible to use the fast load option with the OLE DB-driver when it comes to IBM DB2. It’s possible to choose the fastload option, but you will get the error below.

The reason why the OLE DB-driver is that slow is probably that it transfers one row at a time.

ODBC
It’s possible to use a batch/fast load load option in ODBC.

ADO.NET
It’s also possible to use a bulk/fast load option in ADO.NET.

Links

The links below is to different sources that can help you with SSIS.

SQL Server Integration Services Design Patterns
A book that describes different design pattern for SSIS.

Task Factory for SSIS from Pragmatic Works
A collection of optimized SSIS-tasks.

Cozyroc
A collection of optimized SSIS-tasks.

PragmaticWorks free online training videos
Free online training videos about SSIS (search for SSIS).

Pluralsight
Online training videos in SSIS (requires a paid subscription to Pluralsight).

SSIS and versioning

This post will describe the big challenges that SSIS has when it comes to versioning solutions in SSIS when working as a team on the same SSIS-solution.

These experiences are based on using SubVersion (SVN) and Microsoft Team Foundation Server (TFS) for versioning solutions in SSIS. I have no experience with Git. But I will assume that the same problem occurs.

An example could be that John and Allan is working together on a SSIS-solution. Let’s call it ‘Staging DW’. They start out with local (their own) identical working copy of the SSIS-solution.

John creates a new SSIS-package in the SSIS-solution ‘Staging DW’. Let us call the SSIS-package ‘Stage Company’.
Allan creates a new SSIS-package in the SSIS-solution ‘Staging DW’. Let us call the SSIS-package ‘Stage Sales’.

John updates his working copy of the SSIS-solution ‘Staging DW’ and commits his changes to ‘Staging DW’ containing the SSIS-package ‘Stage Company’. Everything goes well.

Now Allan wants to commit his changes to the SSIS-solution ‘Staging DW’. He updates his working copy of the SSIS-solution ‘Staging DW’ – but he gets a conflict in the file ‘Staging DW.dtproj’ (an XML-file containing information about the SSIS-solution ‘Staging DW’ and its SSIS-packages).

The problem is that John’s new SSIS-package ‘Stage Company’ is a new package in the SSIS-solution ‘Stage DW’ and Allans SSIS-package ‘Stage Sales’ is a new package n the SSIS-solution ’Stage DW’. In the XML-file ‘Staging DW.dtproj’ information about the SSIS-package ‘Stage Company’ created by John occupies the same line as information about Allan new SSIS-package ‘Stage Sales’ in the XML-file ‘Staging DW.dtproj’.

The tools for solving conflicts in SVN and TFS isn’t capable of adding to the file. They can only overwrite.

One solution could be that Allan overwrites his ‘Staging DW.dtproj’ file. This means that his SSIS-solution ‘Staging DW’ looses it’s knowledge of his new SSIS-package ‘Stage Sales’. But he doesn’t lose his SSIS-package ‘Staging Sales’ it will still in the directory of the SSIS-solution ‘Stage DW’. He can now add this SSIS-package ‘Staging Sales’ to the SSIS-solution ‘Staing DW’ now containing John’s new SSIS-package ‘Staging Company’ and commit the SSIS-solution ‘Staging DW’ containing both John’s new SSIS-package ‘Stage Company’ and Allan’s new SSIS-package ‘Stage Sales’ without any problems.

Another solution is good old communication between team members. John could have told Allan that he was going to create the SSIS-package ‘Stage Company’ and that Allan should wait a bit with creating his SSIS-package ‘Stage Sales’ until John had committet the SSIS-solution ‘Stage DW’ containing his SSIS-package ‘Stage Company’. Then Allan could have updated his local copy and got the changes John made. Then Allan could start making his SSIS-package ‘Stage Sales’.

Or they could upfront have made all the SSIS-packages (without content) in the SSIS-solution. They could also choose to make SSIS-solution that was so specific or small that they did have to work together on them.

As of medio 2017 there is no real solution to this challenge.

Partitions in SSAS with no data

If you get the error below when processing a cube in SSAS. You might have partitions in the cube, that doesn’t have corresponding data in the underlying relational tables.

Errors in the OLAP storage engine: The restrictions imposed on partition slice where violated.‏

A solution could be to just delete the affected partitions from the cube.

Using special national characters in unittests with NBI

First of all, if you have a .csv file containing the results that you want to compare with your test and the result contains special national characters, then you need to encode the file in Unicode before saving the file. You change the encoding in Notepad for Windows as shown below.

SaveUnicode

If your .mdx file with the test to perform contains national characters in Danish that could be the characters æ (ae), ø (oe) and å (aa). Then you have to translate them to these characters to make them work.

æ æ
Æ Ã†
ø ø
Ø Ã˜
å Ã¥
Å Ã…

More characters can be found here.

An example of and MDX-query with national characters for in .mdx file for unittesting in NBI would look like the query below.

-- MEASURES IN A CUBE – SPECIFY CUBE_NAME BELOW
SELECT CUBE_NAME AS [CUBE],[MEASUREGROUP_NAME] AS [FOLDER],
[MEASURE_UNIQUE_NAME] as UNIQUIE_NAME, [MEASURE_CAPTION] AS [MEASURE],
[MEASURE_IS_VISIBLE]
FROM $SYSTEM.MDSCHEMA_MEASURES
WHERE CUBE_NAME ='MÆJ' (that is MÆJ)
ORDER BY [MEASUREGROUP_NAME]