Uniqueness in data

The SAS-macro below will tell you if a variable in a dataset is unique.

/********************************************************************************
Author        : 
Creation date : ddmmmyyy
Description   : Gets info about uniqness in a SAS-dataset.
Example       : %uniq(sashelp.class, name, print)
*********************************************************************************
Input
-----
&datset   : The dataset to test.
&variable : The variable to test for uniqueness.
&print    : If the output/result should be shown in a PROC PRINT.
*********************************************************************************
Output
------
freq_result          : Dataset sorted with doublets as first rows.
freq_result_doublets : Data containing only the doublets.
********************************************************************************/
%macro uniq(dataset, variable, print);
	proc freq data=&dataset.;
		tables &variable / noprint out=freq_result;
	run;

	proc sort data=freq_result;
		by descending count;
	run;

	data freq_result_doublets;
		set freq_result;
		where count gt 1;
	run;

	proc sql noprint;
		select count(*) into :doublets
		from freq_result_doublets
		where count gt 1
		;
	quit;

	%put --------------------------------------------------------------------------------------------;
	%put NUMBER OF DOUBLETS IN [%upcase(&dataset.)] FOR VARIABLE [%upcase(&variable.)]: &doublets.;
	%put --------------------------------------------------------------------------------------------;

	%if &doublets. eq 0 %then
	%do;
		%put !!!! NO DOUBLETS !!!;
	%put --------------------------------------------------------------------------------------------;
	%end;

        %if &print. ne %then
        %do;
             proc print data=freq_result;
             run;
        %end;
%mend;

 

Finding unique and dublicates in SAS

The code below shows you how to find unique and duplicate values in a dataset and get them seperated into two different datasets.
The variables you want to examin for uniqueness has to be in the by-statement and each have an not(first.<variable> and last.variable). Be aware that in SAS 9.3 there is an easier solution using proc sort.

data unique dups;
 set sashelp.class;
 by Age Height Name Weight;
 if not(first.Age and last.Age) 
 and not(first.Height and last.Height) 
 and not(first.Name and last.Name)
 and not(first.weight and last.Weight) then output dups;
 else output unique;
run;

This code is different than using proc sort prior to SAS 9.3

proc sort data=sashelp.class nodupkey out=unique dupout=dups;
 by Age Height Name Weight;
run;

The code above will take the first of the dublicates and put it into the unique-dataset. It will not completely seperate unique and duplicate rows from each other.

In SAS 9.3 proc sort has a new parameter uniqueout. This can be used to do the trick of the datastep much easier. I haven’t tried it, but I imagine that this is how it works.

proc sort data=sashelp.class nouniquekeys uniqueout=singles out=dublet; 
 by Age Height Name Weight;
run;