A normalized data that tends to be tall and narrow is preferred by most of the SAS procedures like Proc means and Proc Freq.

Since we often do not have control over the form of the data when we receive it, we need to be able to convert the data from the normal to non-normal form and from non-normal to normal form or from long data to wide data and vice versa.

This process is known as transposing the data and the operations are commonly performed by PROC TRANSPOSE .

Variables become observations and observations become variables.

The general syntax of Transpose Procedure

PROC TRANSPOSE DATA=Dataset-name OUT=New-dataset-name; BY variable(s); COPY variable(s); ID variable; VAR variable(s); RUN;

The variables specified in the BY statement is transposed within the combination of the BY variable. The BY variables themselves aren’t transposed but is used to determine the row structure of the transposed dataset. The variables need to be sorted before running PROC TRANSPOSE unless you specify the NOSORTED option. For long-to-wide transposes, the BY variables should uniquely identify each row. For wide-to-long transposes, the BY variables determine the row structure of the long data; that is, it determines the repetition of the rows.

statement is transposed within the combination of the BY variable. The BY variables themselves aren’t transposed but is used to determine the row structure of the transposed dataset.

The ID statement can be used to help identify rows. The new columns created will be named as per the variables specified in the ID statement. Thus, ID Statement also gives names to the Transposed column. The ID statement also ties a value in a specific row to a specified new column. In case of long-to-wide transposes, the structure of the column is determined by the ID variable. There will be one column for each unique value of the ID variable (or if multiple ID variables are present, one column for each unique combination of values). For wide-to-long transposes, you typically do not need an ID variable. However, if you do supply an ID variable, it will determine the column structure. The combination of variables on the BY and ID statements must identify down to the row level.

statement can be used to help identify rows. The new columns created will be named as per the variables specified in the ID statement. Thus, ID Statement also gives names to the Transposed column.

The variables in the VAR statement are transposed. If the VAR statement is not included PROC TRANSPOSE will transpose all numeric variables that are not included in a BY statement or an ID statement. Character variables are transposed only if they are listed in a VAR statement. Usually, one variable is specified for a long to wide transpose whereas multiple variables are specified for wide to long datasets. One row for each variable in the VAR statement is returned in the output dataset.

statement are transposed. If the VAR statement is not included PROC TRANSPOSE will transpose all numeric variables that are not included in a BY statement or an ID statement. Character variables are transposed only if they are listed in a VAR statement.

Transposing Long to Wide Datasets

PROC TRANSPOSE provides the ability to go from a long dataset to a wide dataset.

Below is an example of a long dataset (SASHELP.ORSALES).

proc transpose data=sashelp.orsales out=sales; var quantity profit total_retail_price; run;

Transposing Wide to Long Datasets

The syntax for transposing wide to long datasets is essentially identical, but the objective is to reduce the number of columns and create a data structure where multiple rows are used to define the different attributes of a variable.

proc transpose data=sashelp.library out=column1; id libref; var _all_; run;

Applying the options

NAME = SAS automatic variable _NAME_ contains the name of the variable being transposed. The remaining transposed variables are named COL1 all the way through COLn.

DELIMITER= specifies a delimiter to use as a name for transposed variables in the output data set. The delimiter specified is inserted between variable values if there is more than one variable given in the ID statement.

You can use the PREFIX= or SUFFIX= option to specify a prefix or suffix for each new variable name.

data exa; input subject test $ score; datalines; 1 post 92 1 pre 90 2 post 88 2 pre 77 3 post 50 3 pre 51 4 post 77 4 pre 72 5 post 69 5 pre 60 ; run;

proc transpose data=exa out=exa1 prefix=score; by subject; id test; var score; run;

Transposing multiple variables – Double Transpose

Double Transpose helps us to transpose multiple variables and reshape long data to a wide format.

Below is the original format of the data which we want to convert to a wide format.

data subj; input subject Month $ potassium sodium; datalines; 210 JAN 5.0 14.0 210 FEB 3.0 11.0 210 MAR 2.0 12.0 211 JAN 1.0 11.0 211 FEB 5.0 10.0 211 MAR 3.0 19.0 212 JUN 3.0 12.0 ; run;

We want an output similar to below.

proc transpose data=subj out=labtran; by subject Month notsorted; var sodium potassium; run; proc transpose data=labtran out=sparsed(drop=_name_); by subject; var col1; id Month _name_; run; proc print;

The first PROC TRANSPOSE step creates one column for each value of the variable Potassium and Sodium and all the values are stored in a single variable COL1.

The second PROC TRANSPOSE step reconverts the columns Potassium and Sodium into rows. The data now has every month represented as a column for each Potassium and Sodium values.