SAS for beginners - Day 4 || Input methods & infile statement & options in SAS || datasets in SAS
Вставка
- Опубліковано 19 вер 2024
- How to create datasets in SAS?
Different Input methods & infile statement & options in SAS.
How to bring data from external text files into SAS environment ?
Creating SAS Datasets,
This can be done in two ways,
(1) Entering raw data directly using Cards statement
(2) Importing data into SAS from other PC files
Entering raw data directly using Cards/ datalines statement, this is called as
Input method
There are four different input methods,
1. List input,
2. Column input,
3. Named input
4. Formatted input methods
• List input method:
Example:
Data Emp_data;
input EmpID EName $ Dept $;
cards;
101 Rick IT
102 Dan OPS
103 Tusar IT
104 Pranav OPS
105 Rasmi FIN
Run;
Proc print data=Emp_data;
Run;
List input method is the easiest input method to use because, as shown in the example you
simply list the variable names in the same order as the corresponding data values.
The main restrictions are,
• Data values must be separated by at least one blank space
• Data values must be entered following the variable names order
• Missing values must be represented by a period. (A blank field causes the matching of
variable names and values to get out of sync.)
• Character values can't contain embedded blanks.
• The default length of character values is 8 bytes. A longer value is truncated when it is
written to the data set.
• Data must be in standard character or numeric format.
Advanced list input method:
We can overcome the restrictions in this list input method by, by using advanced list
input method,
• The length restriction we can overcome by using length statement
• The embedded blank spaces restriction by putting ‘:&’ after the variable name and before
the $ symbol and maintain extra blank spaces between the data values.
Data Emp_data;
length EName$ 15.
input EmpID EName:& $ Dept $ ;
cards;
101 Rick IT
102 Dan OPS
103 Tusar IT
104 Pranav goswami OPS
105 Rasmi FIN
Run;
Proc print data=Emp_data;
Run;
• Column Input method:
Example
Data Emp_data;
input EmpID 1-3 EName $ 4-17 Dept $20-22;
cards;
101Rick IT
102Dan OPS
103Tusar IT
104Pranab goswami OPS
105Rasmi FIN
Run;
Proc print data=Emp_data;
Run;
In this method, we use the width of the columns to specify the value i.e., from which position to
which position the first variable value is there
The main advantages of this method are,
• Data values need not be separated by blank space
• Missing values need not be represented by a period/placeholder
• Character values can contain embedded blanks.
• The length of character values can be more than 8 bytes
The restrictions,
• In the same field/line on all the data input lines (Every variable data value should start at the
same position)
• In standard numeric or character form
• Formatted input method:
Data Emp_data;
input @1 EmpID 3. @4 EName $ 14. @20 Dept $ 3. ;
cards;
101Rick IT
102Dan OPS
103Tusar IT
104Pranab goswami OPS
105Rasmi FIN
Run;
Proc print data=Emp_data;
Run;
In this method, we use the pointers to the columns to specify the value i.e., from which position
the value is starting to how many characters/positions it is holding
The main advantages of this method are,
• Data values need not be separated by blank space (if data values are separated by blank
space then we no need to represent the length of the column i.e.,
• Missing values need not be represented by a period/placeholder
• Character values can contain embedded blanks.
• The length of character values can be more than 8 bytes/characters
• Need not to be in standard numeric or character form
The restrictions,
• In the same field/line on all the data input lines (Every variable data value should start at the
same position)
• Named input:
Data Emp_data;
input EmpID= EName= $ Dept= $ ;
cards;
EmpID=101 EName= Rick Dept= IT
EName= Dan
EmpID=102 Dept= OPS
Dept= IT
EmpID=103 EName= Tusar
EName= Pranab goswami
EmpID=104 Dept= OPS
Dept= FIN
EmpID=105 EName= Rasmi
Run;
Proc print data=Emp_data;
Run;
In named input method we have specify the variable names in front of each data value.
Output method:
In this method, we import the data into SAS from external files/sources using infile or input
statements
Infile Options:
Flow over:
It is a default nature of the SAS, it reads line by line and observation for one observation, If any value is missing in raw data, SAS catches the next value, this concept is called Flowover. Here missing
values should be represented with period
Missover option:
When you have the missing values you can use the missover option instead of assigning a period for missing values.
Truncover option:
When you have the missing values you can use the Truncover option instead of assigning a period for missing values. However, it returns the same results as missover.
Stopover:
Tells SAS to stop reading when you find a missing record and print the data it already in the
data set.