SAS for beginners - Day 4 || Input methods & infile statement & options in SAS || datasets in SAS

Поділитися
Вставка
  • Опубліковано 19 вер 2024
  • How to create datasets in SAS?
    Different Input methods & infile statement & options in SAS.
    How to bring data from external text files into SAS environment ?
    Creating SAS Datasets,
    This can be done in two ways,
    (1) Entering raw data directly using Cards statement
    (2) Importing data into SAS from other PC files
    Entering raw data directly using Cards/ datalines statement, this is called as
    Input method
    There are four different input methods,
    1. List input,
    2. Column input,
    3. Named input
    4. Formatted input methods
    • List input method:
    Example:
    Data Emp_data;
    input EmpID EName $ Dept $;
    cards;
    101 Rick IT
    102 Dan OPS
    103 Tusar IT
    104 Pranav OPS
    105 Rasmi FIN
    Run;
    Proc print data=Emp_data;
    Run;
    List input method is the easiest input method to use because, as shown in the example you
    simply list the variable names in the same order as the corresponding data values.
    The main restrictions are,
    • Data values must be separated by at least one blank space
    • Data values must be entered following the variable names order
    • Missing values must be represented by a period. (A blank field causes the matching of
    variable names and values to get out of sync.)
    • Character values can't contain embedded blanks.
    • The default length of character values is 8 bytes. A longer value is truncated when it is
    written to the data set.
    • Data must be in standard character or numeric format.
    Advanced list input method:
    We can overcome the restrictions in this list input method by, by using advanced list
    input method,
    • The length restriction we can overcome by using length statement
    • The embedded blank spaces restriction by putting ‘:&’ after the variable name and before
    the $ symbol and maintain extra blank spaces between the data values.
    Data Emp_data;
    length EName$ 15.
    input EmpID EName:& $ Dept $ ;
    cards;
    101 Rick IT
    102 Dan OPS
    103 Tusar IT
    104 Pranav goswami OPS
    105 Rasmi FIN
    Run;
    Proc print data=Emp_data;
    Run;
    • Column Input method:
    Example
    Data Emp_data;
    input EmpID 1-3 EName $ 4-17 Dept $20-22;
    cards;
    101Rick IT
    102Dan OPS
    103Tusar IT
    104Pranab goswami OPS
    105Rasmi FIN
    Run;
    Proc print data=Emp_data;
    Run;
    In this method, we use the width of the columns to specify the value i.e., from which position to
    which position the first variable value is there
    The main advantages of this method are,
    • Data values need not be separated by blank space
    • Missing values need not be represented by a period/placeholder
    • Character values can contain embedded blanks.
    • The length of character values can be more than 8 bytes
    The restrictions,
    • In the same field/line on all the data input lines (Every variable data value should start at the
    same position)
    • In standard numeric or character form
    • Formatted input method:
    Data Emp_data;
    input @1 EmpID 3. @4 EName $ 14. @20 Dept $ 3. ;
    cards;
    101Rick IT
    102Dan OPS
    103Tusar IT
    104Pranab goswami OPS
    105Rasmi FIN
    Run;
    Proc print data=Emp_data;
    Run;
    In this method, we use the pointers to the columns to specify the value i.e., from which position
    the value is starting to how many characters/positions it is holding
    The main advantages of this method are,
    • Data values need not be separated by blank space (if data values are separated by blank
    space then we no need to represent the length of the column i.e.,
    • Missing values need not be represented by a period/placeholder
    • Character values can contain embedded blanks.
    • The length of character values can be more than 8 bytes/characters
    • Need not to be in standard numeric or character form
    The restrictions,
    • In the same field/line on all the data input lines (Every variable data value should start at the
    same position)
    • Named input:
    Data Emp_data;
    input EmpID= EName= $ Dept= $ ;
    cards;
    EmpID=101 EName= Rick Dept= IT
    EName= Dan
    EmpID=102 Dept= OPS
    Dept= IT
    EmpID=103 EName= Tusar
    EName= Pranab goswami
    EmpID=104 Dept= OPS
    Dept= FIN
    EmpID=105 EName= Rasmi
    Run;
    Proc print data=Emp_data;
    Run;
    In named input method we have specify the variable names in front of each data value.
    Output method:
    In this method, we import the data into SAS from external files/sources using infile or input
    statements
    Infile Options:
    Flow over:
    It is a default nature of the SAS, it reads line by line and observation for one observation, If any value is missing in raw data, SAS catches the next value, this concept is called Flowover. Here missing
    values should be represented with period
    Missover option:
    When you have the missing values you can use the missover option instead of assigning a period for missing values.
    Truncover option:
    When you have the missing values you can use the Truncover option instead of assigning a period for missing values. However, it returns the same results as missover.
    Stopover:
    Tells SAS to stop reading when you find a missing record and print the data it already in the
    data set.

КОМЕНТАРІ • 24