STATISTICAL FEATURES: SIMPLE MODE
Two parsing modes may be used: Simple Mode or Recursive (R-) Mode. This page gives details on Simple Mode.
1. Input
The questionnaire is read from the input:-i (path/to/questionnaire)
The application computes the number of database characters for each line, variable and question, adds up these counts and parses variable names, labels and other additional information specified on comment lines.
1.1 Parsing comment lines
Explicit
specification of variable, labels and formats is made possible on a
single line introduced by double slashes (//) as follows:
// variable label format modification
Field modification
corresponds to extra database characters requested by the keyboarding
operator, which were not initially included in the questionnaire's
variable formats. The sum of field format and field
modification is
the
'real' format of the variable used on importing the 'flat' database
to the SAS format. This field is only parsed if inserted at
the
end of comment lines, before a closing paragraph tag (</p>).
Option -M
must be specified on command line for comment parsing to yield the
expected results.
Formats can be coded following C language specifications:
%Ns field of N characters (%s for one character)
%Nd field of N integer digits (%d for one digit)
or SAS software:
$N. field of N characters ($1. for one character)
N. field of N digits (1. for one digit)
Decimal formats are not taken into account; thus:
3.3 ou 3,3
will be encoded with format %3d, or 3. with SAS (R), decimal separators being counted as one character or digit.
Click here to see example 1
1.2 Automatic generation of variable names and labels
Variable
names, labels and formats can be read on the
questionnaire. Just insert double slashes ( //) before
each questionnaire variable:
Click here to see example 2If
there is just one variable for each question, it is possible
to delimit variables without using double slashes. Variable assignment
is ensured by switch -O in
this case. In example 2 option -O
cannot be used as there are two variables in one question.
For each variable introduced by double slashes, an automatic variable name is created in the absence of an explicit variable name given on a comment line.
Automatic generation of variable names yields VAR1, VAR2, ..., VARN (option -A).
Formats are automatically computed: table cells are numeric by default, unless an alphabetic character is read.
For each variable introduced by double slashes, an automatic variable name is created in the absence of an explicit variable name given on a comment line.
Automatic generation of variable names yields VAR1, VAR2, ..., VARN (option -A).
Formats are automatically computed: table cells are numeric by default, unless an alphabetic character is read.
In this case formats are
automatically converted into character chains.
Labels are retrieved betwen curly brackets (braces) in the text of the question (option -U).
When this feature is selected, braces should only be used for delimiting labels. Using braces that are not closed or not opened will yield an error message and the process will be terminated.
When the questionnaire's layout is formatted by option -o, the new questionnaire thereby created no longer contains any brace. These braces are replaced by the ASCII separator with hexadecimal code 0x1F, which is not printable. Auxiliary questionnaires with this improved layout can in turn be processed as input forms with options -i or -ir, separators 0x1F behaving as braces to indentify labels. These separators can also be used right from the start, when designing the survey questionnaire.
Labels are retrieved betwen curly brackets (braces) in the text of the question (option -U).
When this feature is selected, braces should only be used for delimiting labels. Using braces that are not closed or not opened will yield an error message and the process will be terminated.
When the questionnaire's layout is formatted by option -o, the new questionnaire thereby created no longer contains any brace. These braces are replaced by the ASCII separator with hexadecimal code 0x1F, which is not printable. Auxiliary questionnaires with this improved layout can in turn be processed as input forms with options -i or -ir, separators 0x1F behaving as braces to indentify labels. These separators can also be used right from the start, when designing the survey questionnaire.
Click here to see example 3
When several variables
are intoduced in the same question, braces should not be used in the
text of the question between the question number and the end of the
question itself. This option remains open when there is just one
variable per question:
Click here to see example 4
1.3 Partial Automation
Explicit input on a
comment line and partial automatic generation may be combined for
variable names, labels and formats on comment lines.
With option -H, variable names, labels and formats on the comment line are ignored to just parse modifications requested by the keyboarding operator (this may happen when the keyboarding operator finds flaws with the original, hand-made set of formats designed with the questionnaire).
Table 3 summarizes the various possible options for partial automation.
With option -H, variable names, labels and formats on the comment line are ignored to just parse modifications requested by the keyboarding operator (this may happen when the keyboarding operator finds flaws with the original, hand-made set of formats designed with the questionnaire).
Table 3 summarizes the various possible options for partial automation.
1.4 Automatic generation of formats
Three types of
questionnaire fields can be filled in. They are identified by a
variable name and a label. Independently of formats (which may
optionally be indicated on comment lines), IDSS computes the number of
characters/digits for each variable.
Indentified fields are:
1. table cells
A variable may correspond to N table cells, on any number of table lines. The cells can be separated by text.
The variable will have format %Nd (or %Ns, or $N., or N.). An example of table format computing is given in the annex.
2. checkboxes
The same option also makes it possible to identify checkboxes used for yes/no questions.
Checkbox format is of type %s (or $1.)
3. diacritic characters
Diacritic characters are used to signal a closed question on the same line. Character '-' (simple hypen) is reserved by the software. Two other reserved characters can be specified by compiling options (see COMPILE).
Details on diacritic characters are given on the page Specifications.
An example of format computing is given in the annex.
When option -A is selected, automatic computing of formats is used to write the DATA step program that will import the 'flat' file and convert it into a SAS data base (see 4)
Indentified fields are:
1. table cells
A variable may correspond to N table cells, on any number of table lines. The cells can be separated by text.
The variable will have format %Nd (or %Ns, or $N., or N.). An example of table format computing is given in the annex.
2. checkboxes
The same option also makes it possible to identify checkboxes used for yes/no questions.
Checkbox format is of type %s (or $1.)
3. diacritic characters
Diacritic characters are used to signal a closed question on the same line. Character '-' (simple hypen) is reserved by the software. Two other reserved characters can be specified by compiling options (see COMPILE).
Details on diacritic characters are given on the page Specifications.
An example of format computing is given in the annex.
When option -A is selected, automatic computing of formats is used to write the DATA step program that will import the 'flat' file and convert it into a SAS data base (see 4)
2. Output: formatted questionnaire
A formatted questionnaire (F-questionnaire henceforth) can be created, either to improve the questionnaire's layout or as an auxiliary tool for statistical analysis. The following items can be specified with command-line options:- questionnaire layout (color, style, font size) of questions and question numbers,
- incremented question numbers,
- line numbers,
- formatted comment lines (variable name, label, format and possible modifications of format),
- number of characters for each question (usually requested by keyboarding operators), henceforth character count.
An example of F-questionnaire is given here, in example 5 and example 6.
F-questionnaire
authoring is deactivated by option -a.
The F-questionnaire is different from the original one, unless option -r is selected. In this case, the initial file is transformed and the original version is lost.
The output path for F-questionnaires is argument to option -o
This path is a folder path for Recursive Mode, and a file path for Simple Mode.
The F-questionnaire is different from the original one, unless option -r is selected. In this case, the initial file is transformed and the original version is lost.
The output path for F-questionnaires is argument to option -o
This path is a folder path for Recursive Mode, and a file path for Simple Mode.
-o (path to F-questionnaire)
Colors can be
parametrized for question numbers, question text (first paragraph),
line count, comment lines and character counts.
They can be chosen in the following list:rgb(x%, y%, z%): RGB specification
x, percentage of red; y, percentage of green; z, percentage of blue
black, navy, green, teal, maroon, purple, olive, silver, gray, lime, aqua, red , orange, white, yellow
Question number colors and question text colors are specified as follows: -q (number color) (question text color)
Line number color is given by -l (color).
The color of character counts is -n (color).
The color of comment lines can be specified by -c (color).
When -l, -n or -c are not used,
corresponding information is not inserted in the F-questionnaire. When -q is not used,
question number text color and question text color are left in original
black.
When -q is used with just one color, question numbers will be colorized with it. Question text color will be the default question text color.
When -l, -n, or -c are used without any color specification, the default color value for each parameter will be used too. These default values can be different from each other.
Default color values are defined in file constants.h. They are mentioned below:
When -q is used with just one color, question numbers will be colorized with it. Question text color will be the default question text color.
When -l, -n, or -c are used without any color specification, the default color value for each parameter will be used too. These default values can be different from each other.
Default color values are defined in file constants.h. They are mentioned below:
-q (question number) | navy |
(question text) | lime |
-l | red |
-n | orange |
-c | maroon |
In addition, HTML tags for these zones can be partially specified:
- font weight (option -K1) and font style for comment lines (option -K2) ;
- font size of line numbers (option -K3) and line comments (option -K4) ;
- style (span, bold, italic) of questions (option -K5) and question numbers (option -K6).
Option K1 if followed by a number (typically between 500 et 1000); option K2 by a style tag (italic, bold).
Numbers typically
between 6 and 12 follow K3
and K4.
Options K5
et K6 use
HTML tags (b for bold,
i for italic,
span for neutral).
If an F-questionnaire is authored, the original questionnaire can thus
simply have 1. (or whatever number below 1,000 followed by
'.') to signal the start of a question line. The number must be
immediately followed by a dot.3. Output: flat database and database with separators
Once the questionnaire
has been
parsed, the flat database sent back by the keyboarding operator can be
converted into .csv, .txt or any type of database with separators. This
in turn can be easily imported by spreadsheet software like Microsoft
Excel or OpenOffice Calc. Any ASCII character can begiven as a
separator.
To activate flat database conversion, the following options should be
used:-B -e (path to flat base) -s (path to converted base)
and, optionally:
-$ (variable separator character) -d (decimal separator character)
Variable separators and
decimal separators should be carefully checked, as they must be
different for obvious reasons.
Under French language "locale" conventions, the decimal separator will be a comma, which implies that the .csv format cannot be comma- separated. In this case, .csv is semi-colon separated. When -$ "," is on command line, it is coerced into a -$ ";" instruction, provided that -d "," is there too.
By default, tabulations are used as variable separators and dots as decimal separators.
Under French language "locale" conventions, the decimal separator will be a comma, which implies that the .csv format cannot be comma- separated. In this case, .csv is semi-colon separated. When -$ "," is on command line, it is coerced into a -$ ";" instruction, provided that -d "," is there too.
By default, tabulations are used as variable separators and dots as decimal separators.
In some cases the flat
database can
contain blank spaces instead of null values or vice-versa. To avoid
errors, it is necessary in this case to convert blank spaces into zero
values. This transformation is automatically performed by specifying
option -z,
before inserting separators.
It is possible to
extract X lines and Y columns from a base with separators. The
resulting base will comprise lines L to L+X-1 included and columns C to
C+Y-1 included, in which L and C are the respective ranks of the first
extracted line and the first extracted column (from top to bottom for
lines and from left to right for columns).
To perform extraction use the following syntax:-P L L+X-1 for line extraction and/or
-I C C+Y-1 for column extraction.
Numbering of columns starts at 0, according to C language conventions.back to top
4. Output: SAS(R) DATA step program
Statistical Analysis System (SAS) has a DATA step import procedure for converting a flat database to the SAS format.The structure of the DATA step runs as follows:
DATA library.name ;
ATTRIB
VARIABLE_NAME_1 LABEL_1
VARIABLE_NAME_2 LABEL_2
VARIABLE_NAME_3 LABEL_3
(...)
;
INFILE path_to_flat_file_to_be_imported
sas_options
;
INPUT
VARIABLE_NAME_1 informat_1
VARIABLE_NAME_2 informat_2
VARIABLE_NAME_3 informat_3
(...)
;
RUN;
Variable names and
labels are parsed in the questionnaire. They are imported as SAS-type
variable names and labels. In the INPUT section, each variable's name
is associated with the SAS informat corresponding to the
informat parsed on the questionnaire's comment lines (or automatically
computed if option -A
is activated).
If specified on comment lines, informats are indicated right after the label. In this case, option -A must not be used. If there is a modification field on comment line right after the informat field, the value of the corresponding DATA step informat is the sum of the informat field and the modification field on comment line (see 1.1)
If the informat is specified according to C language specification, it is possible to convert the values of the comment line informat fields into SAS-type informat fields by using option -X. This option is only activated for generating SAS DATA steps, which are specified as follows:
If specified on comment lines, informats are indicated right after the label. In this case, option -A must not be used. If there is a modification field on comment line right after the informat field, the value of the corresponding DATA step informat is the sum of the informat field and the modification field on comment line (see 1.1)
If the informat is specified according to C language specification, it is possible to convert the values of the comment line informat fields into SAS-type informat fields by using option -X. This option is only activated for generating SAS DATA steps, which are specified as follows:
-Ts library.name
-To sas_options
-Te path_to_flat_file_to_be_imported
-T path_to_.sas_DATA_step_program
5. Output: Correspondence Table
A correspondence table
can be created, which indicates the number of characters/digits for
each questionnaire line, variable or question, depending on user's
choice. The sum of these counts up to the current
line/variable/question is also indicated on the same line. The table
can also mention variable names, labels, (in)formats and the
modification fields requested at keyboarding stage.
Table generation is triggered by option -t and the directory that contains tables is indicated with option -tp (path to directory)
Table generation is triggered by option -t and the directory that contains tables is indicated with option -tp (path to directory)
Click here to see example 7
Columns #VAR
(variable rank), #QUEST
(question rank) and #LINE
(line number) are always generated. To place the #VAR column on the
left of the table, and thereby obtain values for
each variable, use option -Z.
To obtain a table by questionnaire line, use -L instead of -Z. Option -Q displays a more
compact table, which will only show values for each question.
Click here to see table options