IDSS Interface for the development of statistical surveys

Advanced
- Debugging
  compiling
  portability
Performance
- Comparative tests

STATISTICAL FEATURES: SIMPLE MODE

Statistical features are dependent on questionnaire parsing mode.
Two parsing modes may be used: Simple Mode or Recursive (R-) Mode. This page gives details on Simple Mode.

1. Input

The questionnaire is read from the input:

-i (path/to/questionnaire)

The application computes the number of database characters for each line, variable and question, adds up these counts and parses variable names, labels and other additional information specified on comment lines.

1.1 Parsing comment lines

Explicit specification of variable, labels and formats is made possible on a single line introduced by double slashes (//) as follows:

// variable label format modification

Field modification corresponds to extra database characters requested by the keyboarding operator, which were not initially included in the questionnaire's variable formats. The sum of field format and field modification is the 'real' format of the variable used on importing the 'flat' database to the SAS format. This field is only parsed if inserted at the end of comment lines, before a closing paragraph tag (</p>). Option -M must be specified on command line for comment parsing to yield the expected results.

Formats can be coded following C language specifications:

%Ns    field of N characters (%s for one character)
%Nd    field of N integer digits (%d for one digit)

or SAS software:

$N. field of N characters ($1. for one character)
N. field of N digits (1. for one digit)

Decimal formats are not taken into account; thus:

            3.3 ou 3,3

will be encoded with format %3d, or 3. with SAS (R), decimal separators being counted as one character or digit.

    Click here to see example 1

1.2 Automatic generation of variable names and labels

Variable names, labels and formats can be read on the questionnaire. Just insert double slashes ( //) before each questionnaire variable:

Click here to see example 2

If there is just one variable for each question, it is possible to delimit variables without using double slashes. Variable assignment is ensured by switch -O in this case. In example 2 option -O cannot be used as there are two variables in one question.
For each variable introduced by double slashes, an automatic variable name is created in the absence of an explicit variable name given on a comment line.
Automatic generation of variable names yields VAR1, VAR2, ..., VARN (option -A).
Formats are automatically computed: table cells are numeric by default, unless an alphabetic character is read.

In this case formats are automatically converted into character chains.
Labels are retrieved betwen curly brackets (braces) in the text of the question (option -U).
When this feature is selected, braces should only be used for delimiting labels. Using braces that are not closed or not opened will yield an error message and the process will be terminated.
When the questionnaire's layout is formatted by option -o, the new questionnaire thereby created no longer contains any brace. These braces are replaced by the ASCII separator with hexadecimal code 0x1F, which is not printable. Auxiliary questionnaires with this improved layout can in turn be processed as input forms with options -i or -ir, separators 0x1F behaving as braces to indentify labels. These separators can also be used right from the start, when designing the survey questionnaire.

Click here to see example 3

When several variables are intoduced in the same question, braces should not be used in the text of the question between the question number and the end of the question itself. This option remains open when there is just one variable per question:

Click here to see example 4

1.3 Partial Automation

Explicit input on a comment line and partial automatic generation may be combined for variable names, labels and formats on comment lines.
With option -H, variable names, labels and formats on the comment line are ignored to just parse modifications requested by the keyboarding operator (this may happen when the keyboarding operator finds flaws with the original, hand-made set of formats designed with the questionnaire).
Table 3 summarizes the various possible options for partial automation.

1.4 Automatic generation of formats

Three types of questionnaire fields can be filled in. They are identified by a variable name and a label. Independently of formats (which may optionally be indicated on comment lines), IDSS computes the number of characters/digits for each variable.
Indentified fields are:

                1. table cells

A variable may correspond to N table cells, on any number of table lines. The cells can be separated by text.
The variable will have format %Nd (or %Ns, or $N., or N.). An example of table format computing is given in the annex.

                2. checkboxes

The same option also makes it possible to identify checkboxes used for yes/no questions.
Checkbox format is of type %s (or $1.)

                3. diacritic characters

Diacritic characters are used to signal a closed question on the same line. Character '-' (simple hypen) is reserved by the software. Two other reserved characters can be specified by compiling options (see COMPILE).
Details on diacritic characters are given on the page Specifications.
An example of format computing is given in the annex.

When option -A is selected, automatic computing of formats is used to write the DATA step program that will import the 'flat' file and convert it into a SAS data base (see 4)

2. Output: formatted questionnaire

A formatted questionnaire (F-questionnaire henceforth) can be created, either to improve the questionnaire's layout or as an auxiliary tool for statistical analysis. The following items can be specified with command-line options:

    - questionnaire layout (color, style, font size) of questions and question numbers,
    - incremented question numbers,
    - line numbers,
- formatted comment lines (variable name, label, format and possible modifications of format),
- number of characters for each question (usually requested by keyboarding operators), henceforth character count.

An example of F-questionnaire is given here, in example 5 and example 6.

F-questionnaire authoring is deactivated by option -a.
The F-questionnaire is different from the original one, unless option -r is selected. In this case, the initial file is transformed and the original version is lost.
The output path for F-questionnaires is argument to option -o
This path is a folder path for Recursive Mode, and a file path for Simple Mode.

-o (path to F-questionnaire)

Colors can be parametrized for question numbers, question text (first paragraph), line count, comment lines and character counts.

They can be chosen in the following list:

rgb(x%, y%, z%): RGB specification
x, percentage of red; y, percentage of green; z, percentage of blue

black, navy, green, teal, maroon, purple, olive, silver, gray, lime, aqua, red , orange, white, yellow

Question number colors and question text colors are specified as follows: -q (number color) (question text color)
Line number color is given by -l (color).
The color of character counts is -n (color).
The color of comment lines can be specified by -c (color).

When -l, -n or -c are not used, corresponding information is not inserted in the F-questionnaire. When -q is not used, question number text color and question text color are left in original black.
When -q is used with just one color, question numbers will be colorized with it. Question text color will be the default question text color.
When -l, -n, or -c are used without any color specification, the default color value for each parameter will be used too. These default values can be different from each other.
Default color values are defined in file constants.h. They are mentioned below:

-q (question number)	navy
(question text)	lime
-l	red
-n	orange
-c	maroon

In addition, HTML tags for these zones can be partially specified:
    - font weight (option -K1) and font style for comment lines (option -K2) ;
    - font size of line numbers (option -K3) and line comments (option -K4) ;
    - style (span, bold, italic) of questions (option -K5) and question numbers (option -K6).

Option K1 if followed by a number (typically between 500 et 1000); option K2 by a style tag (italic, bold).

Numbers typically between 6 and 12 follow K3 and K4. Options K5 et K6 use HTML tags (b for bold, i for italic, span for neutral).

If an F-questionnaire is authored, the original questionnaire can thus simply have 1. (or whatever number below 1,000 followed by '.') to signal the start of a question line. The number must be immediately followed by a dot.

3. Output: flat database and database with separators

Once the questionnaire has been parsed, the flat database sent back by the keyboarding operator can be converted into .csv, .txt or any type of database with separators. This in turn can be easily imported by spreadsheet software like Microsoft Excel or OpenOffice Calc. Any ASCII character can begiven as a separator.

To activate flat database conversion, the following options should be used:

-B -e (path to flat base) -s (path to converted base)

and, optionally:

-$ (variable separator character) -d (decimal separator character)

Variable separators and decimal separators should be carefully checked, as they must be different for obvious reasons.
Under French language "locale" conventions, the decimal separator will be a comma, which implies that the .csv format cannot be comma- separated. In this case, .csv is semi-colon separated. When -$ "," is on command line, it is coerced into a -$ ";" instruction, provided that -d "," is there too.
By default, tabulations are used as variable separators and dots as decimal separators.

In some cases the flat database can contain blank spaces instead of null values or vice-versa. To avoid errors, it is necessary in this case to convert blank spaces into zero values. This transformation is automatically performed by specifying option -z, before inserting separators.

It is possible to extract X lines and Y columns from a base with separators. The resulting base will comprise lines L to L+X-1 included and columns C to C+Y-1 included, in which L and C are the respective ranks of the first extracted line and the first extracted column (from top to bottom for lines and from left to right for columns).

To perform extraction use the following syntax:

-P L L+X-1 for line extraction and/or

-I C C+Y-1 for column extraction.

Numbering of columns starts at 0, according to C language conventions.

back to top

4. Output: SAS(R) DATA step program

Statistical Analysis System (SAS) has a DATA step import procedure for converting a flat database to the SAS format.
The structure of the DATA step runs as follows:

        DATA library.name ;

            ATTRIB
                VARIABLE_NAME_1   LABEL_1
                VARIABLE_NAME_2    LABEL_2
                VARIABLE_NAME_3    LABEL_3
                    (...)
            ;
            INFILE path_to_flat_file_to_be_imported
            sas_options
            ;

            INPUT

            VARIABLE_NAME_1   informat_1
            VARIABLE_NAME_2 informat_2
            VARIABLE_NAME_3 informat_3
                    (...)
            ;
            RUN;

Variable names and labels are parsed in the questionnaire. They are imported as SAS-type variable names and labels. In the INPUT section, each variable's name is associated with the SAS informat corresponding to the informat parsed on the questionnaire's comment lines (or automatically computed if option -A is activated).
If specified on comment lines, informats are indicated right after the label. In this case, option -A must not be used. If there is a modification field on comment line right after the informat field, the value of the corresponding DATA step informat is the sum of the informat field and the modification field on comment line (see 1.1)
If the informat is specified according to C language specification, it is possible to convert the values of the comment line informat fields into SAS-type informat fields by using option -X. This option is only activated for generating SAS DATA steps, which are specified as follows:

    -Ts    library.name
    -To   sas_options
    -Te path_to_flat_file_to_be_imported
    -T path_to_.sas_DATA_step_program

5. Output: Correspondence Table

A correspondence table can be created, which indicates the number of characters/digits for each questionnaire line, variable or question, depending on user's choice. The sum of these counts up to the current line/variable/question is also indicated on the same line. The table can also mention variable names, labels, (in)formats and the modification fields requested at keyboarding stage.
Table generation is triggered by option -t and the directory that contains tables is indicated with option -tp (path to directory)

Click here to see example 7

Columns #VAR (variable rank), #QUEST (question rank) and #LINE (line number) are always generated. To place the #VAR column on the left of the table, and thereby obtain values for each variable, use option -Z. To obtain a table by questionnaire line, use -L instead of -Z. Option -Q displays a more compact table, which will only show values for each question.

Click here to see table options