KOSMA File I/O Library

written by Urs Graf

Introduction

The KOSMA file I/O library is intended to help exchange information between programs running on one or several computers. In the style of a database system, every variable is associated with its name and some more information about it. Exchange of information between programs is done by writing variables to files and reading them from files. A variable is found within a file exclusively through its name. It does not matter where the variable is located within the file. Thus, it is simple to, for example, edit or restructure the file without affecting the the way how information is exchanged. In addition, information exchange can be monitored by tracing the file contents.
This documentation tries to give a short overview of the capabilities and the structure of the library. The correct syntax for calling the library routines is best copied from the example programs.

Using the library

There are three basic actions performed on the I/O variables:

Configure a variable
Read a variable from a file
Write a variable to a file

Configuring a variable is necessary to tell the program how to handle the variable's I/O, in particular to set the input an output files for it. Configuring is usually (but not necessarily) done only once within a program.
Both for reading and writing of variables, it is recommended to use only functions which operate on complete var_sets. This is even compulsory in the case of writing variables since the sequential nature of the data files do not allow random write access to a single variable. In other words: whenever a variable is written, the library writes the complete var_set of this variable. Reading individual variables is in principle possible; however, it is usually not very practical and may result in increased overhead.
It is convenient to use the functions ReadAllVariables() and WriteAllVariables(), to update all the variables from or in all files where they occur. For I/O functions of a individual var_sets, try ReadVariablesVarSet( &variable.name ) and WriteVariablesVarSet( &variable.name ). Note that in order to read or write a variable's var_set, the calling program it is not required to know the name of the var_set or file. This is taken care of by the library and the config file.

Library functions

The functions available to the user are listed as function prototypes in the KOSMA_file_io.h file. The most important functions are the following:

Configure functions:

ConfigXxxx( &variable, "namestring", "configfile", "dimensionstring" )
Xxxx denotes the data type of the variable's value ( current options are: Int, Double, String ). Calling a ConfigXxxx function associates the variable with the namestring, initializes its INFOBLOCK data structure and searches for an entry in the configfile to read configuration information. If the dimensionstring is not empty, the variable is configured as an array (see Arrays). For instance, if the dimensionstring is "4 6 8", it is configured as a three dimensional array with 4, 6, and 8 entries in the respective dimensions. Of course, the declaration of the variable has to be consistent with this, i.e. 192 array elements have to be declared in total to avoid memory overlap problems.
Scalar variables need an empty dimensionstring: "".
See the explanation of the config file name setting for easy ways to use standard config file names.

Read functions:

ReadAllVariables()
ReadVariablesVarSet( &variable.name )
[ ReadVar( &variable.name ) ]
The first one of these functions reads all variables previously configured. The second one reads all variables sharing the read file with variable. The third one reads a single variable, which, in the current philosophy is considered obsolete and may at some stage be discontinued.
ReadVariablesVarSet (and ReadVar) returns a double value which, if it is non-negative, is the time difference in seconds between the time stamp of the currently read data and the previously read version of the same data. If this difference is 0 (i.e. if the input file has not changed between now and the last read access) the functions return without actually reading anything. A negative return value indicates that an error has occurred. Its absolute value is either 1, if the input file could not be opened, or it is the number of variables which could not be read from the file.
ReadAllVariables returns the sum of the time differences of all input files, if no errors occurred. If read errors occurred, it returns the sum of all read errors as above.

Write functions

WriteAllVariables()
WriteVariablesVarSet( &variable.name )
[WriteVariable( &variable.name )]
The first one of these functions writes all variables previously configured. The second one writes all variables sharing any of the write files with variable. The third one is obsolete and will disappear at some stage. The reason for this is that it is not possible to write randomly somewhere in a file without destroying other information in the file. Therefore, WriteVariable internally calls WriteVariablesVarSet to rewrite the whole files completely. Thus, if one calls WriteVariable successively for a number of variables in the same var_set, this var_set is rewritten with every call.
The write functions return integer values corresponding to the number of write errors that occurred within the function. A successful call returns 0.

The var_set concept

A var_set is a set of variables residing in the same file, as defined in the config file. Each I/O variable belongs to one or several var_sets. Since a variable can only be read from one file it can not belong to more than one read var_set. Configuring the read file as NULL configures a write only variable, which does not belong to any read var_set. This is meant for variables whose values are created internally in the program.
A variable can be written to an arbitrary number of output files, and , therefore, may belong to an arbitrary number of write var_set. If at least one of the write files is configured as NULL, the variable is read-only, i.e. it will not be written to any file.
Depending on the configuration information, the library automatically keeps track of the interdependence of variables and var_sets. For instance if a given value is required in an additional output file, it is sufficient to configure an additional write file to make it available. Recompiling of the software is not required.

Arrays

Array variables are treated as individual variables in the I/O routines. Each array element has its individual name which consists of the array's basename extended by the element's index within the array. For example the element (3,5,7) of a 3-dimensional array named a_3d gets the name a_3d[3][5][7]. When configuring an array, a single entry for the array's basename is sufficient. Based on the dimensional information given in the call of the config function, the library automatically creates the names for the array elements. This obviously means, that the array elements get the same default values, and they belong to the same var_sets. However, when an array is read from an input file, each element needs its own entry.
The naming of the array entries employs the C syntax. Array indexing starts with 0, multidimensional arrays are sorted the way C does it. This may lead to conflicts when configuring multidimensional arrays through calls from other programming languages.

Settings

Directories:

Three environment variables govern the directory structures of the various files: CONFIG_DIR, READ_DIR, and WRITE_DIR. If any of them is not set, the library works in the current working directory. If any of the variables is set, the library appends a slash (/) and prepends it to the appropriate file names. The directories may be set or reset explicitly by the routines SetConfigDir( "dirname" ), SetReadDir( "dirname" ), and SetWriteDir( "dirname" ).

Config-File Name:

The name of the config file is most conveniently set by StandardConfigFile( __FILE__ ), which replaces the .c extension of the source file by the extension .cfg. Alternatively the default config file name can be set by the command DefineConfigFile( "filename" ). If a default config file is set, variables may be configured without explicitly giving the config file name ( use "" as configfile in the ConfigXxxx function call).

Variable format

Variables are stored together with information about them in a C-structure containing the entries:

.name
INFOBLOCK
.value

The name section is a pointer to a string containing the name of the variable. The INFOBLOCK macro defines a number of entries in the structure containing information such as pointers to the functions required to read and write the variable (that's data type specific), a format string for writing, the names of the files for input and output of the variable etc. The value section finally contains the value of the variable (or, in the case of string variables, a pointer to the value).
Within the calling program, the values of I/O variables can thus be accessed through variable.value ( variable[x,y,z,...].value in the case of an array variable).

Config-File format

To configure a variable, a towline entry in a configuration file is required. The first line contains

the variable value
the variable name
the write format string

The variable name string acts as a delimiter to terminate the string containing the variable value. If a variable name is found in a file, the string between the beginning of the line and the first character of the name is taken as the input string to read the variable default value from. The first non-blank character after the name string is taken as the first character of the format string, which extends to the last non-blank character of the line.
The second line in the configuration file contains

the variable's input file name
a comma separated list of output file names
an optional comment string started by the comment character '!'

If a variable should not be read from any input file (i.e. it is created locally), the input file name has to be NULL (case sensitive). Similarly, if it should not be written, at least one of the output file names has to be NULL.

I/O File format

All files written by the library routines contain a time stamp entry in the first line of the file. It is actually a STRING_VAR variable by the variable.name of File update time stamp (i.e. this is a reserved variable name. Never use it for any other variable!), containing as its value a string composed of the human readable version and a machine readable version of the date and time when the file was written. This time stamp entry is used by the read functions to determine whether the file contents has changed since the last read or not.
The I/O file format corresponds to the configuration file format, but it is reduced to one line and contains only three entries:

the variable value
the variable name
an optional comment string started by the comment character

Again, the variable name acts as a delimiter to terminate the string used to read the variable's value from.

Error handling

The library routines issue error messages categorized by severity of the error: DEBUG, INFO, WARNING, ERROR, FATAL... Errors of category FATAL terminate program execution.
Whether error messages are actually displayed or not is controlled by the debug level (set with the command DebugLevel( int dl ) ). The higher the debug level the more messages are issued. A cute way of setting the debug level is by calling DebugLevel( FATAL - minlevel ). Where minlevel is any of the error categories defined above. This displays all error categories from FATAL down to minlevel.
In addition, a long error message format can be chosen, to include file name and line number information in the error message.
If an error log file is set ( ErrorLogfile( "logfilename" ) ), errors are also logged in this file. Time stamps are written to the log file whenever ErrorLogfile is called.