Alexandria 2.31.0
SDC-CH common library for the Euclid project
|
The Table module provides the means for reading and writing tables from/to ASCII or FITS files. It provides a set of classes describing a table in the memory, which constitute its data model, and methods for reading and writing these classes from/to ASCII and FITS formats. The following sections describe briefly how to use the module and can be used as a quick-start guide. For more detailed information refer to the documentation of each individual class and method in the Table namespace.
The main class of the Table module is the Table class, which represents a table in memory. It can be seen as a collection of Rows, with the restriction that all share the same columns. Each of the columns is defined by its name (which must be unique, not be empty and not contain whitespace characters), the type of the objects it contains, the unit they are expressed and a short text description. The information of the columns, which can be both retrieved from the Table object itself or from any of its Rows, is represented by the ColumnInfo class. Each cell of a Row is stored in a boost::variant (redefined as Row::cell_type), so the boost::get method should be used to retrieve their content. Note that all the Table data model classes are designed to be immutable (cannot be modified).
The following examples demonstrate how to create new Tables from the scratch and some basic operations on them. To avoid duplication, the following lines are assumed for each example:
Note that the second and third lines are used to make the example code more readable and they will introduce all the symbols of the Table and std namespaces, so they must be sparingly used.
This example demonstrates all the necessary steps for creating a Table object. To be more comprehensible, each step is presented and explained separately.
The first step is to create the ColumnInfo object which describes the columns of the table. For example, the following code creates the ColumnInfo object for a lookup table (which will be used to store values of functions):
Each column is described by a ColumnInfo::info_type object (which is an alias for the ColumnDescription class). It gets as arguments the column name, type, unit and description (in this order). From the above, only the name is required. The type defaults to the std::string and the unit and description to empty strings. As can be seen above, our table will have three columns defined as:
The ColumnInfo object will be shared between the Table and all its Rows, so it is stored in a std::shared_ptr. Note that at the construction of the ColumnInfo the info_list
vector is moved because it is not needed anymore. This is optional (but recommended).
Having the ColumnInfo object from the previous step, we can start creating the Rows of the Table. In this step we store the Rows in a vector, which will be used later to construct the Table. The Row constructor gets as parameters a list with its cell values and the shared pointer of the ColumnInfo object. Note that all the cells are of the type Row::cell_type, which is a boost::variant specialization allowing for the following types:
bool
int32_t
int64_t
float
double
std::string
std::vector<bool>
std::vector<int32_t>
std::vector<int64_t>
std::vector<float>
std::vector<double>
NdArray::NdArray<int32_t>
NdArray::NdArray<int64_t>
NdArray::NdArray<float>
NdArray::NdArray<double>
The following code is an example of how to construct a Row in a verbose and easy way to understand, by first creating a vector with the cell values. Note that this vector is not going to be further used, so it is moved during the Row construction.
Because the Row::cell_type is a boost::variant, extra care must be taken so each cell value is of the correct type (as defined in the ColumnInfo). If a cell has incorrect type, a runtime exception will be thrown during the Row construction. This problem becomes more apparent when literals are used, as their types might be different than what the user expects. The following table gives some examples of types and literals, and can be used to avoid mistakes.
Type | Literal | Comment |
---|---|---|
char* | "text" | String literals are C-strings, do not use! |
std::string | string{"text"} | Explicit string construction is necessary |
int32_t | int32_t{5} | Recommended int32_t literal syntax |
int64_t | int64_t{5} | Recommended int64_t literal syntax |
int | int32_t in most cases | |
long | int32_t or int64_t , OS dependent, do not use! | |
long long | int64_t or int128 , OS dependent, do not use! | |
5 | First of int , long , long long which can represent it | |
5L | First of long , long long which can represent it, do not use! | |
5LL | long long , do not use! | |
float | 5.F or 2E-15F | Recommended float literal syntax |
double | 5. or 2E-15 | Recommended double literal syntax |
long double | 5.L or 2E-15L | do not use! |
For the same reason, when passing vector
types to the Row::cell_type constructor, you should use the full constructor name and the initialization list, for example: vector<double>{1.1, 2.2, 3.3}
.
A more complicated and realistic Row creation example is the following, which creates some Rows with the results of the sin
and cos
functions. Note that in this example the Row and the vector containing the cell values are constructed at the same time.
The final step is to create the Table object itself. The constructor of the Table takes only one argument, which is the vector of its Rows. At the moment, there is no support of heterogeneous Tables, which means that all the Rows of a Table must have the same ColumnInfo (an exception is thrown otherwise).
This example shows how the information of the columns can be retrieved by using the ColumnInfo class. A shared pointer to the ColumnInfo object can be retrieved either from a Table object or from a Row.
The ColumnInfo object provides the method ColumnInfo::getDescription(), which can be used for retrieving the information of a specific column by using its index. The total number of columns can be retrieved by the ColumnInfo::size() method. The following code uses these methods to print the information of all the columns:
Output:
If the index of a column is not known it can be found by using the method ColumnInfo::find(). This method returns a unique_ptr<size_t>
which contains a pointer to the index of the column if it was found, or the nullptr
if no column with the given name exists. This facilitates the checking if a column exists or not, as demonstrated in the following example:
Output:
Because all the cell values are stored as boost::variants, if a Table contains columns of a different type than the one the program expects, the problem will not be detected until a cell value is retrieved during runtime. For this reason, to minimize the debugging effort and to allow better log messages, it is recommended to always check the types of the Table columns in interest, before trying to read the data from the Rows cells. This can easily be done by using the typeid()
method:
If you do not know the exact types of a Table (for example when reading from a file) or you want to allow casting to take place, you should use the CastVisitor as described at the following section.
The easiest way to access the data of a Table is by using the iterators provided by the Table and Row classes. The Table::Table::begin() method provides an iterator which can be used to iterate through all the Rows of the table and the Row::begin() method provides an iterator which can be used to iterate through all the cells of the Row. Note that each cell of the Row is of the type boost::variant
, so if a specific type is not required it can be used directly. The following lines of code demonstrate how to print on the screen all the contents of a table, by using these iterators:
Output:
Note that the Table module defines the stream <<
operator for the vector types which are allowed as cell values, so the above example will work even if a column is of a vector type. The convention used for converting the vector to string is to print all the values separated by the "," character (without any white spaces).
Except of the iterators, the Table and Row classes provide also indexed access to their contents. The syntax is similar to the STL containers, but a range check does take place:
Output:
The Row class, in addition to the indexed access, provides a map-like access based on the name of the column. Note though that this way of accessing the Row cells implies a performance penalty and should be used sparingly.
Output:
Most of the time, to use the data of the Table cells, they must be retrieved with the correct type. The Row::cell_type objects can be converted to the actual type by using the boost::get()
method. Note that trying to convert the cell value to a wrong type will result to a boost::bad_get
exception. For example, the following lines of code calculate the sum of the Y column values (column of double
type):
Output:
If you want to avoid using the boost::get
method (and avoid its limitation of having to use the exact type of the column), for example when you read a table from a file and you do not know the exact types in advance, you can use the boost::apply_visitor
with the Table::CastVisitor class. To use the Table::CastVisitor class, you have to give as its template parameter the type you want to cast the cell (it has to be one of the Row::cell_type types):
Note that the Table::CastVisitor supports to and from string conversions, but it does not allow for down-casting cells with numerical values. Vector types can be converted to other vector types, as long as the type of their contents can be converted without down-casting. Finally, converting a scalar (single value) column to vector, will result to a vector with a single element.
The Table module provides functionality for importing and exporting Tables from/to files and streams. The currently supported formats are ASCII space separated text and FITS tables. The following sections demonstrate how to use this functionality.
To allow abstraction of the I/O format, the table module provides the interfaces TableReader and TableWriter for importing and exporting tables. This way, the user can isolate the format specific code to a single place in his application (where he creates the instance of the specific implementation), and expose this instance to the rest of the code using the interface. For example, the following class encapsulates the format details for a table reader:
The TableReader interface is designed to operate as a stream to a given table. It provides the following methods:
The TableReader::read() method, when called without any arguments it will read all the remaining rows of the table. When it is called with a number as an argument it will read only this specific number of rows. This allows the user to handle big files, which otherwise would not fit in memory as a single Table object. As an example, the following function calculates the summary of a column values, reading in memory only 1000 rows per time:
The TableWriter interface, similarly to the TableReader, is also designed to operate as a stream. It provides only two methods:
As with the TableReader, applications which produce tables too big to fit in memory as a single Table object, can break down their computation to smaller chunks and use the addData() call multiple times. As an extra advantage, when the addData() method returns, the rows are guaranteed to be flushed to the underlying table file. This means that the progress of the application can be monitored by checking the output file, and that in case of an application crash the already produced table stays intact. As an example, the following class can be used to create an output table, which is flushed every time 100 results are computed:
Reading ASCII tables is supported by the AsciiReader class, which can be used to read tables from any kind of std::istream
. The stream reference is given as parameter to the constructor. For example, the following code reads a table from a string stream:
This constructor only gets a reference to the stream, so the user is responsible to manage the stream instance itself and to guarantee that the reference stays valid for all the lifetime of the AsciiReader.
As the most common use is to read a file, there is provided a second constructor which gets as parameter a string with the file name. When using this constructor the AsciiReader will handle any file stream opening internally, so the user does not need to worry about any broken references. This is the recommended way of using the AsciiReader:
The AsciiReader class, by default will interpret the #
as comment character and will try to detect automatically the names and the types of the columns in the file, based on the information in the comments before the data. This is done by comment lines following the format:
# Column: NAME TYPE (UNIT) - DESCRIPTION
The NAME is the only required field and it must not contain whitespaces. The TYPE field is interpreted using the following conventions:
TYPE string | Interpretation |
---|---|
bool, boolean | A boolean value as the following (case is ignored): - true, t, yes, y, 1 - false, f, no, n, 0 |
int, int32 | A 32 bit integer |
long, int64 | A 64 bit integer |
float | Single (32 bit) precision floating point |
double | Double (64 bit) precision floating point |
string | String without whitespaces |
[bool], [boolean] | A vector of booleans |
[int], [int32] | A vector of 32 bit integer |
[long], [int64] | A vector of 64 bit integer |
[float] | A vector of single (32 bit) precision floating point |
[double] | A vector of double (64 bit) precision floating point |
[bool+], [boolean+] | A multidimensional array of booleans |
[int+], [int32+] | A multidimensional array of 32 bit integer |
[long+], [int64+] | A multidimensional array of 64 bit integer |
[float+] | A multidimensional array of single (32 bit) precision floating point |
[double+] | A multidimensional array of double (64 bit) precision floating point |
The vector columns should contain the values of the vector separated by the "," (comma) character, without any white spaces. If the type of a column is missing, it is interpreted as a string column.
The multidimensional arrays start with the shape between <>
, followed by a one dimensional representation of the values (a vector). ASCII tables do support different shapes for different entries. For instance:
Note that the column description comments are optional. If there are less column descriptions than the actual columns in the file (or if they are missing completely) the columns are named as col1, col2, etc (starting from 1) and are considered to be of type string.
Optionally, the last non empty comment line can also contain the names of the columns, separated by whitespaces. to allow for easier reading of the file. In this case, the names of this line define the names of the columns and they are matched with the column description comment lines to find the column info.
Reading a file that follows the above rules is straight forward. For example, if a file contains the following text:
it can be read directly by using the command:
Note that if you use the constructor constructor which uses a stream referehce, the above command will read till the end of the stream, so any further reading attempts will behave like reading an empty file. In this case, for the following examples to work, before calling the read method again, the stream must be reopened, or be rewind at the beginning by using the commands:
Fore more information of how to partially read the file, how to skip rows, etc see the TableReader interface section.
Except of this automatic column name/type detection mode described above, the AsciiReader class can be parameterized so it can read files which do not follow the default conventions. The first parameterization is overriding the types of the columns, by calling the method AsciiReader::fixColumnTypes(). This is very useful for the cases that a program wants to read the column in a different type (for example read a float
column as a double
column), but also for the cases that a file does not contain the column types and the program does not want to read them as strings. As an example, the following code will read the previous file interpreting the X and Y columns as float
values:
Note that the above code will still detect automatically the names of the columns. If the program wants to override the column names, it can use the second AsciiReader parameterization by calling the AsciiReader::fixColumnNames() method, which allows to fix the column names. The following code will read our table file by reading the X column as Angle and the Y column as Value:
The last way the AsciiReader can be parameterized is the comment character. In fact, any string can be used as a comment indicator and not only a single character. The comment indicator is set by calling the AsciiReader::setCommentIndicator() method. The following example can read files which start their comments with the //
sequence (column names and types will be detected automatically):
All the above methods are returning a reference to the reader itself, so they can be chained together to read tables in a single line:
To export a Table as an ASCII character sequence, the Table module provides the AsciiWriter class. Similarly with the AsciiReader, the AsciiWriter provides two constructors, one that gets as parameter a std::ostream
reference and one that gets as parameter a string with the filename to write the table to. Again, the recommended constructor is the one that gets the filename and handles the file stream internally.
By default, the AsciiWriter will use as a comment indicator the #
character, it will add the column description comments, the names of the columns as the last comment line and it will right-align all the columns. The width of the columns will be automatically calculated based on the contents of the table cells at the first addData() call. If sequential addData() call require more space for a specific column its width is expanded, but any already written rows are not modified. For example, the following code will print on the screen our table:
Output:
Printing a table on the screen this way can be very handy, especially during development. Be careful though, because printing a table with hundreds of thousands of rows in the std::cout
will actually print all of them.
Similarly with the AsciiReader, the AsciiWriter can be parameterized by method calls. The first option which can be parameterized is the comment character to use, by calling the method AsciiWriter::setCommentIndicator(). For example, the following code will use the //
comment indicator:
Output:
The second way to parameterize the AsciiWriter is to hide or not the comments with the column info, by calling the method AsciiWriter::showColumnInfo(). In this case only the comment with the column names is generated. Again, like with AsciiReader, these calls can be chained in a single line:
Output:
Fore more information of how to progressively write a table in a file see the TableWriter interface section.
The functionality related with reading a Table from a FITS file is provided by the FitsReader class. Both ASCII and binary tables are supported. Similarly with the AsciiReader, this class provides multiple constructors to either read from an existing CCfits::HDU
reference or to give the filename and extension index or name:
table_hdu
is a reference. This is obligatory because the result of the extension()
method is a sublcass of the CCfits::HDU
.As the FITS format is a standard, the conventions the FitsReader uses to convert a table HDU to a Table object is straigh forward. The column names, types and units will automatically be detected from the standard HDU keywords (TTYPEn, TFORMn and TUNITn keywords accordingly). The column descriptions are read from the (non standard) keyword TDESCn.
Reading the Table is as simple as:
Similarly with the AsciiReader, the FitsReader can be parameterized via its method calls. In particular, the names of the columns can be overridden:
Overriding the column types on the other hand is not allowed. The conventions of the FITS types conversions to the Table cell values (both for ASCII and binary FITS tables) can be found in the documentation of the FitsReader class.
To export a Table in a FITS file, the Table module provides the FitsWriter class. The same as all the rest of the Readers/Writers, this class provides multiple constructors, to either manage the lifetime of the CCfits::FITS object internally or to delegate it to the user:
Similarly with the rest readers/writers, the FitsWriter can be parameterized by method calls. Two parameterizations are possible, to set the table HDU format by calling the method FitsWriter::setFormat() (if not called, binary format is used), and to set the HDU name by calling the method FitsWriter::setHduName() (if not called, the name is set to the empty string). These calls myst be done before any data are written to the FitsWriter. Note that due to file size and performance benefits, it is recommended to use binary format except if you have specific reasons why not to do so. The following code shows how to write a table in a file, overriding it if it already exists, as an ASCII table HDU called HDUNAME
:
Note that in contrast with the AsciiWriter, which always overrides any existing file, the FitsWriter can handle already existing files or CCfits::FITS references with existing HDUs. This allows for both adding new table HDUs and for appending rows to existing ones. The rules for doing so are very simple. If there already exists an extension HDU with the name set by calling the method FitsWriter::setHduName() (or the empty string if this method is not called), the writer will start appending rows to this existing table. If there is no such HDU, the writer will create a new extension HDU at the end of the FITS and write the table there. Note that if the writer appends to an existing HDU, it will ignore any calls to the FitsWriter::setFormat() method (it will use the format of the existing table), as well as any comments added by calling the method FitsWriter::addComment().