6 UniRec is a data format for storage and transfer of simple unstructured records,
7 i.e. sets of key-value pairs with a fixed set of keys. A record in UniRec format
8 is similar to a C structure but it can be defined at run-time. It thus brings
9 possibility to dynamically create structures in a statically typed language.
11 The main advantage of UniRec is extremely fast access to fields of a record.
12 No parsing is needed, the fields are accessed directly from the record, almost
13 as in a plain C struct.
15 In comparison with access to a struct member, just one additional memory
16 access is needed in order to find position of the field in the record. This
17 access is to a small table which easily fits into a CPU cache.
19 To create an UniRec record, a user first needs to specify a set of fields and
20 their types - a template. Then a memory for the record is allocated and field
21 values can be set using simple macros.
23 NOTE: The following text describes UniRec as used in the C or C++ language.
25 ### Simplified example:
28 // Create a template with three fields (their types must be defined earlier)
29 ur_template_t *tmplt = ur_create_template("FIELD1,FIELD2,FIELD3", NULL);
31 // Create a record with that template
32 void *record = ur_create_record(tmplt, 0);
34 // Set values of fields
35 ur_set(tmplt, record, F_FIELD1, 1);
36 ur_set(tmplt, record, F_FIELD2, 234);
37 ur_set(tmplt, record, F_FIELD3, 56);
39 // Read values of the record and print them to standard output
41 ur_get(tmplt, record, F_FIELD1),
42 ur_get(tmplt, record, F_FIELD2),
43 ur_get(tmplt, record, F_FIELD3),
47 The example states that the types of the fields must be defined before a
48 template can be created. If names and types of the fields are known at
49 compile-time, they can be defined at the beginning of a *.c file as in the
52 // Specify which fields will be used in the code and what are their types
59 If the set of fields and their types is not known in advance, they may also be
60 defined at run-time. However, access to such fields is then a little more
61 complicated due to limitations of statically types languages (if a compiler
62 doesn't know the type of a field, it can't create a set of instructions to read
63 from or write into it).
69 An UniRec field may have one of the following types:
71 |name |size| description |
72 |--------|----|--------------------------------------------------------------|
73 |int8 | 1 |8bit singed integer |
74 |int16 | 2 |16bit singed integer |
75 |int32 | 4 |32bit singed integer |
76 |int64 | 8 |64bit singed integer |
77 |uint8 | 1 |8bit unsigned integer |
78 |uint16 | 2 |16bit unsigned integer |
79 |uint32 | 4 |32bit unsigned integer |
80 |uint64 | 8 |64bit unsigned integer |
81 |char | 1 |A single ASCII character |
82 |float | 4 |Single precision floating point number (IEEE 754) |
83 |double | 8 |Double precision floating point number (IEEE 754) |
84 |ipaddr | 16 |Special type for IPv4/IPv6 addresses, see below for details |
85 |macaddr | 6 |Special type for MAC address, see below for details |
86 |time | 8 |Special type for precise timestamps, see below for details |
87 |string | - |Variable-length array of (mostly) printable characters |
88 |bytes | - |Variable-length array of bytes (not expected to be printable characters) |
91 Types "string" and "bytes" are the same from a machine point of view (both are
92 of type char[] in C), the only difference is their semantics. When printing
93 as text, "string" is usually printed directly as ASCII or UTF-8 string,
94 "bytes" is rather interpreted as binary data and printed in hex.
96 A terminating null character ('\0') SHOULD NOT be included at the end of
97 "string" values since this is specific for the C language and data in UniRec
98 should be independent of a programming language.
101 Structure to store both IPv4 and IPv6 addresses and associated functions.
104 Structure to store MAC address and associated functions.
107 Structure to store timestamps and associated types, macros and function.
113 Name of field may be any string matching the regular expression
114 [A-Za-z][A-Za-z0-9_]*
115 with the following limitations:
116 - It SHOULD NOT end with "_T" as this is reserved in C implementation for
117 symbolic constants storing the type of a field.
119 It is RECOMMENDED that all field names are uppercase.
121 Physical record layout
122 ----------------------
124 An UniRec record consists of field values put one after another in a specific
125 order. There is no header. Information about the template and the size of the
126 record must be provided by other means.
128 The layout of a record is given only by its template (specifying a set of fields
129 and their types) and the following rules.
131 A record is divided into three sections:
132 1. Values of all fixed-length fields
133 2. Meta-information about variable-length fields
134 3. Data of variable-length fields
136 Fixed-length fields in the first section are sorted by their size from largest
137 to smallest. Fields with the same size are sorted alphabetically by their name.
139 The second section contains two 16bit numbers for each variable-length field -
140 offset of the beginning of the field's data and length of the data (in bytes).
141 The offset is counted from the beginning of the record.
143 The meta-information fields are sorted alphabetically by the field names.
145 The third section contains data of variable-length fields in an arbitrary order.
146 The data of variable-length fields SHOULD be placed immediately one after
147 another. There SHOULD be NO "empty spaces" between them and data of the fields
150 The first two sections are called the "fixed-length part" of a record, since their
151 total size is always the same and all data are present on fixed offsets (for a
152 given template). The last section is called "variable-length part" because its total
153 length as well as position of individual fields may be different in each record.
157 The following picture shows layout of a record containing information about a
158 HTTP connection. The template of this record contains the following fields:
159 ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT,
160 uint8 PROTOCOL, uint8 TCP_FLAGS, uint32 PACKETS, uint32 BYTES,
161 uint16 HTTP_RSP_CODE, string HTTP_URL, string HTTP_USER_AGENT
165 +-------+-------+-------+-------+
170 +-------+-------+-------+-------+
175 +-------+-------+-------+-------+
177 +-------+-------+-------+-------+
179 +-------+-------+-------+-------+
180 40 | DST_PORT | HTTP_RSP_CODE |
181 +-------+-------+-------+-------+
182 44 | SRC_PORT | PROTO | TCP_F |
183 +-------+-------+-------+-------+
184 48 | HTTP_URL(off) | HTTP_URL(len) |
185 +-------+-------+-------+-------+
186 52 | HTTP_USER(off)| HTTP_USER(len)| fixed-length
187 +-------+-------+-------+-------+ -----------------
188 56 | HTTP_URL (data) | variable-length part
191 +-------+-------+-------+ +
192 64 | HTTP_USER_AGENT (data) |
201 All values, except IP and MAC addresses, are in little endian. IP and MAC addresses are treated
202 rather as sequences of bytes than numbers, so they are left in network order,
203 i.e. big-endian (however, they are encapsulated in a special data type and
204 shouldn't be accessed directly so the internal format should be needed to know).
207 ### Maximal record length
209 Maximal length of the record is limited to 65534 (2^16 - 2) bytes.
212 ### Template definition
214 Templates are usually defined by a string enumerating all the fields in the
215 template, using comma (',') as a separator of field names. Order of field names
216 in such string is not important (since physical order of fields is given by the
225 Types, enums and structures defined in unirec.h.
246 An enum value for each of the UniRec types.
249 Unsigned integer type for holding field IDs.
250 IMPLEMENTATION NOTE: ur_field_id_t = uint16_t
253 A structure defining an UniRec template. It contains information about which
254 fields are present in a record with that template and how to access them.
255 For user this is a black box, it is not needed to access the structure's
259 Iterator type used by ur_iter_fields function
263 Constants used for iteration over fields in a template, see ur_iter_fields
264 function for details.
267 #### UR_MAX_SIZE = 65535
268 Maximal size of an UniRec record.
270 Public functions and macros
271 ---------------------------
273 ### Definition of statically-known UniRec fields.
275 UR_FIELDS(type name [, type name [, ...] ])
277 This macro allows to define fields used in the program and their types at
278 compile-time. This allows to access such fields in UniRec records more easily
281 This macro should be used in the beginning of each translation unit (i.e. a *.c
282 file) if the fields used (or at least some of them) are known at compile-time.
285 - "name" may be any string matching the following regular expression:
286 [A-Za-z][A-Za-z0-9_]*
287 with the following exceptions:
288 - It must not be the same as a keyword in C/C++ or another identifier used in the source codes.
289 To avoid collisions with other identifiers in the UniRec library, do not use
290 identifiers beginning with "UR\_" or "ur\_"
291 - It must not end with _T (as this is reserved for constants specifying types)
292 - It is RECOMMENDED that all field names are upper case.
293 - "type" is one of the types specified in "format specification - data types".
295 There MAY be a comma after the last field name. Also, there MAY be a semicolon
296 after the closing parenthesis at the end of the macro.
308 This macro generates code allowing to use the defined fields in ur_get, ur_set
309 and other macros which need symbolic constants to access the fields.
311 For each field specified by this macro, a CPP macro is defined with `F_` prefix
312 in the name and a value of a unique numeric ID. Also, a constant F_name_T is
313 defined with a value of the field's type (as defined in ur_field_type enum).
314 Other internal variables and structures are defined.
316 If there are more than one translation unit accessing UniRec fields, the same set
317 of fields MUST be defined using UR_FIELDS in each of them.
320 ### Cleanup of all internal structures.
324 This function has to be called after all UniRec functions and macros
325 invocations if there were some fields defined at run-time. Otherwise this function
326 does not have any effect, because nothing has been allocated. The function is called
327 typically during a cleanup phase before the program's end.
329 No UniRec function or macro can be called after a call to ur_finalize.
332 ### Run-time definition of a field
334 int ur_define_field(const char *name, ur_field_type type)
336 This function allows to define a field at run-time.
339 "name" - name of the new field, see description of UR_FIELDS for rules on
341 "type" - type of the new field.
343 If a field with the same name already exists in the internal table of defined
344 fields and "type" is the same as the one in the table, the function just returns
345 the ID of the field. If types does not match, a UR_E_TYPE_MISMATCH error code
348 If no field with "name" is present in the table of fields, a new entry is
349 created with a new unique ID and the given name and type of the field.
350 The new ID is returned.
353 - ID of a field with the given name if no error occurs.
354 - UR_E_TYPE_MISMATCH if a field with the given name is already defined with a
356 - UR_E_INVALID_NAME if the name is not a valid field name.
357 - UR_E_INVALID_TYPE if the type is not one of the values of enum ur_field_type.
358 - UR_E_MEMORY if memory allocation error occurred.
360 All error codes returned by this function are negative integers, ID is always
363 If this function is used in a program, the function ur_finalize() has to be
364 called after all UniRec functions and macros invocations.
366 NOTE: It is not necessary to define fields which were defined by UR_FIELDS.
367 It is recommended to define all fields statically by UR_FIELDS if possible.
368 This function is present only for cases when field names and/or types are not
369 known until run-time.
371 NOTE: Fields defined by this function can be accessed using their numeric IDs
372 only. Symbolic CPP macros are not defined, of course.
375 ### Run-time definition of a set of fields
378 int ur_define_set_of_fields(const char *ifc_data_fmt);
381 This function allows to define sef of fields at run-time.
383 Define new UniRec fields at run-time. It adds new fields into existing structures.
384 If the field already exists and type is equal nothing will happen. If the type is not equal
385 an error will be returned.
388 "fc_data_fmt" - String containing types and names of fields delimited by comma.
389 Example ifc_data_fmt: "uint32 FOO,uint8 BAR,float FOO2"
394 - UR_E_MEMORY if there is an allocation problem.
395 - UR_E_INVALID_NAME if the name value is empty.
396 - UR_E_INVALID_TYPE if the type does not exist.
397 - UR_E_TYPE_MISMATCH if the name already exists, but the type is different.
402 int ur_undefine_field(const char *name)
403 int ur_undefine_field_by_id(ur_field_id_t id)
405 Allows to revert a previous definition of a field by ur_define_field.
407 Frees the ID of the given field for future re-use. The ID becomes invalid after
408 a call to this function so the field with the given name can not be accessed
409 any more. Note that the same ID may be assigned to another field later.
411 This function is not necessary in most cases. Its only purpose is to allow a
412 re-use of field IDs since their total count is limited to 2^16-1.
414 After this function is used, all the templates using the undefined field have to
415 freed and created again.
418 ### Create UniRec template
420 ur_template_t *ur_create_template(const char* fields, char **errstr)
422 Creates a structure describing an UniRec template with the given set of fields
423 and returns a pointer to it.
425 The template should be freed by ur_free_template after is not needed any more.
428 - "fields" - A string containing names of fields separated by commas, e.g.:
430 - "errstr" - (output) In case of an error a pointer to the error message is
431 returned using this parameter, if not set to NULL.
433 Order of field names is not important, i.e. any two strings with the same set of
434 field names but with different order are equivalent.
436 All fields MUST be previously defined, either statically by UR_FIELDS or by
437 calls to ur_define_field.
439 If an error occurs and "errstr" is not NULL, it is set to a string with
440 corresponding error message.
444 - Pointer to the newly created template or NULL if an error has occurred.
447 ### Create UniRec template for usage with Libtrap
449 ur_template_t *ur_create_input_template(int ifc, const char* fields, char **errstr)
451 Creates UniRec template and set this template to specified input interface (ifc).
453 This template will be set as a minimum set of fields to be able to receive messages.
454 If the input interface receives superset of fields, the template will be expanded.
457 ur_template_t *ur_create_output_template(int ifc, const char* fields, char **errstr)
459 Creates UniRec template and set this template to specified output interface (ifc).
461 Set of fields of this template will be set to an output interface.
464 ur_template_t *ur_ctx_create_bidirectional_template(trap_ctx_t *ctx, int ifc_in, int ifc_out, const char* fields, char **errstr)
466 Creates UniRec template and set this template to specified input (ifc_in) and output (ifc_out) interface.
468 This template will be set as a minimum set of fields to be able to receive messages.
469 If the input interface receives superset of fields, the template will be expanded and
470 new set of fields will be set to output interface.
473 ### Free UniRec template
475 void *ur_free_template(ur_template_t *tmplt)
477 Free memory allocated for a template.
481 ### Retrieve value from UniRec record.
482 Following functions are used to retrieve certain field value from UniRec record.
485 - "tmplt" - Pointer to UniRec template
486 - "rec" - Pointer to UniRec record, which is created using given template.
487 - "field" - Identifier of a field.
490 ur_get(tmplt, rec, field)
492 This function returns value of an appropriate type of a specific field (int, uint, ...).
493 Because of this, the field must be a symbolic constant (i.e. "F_name") not a numerical ID.
494 It can be used just for fixed size fields (not for string and bytes).
497 ur_get_ptr(tmplt, rec, field)
499 This function returns pointer to a value of an appropriate type. Because of this,
500 the field must be a symbolic constant (i.e. "F_name") not a numerical ID.
501 It can be used for both fixed-length and variable-length fields.
504 ur_get_ptr_by_id(tmplt, rec, field)
506 This function returns void pointer to a value. Field can be symbolic constant or
507 numerical ID. It can be used for both fixed-length and variable-length fields.
508 (This function is used for fields defined at run-time)
511 char* ur_get_var_as_str(tmplt, rec, field);
513 Function copies data of a variable-length field from UniRec record and append '\0' character.
514 The function allocates new memory space for the string, it must be freed using free()!
515 Field can be symbolic constant or numerical ID.
517 ### Set value to UniRec record.
518 Following functions are used to set a value to specified field in a record.
521 - "tmplt" - Pointer to UniRec template
522 - "rec" - Pointer to UniRec record, which is created using given template.
523 - "field" - Identifier of a field.
524 - "value" - Value which is copied to the record.
527 ur_set(tmplt, rec, field, value) // field must be a symbolic constant ...
529 This function assumes value of an appropriate type of a specific field (int, uint, ...).
530 Because of this, the field must be a symbolic constant (i.e. "F_name") not a numerical ID.
531 It can be used just for fixed size fields (not for string and bytes).
533 To set dynamically defined field, use ur_get_ptr_by_id() and write to that pointer.
536 ur_set_var(tmplt, rec, field, val_ptr, val_len)
538 This function is used to set variable-length fields. Field can be symbolic constant or
540 For better performance use function ur_clear_varlen, before setting all variable fields in record.
543 - "val_ptr" - Pointer to value.
544 - "val_len" - Length of a value. (length which will be copied)
547 ur_clear_varlen(tmplt, rec);
549 This function will clear all variable-length fields. It can be used for better performance of setting
550 content to variable-length fields. Use this function before setting of all the variable-length
553 ur_set_string(tmplt, rec, field, str) //
555 Set string to the UniRec record. Value is a C-style string, length is determined
556 automatically by strlen() ('\0' is not included in the record)
558 - "str" - Pointer to a string.
560 ### Size of a fixed-length, static field
562 ur_get_size(field) // field must be a symbolic constant ...; for static fields only
564 Returns size of a field. Field has to be statically defined.
565 For variable-length fields it returns -1. To get size of variable-length field use
566 function ur_get_var_len().
568 ### Size of variable-length field
570 ur_get_var_len(tmplt, rec, field)
572 Returns length of a variable-length field. Field can be symbolic constant or
575 ### Size of a fixed-length part of a record
577 ur_rec_fixlen_size(tmplt)
579 Returns size of a fixed-length part of a record.
581 ### Size of a variable-length part of a record
583 ur_rec_varlen_size(tmplt, rec)
585 Returns size of a variable-length part of a record.
589 ur_rec_size(tmplt, rec)
591 Returns total size of whole UniRec record.
593 ### Check template's fields
595 ur_is_present(tmplt, field)
597 Returns non-zero if field is present, zero otherwise.
599 ### Check type of a field (variable-length or fixed-length)
604 Returns non-zero if field is dynamic, zero otherwise.
606 ### ID of a field (dynamic or static)
608 ur_field_id_t ur_get_id_by_name(const char *name);
610 Function returns id of a field by name of the field, or UR_E_INVALID_NAME if
611 the name is not known.
613 ### Create UniRec record
615 void* ur_create_record(const ur_template_t *tmplt, uint16_t max_var_size);
617 Allocates memory for a record with given template. It allocates N+M bytes,
618 where N is the size of fixed-length part of the record (inferred from template),
619 and M is the size of variable-length, which must be provided by caller.
622 - "tmplt" - Pointer to UniRec template.
623 - "max_var_size" - Size of variable-length part, i.e. sum of lengths of all variable-
624 length fields. If it is not known at the time of record creation, use
625 UR_MAX_SIZE, which allocates enough memory to hold the largest possible UniRec
626 record (65535 bytes). Set to 0 if there are no variable-length fields in the template
628 ### Free UniRec record
630 void ur_free_record(void *record);
632 Free memory allocated for UniRec record. You can call system free() on the
637 ur_clone_record(tmplt, src)
639 Function creates new UniRec record and fills it with the data given by parameter.
640 It returns Pointer to a new UniRec record.
643 - "tmplt" Pointer to UniRec template
644 - "src" Pointer to source record
649 void ur_copy_fields(dst_tmplt, dst, src_tmplt, src);
652 Copies all fields present in both templates from src to dst.
654 The function compares src_tmplt and dst_tmplt and for each field present in both
655 templates it sets the value of field in dst to a corresponding value in src.
658 - "dst_tmplt" - Pointer to destination UniRec template.
659 - "dst" - Pointer to destination record. It must point to a memory of enough size.
660 - "src_tmplt" - Pointer to source UniRec template.
661 - "src" - Pointer to source record.
664 ### Iterate over fields of a template
666 ur_iter_fields(tmplt, id);
668 This function can be used to iterate over all fields of a given template.
669 It returns ID of the next field present in the template after a given ID.
670 If ID is set to UR_ITER_BEGIN, it returns the first fields. If no more
671 fields are present, UR_ITER_END is returned.The order of fields is given
672 by the order in which they are defined.
674 The order of fields is given by the order in which they are defined.
677 - "tmplt" - Pointer to a template to iterate over.
678 - "id" - Field ID returned in last iteration or UR_ITER_BEGIN to get first value.
680 Returns ID of the next field or UR_ITER_END if no more fields are present.
684 ur_field_id_t id = UR_ITER_BEGIN;
685 while ((id = ur_iter_fields(tmplt, id)) != UR_ITER_END) {
691 ur_iter_fields_record_order(tmplt, id);
694 This function can be used to iterate over all fields of a given template.
695 It returns n-th ID of a record specified by index.
696 If the return value is UR_ITER_END. The index is higher than count of fields
699 The order of fields is given by the order in the record
701 - "tmplt" Template to iterate over.
702 - "id" Field ID returned in last iteration or UR_ITER_BEGIN to
704 Returns ID of the next field or UR_ITER_END if no more fields are present.
709 while ((id = ur_iter_fields_record_order(tmplt, i++)) != UR_ITER_END) {