Conventions

This sections describes the coding conventions adopted by the Code Craftsmen project. These standards should be respected when making contributions to our Code Crafting tools.

Python Code

Style

A formal style guide for Code Craftsmen Python code has not been defined yet, but the Google Python Style Guide is worth a look as a starting point. Until a style guide is defined, please look at existing code and try to match the style as closely as possible.

Embedded Documentation

The API documentation for our Python packages is extracted from docstrings embedded in the code using Sphinx. The docstrings are formatted according to the Google docstring style convention with PEP 484 type annotations. The autodoc, napoleon, and sphinx_autodoc_typehints Sphinx extensions are used to extract and format the API documentation. See this Google docstring example to get a feel for this docstring format.

C++ Code

Style

A formal style guide for Code Craftsmen C++ code has not been defined yet, but the Google C++ Style Guide is worth a look as a starting point. Until a style guide is defined, please look at existing code and try to match the style as closely as possible.

Embedded Documentation

The API documentation for our C++ packages is extracted from special comment blocks embedded in the source code using Doxygen and Sphinx.

C Code

Style

In general, C code should follow the Linux kernel coding style.

Embedded Documentation

The API documentation for our C packages is extracted from special comment blocks embedded in the source code using the kernel-doc Sphinx extension.

Text Files

Configuration, input, and source information should be specified in a human-readable text format, if it is practical to do so. A Wumps-based syntax should be employed for these purposes unless there is a very good reason to use something else. This commonality reduces the number of formats that software developers need to deal with and allows parsing code to be shared. More rationale for these recommendations can be found in the file format considerations section.

Binary Data Files

In some cases, such as those where data processing or storage efficiency is a significant concern, it may be more appropriate to store data in binary files instead of text files. In these cases, it is also desirable to use some sort of standardized format, if possible. The best format to use may depend on the particular application.

As discussed in the data recording considerations section, there is often a need to record or play back the high-density run-time message stream traffic generated by an application. Since messaging is such an integral part of our code crafting paradigm, we have developed our own standardized file format for storing binary message stream data.

Binary Message Stream Recording Format

A binary message stream file is simply a sequence of file entries.

File Content
Entry 1
Entry 2
[…]
Entry N

Each entry in the file consists of three sequential fields: an entry code identifying the type of entry, the entry size (in bytes), and the entry content.

Entry
Entry Code
Entry Size
Entry Content

The sizes of these three fields are not fixed, but may vary with each entry. This minimizes the storage space “overhead” of the file format without introducing additional restrictions.

Standardizing the format of file entries allows data processing applications to skip over entries that are not of interest without having to understand the details of those specific entries.

Entry Code

The first field of a file entry is the entry code, which is used to indicate what type of data the following entry content field contains. The minimum size of the entry code field is one byte, and its size is determined by examining the content of this field as it is read in. If the most significant bit (bit 7) of the first byte of the entry code is 0, then the entry code field is only one byte long.

Single-Byte Entry Code
MSb	0XXXXXXX	LSb

If the most significant bit of the first byte is 1, then the entry code field is more than one byte long, and the next byte must be examined. The same process is applied to each consecutive byte until a 0 is found in the most significant bit. In theory, an entry code could be up to 4 bytes long, but more than one or two bytes should not be required.

Two-Byte Entry Code
MSb	1XXXXXXX	LSb
MSb	0XXXXXXX	LSb

Three-Byte Entry Code
MSb	1XXXXXXX	LSb
MSb	1XXXXXXX	LSb
MSb	0XXXXXXX	LSb

Four-Byte Entry Code
MSb	1XXXXXXX	LSb
MSb	1XXXXXXX	LSb
MSb	1XXXXXXX	LSb
MSb	0XXXXXXX	LSb

Once the number of bytes in the entry code has been determined, the individual bytes are combined in little-endian byte order to construct the final entry code field value.

Resulting Entry Code Field Value
Byte 3 (MSB)	Byte 2	Byte 1	Byte 0 (LSB)	32-Bit Result (Hex)
N/A	N/A	N/A	0XXXXXXX	000000XX
N/A	N/A	0XXXXXXX	1XXXXXXX	0000XXXX
N/A	0XXXXXXX	1XXXXXXX	1XXXXXXX	00XXXXXX
0XXXXXXX	1XXXXXXX	1XXXXXXX	1XXXXXXX	XXXXXXXX

Entry Size

The second field of a file entry is the entry size, which indicates the number of bytes present in the following entry content field. The entry size field is either one, two, four, or eight bytes in length, depending on the values of bits 5 and 6 in the preceding entry code field.

	Entry Code Byte 0 (LSB)		Length of Entry Size Field
MSb	X00XXXXX	LSb	1 Byte
MSb	X01XXXXX	LSb	2 Bytes
MSb	X10XXXXX	LSb	4 Bytes
MSb	X11XXXXX	LSb	8 Bytes

Entry Content

The third (and final) field of a file entry is the entry content, which is the actual data associated with the file entry. The number of bytes present in the entry content field is equal to the value of the preceding entry size field, and the type of data contained in the content field is indicated by the value of the preceding entry code field. The following table lists the types of file entries that are currently defined.

	Entry Code		Entry Content
MSb	0XX00000	LSb	File Format ID
MSb	0XX00001	LSb	Source Application ID
MSb	0XX10000	LSb	Message Content
MSb	0XX10001	LSb	Message Content with 1-Byte Stream ID
MSb	0XX10010	LSb	Message Content with 2-Byte Stream ID
MSb	0XX10011	LSb	Message Content with 4-Byte Stream ID

All other entry codes are reserved for future use.

File Format ID

The content field for a File Format ID entry contains a null-terminated string that specifies the file format used in the data file being read. Files that conform to the Binary Message Stream Recording Format described in this document shall specify a File Format ID string that begins with the characters CCBMSRF. Other characters may follow these leading characters. In the future, a suffix (e.g. r1.2) may be added to indicate a specific revision of the file format.

The first entry in a CCBMSRF data file must always be a File Format ID entry. This allows data processing applications to quickly determine if a specified data file is encoded in a supported format.

Source Application ID

The content field for a Source Application ID entry contains a null-terminated string that specifies the application that was used to generate the data file being read. This string is typically used by data processing applications to look up the message stream identifiers and message formats associated with the data in the file.

A source application string is typically of the form /company/division/group/application[revision] in order to reduce name collisions.

Message Content

The content field for a Message Content entry contains an “opaque” block of raw data. This is usually the content of a single message, but could, in theory, be any type of data. Since there is no stream identifier specified for the message, this type of file entry is typically used in data files where only one message stream is recorded.

Message Content with Stream ID

The content field for a Message Content with Stream ID entry contains two subfields: a stream identifier field followed by a message content field.

Message Content w/ Stream ID Subfield
Stream ID
Message Content

The size of the stream identifier subfield is determined by bits 0 and 1 of the entry code field:

	Entry Code		Size of Stream ID Subfield
MSb	0XX10001	LSb	1-Byte
MSb	0XX10010	LSb	2-Bytes
MSb	0XX10011	LSb	4-Bytes

The message content subfield is an “opaque” raw data block. It is usually the content of a single message, but could, in theory, be any type of data. The preceding stream identifier subfield indicates which data stream the message is associated with. A data processing application typically uses the stream identifier in conjunction with the Source Application ID string to uniquely identify the source and format for the associated message data. The size of the message content subfield is the value of the entry size field minus the size of the stream identifier subfield.