Difference between revisions of "File format:Sigrok/v3"

Revision as of 20:25, 10 September 2014

This page describes the proposed file/stream format (v3) for sigrok sessions.

NOTE: This is work in progress and has not yet been implemented!

Motivation

The previous sigrok session file format (version 2) is a ZIP file containing multiple files (some metadata files and data files containing the actual samples). This works fine, but it also has some issues:

In order to get to the data you want, you need to decompress the whole file.
Appending to a file is not possible easily (and it's not efficient).
...

Goals

The following list highlights some of the goals of the new file format (v3):

It must be able to store
- arbitrary data (logic samples, and/or analog samples, and/or protocol decoder data, and more), as well as
- arbitrary meta-/config-data and other extra information that may be useful to frontends (UI state data, user-configured probe colors, names, positions, and so on).
It must support and facilitate stream-oriented processing (save, load, transmission, compression/decompression, and so on).
It must support compression of the payload data.
It must be usable independent of hardware architecture (x86, ARM, PowerPC, MIPS, and so on), operating system, endianness, float representation, and so on. All data fields must be properly specified (endianness, signedness, size, format).
It must allow for sufficiently good performance for the common operations a frontend needs to perform on the data/file/stream (save, load, compress/uncompress, append, and so on) so that it doesn't become the bottleneck. This is especially important for stream-oriented devices which could otherwise lose samples if the processing on the host side is not sufficiently fast (Saleae Logic, Saleae Logic16, IKALOGIC ScanaPLUS, others).
It should be able to handle run-time changes in the data streams (via meta packets on the session bus), e.g. changing samplerates, changing probes, etc. etc.
It should have better compression properties than ZIP (e.g. using LZO or other algorithms, this is to be evaluated). What we ideally want out of the compression algorithm is:
- Good and relatively fast compression results at only moderate CPU usage.
- Very fast decompression (LZO is probably the best one here, as it's specifically designed for this).
- Ideally, support for appending further data to already compressed data chunks (though this could be also implemented outside of the compression algorithm per se).
- Open-source license and OS portability. There should be an open-source library or code chunk for compression/uncompression and it should be widely available in Linux distros, and portable to Windows, Mac OS X, FreeBSD, Android, and so on.

Specification

UUIDs

The format uses random UUIDs (version 4) as per RFC4122 in various places. These UUIDs are always 16 bytes long.

A simple way to generate a random (version 4) UUID (ASCII and hex representation):

$ python3 -c 'import uuid; u = uuid.uuid4(); print(u); print(u.hex)'
14c49f22-f08a-4ef2-b3d7-82ee16c3d531
14c49f22f08a4ef2b3d782ee16c3d531

File/stream format

The format consists entirely of a stream of packets of various types.

These packets can be either written to or read from a file, buffer, pipe, socket, or any other source/destination.

Packet format

Every packet consists of three fields:

Field	Length	Description
UUID	16	The UUID (binary representation, 16 bytes, little-endian) which identifies the type of packet (globally unique). The reason for using a UUID here (instead of some simple index number) is to allow for clients to define and use their own special-purpose packet types as they see fit, without having to fear any conflicts with existing packet types (or packet types that someone else might add later).
Length	4	The length of the data in this packet (in number of bytes). The length does not include the length of the UUID field or the Length field itself, only the length of the Data field. The length is given as an uint32_t number (little-endian).
Data	0..n	The actual payload data, max. 2^32 bytes (4GiB). For some packet types the Data field is optional (in that case it is completely omitted and the Length field is set to 0). The contents of the Data field are entirely dependent on (and vary with) the type of packet.

Using the common UUID/Length/Data triplet for each packet allows clients to easily skip over (ignore) any packets they do not know how to handle, and instead continue on to checking/handling the next packet.

Example packet with a 7-byte data field:

UUID	Length	Data
14 c4 9f 22 f0 8a 4e f2 b3 d7 82 ee 16 c3 d5 31	00 00 00 07	11 22 33 44 55 66 77

Example packet without a data field:

UUID	Length
14 c4 9f 22 f0 8a 4e f2 b3 d7 82 ee 16 c3 d5 31	00 00 00 00

Packets

The following packets are currently defined for use in projects hosted on sigrok.org.

The "names" (e.g. "SIGROK_PACKET_MAGIC") are for documentation purposes only, the UUIDs are what actually matters. The names are prefixed with SIGROK_ to make it clear that other 3rd-party software may define their own additional packet types with arbitrary contents and for arbitrary purposes.

SIGROK_PACKET_MAGIC

This a special "magic" packet that serves as a file type marker for actual files (so that the file utility can properly detect sigrok files). It also contains some metadata about the file/stream format itself. This packet has to be the very first one in the file/stream and is only allowed to be used exactly once in a file/stream.

This packet uses the fixed UUID 5a1772eb-2854-48a8-a41c-7397d7e9223d.

The Data field has the following contents:

Field	Length	Description
Version	2	The version of the sigrok file/stream format in binary format (little-endian). Current version: 0x0003 (continuing the count from the last two ZIP-based file format versions).
Magic marker	6	This is a special marker that can be used by the file utility (and other tools) to detect the file format easily. Contents: sIgRoK.

Example packet:

UUID	Length	Data
5a 17 72 eb 28 54 48 a8 a4 1c 73 97 d7 e9 22 3d	00 00 00 04	00 03 73 49 67 52 6f 4b

SIGROK_PACKET_LOGIC

This is a packet type used to store/transmit (only) digital samples, usually from a logic analyzer.

This packet uses the fixed UUID 2236202e-9ee7-4bc6-81f6-56b4e6e029ba.

The Data field has the following contents:

Field	Length	Description
Version	2	The version of the SIGROK_PACKET_LOGIC format in binary format (little-endian). Current version: 0x0001.
Reserved	2	Reserved field. Reads should ignore this field, writes should keep this field's value unchanged (if it was read before), otherwise set it to 0x0000. Current value: 0x0000.
Payload format UUID	16	A UUID (binary representation, 16 bytes, little-endian) which identifies a certain payload format.
Compression scheme UUID	16	A UUID (binary representation, 16 bytes, little-endian) which identifies a certain compression scheme that is applied to the payload data.
Payload length	4	The length of the actual payload data in this SIGROK_PACKET_LOGIC packet (in number of bytes). The length only includes the Payload field. The length is given as an uint32_t number (little-endian).
Payload	0..n	The actual payload data, i.e. logic analyzer samples in the specified payload format, using the specified compression scheme.

Example packet:

(Packet type SIGROK_PACKET_LOGIC, 0x30 bytes packet data, SIGROK_PACKET_LOGIC version 0x0001, SIGROK_PAYLOAD_FORMAT_LOGIC_V1 payload format, SIGROK_COMPRESSION_NONE compression scheme, 8 bytes of logic analyzer payload (compressed))

UUID	Length	Data
22 36 20 2e 9e e7 4b c6 81 f6 56 b4 e6 e0 29 ba	00 00 00 30	00 01 00 00 d2 96 4f 38 8b 13 45 70 9a dd ad d5 67 8a 03 94 ec 6b d7 63 c8 79 4a a7 a9 7a 7e df 0e 68 af c7 00 00 00 08 11 22 33 44 5 66 77 88

SIGROK_PACKET_ANALOG

This is a packet type used to store/transmit (only) analog samples, e.g. from a multimeter, oscilloscope, sound level meter, or any other source for analog data.

This packet uses the fixed UUID 59def330-536a-46b1-8edd-62f2195d1c95.

Details yet to be defined.

SIGROK_PACKET_FRONTEND

This is a packet type used to store/transmit configuration data of (supported) sigrok frontends. This can include an arbitrary number of things, such as user-configured channel names, channel colors, window sizes and other UI state, protocol decoder setups, and so on.

This packet uses the fixed UUID 1325b595-0d5e-40a4-ac4d-36e89224dcb9.

Details yet to be defined.

List of known packet types

This is a short overview of known packet types that are in use. This includes the packet types used in projects hosted at sigrok.org, as well as pointers to packet types that other (3rd-party) software is known to use.

UUID	Packet type	Description
5a1772eb-2854-48a8-a41c-7397d7e9223d	SIGROK_PACKET_MAGIC	See above.
5a1772eb-2854-48a8-a41c-7397d7e9223d	SIGROK_PACKET_LOGIC	See above.
59def330-536a-46b1-8edd-62f2195d1c95	SIGROK_PACKET_ANALOG	See above.
1325b595-0d5e-40a4-ac4d-36e89224dcb9	SIGROK_PACKET_FRONTEND	See above.

List of known payload formats

This is a short overview of known payload formats that are in use. This includes the payload formats used in projects hosted at sigrok.org, as well as pointers to payload formats that other (3rd-party) software is known to use.

UUID	Payload format	Description
d2964f38-8b13-4570-9add-add5678a0394	SIGROK_PAYLOAD_FORMAT_LOGIC_V1	This payload format can only store digital samples from a logic analyzer (0/1 values for a certain channel/probe/pin). It is basically identical to the format that was used in the previous ZIP-based file format versions. Details are yet to be defined.
79e7cfd1-0f56-4d5e-968a-b66fdbdff624	SIGROK_PAYLOAD_FORMAT_ANALOG_V1	A certain type of payload format that can store (only) analog samples of a certain number of analog channels. Details are yet to be defined.

List of known compression schemes

This is a short overview of known compression schemes that are in use. This includes the schemes used in projects hosted at sigrok.org, as well as pointers to schemes that other (3rd-party) software is known to use.

UUID	Compression scheme	Description
ec6bd763-c879-4aa7-a97a-7edf0e68afc7	SIGROK_COMPRESSION_NONE	No compression whatsoever is used.
acd2e249-5c4d-426d-96ae-ded5b6020e6f	SIGROK_COMPRESSION_RLE_V1	A certain type of RLE-based compression is used. Details are yet to be defined.

Futher notes and ideas to consider

Data should be encoded in a data aware way. This would give greater compression:
- Logic Data is most efficient stored in RLE+Huffman or Golomb coding. e.g. a clock signal may compress to one bit per edge.
- FLAC (libflac) or a FLAC inspired codec (linear predicition) is probably as good as it gets for lossless analog data encoding.
If data is stored in a format specific way, it would be best to store it as a series of stream-blocks, similar to how video containers work. Would it be possible to simply leverage a video container such as OGG? IIRC this contains headers to declare metadata about each stream, then a series of timestamped stream blocks interleaved together. The time stamp is a format specific number... for audio: the sample number, for video: the frame number, so sigrok formats can easily leverage this.
- Similarly RTP is a rather natural protocol for sigrok network streaming.

@@ Line 8: / Line 8: @@
 == Motivation ==
-The previous [[File format:sigrok/v2|sigrok session]] file format (version 2) was a ZIP file containing multiple files (some metadata files and actual sampling data files). This works fine, but it also has some issues:
+The previous [[File format:sigrok/v2|sigrok session]] file format (version 2) is a ZIP file containing multiple files (some metadata files and data files containing the actual samples). This works fine, but it also has some issues:
 * In order to get to the data you want, you need to decompress the whole file.
@@ Line 16: / Line 16: @@
 == Goals ==
-The following list highlights some of the goals of the new file format:
+The following list highlights some of the goals of the new file format (v3):
-* It should be able to store metadata and arbitrary data (logic samples, and/or analog samples, and so on).
+* It must be able to store
-* It must support compression.
+** arbitrary data (logic samples, and/or analog samples, and/or protocol decoder data, and more), as well as
+** arbitrary meta-/config-data and other extra information that may be useful to frontends (UI state data, user-configured probe colors, names, positions, and so on).
+* It must support and facilitate stream-oriented processing (save, load, transmission, compression/decompression, and so on).
+* It must support compression of the payload data.
+* It must be usable independent of hardware architecture (x86, ARM, PowerPC, MIPS, and so on), operating system, endianness, float representation, and so on. All data fields must be properly specified (endianness, signedness, size, format).
+* It must allow for sufficiently good performance for the common operations a frontend needs to perform on the data/file/stream (save, load, compress/uncompress, append, and so on) so that it doesn't become the bottleneck. This is especially important for stream-oriented devices which could otherwise lose samples if the processing on the host side is not sufficiently fast ([[Saleae Logic]], [[Saleae Logic16]], [[IKALOGIC ScanaPLUS]], others).
 * It should be able to handle run-time changes in the data streams (via meta packets on the session bus), e.g. changing samplerates, changing probes, etc. etc.
-* Better compression properties (e.g. using LZO or other algorithms, this is to be evaluated). What we ideally want out of the compression algorithm is:
+* It should have better compression properties than ZIP (e.g. using LZO or other algorithms, this is to be evaluated). What we ideally want out of the compression algorithm is:
 ** Good and relatively fast compression results at only moderate CPU usage.
 ** Very fast decompression (LZO is probably the best one here, as it's specifically designed for this).
 ** Ideally, support for appending further data to already compressed data chunks (though this could be also implemented outside of the compression algorithm per se).
 ** Open-source license and OS portability. There should be an open-source library or code chunk for compression/uncompression and it should be widely available in Linux distros, and portable to Windows, Mac OS X, FreeBSD, Android, and so on.
-* Independent of hardware architecture (x86, ARM, PowerPC, MIPS, and so on), operating system, endianness, float representation, and so on. All data fields must be properly specified (endianness, signedness, size, format).
-* Must (optionally) allow to store arbitrary extra information that may be useful to frontends (UI state data, user-configured probe colors, names, positions, and so on).
 == Specification ==