emCompress-Pro User Guide & Reference Manual
Professional compression system.
Introducing emCompress-Pro
This section presents an overview of emCompress-Pro, its structure,
and its capabilities.
What is emCompress-Pro?
emCompress-Pro is a compression system that is able to reduce
the size of data that must be compressed or decompressed by a small
target microcontroller. Typical uses of emCompress-Pro are:
- Compress and decompress communication data over a limited-bandwidth link.
- Decompress firmware images that must be dynamically expanded on device reprogramming.
- Decompress configuration bitstreams to program FPGA and CPLD devices.
- Permanent files for embedded web server static content.
Of course, emCompress-Pro is not limited to these applications, it
can be used whenever it’s beneficial to reduce the size of dynamic or static data.
Features
emCompress-Pro is written in standard ANSI C and can run on virtually
any CPU. Here’s a list summarizing the main features of emCompress-Pro:
- Clean ISO/ANSI C source code.
- Small decompressor ROM footprint.
- Essentially zero-RAM compression and decompression.
- Easy-to-understand and simple-to-use API.
- Simple configuration.
Package content
emCompress-Pro is provided in source code and contains everything required.
The following table shows the content of the emCompress-Pro Package:
| Files | Description |
| Application | Sample application source code. |
| Config | Configuration header files. |
| Doc | emCompress-Pro documentation. |
| COMPRESS | emCompress-Pro source code. |
| SEGGER | SEGGER software component source code used in emCompress-Pro. |
| Tool | Supporting applications in binary form. |
Sample applications
emCompress-Pro ships with a number of sample applications that show how
to integrate compression capability into your application.
The sample applications are:
| Application | Description |
| CX_Compress.c | Compress file using DEFLATE algorithm. |
| CX_Decompress.c | Decompress file compressed with DEFLATE algorithm. |
A note on the samples
Each sample that we present in this section is written in a style
that makes it easy to describe and that fits comfortably within the
margins of printed paper. Therefore, it may well be that you would
rewrite the sample to have a slightly different structure that fits
better, but please keep in mind that these examples are written with
clarity as the prime objective, and to that end we sacrifice some
brevity and efficiency.
Recommended project structure
We recommend keeping emCompress-Pro separate from your application files.
It is good practice to keep all the program files (including the
header files) together in the COMPRESS subdirectory of your project’s
root directory. This practice has the advantage of being very easy to
update to newer versions of emCompress-Pro by simply replacing the COMPRESS
and SEGGER directories. Your application files can be stored
anywhere.
Warning
When updating to a newer emCompress-Pro version: as files may have
been added, moved or deleted, the project directories may need to be
updated accordingly.
Include directories
You should make sure that the include path contains the following directories
(the order of inclusion is of no importance):
Warning
Always make sure that you have only one version of each file!
It is frequently a major problem when updating to a new version of emCompress-Pro
if you have old files included and therefore mix different versions. If you
keep emCompress-Pro in the directories as suggested (and only in these), this
type of problem cannot occur. When updating to a newer version, you should
be able to keep your configuration files and leave them unchanged. For
safety reasons, we recommend backing up (or at least renaming) the
COMPRESS directories before updating.
Supported algorithms
emCompress-Pro supports four compression algorithms: SMASH-2, DEFLATE, LZPJ and LZMA. These algorithms differ in complexity, achievable compression rate, speed, and resource requirements. The algorithms are composed of several layers, which process data which has passed through other layers. During decompression, data is fed through the layers in reverse order.
The LZSS base layer
The first layer, on which the four algorithms are based, is the LZSS layer. The LZSS (Lempel–Ziv–Storer–Szymanski) algorithm is an extension of the well-known LZ77 (Lempel–Ziv-1). The LZSS algorithm scans the data for repeated blocks. When it finds a repetition of a block, it inserts a reference to the block, containing the offset and the length of the block. Block references are only inserted if the net length of the reference is smaller than the uncompressed equivalent. For example, to insert a reference to a two character long block at a large distance may require more space than simply inserting the two characters as literals. The LZSS layer consumes the raw data and at each step, reports to the next layer whether literal data or a reference to a previously seen block should be inserted.
The LZSS algorithm can be controlled by several parameters, whose exact ranges depend on the layers on top of the LZSS algorithm which handle the encoding of the LZSS tokens into a stream.
- Window size: The window size determines the context size which can be used to look for matching blocks which can be replaced by references.
- Optimization level: The LZSS algorithm curates a hash based with linked listed which it uses to find blocks which it can reference. The optimization parameter determines how far back through the linked lists the algorithm shall search for a suitable match.
- Hash table size: The hash table defines the entry point to the linked list of matches with the same hash. The size of the hash table provides a trade-off between speed and memory requirements: When the hash table is small, collisions are more likely and therefore the linked lists behind each hash table entry will become longer, and the search for a match will take longer. At low optimization levels and large window sizes, the lists will only be partially searched. When the table is larger, it consumes more memory. However, matches can potentially be spread more evenly, resulting in a shorter average linked list which needs to be searched for each hit in the hash map.
- Minimum and maximum match length: The minimum length a matching block needs to have before it is considered to be inserted as a reference is defined by the algorithm running on top of LZSS. The lower limit is usually determined by the expected size of encoding of the reference versus literal data. The upper limit is determined by the maximum match length which the bitstream can encode.
| Parameter | LZPJ | DEFLATE | LZMA | SMASH-2 |
| Optimization | 0 - 10 | 0 - 10 | 0 - 10 | 0 - 10 |
| Window size | 16 - 32768 | 16 - 32768 | 1 - 16777216 | 256 - 16384 |
| Minimum match length | 3 - 258 | 3 - 258 | 2 - 273 | 2 - 258 |
| Maximum match length | 18 - 258 | 18 - 258 | 2 - 273 | 2 - 258 |
Additional requirements:
- LZPJ, DEFLATE, SMASH-2: The minimum match length must be smaller than the maximum match length.
- LZMA: The minimum match length must be smaller or equal to the maximum match length.
- SMASH-2: The window size must be a power of two.
- SMASH-2, LZPJ: The window size must be larger than the minimum match length.
The size of the hash map is configured at compile time using the CX_LZSS_HASH_TABLE_SIZE configuration flag. For window sizes larger than 65260 bytes, the CX_LZSS_ENCODE_LARGE_WINDOW flag has to be set to 1.
LZPJ
The LZPJ (Lempel-Ziv “Plain Jane”) algorithm takes the output of the LZSS algorithm and turns it into a simple bitstream which only contains the literals and the references. For this reason, implementations of LZPJ are very simple and fast, at the cost of compression efficiency.
DEFLATE
The DEFLATE algorithm is also based on the LZSS algorithm. It employs entropy encoding using Huffman tables to efficiently represent the tokens delivered by the LZSS algorithm. emCompress-Pro determines the entries of the Huffman tables dynamically during compression, but it can also read data compressed with static Huffman tables by other compression utilities.
In order to be able to adapt the Huffman table to changes in the input data, the DEFLATE algorithm’s output is block based. Each block has its own Huffman table. The maximum block size is a parameter which must be provided to the algorithm. Usual values for the block length are 16384 to 32768.
LZMA
The LZMA (Lempel-Ziv-Markov chain algorithm) also uses LZSS as its foundation. It supports very large dictionary sizes for inserting references and a sophisticated tracker to optimize references to previous blocks. It employs a range encoder instead of a Huffman encoder for even more dense storage of data.
A range encoder is based on the probability distribution of the characters in a stream of data. The range encoder used in LZMA does not treat the data uniformly, but tracks the probabilities of characters based on certain contexts:
- Literal context (LC): Number of most significant bits of the previous byte to use for context determination when encoding a literal.
- Literal position (LP): Number of least significant position bits to use for context determination when encoding literals.
- Position bits (PB): Number of least significant position bits to use for context determination for general encoding.
The range of valid values for these parameters depends on whether compression and decompression are only performed with emCompress-Pro or whether other tools, like the XZ tool must also be able to work with the data. See section Optimizing LZMA compression for details on how to determine the best values of these parameters. The values of LC, LP and PB contribute to the dynamic memory requirements for compression and decompression, see sub-section “LZMA algorithm” in the section about memory requirements.
| Parameter | Default | emCompress-Pro | XZ |
| LC | 3 | 0 - 8 | 0 - 4 |
| LP | 0 | 0 - 4 | 0 - 4 |
| PB | 2 | 0 - 4 | 0 - 4 |
| | | | LC + LP ≤ 4 |
emCompress-Pro can produce and consume raw LZMA streams which are called “LZMA1” in the terminology of the XZ tool. emCompress-Pro’s LZMA compressor has been designed to provide a reasonable compression ratio within the memory limitations of embedded systems. Therefore, its compression efficiency may be lower than what can be achieved by the XZ tool, which features a highly optimized compressor.
Compression with the XZ tool
To compress data with the XZ tool for consumption by emCompress-Pro, the following parameters need to be specified:
xz -z -k --suffix=.lzma -T 1 --format=raw
--lzma1=preset=<OPT>,lc=<LC>,lp=<LP>,pb=<PB>,dict=<WS>
<input-file>
- -z: Compress data
- -k: Keep the input file
- --suffix=.lzma: Write output to a file with the same name as the input file with the added suffix .lzma. The name of the output file can not be specified directly, only via the --suffix parameter.
- -T 1: Use a single thread only.
- <OPT>: Specifies the optimization level, which ranges from 0 to 9.
- <LC>, <LP> and <PB>: Specifies the values for LC, LP and PB.
- <WS>: Specifies the window size.
For example, to compress a file Firmware.input with a window size of 32768, maximum optimization (9), LC=1, LP=2 and PB=2, the following command can be used to generate a compressed file Firmware.input.lzma:
xz -z -k --suffix=.lzma -T 1 --format=raw
--lzma1=preset=9,lc=1,lp=2,pb=2,dict=32768
Firmware.input
This file can be decompressed using emCompress-Pro’s XZ_Tool (section compression/decompression utility) like this:
CX_Tool -d lzma --lc=1 --lp=2 --pb=2 --ws=32768
Firmware.input.lzma Firmware.output
Decompression with the XZ tool
xz -d -k --suffix=.lzma -T 1 --format=raw
--lzma1=preset=9,lc=<LC>,lp=<LP>,pb=<PB>,dict=<WS>
<input-file>
- -d: Decompress data
- -k: Keep the input file
- --suffix=.lzma: Write output to a file with the name as the input file, but with the suffix .lzma removed. The name of the output file can not be specified explicitly, the XZ tool can only use the same filename without the specified suffix.
- -T 1: Use a single thread only.
- <LC>, <LP> and <PB>: Specifies the values for LC, LP and PB.
- <WS>: Specifies the window size.
For example, when data is compressed using emCompress-Pro’s XZ_Tool (section compression/decompression utility) like this:
CX_Tool -c lzma --lc=1 --lp=2 --pb=2 --ws=32768 --opt=10
Firmware.input Firmware.output.lzma
It can be decompressed using the XZ tool, resulting in an uncompressed file named Firmware.output:
xz -d -k --suffix=.lzma -T 1 --format=raw
--lzma1=preset=9,lc=1,lp=2,pb=2,dict=32768
Firmware.output.lzma
SMASH-2
SMASH-2 uses LZSS combined with variable length encoding of references to matches. The SMASH-2 implementation in emCompress-Pro features an optimization for compressing firmware with 16-bit or 32-bit instruction width, which allows more efficient encoding of distances to matches.
The optimization for different instruction sets is enabled using the ISA parameter:
| Instruction set | ISA-parameter |
| Variable length instructions (e.g. IA-32), no optimization | 0 |
| 16-bit instructions | 1 |
| 32-bit instructions | 2 |
Using emCompress-Pro
emCompress-Pro divides into two parts:
- A compressor that is responsible for compressing data using
a selected compression algorithm and parameters, and
- A decompressor that is responsible for decompressing data
compressed with a selected compression algorithm and parameters.
Compressing and decompressing data
Compressing a stream of data comprises three steps:
- Initialization
- Repeatedly presenting uncompressed data and extracting
compressed data
- Finalization
Providing dynamic memory
The amount of memory required for compression and decompression depends
on the algorithm and the compression parameters. For this reason,
dynamic memory is used instead of static memory. The dynamic memory
management system can be chosen to match the capabilities of the
system on which emCompress-Pro is used.
emCompress-Pro accesses the memory management system via a context, which is initialized to use the chosen memory management system. The available memory management systems are described below.
System heap
On PCs and on some embedded systems, a heap is available using the standard malloc and free functions. This heap can be be used by emCompress-Pro by initializing the context as follows:
#include "CX.h"
static SEGGER_MEM_CONTEXT MemCtx;
SEGGER_MEM_SYSTEM_HEAP_Init(&MemCtx);
Stack-like buffer implementation
On systems which do not provide a heap system, the stack-like buffer implementation is the simplest option for memory management. It needs a static buffer which is large enough to accomodate for the maximum dynamic memory requirements of emCompress-Pro. Assuming that the maximum memory requirement has been determined to be 64 KB (using section Resource requirements), the following code can be used:
#include "CX.h"
#define HEAP_SIZE 65536
static SEGGER_MEM_CONTEXT MemCtx;
static SEGGER_MEM_SBUFFER MemSBuffer;
static U8 MemBuffer[HEAP_SIZE];
SEGGER_MEM_SBUFFER_Init(&MemCtx, &MemSBuffer, &MemBuffer, HEAP_SIZE);
Preparing for compression
A compression operation is prepared by calling CX_ENCODE_Init(),
passing in:
- An encoding context to initialize
- A compression algorithm to use
- A set of parameters that configure the compression algorithm
- A memory context to allocate working storage for compression
For instance:
static CX_ENCODE_CONTEXT EncCtx;
static CX_PARAS Paras;
static SEGGER_MEM_CONTEXT MemCtx;
SEGGER_MEM_SYSTEM_HEAP_Init(&MemCtx);
CX_PARAS_Clear(&Paras);
Paras.WindowSize = 32768;
Paras.MinLen = 3;
Paras.MaxLen = 258;
Paras.Optimize = 10;
Paras.BlockLen = 32768;
Status = CX_ENCODE_Init(&EncCtx,
&CX_DEFLATE_Encode,
&Paras,
pMemCtx);
if (Status == 0) {
printf("Successfully initialized encoder\n");
} else {
printf("Initializatio of encoder failed: %s\n",
CX_GetErrorText(Status));
}
Declare encoding context
The encoding context is defined by the type CX_ENCODE_CONTEXT. It
contains the encoding state of data presented for compression. The
fields within this structure are private and must not be modified.
SEGGER makes no guarantee that these fields have the the same name,
are present, or have the same interpretation between releases.
Declare encoding parameters
The encoding parameters are defined within the structure CX_PARAS.
SEGGER guarantees the presence and interpretation of these fields
between releases. These fields are initialized by the user and
are described below.
Declare memory context
A memory context is declared and initialized to use the system’s heap implementation to provide dynamic memory to emCompress-Pro.
Initialize compression parameters
Before setting up compression algorithm parameters, they must be
cleared to zero using CX_PARAS_Clear().
Initialize window size
This example initializes the compression parameters used by
the DEFLATE compressor. The window size specifies the number of characters
to look backwards over to find a match and to replace the characters
at the cursor with a reference to the match, thereby compressing
the incoming data. Larger window sizes provide more data and
therefore a larger probability of finding a match and producing
a better-compressed output.
In this case the window is initialized to 32768 characters, the
largest window that the DEFLATE algorithm supports.
Initialize match lengths
This example initializes the match size to between 3 and 258
characters inclusive, the widest possible range for DEFLATE.
Adjusting these parameters can improve compression and
reduce the size of the compressed output (using longer match
lengths), or can improve the speed of compression (by using
shorter match lengths).
Initialize the optimization
The optimization controls how much effort the compressor puts
into finding the best match. In this case, the best possible
compression is requested by setting the parameter to 10.
Initialize block length
DEFLATE is a block-based algorithm rather than a streaming
algorithm. Data are collected into a block for compression,
compressed, and emitted. Blocks are chained together such
that compression can adapt to the changing characteristics
of the input stream. In this case, the block size is set
to 32768 characters, the maximum for DEFLATE.
Initialize encoder
The encoding process is initialized by calling CX_ENCODE_Init(),
which return a status code indicating success or failure.
Compressing data
Data are compressed using CX_ENCODE_Process(). The general form of compression
uses a CX_STREAM structure that provides the data to compress and the
buffer to place compressed data into. For instance, to read a file,
compress the data, and write the output, where the files are already
open in binary mode:
static int _CompressFile(FILE *pFileIn, FILE *pFileOut) {
CX_STREAM Stream;
U8 aIn [1024];
U8 aOut[1024];
int Status;
int Eof;
size_t NumBytesToWrite;
size_t NumBytes;
//
memset(&Stream, 0, sizeof(Stream));
Eof = CX_FLUSH_NONE;
//
for (;;) {
if (Stream.AvailIn == 0) {
NumBytes = fread(aIn, 1, sizeof(aIn), pFileIn);
if (feof(pFileIn) != 0) { // end of file
Eof = CX_FLUSH_END;
} else if (ferror(pFileIn) != 0) {
Status = CX_STATUS_GENERAL_ERROR;
break;
}
// Note: Stream.pIn always has to be set to point to the
// start of the buffer, since it is moved through
// the buffer while data is processed.
Stream.pIn = aIn;
Stream.AvailIn = NumBytes;
}
// Note: Stream.pOut always has to be set to point to the
// start of the buffer, since it is moved through
// the buffer while data is written into it.
Stream.pOut = aOut;
Stream.AvailOut = sizeof(aOut);
Status = CX_ENCODE_Process(&EncCtx, &Stream, Eof);
if (Status < 0) {
break;
}
NumBytesToWrite = sizeof(aOut) - Stream.AvailOut;
NumBytes = fwrite(aOut, 1, NumBytesToWrite, pFileOut);
if (NumBytes != NumBytesToWrite) {
Status = CX_STATUS_GENERAL_ERROR;
break;
}
if (Status == CX_STATUS_DONE) {
// Compressor is done consuming the input data and has
// flushed its output.
break;
}
}
//
return Status;
}
Decompressing data
Decompressing data has identical form to compressing data, except that
whenever “Encode” is used to encode (compress) data, “Decode” is used
to decode (decompress) data. Care has to be taken to call the function
CX_DECODE_Process() until it returns CX_STATUS_DONE.
static int _DecompressFile(FILE *pFileIn, FILE *pFileOut) {
CX_STREAM Stream;
U8 aIn [1024];
U8 aOut[1024];
int Status;
int Eof;
size_t NumBytesToWrite;
size_t NumBytes;
//
memset(&Stream, 0, sizeof(Stream));
Eof = CX_FLUSH_NONE;
//
for (;;) {
if (Stream.AvailIn == 0) {
NumBytes = fread(aIn, 1, sizeof(aIn), pFileIn);
if (feof(pFileIn) != 0) { // end of file
Eof = CX_FLUSH_END;
} else if (ferror(pFileIn) != 0) {
Status = CX_STATUS_GENERAL_ERROR;
break;
}
// Note: Stream.pIn always has to be set to point to the start
// of the buffer, since it is moved through the buffer
// while data is processed.
Stream.pIn = aIn;
Stream.AvailIn = NumBytes;
}
// Note: Stream.pOut always has to be set to point to the start
// of the buffer, since it is moved through the buffer
// while data is written into it.
Stream.pOut = aOut;
Stream.AvailOut = sizeof(aOut);
Status = CX_DECODE_Process(&DecCtx, &Stream, Eof);
if (Status < 0) {
break;
}
NumBytesToWrite = sizeof(aOut) - Stream.AvailOut;
NumBytes = fwrite(aOut, 1, NumBytesToWrite, pFileOut);
if (NumBytes != NumBytesToWrite) {
return CX_STATUS_GENERAL_ERROR;
}
// Did the stream end?
if (Status == CX_STATUS_DONE) {
if (Eof == CX_FLUSH_END) {
// Stream ended and Eof, done with success.
return CX_STATUS_DONE;
}
if (Stream.AvailIn != 0) {
// Stream ended, but not all input data has been consumed
return CX_STATUS_GENERAL_ERROR;
}
// Eof has not been signalled yet. Read from the file
// one more time to check whether Eof is returned then
// or more data is returned (remember: Eof is only returned
// if one reads beyond the last byte in the file).
}
}
//
return Status;
}
Compression of program code
If the content of the data to be compressed is known in advance, it
can generally be better compressed by applying a conditioner to the
data before compression.
Conditioners are available for the following instruction sets:
- Arm T32 (Thumb-2)
- Arm A32 (AArch32)
- Arm A64 (AArch64)
- RISC-V RV32
- Intel IA-32
The following shows how conditioning, for instance, improves the
data file Firmware.input for all compressors. The utility
is described in Compression algorithm comparison.
Without conditioning
C:> CX_Util.exe Firmware.input
Input size: 218824 bytes
Conditioner: None
SMASH-2 flags: None
Window SMASH2-PRO LZPJ DEFLATE LZMA
------ ---------- ---------- ---------- ----------
256 104648 131504 100288 83938
512 94988 120095 93314 78642
1024 89468 113406 88948 75197
2048 86497 109440 86270 73499
4096 84572 106652 84440 72072
8192 83197 104235 83080 70798
16384 82755 102896 82516 70009
------ ---------- ---------- ---------- ----------
Total 626125 788228 618856 524155
------ ---------- ---------- ---------- ----------
C:> _
With Arm A32 conditioning
C:> CX_Util.exe --a32 Firmware.input
Input size: 218824 bytes
Conditioner: Arm A32
SMASH-2 flags: None
Window SMASH2-PRO LZPJ DEFLATE LZMA
------ ---------- ---------- ---------- ----------
256 99749 123481 94516 79490
512 89202 111026 86614 73482
1024 82980 103486 81566 69366
2048 79484 98985 78402 67153
4096 77008 95648 76114 65223
8192 75215 93025 74478 63588
16384 74437 91731 73816 62483
------ ---------- ---------- ---------- ----------
Total 578075 717382 565506 480785
------ ---------- ---------- ---------- ----------
C:> _
Additional conditioning
In addition, the SMASH-2 compressor can usually better compress
code for any 16-bit or 32-bit instruction set:
C:> CX_Util.exe --a32 --fw32 Firmware.input
Input size: 218824 bytes
Conditioner: Arm A32
SMASH-2 flags: 32-bit instruction set
Window SMASH2-PRO LZPJ DEFLATE LZMA
------ ---------- ---------- ---------- ----------
256 96539 123481 94516 79490
512 86583 111026 86614 73482
1024 80842 103486 81566 69366
2048 77730 98985 78402 67153
4096 75531 95648 76114 65223
8192 73858 93025 74478 63588
16384 73345 91731 73816 62483
------ ---------- ---------- ---------- ----------
Total 564428 717382 565506 480785
------ ---------- ---------- ---------- ----------
C:> _
In this case, the SMASH-2 compressor, with a window size of
512 bytes or more compresses better than DEFLATE.
emCompress-Pro API
This section describes the public API for emCompress-Pro. Any functions
or data structures that are not described here but are exposed through
inclusion of the CX.h header file must be considered
private and subject to change.
Preprocessor definitions
Version number
Description
Symbol expands to a number that identifies the specific
emCompress-Pro release.
Definition
#define CX_VERSION 22000
Symbols
| Definition | Description |
| CX_VERSION | Internal use. |
CX_LZSS_ENCODE_LARGE_WINDOW
Description
Configures LZSS for large window sizes.
This is a helper configuration flag which can be used
without specifying an exact window size as is required
for CX_LZSS_ENCODE_MAX_WINDOW_SIZE.
Definition
#define CX_LZSS_ENCODE_LARGE_WINDOW 0
CX_LZSS_ENCODE_MAX_WINDOW_SIZE
Description
Controls the maximum window size supported by the LZSS algorithm.
- ≤ 65260 bytes: In this case, U16 is used to store references to to previous matches in the hash table.
- > 65260 bytes. In this case, U32 is used in the hash table.
Definition
#define CX_LZSS_ENCODE_MAX_WINDOW_SIZE CX_LZSS_MAX_U16_WINDOW_SIZE
CX_LZSS_HASH_TABLE_SIZE
Description
Controls the size of the hash table used by the LZSS algorithm.
Depending on the CX_LZSS_ENCODE_MAX_WINDOW_SIZE parameter,
each entry in the hash table is either 16-bit or 32-bit large.
For usual window sizes on target hardware, a size of 256 entries
has been chosen as a good middle ground between memory consumption
and speed.
Allowed values:
- 0 -- Small, 256 entries.
- 1 -- Medium, 1,048,576 entries.
- 2 -- Large, 16,777,216 entries.
Definition
#define CX_LZSS_HASH_TABLE_SIZE 0
Data types
CX_PARAS
Description
Compression parameters.
Type definition
typedef struct {
CX_PARA WindowSize;
CX_PARA MinLen;
CX_PARA MaxLen;
CX_PARA P1;
CX_PARA P2;
CX_PARA P3;
CX_PARA BlockLen;
CX_PARA Optimize;
} CX_PARAS;
Structure members
| Member | Description |
| WindowSize | Number of octets in matching window. |
| MinLen | Minimum length of match. |
| MaxLen | Maximum length of match. |
| P1 | LC for LZMA, ISA width for SMASH-2. |
| P2 | LP for LZMA. |
| P3 | PB for LZMA. |
| BlockLen | Number of bytes for one block (DEFLATE only). |
| Optimize | Optimization level, [0..10], higher values produce better matches but take more time. |
CX_STREAM
Description
Streaming interface.
Type definition
typedef struct {
U32 AvailIn;
const U8 * pIn;
U32 AvailOut;
U8 * pOut;
} CX_STREAM;
Structure members
| Member | Description |
| AvailIn | Number of elements available as input octets |
| pIn | Pointer to available input octets |
| AvailOut | Number of elements available for output octets |
| pOut | Pointer to output octets |
CX_ENCODE_CONTEXT
Description
Private encoding context.
Type definition
typedef struct {
void * pWork;
const CX_ENCODE_API * pAPI;
SEGGER_MEM_CONTEXT * pMem;
CX_BIT_ENCODE_CONTEXT Bitstream;
CX_BUFFER Block;
CX_ENCODE_STATE State;
} CX_ENCODE_CONTEXT;
Structure members
| Member | Description |
| pWork | Internal use. |
| pAPI | Internal use. |
| pMem | Internal use. |
| Bitstream | Internal use. |
| Block | Internal use. |
| State | Internal use. |
CX_DECODE_CONTEXT
Description
Private decoding context.
Type definition
typedef struct {
void * pWork;
const CX_DECODE_API * pAPI;
SEGGER_MEM_CONTEXT * pMem;
CX_BIT_DECODE_CONTEXT Bitstream;
CX_DECODE_STATE State;
} CX_DECODE_CONTEXT;
Structure members
| Member | Description |
| pWork | Internal use. |
| pAPI | Internal use. |
| pMem | Internal use. |
| Bitstream | Internal use. |
| State | Internal use. |
Data definitions
Compression algorithms
Definition
extern const CX_ENCODE_API CX_SMASH2_Encode;
extern const CX_ENCODE_API CX_LZPJ_Encode;
extern const CX_ENCODE_API CX_LZMA_Encode;
extern const CX_ENCODE_API CX_DEFLATE_Encode;
Description
Each encoding API is passed to CX_ENCODE_Init() in order to
select the appropriate compression algorithm.
Decompression algorithms
Definition
extern const CX_DECODE_API CX_SMASH2_Decode;
extern const CX_DECODE_API CX_LZPJ_Decode;
extern const CX_DECODE_API CX_LZMA_Decode;
extern const CX_DECODE_API CX_DEFLATE_Decode;
Description
Each encoding API is passed to CX_DECODE_Init() in order to
select the appropriate decompression algorithm.
Compression functions
emCompress-Pro defines the following compression functions:
CX_ENCODE_Init()
Description
Initialize encoder. The encoder parameters are specific to the
encoder being selected.
Prototype
int CX_ENCODE_Init( CX_ENCODE_CONTEXT * pSelf,
const CX_ENCODE_API * pAPI,
const CX_PARAS * pParas,
SEGGER_MEM_CONTEXT * pMem);
Parameters
| Parameter | Description |
| pSelf | Pointer to encoder context. |
| pAPI | Pointer to encoder API. |
| pParas | Pointer to encoder parameters. |
| pMem | Pointer to memory allocation context. |
Return value
| ≥ 0 | Success. |
| < 0 | Error (no memory, invalid parameters). |
CX_ENCODE_Process()
Description
Run encoder coroutine.
Reads data from the stream’s input buffer and writes compressed
data into the stream’s output buffer. Pass flags = CX_FLUSH_NONE
when more input data is still available after the data which is
currently in the stream’s input buffer. Pass flags = CX_FLUSH_EOF
when no more data is available after the data which is left in
the buffer.
The function will return CX_STATUS_SUCCESS when the input data was
successfully written into the stream’s output buffer and more data
may be available at the next call to this function.
The function must be called until it returns CX_STATUS_DONE,
even if no more input data is available. When CX_STATUS_DONE
is returned, the compression stream has been successfully
finalized and written to the stream’s output buffer.
The pointers to the input/output buffers of the streams are moved
through those buffers as the function reads from them and writes
to them. This means that when the input buffers is refilled or the
data from the output buffer has been consumed by the caller, care
must be taken to reset those pointers to the start of the respective
buffers.
Prototype
int CX_ENCODE_Process(CX_ENCODE_CONTEXT * pSelf,
CX_STREAM * pStream,
unsigned Flags);
Parameters
| Parameter | Description |
| pSelf | Pointer to encoder context. |
| pStream | Pointer to I/O stream. |
| Flags | Encoding flags, either CX_FLUSH_NONE or CX_FLUSH_EOF when the stream ends. |
Return value
| = 1 | Success encoding data, encode is complete (CX_STATUS_DONE). |
| = 0 | Success encoding data, waiting for more data (CX_STATUS_SUCCESS). |
| < 0 | Error during encoding, current error status (CX_STATUS_…). |
CX_ENCODE_Exit()
Description
Finalize encoder and deallocate memory.
Calling this function multiple times
during an exit sequence is allowed.
Prototype
void CX_ENCODE_Exit(CX_ENCODE_CONTEXT * pSelf);
Parameters
| Parameter | Description |
| pSelf | Pointer to encoder context. |
Decompression functions
emCompress-Pro defines the following decompression functions:
CX_DECODE_Init()
Description
Initialize decoder.
Prototype
int CX_DECODE_Init( CX_DECODE_CONTEXT * pSelf,
const CX_DECODE_API * pAPI,
const CX_PARAS * pParas,
SEGGER_MEM_CONTEXT * pMem);
Parameters
| Parameter | Description |
| pSelf | Pointer to decoder context. |
| pAPI | Pointer to decoder API. |
| pParas | Pointer to decoder parameters. |
| pMem | Pointer to memory allocation context. |
Return value
| ≥ 0 | Success. |
| < 0 | Error (no memory, invalid parameters). |
CX_DECODE_Process()
Description
Run decoder coroutine.
Reads data from the stream’s input buffer and writes decompressed
data into the stream’s output buffer. Pass flags = CX_FLUSH_NONE
when more input data is still available after the data which is
currently in the stream’s input buffer. Pass flags = CX_FLUSH_EOF
when no more data is available after the data which is left in
the buffer.
The function will return CX_STATUS_SUCCESS when the input data was
successfully written into the stream’s output buffer and more data
may be available at the next call to this function.
The function must be called until it returns CX_STATUS_DONE,
even if no more input data is available. When CX_STATUS_DONE
is returned, the compression stream has been successfully
consumed and all output has been written to the stream’s output buffer.
If there is data in the input buffer after the end of the compressed
stream, this function will not consume it, but return CX_STATUS_DONE.
It is the responsibility of the caller to decide whether data after
the end of the stream constitutes an error condition or whether
it can be discarded or processed by other means.
The pointers to the input/output buffers of the streams are moved
through those buffers as the function reads from them and writes
to them. This means that when the input buffers is refilled or the
data from the output buffer has been consumed by the caller, care
must be taken to reset those pointers to the start of the respective
buffers.
Prototype
int CX_DECODE_Process(CX_DECODE_CONTEXT * pSelf,
CX_STREAM * pStream,
unsigned Flags);
Parameters
| Parameter | Description |
| pSelf | Pointer to decoder context. |
| pStream | Pointer to I/O stream. |
| Flags | Encoding flags, either CX_FLUSH_NONE or CX_FLUSH_EOF when the stream ends. |
Return value
| = 1 | Success decoding bitstream, decode is complete (CX_STATUS_DONE). |
| = 0 | Success decoding bitstream, waiting for more data (CX_STATUS_SUCCESS). |
| < 0 | Decoder is in error, current error status (CX_STATUS_…). |
CX_DECODE_Exit()
Description
Finalize decoder and deallocate memory.
Prototype
void CX_DECODE_Exit(CX_DECODE_CONTEXT * pSelf);
Parameters
| Parameter | Description |
| pSelf | Pointer to decoder context. |
Conditioning functions
emCompress-Pro defines the following conditioning functions:
CX_PRECOND_T32_Run()
Description
Condition for Arm T32 ISA.
Prototype
void CX_PRECOND_T32_Run( U8 * pOutput,
const U8 * pInput,
U32 Len,
int Encode);
Parameters
| Parameter | Description |
| pOutput | Pointer to object that receives the conditioned output. |
| pInput | Pointer to object to condition. |
| Len | Octet length of input and output objects. |
| Encode | Nonzero to encode, zero to decode. |
Additional information
In-place conditioning is supported when pInput is equal to
pOutput. Other instances where the input and output overlap
result in undefined behavior.
CX_PRECOND_A32_Run()
Description
Condition for Arm A32 ISA.
Prototype
void CX_PRECOND_A32_Run( U8 * pOutput,
const U8 * pInput,
U32 Len,
int Encode);
Parameters
| Parameter | Description |
| pOutput | Pointer to object that receives the conditioned output. |
| pInput | Pointer to object to condition. |
| Len | Octet length of input and output objects. |
| Encode | Nonzero to encode, zero to decode. |
Additional information
In-place conditioning is supported when pInput is equal to
pOutput. Other instances where the input and output overlap
result in undefined behavior.
CX_PRECOND_A64_Run()
Description
Condition for Arm A64 ISA.
Prototype
void CX_PRECOND_A64_Run( U8 * pOutput,
const U8 * pInput,
U32 Len,
int Encode);
Parameters
| Parameter | Description |
| pOutput | Pointer to object that receives the conditioned output. |
| pInput | Pointer to object to condition. |
| Len | Octet length of input and output objects. |
| Encode | Nonzero to encode, zero to decode. |
Additional information
In-place conditioning is supported when pInput is equal to
pOutput. Other instances where the input and output overlap
result in undefined behavior.
CX_PRECOND_RV32_Run()
Description
Condition for RISC-V RV32I ISA.
Prototype
void CX_PRECOND_RV32_Run( U8 * pOutput,
const U8 * pInput,
U32 Len,
int Encode);
Parameters
| Parameter | Description |
| pOutput | Pointer to object that receives the conditioned output. |
| pInput | Pointer to object to condition. |
| Len | Octet length of input and output objects. |
| Encode | Nonzero to encode, zero to decode. |
Additional information
In-place conditioning is supported when pInput is equal to
pOutput. Other instances where the input and output overlap
result in undefined behavior.
CX_PRECOND_IA32_Run()
Description
Condition for Intel IA-32 ISA.
Prototype
void CX_PRECOND_IA32_Run( U8 * pOutput,
const U8 * pInput,
U32 Len,
int Encode);
Parameters
| Parameter | Description |
| pOutput | Pointer to object that receives the conditioned output. |
| pInput | Pointer to object to condition. |
| Len | Octet length of input and output objects. |
| Encode | Nonzero to encode, zero to decode. |
Additional information
In-place conditioning is supported when pInput is equal to
pOutput. Other instances where the input and output overlap
result in undefined behavior.
Utility functions
emCompress-Pro defines the following utility functions:
CX_PARAS_Clear()
Description
Clear all parameters to zero.
Prototype
void CX_PARAS_Clear(CX_PARAS * pParas);
Parameters
| Parameter | Description |
| pParas | Pointer to parameters. |
CX_GetErrorText()
Description
Get error status as printable string.
Prototype
char *CX_GetErrorText(int Status);
Return value
Zero-terminated error string.
CX_GetCopyrightText()
Description
Get copyright as printable string.
Prototype
char *CX_GetCopyrightText(void);
Return value
Zero-terminated copyright string.
CX_GetVersionText()
Description
Get version as printable string.
Prototype
char *CX_GetVersionText(void);
Return value
Zero-terminated version string.
Resource requirements
The memory requirements to compress and decompress data depend
entirely on the algorithm chosen and the parameters that the
algorithm is instantiated with. Algorithms that take more memory
and more time compress better than those that use less memory
and less time.
This section details the general memory requirements for compression
and decompression.
Memory requirements can be divided into static memory requirements and dynamic memory requirements.
Static memory requirements
Static memory requirements are known at compile time and are required by the emCompress-Pro library irregardless of the selected compression algorithm. The structures listed below require about 150 bytes of static RAM.
| Purpose | Structures |
| Encode/decode contexts | CX_ENCODE_CONTEXT, CX_DECODE_CONTEXT |
| Compression parameters | CX_PARAS |
| Algorithm specific API references | CX_…_ENCODE / CX_…_DECODE |
| Memory context | SEGGER_MEM_CONTEXT, SEGGER_MEM_SBUFFER |
Dynamic memory requirements
Dynamic memory is allocated by the compression/decompression algorithms at run time using a suitable dynamic memory allocator. See section Providing dynamic memory for more information on how to provide dynamic memory.
Since the memory requirements depend on many parameters, formulas are provided to calculate estimates for the memory requirements. An Excel sheet, which can calculate the memory requirements based on the compression parameters, is also provided in the shipping package of emCompress-Pro. The actual memory requirements may differ due to word sizes, alignment and memory management overhead. The Excel table takes assumes an alignment of 4 bytes for each block of dynamic memory, while the formulas below do not include alignment for simplicity.
The dynamic memory requirements of the algorithms is the sum of the memory requirements of the LZSS algorithm and the algorithm specific memory requirements. The memory requirements of the LZSS algorithm for encoding are presented below. The requirements of the algorithms which use the LZSS algorithm are presented in the following sections, including the accumulated requirements for decoding.
LZSS requirements for encoding
The LZSS dynamic memory requirements depend on these parameters:
- Window size (WS): Size of the window in which the algorithm looks for matches.
- Hash table size (HTS): Default: 256. Number of entries in the hash matching tables. Configured via the CX_LZSS_HASH_TABLE_SIZE configuration flag.
- Window index size (WIS): Default: 2. Size of the datatype used for indexing the window in bytes. Configured via the CX_LZSS_ENCODE_MAX_WINDOW_SIZE configuration flag.
- Maximum match length (MML): Depends on the algorithm, valid ranges are specified in the sections about the respective algorithms.
| Description | Formula |
| LZSS: Window | WS + MML + 2 |
| LZSS: Matcher chain pointers | (WS + MML + 2) * WIS |
| LZSS: Matcher hash table | HTS * WIS |
| LZSS Total | HTS * WIS + (WS + MML + 2) * (WIS + 1) |
Example: WS = 32768, MML = 258 (for LZPJ), HTS = 256, WIS = 2:
Memory consumption = 256 * 2 + (32768 + 258 + 2) * (2 + 1) = 99596
SMASH-2 algorithm
Encoding sizes
The SMASH-2 encoder requires approximately 3.2 kB of ROM when using Arm Thumb-2 instructions.
The following are the dynamic RAM requirements for the SMASH-2 encoder:
| Parameter | Allowed range |
| Window size (WS) | 256 to 16384 |
| Maximum match length (MML) | 2 to 258 |
| Description | Formula |
| LZSS total | HTS * WIS + (WS + MML + 2) * (WIS + 1) |
| SMASH-2 context | 116 |
| Total | 116 + HTS * WIS + (WS + MML + 2) * (WIS + 1) |
Example: WS = 16384, MML = 258, HTS = 256, WIS = 2:
Memory consumption = 116 + 256 * 2 + (16384 + 258 + 2) * (2 + 1) = 50560
Decoding sizes
The SMASH-2 decoder requires approximately 1.3 kB of ROM when using Arm Thumb-2 instructions.
The following are the dynamic RAM requirements for the SMASH-2 decoder:
| Description | Formula |
| LZSS window | (WS + 1) |
| SMASH-2 context | 48 |
| Total | 48 + (WS + 1) |
Example: WS = 16384
Memory consumption = 48 + (16384 + 1) = 16433
Processing speed
The tables below list typical compression/decompression values reachable with SMASH-2. Tests were done on a Cortex-M7 MCU running at 200 MHz, using internal RAM. The input/output buffer sizes were each 64 KB.
Compression on the target was performed with CX_LZSS_HASH_TABLE_SIZE=0, CX_LZSS_ENCODE_LARGE_WINDOW=0. Instruction set optimization was off (0). Ratio denotes the compression rate, rate denotes the achieved input data rate, memory denotes the amount of dynamic memory required.
| Window size | Optimization = 0 Ratio / Rate [kB/s] | Optimization = 2 Ratio / Rate [kB/s] | Optimization = 10 Ratio / Rate [kB/s] | Memory [kB] |
| 256 | 82.9% / 523.2 | 70.1% / 523.2 | 62.9% / 350.5 | 2.2 |
| 512 | 80.4% / 527.7 | 61.3% / 476.0 | 57.9% / 272.4 | 3.0 |
| 1024 | 79.2% / 531.8 | 56.3% / 410.6 | 54.1% / 190.5 | 4.5 |
| 2048 | 73.1% / 479.0 | 52.6% / 320.3 | 51.0% / 121.0 | 7.6 |
| 4096 | 65.6% / 401.7 | 49.8% / 225.4 | 48.5% / 70.7 | 13.7 |
| 8192 | 61.3% / 306.9 | 47.8% / 147.9 | 46.7% / 38.5 | 26.0 |
| 16384 | 58.7% / 214.4 | 46.6% / 94.2 | 45.8% / 29.4 | 51.0 |
For the decompression tests, compressed data was prepared on the PC with CX_LZSS_HASH_TABLE_SIZE = 1, CX_LZSS_ENCODE_LARGE_WINDOW = 1. Parameters: Instruction set optimization = off, Optimization = 10, default values for minimum and maximum match lengths.
| Window size | Ratio | Input rate [kB/s] | Output rate [kB/s] | Dynamic memory [kB] |
| 256 | 62.9% | 904.9 | 1438.9 | 0.3 |
| 512 | 57.9% | 885.9 | 1530.2 | 0.6 |
| 1024 | 54.1% | 875.9 | 1619.7 | 1.1 |
| 2048 | 51.0% | 870.5 | 1708.2 | 2.1 |
| 4096 | 48.5% | 866.3 | 1787.8 | 4.2 |
| 8192 | 46.7% | 865.2 | 1852.7 | 8.3 |
| 16384 | 45.8% | 874.8 | 1911.3 | 16.5 |
DEFLATE algorithm
Encoding sizes
The DEFLATE encoder requires approximately 6.2 kB of ROM when using Arm Thumb-2 instructions.
The following are the dynamic RAM requirements for the DEFLATE encoder:
| Parameter | Allowed range |
| Window size (WS) | 16 to 32768 |
| Maximum match length (MML) | 18 to 258 |
| Block length (BL) | > 0, usually 16384 - 32768 |
| Description | Formula |
| LZSS total | HTS * WIS + (WS + MML + 2) * (WIS + 1) |
| DEFLATE context | 6124 |
| Block buffer | BL |
| Total | 6124 + BL + HTS * WIS + (WS + MML + 2) * (WIS + 1) |
Example: BL = 16384, WS = 16384, MML = 258, HTS = 256, WIS = 2
Memory consumption = 6124 + 16384 + 256 * 2 + (16384 + 258 + 2) * (2 + 1) = 72952
Decoding sizes
The DEFLATE decoder requires approximately 2.5 kB of ROM when using Arm Thumb-2 instructions.
The following are the dynamic RAM requirements for the DEFLATE decoder:
| Description | Formula |
| LZSS window | WS |
| DEFLATE context | 416 |
| Overflow buffer | 260 |
| Dynamic Huffman table | 2124 |
| Total | 2800 + WS |
Example: WS = 16384
Memory consumption = 2800 + 16384 = 19184
Processing speed
The tables below list typical compression/decompression values reachable with DEFLATE. Tests were done on a Cortex-M7 MCU running at 200 MHz, using internal RAM. The input/output buffer sizes were each 64 KB.
Compression on the target was performed with CX_LZSS_HASH_TABLE_SIZE = 0, CX_LZSS_ENCODE_LARGE_WINDOW = 0. Blocksize was set to 16384. Ratio denotes the compression rate, rate denotes the achieved input data rate, memory denotes the amount of dynamic memory required.
| Window size | Optimization = 0 Ratio / Rate [kB/s] | Optimization = 2 Ratio / Rate [kB/s] | Optimization = 10 Ratio / Rate [kB/s] | Memory [kB] |
| 256 | 57.3% / 323.3 | 57.3% / 323.3 | 54.3% / 227.6 | 24.6 |
| 512 | 56.1% / 324.1 | 53.4% / 297.4 | 51.7% / 182.3 | 25.4 |
| 1024 | 55.3% / 325.6 | 50.7% / 257.7 | 49.5% / 129.7 | 26.9 |
| 2048 | 51.3% / 295.6 | 48.5% / 205.2 | 47.6% / 85.3 | 30.0 |
| 4096 | 48.8% / 250.7 | 46.7% / 151.2 | 46.0% / 53.9 | 36.1 |
| 8192 | 47.1% / 198.2 | 45.6% / 108.8 | 45.2% / 38.4 | 48.4 |
| 16384 | 46.1% / 149.2 | 45.2% / 85.0 | 45.0% / 38.2 | 73.0 |
| 32768 | 45.5% / 110.8 | 45.1% / 68.8 | 45.0% / 38.2 | 122.1 |
For the decompression tests, compressed data was prepared on the PC with CX_LZSS_HASH_TABLE_SIZE = 1, CX_LZSS_ENCODE_LARGE_WINDOW = 1. Parameters: Block length = 32768, optimization = 10, default values for minimum and maximum match lengths.
| Window size | Ratio | Input rate [kB/s] | Output rate [kB/s] | Dynamic memory [kB] |
| 256 | 54.3% | 353.9 | 652.3 | 3.0 |
| 512 | 51.5% | 359.6 | 697.6 | 3.3 |
| 1024 | 49.3% | 366.4 | 743.9 | 3.8 |
| 2048 | 47.2% | 373.3 | 791.1 | 4.8 |
| 4096 | 45.4% | 379.7 | 835.6 | 6.9 |
| 8192 | 44.3% | 385.0 | 869.4 | 11.0 |
| 16384 | 43.8% | 390.2 | 891.6 | 19.2 |
| 32768 | 43.6% | 393.4 | 901.5 | 35.6 |
LZPJ algorithm
Encoding sizes
The LZPJ encoder requires approximately 2.5 kB of ROM when using Arm Thumb-2 instructions.
The following are the RAM requirements for the LZPJ encoder:
| Parameter | Allowed range |
| Window size (WS) | 16 to 32768 |
| Maximum match length (MML) | 18 to 258 |
| Description | Formula |
| LZSS total | HTS * WIS + (WS + MML + 2) * (WIS + 1) |
| LZPJ context | 108 |
| Total | 108 + HTS * WIS + (WS + MML + 2) * (WIS + 1) |
Example: WS = 32768, MML = 258, HTS = 256, WIS = 2
Memory consumption = 108 + 256 * 2 + (32768 + 258 + 2) * (2 + 1) = 99704
Decoding sizes
The LZPJ decoder requires approximately 1.0 kB of ROM when using Arm Thumb-2 instructions.
The following are the RAM requirements for the LZPJ decoder:
| Description | Formula |
| LZSS window | (WS + 1) |
| LZPJ context | 40 |
| Total | 40 + (WS + 1) |
Example: WS = 32768
Memory consumption = 40 + (32768 + 1) = 32809
Processing speed
The tables below list typical compression/decompression values reachable with LZPJ. Tests were done on a Cortex-M7 MCU running at 200 MHz, using internal RAM. The input/output buffer sizes were each 64 KB.
Compression on the target was performed with CX_LZSS_HASH_TABLE_SIZE = 0, CX_LZSS_ENCODE_LARGE_WINDOW = 0. Ratio denotes the compression rate, rate denotes the achieved input data rate, memory denotes the amount of dynamic memory required.
| Window size | Optimization = 0 Ratio / Rate [kB/s] | Optimization = 2 Ratio / Rate [kB/s] | Optimization = 10 Ratio / Rate [kB/s] | Memory [kB] |
| 256 | 82.9% / 594.2 | 82.9% / 594.2 | 74.8% / 434.9 | 2.2 |
| 512 | 80.4% / 603.6 | 75.8% / 586.1 | 69.0% / 353.6 | 3.0 |
| 1024 | 79.2% / 609.8 | 68.2% / 519.8 | 64.4% / 250.2 | 4.5 |
| 2048 | 73.1% / 596.1 | 63.1% / 412.9 | 60.5% / 159.0 | 7.6 |
| 4096 | 65.6% / 512.4 | 59.3% / 291.7 | 57.4% / 90.1 | 13.7 |
| 8192 | 61.3% / 394.2 | 56.6% / 190.3 | 55.1% / 47.7 | 26.0 |
| 16384 | 58.7% / 274.0 | 55.0% / 118.9 | 53.9% / 36.9 | 51.0 |
| 32768 | 56.8% / 178.9 | 53.9% / 75.6 | 53.0% / 24.4 | 99.8 |
For the decompression tests, compressed data was prepared on the PC with CX_LZSS_HASH_TABLE_SIZE = 1, CX_LZSS_ENCODE_LARGE_WINDOW = 1. Parameters: Optimization = 10, default values for minimum and maximum match lengths.
| Window size | Ratio | Input rate [kB/s] | Output rate [kB/s] | Dynamic memory [kB] |
| 256 | 74.8% | 945.9 | 1265.2 | 0.3 |
| 512 | 69.0% | 942.3 | 1366.0 | 0.6 |
| 1024 | 64.4% | 943.8 | 1466.4 | 1.1 |
| 2048 | 60.5% | 948.0 | 1567.3 | 2.1 |
| 4096 | 57.4% | 954.3 | 1663.0 | 4.2 |
| 8192 | 55.1% | 962.0 | 1746.3 | 8.3 |
| 16384 | 53.9% | 973.6 | 1807.3 | 16.5 |
| 32768 | 53.0% | 989.1 | 1867.2 | 32.9 |
LZMA algorithm
Encoding sizes
The LZMA encoder requires approximately 4.2 kB of ROM when using Arm Thumb-2 instructions.
The following are the RAM requirements for the LZMA encoder:
| Parameter | Allowed range |
| Window size (WS) | 1 to 16777216 |
| Maximum match length (MML) | 2 to 273 |
| Literal context (LC) | 0 to 8 |
| Literal position (LP) | 0 to 4 |
| Position bits (PB) | 0 to 4 |
| Description | Formula |
| LZSS total | HTS * WIS + (WS + MML + 2) * (WIS + 1) |
| LZMA context | 2608 |
| Range encoder | 1536 * 2^(LC + LP) + 24 * 2^PB |
| Total | 2608 + 1536 * 2^(LC + LP) + 24 * 2^PB + HTS * WIS + (WS + MML + 2) * (WIS + 1) |
Example: WS = 32768, MML = 273, HTS = 256, WIS = 2, LC = 3, LP = 0, PB = 2
Memory consumption = 2608 + 1536 * 2^(3 + 0) + 24 * 2^2 + 256 * 2 + (32768 + 273 + 2) * (2 + 1) = 114636
Decoding sizes
The LZMA decoder requires approximately 2.5 kB of ROM when using Arm Thumb-2 instructions.
The following are the RAM requirements for the LZMA decoder:
| Description | Formula |
| LZSS window | (WS + 1) |
| LZMA context | 2248 |
| LZMA LC + LP probabilities | 1536 * 2^(LC + LP) |
| LZMA PB probabilities | 160 * 2^PB |
| Total | 2248 + 1536 * 2^(LC + LP) + 160 * 2^PB + (WS + 1) |
Example: WS = 32768, LC = 3, LP = 0, PB = 2
Memory consumption = 2248 + 1536 * 2^(3 + 0) + 160 * 2^2 + (32768 + 1) = 47945
Processing speed
The tables below list typical compression/decompression values reachable with LZPJ. Tests were done on a Cortex-M7 MCU running at 200 MHz, using internal RAM. The input/output buffer sizes were each 64 KB.
Compression on the target was performed with CX_LZSS_HASH_TABLE_SIZE = 0, CX_LZSS_ENCODE_LARGE_WINDOW = 0. Further parameters: LC = 3, LP = 0, PB = 2. Ratio denotes the compression rate, rate denotes the achieved input data rate, memory denotes the amount of dynamic memory required.
| Window size | Optimization = 0 Ratio / Rate [kB/s] | Optimization = 2 Ratio / Rate [kB/s] | Optimization = 10 Ratio / Rate [kB/s] | Memory [kB] |
| 256 | 53.9% / 365.6 | 53.9% / 365.6 | 51.4% / 270.8 | 17.1 |
| 512 | 52.9% / 367.3 | 50.7% / 340.9 | 49.1% / 217.9 | 17.9 |
| 1024 | 52.2% / 368.7 | 48.2% / 300.7 | 47.0% / 155.7 | 19.5 |
| 2048 | 48.6% / 340.9 | 45.9% / 245.7 | 45.0% / 101.3 | 22.5 |
| 4096 | 46.0% / 296.6 | 43.9% / 181.4 | 43.2% / 59.0 | 28.7 |
| 8192 | 44.0% / 237.0 | 42.3% / 122.0 | 41.7% / 31.3 | 41.0 |
| 16384 | 42.6% / 171.6 | 41.2% / 77.0 | 40.8% / 21.4 | 65.5 |
| 32768 | 41.4% / 114.9 | 40.3% / 49.0 | 39.9% / 22.5 | 114.7 |
For the decompression tests, compressed data was prepared on the PC with CX_LZSS_HASH_TABLE_SIZE = 1, CX_LZSS_ENCODE_LARGE_WINDOW = 1. Parameters: Optimization = 10, LC = 3, LP = 0, PB = 2, default values for minimum and maximum match lengths.
| Window size | Ratio | Input rate [kB/s] | Output rate [kB/s] | Dynamic memory [kB] |
| 256 | 51.4% | 440.7 | 857.7 | 15.5 |
| 512 | 49.1% | 449.2 | 915.2 | 15.7 |
| 1024 | 47.0% | 457.6 | 973.2 | 16.3 |
| 2048 | 45.0% | 463.3 | 1029.4 | 17.3 |
| 4096 | 43.2% | 468.4 | 1085.0 | 19.3 |
| 8192 | 41.7% | 474.5 | 1137.6 | 23.4 |
| 16384 | 40.8% | 480.5 | 1178.2 | 31.6 |
| 32768 | 39.9% | 485.6 | 1215.5 | 48.0 |
| 65536 | 39.5% | 489.4 | 1240.1 | 80.8 |
Utilities
This section presents the utilities shipped with emCompress-Pro:
- CX_Optimize, a utility to benchmark the compression achieved by algorithms for a particular input.
- CX_Tool, a utility to compress and decompress files.
Compression algorithm comparison
emCompress-Pro ships with an application which compares the compression performance of included algorithms, CX_Optimize.
Command line and options
The emCompress-Pro example compression application accepts the command
line options described in the following sections.
The command line syntax is:
CX_Optimize [options] <inputfile>
Options
| Option | Description |
| --help | Displays the help message. |
| --algo=<algo> | Determine optimum for <algo>. Default: all, other allowed values: deflate, lmza, lzpj, smash2. |
| --ws=<win-size> | Use a window size of <win-size> bytes |
| --wsmin=<min-size> | Start iteration over window sizes at <min-size>. |
| --wsmax=<max-size> | Stop iteration over window sizes at <max-size>. |
| --fw=auto | Find the optimum option for the instruction set. |
| --fw=16 | Assume program code using 16-bit instruction set. |
| --fw=32 | Assume program code using 32-bit instruction set. |
| --block-size=<block-size> | Set the block size. Default: 32768 |
| --lcmin=<lc-min> | Set the minimum value of LC to try. |
| --lcmax=<lc-max> | Set the maximum value of LC to try. |
| --lpmin=<lp-min> | Set the minimum value of LP to try. |
| --lpmax=<lp-max> | Set the maximum value of LP to try. |
| --pbmin=<pb-min> | Set the minimum value of PB to try. |
| --pbmax=<pb-max> | Set the maximum value of PB to try. |
| --xz-compat | Set XZ-tool compatibility mode (LC + LP ≤ 4). |
| --t32 | Use Arm T32 conditioner. |
| --a32 | Use Arm A32 conditioner. |
| --a64 | Use Arm A64 conditioner. |
| --rv32 | Use RISC-V RV32 conditioner. |
| --ia32 | Use Intel IA-32 conditioner. |
The CX_Optimize tool, when called with only a filename as parameter, lists the compression efficiencies for all algorithms. It iterates over a range of common window sizes. Results are only listed for those algorithms for which the window size is valid.
Optimizing SMASH-2 compression
The CX_Optimize tool can optimize compression using the SMASH-2 algorithm by using an optimization which is specific to the width of the employed instruction set. By specifying --fw=auto, the tool will determine the optimum setting automatically. Below, an invocation with --fw=auto determines that the file is best compressed with an optimization for 32-bit wide instructions:
CX_Optimize --algo=smash2 --fw=auto ./Firmware.input
Optimizing LZMA compression
LZMA compression can be optimized by adjusting the LC, LP and PB parameters. The default value of LC = 3 is well suited for text, since it chooses the probability trackers based on the upper three bits of the character. This means that the probability trackers can adapt to different probabilities of characters following lower- and upper-case characters. For firmware code, setting LP = 2 can lead to better results, since the probability trackers will then be based on the position of the currect byte instead.
If compatibility with the XZ tool of the LZMA suite is desired, the --xz-compat parameter should be specified to restrict the sum of LC and LP to the range allowed by the XZ tool.
Invocing the CX_Optimize tool on the 32-bit instruction set firmware image used above, the best setting for the LP parameter is determined to be 2:
CX_Optimize --algo=lzma --lcmin=0 --lcmax=4 --lpmin=0 --lpmax=4 --xz-compat --pbmin=0 --pbmax=4 --wsmin=8192 --wsmax=32768 ./Firmware.input
When running CX_Optimize on a 16-bit firmware image, the best setting for LP is determined to be 1:
Applying the same command to an HTML file determines the following optimum settings:
This section describes the compression/decompression utility which ships with emCompress-Pro, CX_Tool.
Command line and options
The command line syntax is:
CX_Tool [options] inputfile outputfile
Options
| Option | Description |
| --help | Displays the help message. |
| -c <algo> | Compress using <algo>: deflate, lzma, lzpj or smash2. |
| -d <algo> | Decompress using <algo>: deflate, lzma, lzpj or smash2. |
| --ws=<win-size> | Use a window size of <win-size> bytes. |
| --opt=<opt> | Set optimization level (0-10), default: 10. |
| --minlen=<min-len> | Set minimum match length. |
| --maxlen=<max-len> | Set maximum match length. |
| --isa=auto | Find the optimum option for the instruction set. |
| --isa=1 | Assume program code using 16-bit instruction set. |
| --isa=2 | Assume program code using 32-bit instruction set. |
| --block-size=<block-size> | Set the block size. Default: 32768 |
| --lc=<lc> | Set the value of LC. |
| --lp=<lp> | Set the value of LP. |
| --pb=<pb> | Set the value of PB. |
This section provides examples of usage of the CX_Tool utility.
The following command compresses the file Firmware.input using the LZMA algorithm and stores it in the file Firmware.lzma:
CX_Tool -c lzma --lc=1 --lp=2 --pb=2 --ws=32768 --opt=10
Firmware.input Firmware.lzma
The following command decompresses the file which was compressed above:
CX_Tool -d lzma --lc=1 --lp=2 --pb=2 --ws=32768
Firmware.lzma Firmware.output
Glossary
Bitstream
A sequence of bits read on bit-by-bit basis.
Codec
Coder-decoder. A device or algorithm capable of coding or decoding a digital data stream. A lossless compressor and decompressor combination constitutes a codec.
Compressor
An algorithm that attempts to find redundancy in data and remove that redundancy thereby compressing the data to use fewer bits.
Decompressor
An algorithm that reverses the effect of compression and recovers the original data from the encoded bitstream.
ISA
Instruction Set Architecture. Defines the instruction set, registers and memory access models which an MCU supports.
KB
Kilobyte. Defined as either 1,024 or 1,000 bytes by context. In the microcontroller world and this manual it is understood to be 1,024 bytes and is routinely shortened further to “K” when describing microcontroller RAM or flash sizes.
LZMA
Lempel-Ziv-Markov chain algorithm. A compression algorithm which combines LZSS with a sophisticated range encoder.
LZPJ
Lempel-Ziv “Plain Jane” algorithm. An algorithm which provides a bitstream to encode LZSS compressed data.
LZSS
Lempel-Ziv-Storer-Szymanski. A compression scheme that is based on LZ77.
SMASH
Small Microcontroller Advanced Super-High format. SEGGER’s proprietary format for compressing data.
XZ tool
A tool which is part of the XZ Utils which can handle LZMA compression/decompression. Website: https://tukaani.org/xz/
Indexes
Index of functions
Index of types