undr.decode#

Implementation of all supported compression formats.

To add support for a new format, create and implement a derived class of Compression (see for instance BrotliCompression) and add it to compression_from_dict.().

Overview#

Classes#

BrotliCompression

Implements Brotli decompression (https://github.com/google/brotli).

Compression

Represents a compressed file’s metadata.

Decoder

Abstract class for decoders. A decoder controls a decompression process.

DecompressFile

Decompresses a local file and writes decoded bytes to another local file.

NoneCompression

Placeholder format for uncompressed files.

Progress

Represents decompression progress for a given resource.

Functions#

compression_from_dict

Factory for comprssion formats.

Module Contents#

class undr.decode.BrotliCompression#

Bases: Compression

Implements Brotli decompression (https://github.com/google/brotli).

Parameters:

word_size (int) – The resource’s word size in bytes.

class Decoder(word_size: int)#

Bases: NoneCompression

Abstract class for decoders. A decoder controls a decompression process.

decompress(buffer: bytes)#

Consumes a buffer and produces decompressed bytes.

Parameters:

buffer (bytes) – Compressed input bytes.

Returns:

Decompressed output bytes. Their length must be a multiple of the word size.

Return type:

bytes

finish()#

Tells the decoder that all input bytes have been read.

Returns:

Decompressed output bytes, whose length must be a multiple of the word size, and remaining bytes, whose length must be striclty smaller than the word size. A non-zero number of remaining bytes usually indicates an issue (erroneous configuration or corrupted data).

Return type:

tuple[bytes, bytes]

decoder(word_size: int)#

Creates a new decoder for this compression.

Parameters:

word_size (int) – The resource’s word size in bytes.

Returns:

A decompression manager for this compression format.

Return type:

Decoder

class undr.decode.Compression#

Represents a compressed file’s metadata.

hash: str#

SHA3-224 (FIPS 202) hash of the compressed bytes.

size: int#

Size of the compressed file in bytes.

suffix: str#

Suffix for files compressed with this format.

The suffix must include a leading dot, for instance ".br".

abstract decoder(word_size: int) Decoder#

Creates a new decoder for this compression.

Parameters:

word_size (int) – The resource’s word size in bytes.

Returns:

A decompression manager for this compression format.

Return type:

Decoder

class undr.decode.Decoder#

Abstract class for decoders. A decoder controls a decompression process.

abstract decompress(buffer: bytes) bytes#

Consumes a buffer and produces decompressed bytes.

Parameters:

buffer (bytes) – Compressed input bytes.

Returns:

Decompressed output bytes. Their length must be a multiple of the word size.

Return type:

bytes

abstract finish() tuple[bytes, bytes]#

Tells the decoder that all input bytes have been read.

Returns:

Decompressed output bytes, whose length must be a multiple of the word size, and remaining bytes, whose length must be striclty smaller than the word size. A non-zero number of remaining bytes usually indicates an issue (erroneous configuration or corrupted data).

Return type:

tuple[bytes, bytes]

class undr.decode.DecompressFile(path_root: pathlib.Path, path_id: pathlib.PurePosixPath, compression: Compression, expected_size: int, expected_hash: str, word_size: int, keep: bool)#

Bases: undr.task.Task

Decompresses a local file and writes decoded bytes to another local file.

Parameters:
  • path_root (pathlib.Path) – The root path used to generate local file paths.

  • path_id (pathlib.PurePosixPath) – The path ID of the file.

  • compression (Compression) – The format of the compressed file.

  • expected_size (int) – The size of the decompressed file in bytes, according to the index.

  • expected_hash (str) – The hash of the decompressed file, according to the index.

  • word_size (int) – The file’s word size (the number of decoded bytes must be a multiple of this value).

  • keep (bool) – Whether to keep the compressed file after a successful decompression.

run(session: requests.Session, manager: undr.task.Manager)#
class undr.decode.NoneCompression#

Bases: Compression

Placeholder format for uncompressed files.

This “compression” format passes the input bytes to the output without transforming them. It may cut and stitch buffers to ensure that each buffer has a length that is a multiple of the word size.

Parameters:

word_size (int) – The resource’s word size in bytes.

class Decoder(word_size: int)#

Bases: Decoder

Abstract class for decoders. A decoder controls a decompression process.

decompress(buffer: bytes)#

Consumes a buffer and produces decompressed bytes.

Parameters:

buffer (bytes) – Compressed input bytes.

Returns:

Decompressed output bytes. Their length must be a multiple of the word size.

Return type:

bytes

finish()#

Tells the decoder that all input bytes have been read.

Returns:

Decompressed output bytes, whose length must be a multiple of the word size, and remaining bytes, whose length must be striclty smaller than the word size. A non-zero number of remaining bytes usually indicates an issue (erroneous configuration or corrupted data).

Return type:

tuple[bytes, bytes]

decoder(word_size: int)#

Creates a new decoder for this compression.

Parameters:

word_size (int) – The resource’s word size in bytes.

Returns:

A decompression manager for this compression format.

Return type:

Decoder

class undr.decode.Progress#

Represents decompression progress for a given resource.

complete: bool#

Whether decoding of this resource is complete.

current_bytes: int#

Number of bytes of the decompressed resource that have been decompressed so far.

final_bytes: int#

Total number of bytes of the decompressed resource.

initial_bytes: int#

Number of bytes of the decompressed resource that were already decompressed when the current decoding process began.

path_id: pathlib.PurePosixPath#

Identifier of the resource.

exception undr.decode.RemainingBytesError(word_size: int, buffer: bytes)#

Bases: Exception

Raised if the number of bytes in the decompressed resource is not a multiple of its word size.

Parameters:
  • word_size (int) – The resource’s word size.

  • buffer (bytes) – The remaining bytes. Their length is larger than zero and smaller than the word size.

undr.decode.compression_from_dict(data: dict[str, Any], base_size: int, base_hash: str) Compression#

Factory for comprssion formats.

Parameters:
  • data (dict[str, Any]) – Parsed compression object read from an index file.

  • base_size (int) – Size of the uncompressed file in bytes, read from the index.

  • base_hash (str) – Hash of the uncompressed file, read from the index.

Raises:

RuntimeError – if the compression format is unknown or not supported.

Returns:

The compressed file’s metadata, can be used to create a decoder.

Return type:

Compression