undr.path#

Local or remote file.

Overview#

Classes#

Download

Downloads a remote file.

File

Represents a local or remote file.

Path

A file or directory in a dataset.

Module Contents#

class undr.path.Download(path_id: pathlib.PurePosixPath, suffix: str | None, server: undr.remote.Server, stream: bool)#

Bases: undr.remote.Download

Downloads a remote file.

This task is never used with a scheduler. Its run function is called by File.chunks() to recycle the download logic implemented in py:class:undr.remote.Download.

Parameters:
  • path_id (pathlib.PurePosixPath) – The resource’s unique path id.

  • suffix (Optional[str]) – Added to the file name while it is being downloaded.

  • server (Server) – The remote server.

  • stream (bool) – Whether to download the file in chunks (slightly slower for small files, reduces memory usage for large files).

on_begin(manager: undr.task.Manager) int#

Called before contacting the server.

This function must return an offset in bytes.

  • 0 indicates that the file is not downloaded yet.

  • Positive values indicate the number of bytes already downloaded.

  • Negative values indicate that the download is already complete and must be skipped.

Parameters:

manager (task.Manager) – The task manager for reporting updates.

Returns:

Number of bytes already downloaded.

Return type:

int

on_response_ready(response: requests.Response, manager: undr.task.Manager)#

Called when the HTTP response object is ready.

The reponse object can be used to download the remote file.

Parameters:
class undr.path.File#

Bases: Path

Represents a local or remote file.

compressions: tuple[undr.decode.Compression, Ellipsis]#

List of compressions available on the server.

hash: str#

The decompressed file hash (SHA3-224).

manager: undr.task.Manager#

Can be called to schedule new tasks and report updates.

session: requests.Session | None#

An open session that can be used to download resources.

size: int#

The decompressed file size in bytes.

__truediv__(other: str) Path#

Concatenates this path with a string to create a new path.

Parameters:

other (str) – Suffix to append to this path.

Returns:

The concatenated result.

Return type:

Path

_chunks(word_size: int) Iterable[bytes]#

Returns an iterator over the file’s decompressed bytes.

Users should prefer chunks() since files know their word size.

Parameters:

word_size (int) – size of an entry (events, frames…) in the file.

Raises:
Returns:

Iterator over the file’s decompressed bytes.

Return type:

Iterable[bytes]

attach_manager(manager: undr.task.Manager | None)#

Binds a manager to this file.

The file sends all subsequent updates (download and processing) to the manager.

Parameters:

manager (Optional[task.Manager]) – The manager to use to keep track of progress.

attach_session(session: requests.Session | None)#

Binds a session to this file.

The session is used for all subsequent downloads.

Parameters:

session (Optional[requests.Session]) – An open session to use for downloads.

static attributes_from_dict(data: dict[str, Any], parent: undr.path_directory.Directory) dict[str, Any]#

Converts -index.json data to a dict of this class’s arguments.

The returned dict can be used to initialise an instance of this class.

Parameters:
Returns:

Data that can be used to initialize this class.

Return type:

dict[str, Any]

best_compression() undr.decode.Compression#

Returns the best compression supported by the remote server for this file.

Best is defined here as “smallest encoded size”.

Returns:

Compression format that yields the smallest version of this file.

Return type:

decode.Compression

chunks() Iterable[bytes]#

Returns an iterator over the file’s decompressed bytes.

Returns:

Iterator over the decompressed file’s bytes. The size of the chunks may vary.

Return type:

Iterable[bytes]

classmethod from_dict(data: dict[str, Any], parent: undr.path_directory.Directory)#

Conerts -index.json data to an instance of this class.

Parameters:
Returns:

The file represented by the given data.

Return type:

File

word_size() int#

The size of an entry in this file, in bytes.

This can be used to ensure that entries (events, frames…) are not split while reading. A decoded file’s size in bytes must be a multiple of the value returned by this function.

Returns:

Number of bytes used by each entry.

Return type:

int

class undr.path.Path#

A file or directory in a dataset.

A path can point to a local resource or represent a remote resource.

metadata: dict[str, Any]#

Any data not strictly required to decode the file (stored in -index.json).

own_doi: str | None#

This resource’s DOI, used by all its children unless they have their own DOI.

path_id: pathlib.PurePosixPath#

A POSIX path uniquely identifying the resource (including its dataset).

path_root: pathlib.Path#

Path of the root “datasets” directory used to generate local paths.

server: undr.remote.Server#

The resource’s remote server, used to download data if it is not available locally.

abstract __truediv__(other: str) Path#

Concatenates this path with a string to create a new path.

Parameters:

other (str) – Suffix to append to this path.

Returns:

The concatenated result.

Return type:

Path

local_path() pathlib.Path#

Returns the local file path.

This function always return a path, even if the local resource does not exist.

Returns:

The path of the local resource.

Return type:

pathlib.Path