undr.remote#

Low-level implementation of resource download.

Overview#

Classes#

Download

Retrieves data from a remote server.

DownloadFile

Retrieves data from a remote server and saves it to a file.

NullServer

A placeholder server that raises an exception when used.

Progress

Message that reports download progress.

Server

Represents a remote server.

Module Contents#

class undr.remote.Download(path_id: pathlib.PurePosixPath, suffix: str | None, server: Server, stream: bool)#

Bases: undr.task.Task

Retrieves data from a remote server.

This is an abstract task that calls its methods (lifecycle callbacks) as follows:

  • on_begin() is called before contacting the server. This function can be used to create write resources and must return an offset in bytes. Download resumes from that offset if it is non-zero. If the offset is negative, the task assumes that the download is complete and it calls on_end() immediately.

  • on_range_failed() is called if on_begin() returned a non-zero offset and the server rejects the range request (HTTP 206). It can be used to clean up ‘append’ resources and replace them with ‘write’ resources. The actual download starts after on_range_failed() as if on_begin() returned 0.

  • on_response_ready() is called when the response is ready for iteration. The subclass must call requests.Response.close() after reading the response (and probably on_end()).

This lifecycle allows users to yield on response chunks (see undr.path.File._chunks() for an example).

Parameters:
  • path_id (pathlib.PurePosixPath) – The resource’s unique path id.

  • suffix (Optional[str]) – Added to the file name while it is being downloaded.

  • server (Server) – The remote server.

  • stream (bool) – Whether to download the file in chunks (slightly slower for small files, reduces memory usage for large files).

abstract on_begin(manager: undr.task.Manager) int#

Called before contacting the server.

This function must return an offset in bytes.

  • 0 indicates that the file is not downloaded yet.

  • Positive values indicate the number of bytes already downloaded.

  • Negative values indicate that the download is already complete and must be skipped.

Parameters:

manager (task.Manager) – The task manager for reporting updates.

Returns:

Number of bytes already downloaded.

Return type:

int

abstract on_end(manager: undr.task.Manager) None#

Called when the download task completes.

This function is called automatically if the byte offset returned by on_begin() is nagative. Implementations should call it after consuming the response in on_response_ready().

Parameters:

manager (task.Manager) – The task manager for reporting updates.

abstract on_range_failed(manager: undr.task.Manager) None#

Called if the HTTP range call fails.

The HTTP range request asks the serve to resumes download at a given byte offset. It used when on_begin() returns a non-zero value. Range is not always supported by the server. This function should reset counters and ready the local file system for a standard (full) download.

Parameters:

manager (task.Manager) – The task manager for reporting updates.

abstract on_response_ready(response: requests.Response, manager: undr.task.Manager) None#

Called when the HTTP response object is ready.

The reponse object can be used to download the remote file.

Parameters:
run(session: requests.Session, manager: undr.task.Manager)#
url() str#

Returns the file’s remote URL.

Returns:

File URL on the server.

Return type:

str

class undr.remote.DownloadFile(path_root: pathlib.Path, path_id: pathlib.PurePosixPath, suffix: str | None, server: Server, force: bool, expected_size: int | None, expected_hash: str | None)#

Bases: Download

Retrieves data from a remote server and saves it to a file.

on_begin(manager: undr.task.Manager) int#

Opens the local file before starting the download.

If the file exists, this function opens it in append mode and returns its size in bytes.

Parameters:

manager (task.Manager) – The task manager for reporting updates.

Returns:

Number of bytes already downloaded.

Return type:

int

on_end(manager: undr.task.Manager)#

Checks the hash and closes the file.

Parameters:

manager (task.Manager) – The task manager for reporting updates.

Raises:
on_range_failed(manager: undr.task.Manager)#

Re-opens the file in write mode.

Parameters:

manager (task.Manager) – The task manager for reporting updates.

on_response_ready(response: requests.Response, manager: undr.task.Manager) None#

Iterates over the file chunks and writes them to the file.

Parameters:
class undr.remote.NullServer#

Bases: Server

A placeholder server that raises an exception when used.

Some functions and classes require a server to download resources that are no available locally. If the resources are known to be local, this server can be used to detect download attempts.

abstract path_id_to_url(path_id: pathlib.PurePosixPath)#

Calculates a resource URL from its path ID.

Parameters:

path_id (pathlib.PurePosixPath) – The resource’s path ID, including the dataset name.

Returns:

The resource’s remote URL.

Return type:

str

class undr.remote.Progress#

Message that reports download progress.

complete: bool#

Whether this resource has been completely downloaded.

current_bytes: int#

Number of bytes of the remote resource that have been downloaded so far.

final_bytes: int#

Total number of bytes of the remote resource.

initial_bytes: int#

Number of bytes of the remote resource that were already downloaded when the current download began.

path_id: pathlib.PurePosixPath#

Path ID of the associated resource

class undr.remote.Server#

Represents a remote server.

timeout: float#

Timeout in seconds for requests to this server.

url: str#

The server’s base URL.

Resources URL are calculated by appending the file’s path ID to the server URL. A slash is inserted before the path ID if the server’s URL does not end with one.

path_id_to_url(path_id: pathlib.PurePosixPath) str#

Calculates a resource URL from its path ID.

Parameters:

path_id (pathlib.PurePosixPath) – The resource’s path ID, including the dataset name.

Returns:

The resource’s remote URL.

Return type:

str