undr.simple#

High-level functions to install datasets without using a configuration file.

The functions in this module can be very convenient for one-off downloads and simple projects. Configuration files (undr.toml) are recommended for more complex projects.

Overview#

Functions#

default_datasets

Generates a list of the default datasets’ names.

install

Downloads (and optionally decompresses) a dataset.

name_to_url

Generates a dictionary of the default datasets’ names and URLs.

Module Contents#

undr.simple.default_datasets() list[str]#

Generates a list of the default datasets’ names.

This function calls name_to_url() and has the same caveats regarding caching.

Returns:

The names of the default datasets.

Return type:

list[str]

undr.simple.install(name: str, url: str | None = None, timeout: float = constants.DEFAULT_TIMEOUT, mode: str | undr.install_mode.Mode = install_mode.Mode.LOCAL, directory: str | pathlib.Path = 'datasets', show_display: bool = True, workers: int = multiprocessing.cpu_count() * 2, force: bool = False, log_directory: pathlib.Path | None = None)#

Downloads (and optionally decompresses) a dataset.

See undr.install_mode.Mode for details on the different installation strategies.

Parameters:
  • name (str) – Name of the dataset to install. Unless url is provided, it must be one of the keys returned by name_to_url().

  • url (Optional[str], optional) – URL of the dataset. Defaults to None.

  • timeout (float, optional) – Request timeout in seconds. Defaults to undr.constants.DEFAULT_TIMEOUT.

  • mode (Union[str, install_mode.Mode], optional) – Installation strategy. Defaults to undr.install_mode.Mode.LOCAL.

  • directory (Union[str, pathlib.Path], optional) – Path of the local directory to store datasets. Defaults to “datasets”.

  • show_display (bool, optional) – Whether to show a progress bar. Defaults to True.

  • workers (int, optional) – Number of parallel workers (threads). Defaults to twice multiprocessing.cpu_count()

  • force (bool, optional) – Whether to re-download files even if they are already present locally. Defaults to False.

  • log_directory (Optional[pathlib.Path], optional) – Directory to store log files. Logs are not generated if this is None. Defaults to None.

undr.simple.name_to_url() dict[str, str]#

Generates a dictionary of the default datasets’ names and URLs.

The first call to this function parses the configuration file bundled with UNDR and caches the result. Subsequent calls return the cached value immediately. Hence, modifying the returned dictionary also modifies the returned value of all subsequent calls, until the Python interpreter restarts. Users who plan to modify the returned value may want to call name_to_url().copy() to avoid this problem.

Returns:

Dictionary whose keys are dataset names and whose values are matching dataset URLs.

Return type:

dict[str, str]