undr.simple
#
High-level functions to install datasets without using a configuration file.
The functions in this module can be very convenient for one-off downloads and simple projects. Configuration files (undr.toml) are recommended for more complex projects.
Overview#
Functions#
Generates a list of the default datasets’ names. |
|
Downloads (and optionally decompresses) a dataset. |
|
Generates a dictionary of the default datasets’ names and URLs. |
Module Contents#
- undr.simple.default_datasets() list[str] #
Generates a list of the default datasets’ names.
This function calls
name_to_url()
and has the same caveats regarding caching.
- undr.simple.install(name: str, url: str | None = None, timeout: float = constants.DEFAULT_TIMEOUT, mode: str | undr.install_mode.Mode = install_mode.Mode.LOCAL, directory: str | pathlib.Path = 'datasets', show_display: bool = True, workers: int = multiprocessing.cpu_count() * 2, force: bool = False, log_directory: pathlib.Path | None = None)#
Downloads (and optionally decompresses) a dataset.
See
undr.install_mode.Mode
for details on the different installation strategies.- Parameters:
name (str) – Name of the dataset to install. Unless url is provided, it must be one of the keys returned by
name_to_url()
.url (Optional[str], optional) – URL of the dataset. Defaults to None.
timeout (float, optional) – Request timeout in seconds. Defaults to
undr.constants.DEFAULT_TIMEOUT
.mode (Union[str, install_mode.Mode], optional) – Installation strategy. Defaults to
undr.install_mode.Mode.LOCAL
.directory (Union[str, pathlib.Path], optional) – Path of the local directory to store datasets. Defaults to “datasets”.
show_display (bool, optional) – Whether to show a progress bar. Defaults to True.
workers (int, optional) – Number of parallel workers (threads). Defaults to twice
multiprocessing.cpu_count()
force (bool, optional) – Whether to re-download files even if they are already present locally. Defaults to False.
log_directory (Optional[pathlib.Path], optional) – Directory to store log files. Logs are not generated if this is None. Defaults to None.
- undr.simple.name_to_url() dict[str, str] #
Generates a dictionary of the default datasets’ names and URLs.
The first call to this function parses the configuration file bundled with UNDR and caches the result. Subsequent calls return the cached value immediately. Hence, modifying the returned dictionary also modifies the returned value of all subsequent calls, until the Python interpreter restarts. Users who plan to modify the returned value may want to call
name_to_url().copy()
to avoid this problem.