The database inventory
Analysing the inventory
The structure and files including file stats are saved in an inventory that is returned by most API functions like sftp_download, clean!, convert_inventory!, and convert_inventory. The inventory is a SortedDict that saves relevant data such as the available data range, the file count and the overall size in the metadata. Stats about the data files are kept in the "dates" section and missing dates are listed in "gaps". Function list_inventory gives a simplified view of the folder and file tree together with important statistics about the inventory and the downloaded portion.
Any function relying on the inventory will load it when needed. However, the inventory can also be loaded on its own with load_inventory. For many functions, convenience methods exist that load the inventory from the hidden .inventory.yaml in the product folder and more performant methods exist using the preloaded inventory, skipping the load step.
There is no public function to save the inventory. This is done automatically every time the inventory is updated by any ICARE function. Users should not have to save the inventory manually and should not attempt to do so as a wrong format might lead to unintended errors.
ICARE.load_inventory — Function
load_inventory(
path::AbstractString,
logger::logex.AbstractLogger=logex.global_logger();
save_migrations::Bool=true
) -> SortedDictLoad the database inventory from the path to the product folder (and a hidden yaml file) to a SortedDict, which can be processed by other ICARE functions. By default, older versions of the inventory are upgraded and saved to the .inventory.yaml. Updating the inventory file can be prevented by setting the kwarg save_migrations to false.
If a logger is provided, events are logged to logger instead of the global logger (typically the console).
See also: list_inventory
ICARE.list_inventory — Function
list_inventory(
inventory::SortedDict;
list_dates::Bool=true,
list_gaps::Bool=true,
list_ignored::Bool=true,
list_extras::Bool=true
)List the content of the inventory in a simplified tree structure showing available and already downloaded folders and files and statistics about the inventory content. Additionally, missing dates (gaps), ignored files, and extra files are listed.
For all but the overall stats, printing can be switched off with keyword arguments (list_dates, list_gaps, list_ignored, list_extras).
Manipulating the inventory
Several functions exist to help shaping the database to the users need.
Batch conversions
Methods convert_inventory/convert_inventory! can be used to convert data files to a new file format. By default, HDF4 files are upgraded to HDF5. The same could be achieved by re-running sftp_download with convert option set to true. If the original files are already downloaded, then sftp_download will not re-download these files and just convert them. If you have already loaded the inventory, then conversion with convert_inventory! is the more performant option; convert_inventory is a convenience method that will load the inventory first and then call convert_inventory!.
ICARE.convert_inventory — Function
convert_inventory(
root::AbstractString=".";
sizecheck::Bool=false,
logfile::AbstractString = "logs/conversions.log",
loglevel::Symbol = :Debug
) -> SortedDictConvert all files in root that are part of the inventory to a new format as defined in the inventory. If files of the new format already exist, they are skipped unless sizecheck is set to true, which will reconvert any file whose size differs from that listed in the inventory. Logging is written to logfile with the specified loglevel. A timestamp is added to the log file name to avoid overwriting existing logs. The logfile may include a path (either absolute or relative to the product folder). The function returns the updated inventory.
See also: convert_inventory!, sftp_download, ignore!, unignore!
ICARE.convert_inventory! — Function
convert_inventory!(
inventory::SortedDict,
sizecheck::Bool,
logfile::AbstractString = "logs/conversions.log",
loglevel::Symbol = :Debug
) -> SortedDictConvert all files in the local database that are part of the inventory to a new format as defined in the inventory. If files of the new format already exist, they are skipped unless sizecheck is set to true, which will reconvert any file whose size differs from that listed in the inventory. Logging is written to logfile with the specified loglevel. A timestamp is added to the log file name to avoid overwriting existing logs. The logfile may include a path (either absolute or relative to the product folder). The function returns the updated inventory.
See also: convert_inventory, sftp_download, ignore!, unignore!
Cleaning up the inventory
Function clean! can be used to remove all files in a product folder that do not belong to the database, i.e. that don't exist on the ICARE server. Files will be permanently deleted, but the user will always be shown, which files will be deleted, and ask for confirmation.
By passing a file extension or a vector of file extensions to the keyword argument keepext, those files will be ignored for the clean-up. This can be useful, for example, to keep log files defaulting to the .log extension. Otherwise, they will be removed not counting as part of the database and only the log file from the current cleaning run is saved.
The database itself can be cleaned up as well. By default, both the original and converted files are considered part of the database. But one or the other file type can be removed during cleaning. This is done by specifying which datatype shall be removed to the erase keyword argument with the Extension enum. The two choices available are original and converted. The enums are passed directly to the erase kwarg (without quotes or any other specifier). They are constants in ICARE that are exported from the package. Therefore, they should not be overwritten.
If the original files are removed during clean!, they are permanently lost and will have to be re-downloaded, if needed. If converted files are deleted, they can be converted again, if the originals are still present, which is much faster than downloading.
ICARE.clean! — Function
clean!(
root::AbstractString=".",
erase::Extension=none;
logs::Bool=false,
logfile::AbstractString = "logs/clean.log",
loglevel::Symbol = :Debug
) -> SortedDict
clean!(
inventory::SortedDict,
erase::Extension=none;
logs::Bool=false,
logfile::AbstractString = "logs/clean.log",
loglevel::Symbol = :Debug
) -> SortedDictClean a product folder recursively from all content not listed in the inventory, i.e. not available on the ICARE server, or not flagged as extra files in the inventory extra section with the attach! function. The function has two methods – you can either provide an AbstractString with the path of the product folder or the inventory as SortedDict of the product. The latter is more performant as the inventory doesn't need to be loaded first.
Both methods allow an optional second argument that specifies whether either the original or converted files should be additionally cleaned from the local database. The optional parameter values are predefined constants from the Extension enum. Both methods return the updated inventory for reference.
See also: attach!, detach!, ignore!, unignore!, convert_inventory!, convert_inventory, sftp_download
Keyword Arguments
logs::Bool: Whether to clean (true) or keep (false) log files created during previous operations (default:false).logfile::AbstractString: The name of the log file (default:"logs/clean.log"; the name will be appended by the start timestamp). The name may include a path (either absolute or relative to the product folder).loglevel::Symbol: The log level for the download process (default::Debug).
ICARE.Extension — Type
Enum Extension
Store general choices for available file extensions in the database.
Ignoring granules in the inventory
Users can choose to ignore specific granules in the inventory. This functionality is meant for corrupted files in the remote database and will allow to remove them from the inventory and suppress re-downloads. Ignored files are not considered part of the inventory and are moved from the inventory's dates section to an ignore section, when function ignore! is invoked. The can be re-joined with the inventory, e.g. when files in the remote database are fixed, with function unignore!. Already downloaded, but ignored files will be deleted during clean! operations.
ICARE.ignore! — Function
ignore!(
inventory::SortedDict,
dates::AbstractDict{Date, Any};
logfile::AbstractString = "logs/ignore.log",
loglevel::Symbol = :Debug
) -> SortedDictFlag the dates as ignored in the inventory and ensure they will not get downloaded. Log events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The logfile may include a path (either absolute or relative to the product folder). The function returns the updated inventory.
See also: unignore!, attach!, detach!, sftp_download
ICARE.unignore! — Function
unignore!(
inventory::SortedDict,
dates::AbstractDict{Date, Any}=Dict{Date,Any}();
logfile::AbstractString = "logs/ignore.log",
loglevel::Symbol = :Debug
) -> SortedDictUnflag the dates from being ignored in the inventory and allow them to be downloaded again. Log events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The logfile may include a path (either absolute or relative to the product folder). The function returns the updated inventory.
See also: ignore!, attach!, detach!, sftp_download
Allowing extra data in the product folder
Normally, the product folder should be a direct copy of the remote database with the exception of the .inventory.yaml and any log files. However, some users might also want to store additional metadata, analyses of the database files or results from any investigations. For these reasons, additional files and folders can be flagged as extras and are added to an extras section in the inventory. Files in the extras section of the inventory are considered attached to the inventory, but not part of the database. They will be kept during clean! operations, but don't have influence on downloads or anything else.
Function attach! can be used to flag files and folders as extras and function detach! will remove them again and allow for them to be cleaned.
ICARE.attach! — Function
attach!(
inventory::SortedDict,
extras::Union{AbstractString,Vector{<:AbstractString}};
logfile::AbstractString = "logs/extras.log",
loglevel::Symbol = :Debug
) -> SortedDictAttach extra files and folders to the inventory that should be kept during clean! operations. Files nested in foreign folders are recognised as well keeping the parent folders during clean! operations. The extras can be provided as a single AbstractString or as a vector of AbstractStrings. The function returns the updated inventory and logs events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The logfile may include a path (either absolute or relative to the product folder).
ICARE.detach! — Function
detach!(
inventory::SortedDict,
extras::Union{AbstractString,Vector{<:AbstractString}}=String[];
logfile::AbstractString = "logs/extras.log",
loglevel::Symbol = :Debug
) -> SortedDictDetach files and folders from the inventory that were previously marked as extra data to be kept during clean! operations. If no extras are provided, all extra data will be detached. For nested files and folders, the parent will be detached as well, if it contains no other extra data.
The function returns the updated inventory and logs events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The logfile may include a path (either absolute or relative to the product folder).
Inventory Changelog
The inventory has its own version. Like the ICARE package, the inventory follows Semantic Versioning and uses Keep a Changelog style.
However, non-breaking changes in the inventory might lead to breaking changes in the ICARE package and, conversely, breaking changes in the inventory need not necessarily be breaking in ICARE. Therefore, separate versions and changelogs are introduced for the package and its inventory.
[v2.0.0] - 2026-01-22
Added
- Added entry
compression-ratiotoinventory["metadata"]["database"]["size"](#38)
Changed
- Reordered entries in
inventory["metadata"]["database"]withstartandstopdate directly afterdatesandmissingdata. - Made
sizea sub-dictionary with entriestotal,downloaded, andconvertedinstead of the entries"downloaded size"and"converted size". This avoids quoted multi-word keys in the inventory.
[v1.0.0] - 2025-12-27
Added
- First stable version of the inventory (being tracked from now on) with sections for
datesholding file stats of all available granules for all available datesextrasmarking path objects in the local product folder that are considered attached to the inventory, but not part of the inventory, and will be refrained from cleaning processesignoremarking inventory data (e.g. corrupt granules) that are not considered part of the local database and will not be downloaded or will be removed during cleaning processesgapsholding all dates in the inventory time span with no data availablemetadatawith statistics about the inventory, e.g. available dates, sizes, file counts, etc. and information needed for processing the inventory