Analysing and manipulating the inventory

Analysing the inventory

The structure and files including file stats are saved in an inventory that is returned by most API functions like sftp_download, clean!, convert!, and convert. The inventory is a SortedDict that saves relevant data such as the available data range, the file count and the overall size in the metadata. Stats about the data files are kept in the "dates" section and missing dates are listed in "gaps". Function list_inventory gives a simplified view of the folder and file tree together with important statistics about the inventory and the downloaded portion.

Any function relying on the inventory will load it when needed. However, the inventory can also be loaded on its own with load_inventory. For many functions, convenience methods exist that load the inventory from the hidden .inventory.yaml in the product folder and more performant methods exist using the preloaded inventory, skipping the load step.

ICARE.load_inventoryFunction
load_inventory(path::AbstractString) -> SortedDict

Load the database inventory from the path to the product folder (and a hidden yaml file) to a SortedDict, which can be processed by other ICARE functions.

See also: list_inventory

source
ICARE.list_inventoryFunction
list_inventory(
    inventory::SortedDict;
    list_dates::Bool=true,
    list_gaps::Bool=true,
    list_ignored::Bool=true,
    list_extras::Bool=true
)

List the content of the inventory in a simplified tree structure showing available and already downloaded folders and files and statistics about the inventory content. Additionally, missing dates (gaps), ignored files, and extra files are listed.

For all but the overall stats, printing can be switched off with keyword arguments (list_dates, list_gaps, list_ignored, list_extras).

source

Manipulating the inventory

Several functions exist to help shaping the database to the users need.

Batch conversions

Methods convert/convert! can be used to convert data files to a new file format. By default, HDF4 files are upgraded to HDF5. The same could be achieved by re-running sftp_download with convert option set to true. If the original files are already downloaded, then sftp_download will not re-download these files and just convert them. If you have already loaded the inventory, then convert! is the more performant option; convert is a convenience method that will load the inventory first and then call convert!.

Base.convertMethod
convert(
    root::AbstractString=".";
    sizecheck::Bool=false,
    logfile::String = "conversions.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Convert all files in root and part of the inventory to a new format as defined in the inventory. If files of the new format already exist, they are skipped unless sizecheck is set to true, which will reconvert any file whose size differs from that listed in the inventory. Logging is written to logfile with the specified loglevel. A timestamp is added to the log file name to avoid overwriting existing logs. The function returns the updated inventory.

See also: convert!, sftp_download, ignore!, unignore!

source
ICARE.convert!Function
convert!(
    inventory::SortedDict;
    sizecheck::Bool=false,
    logfile::String = "conversions.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Convert all files in the local database and part of the inventory to a new format as defined in the inventory. If files of the new format already exist, they are skipped unless sizecheck is set to true, which will reconvert any file whose size differs from that listed in the inventory. Logging is written to logfile with the specified loglevel. A timestamp is added to the log file name to avoid overwriting existing logs. The function returns the updated inventory.

See also: convert, sftp_download, ignore!, unignore!

source

Cleaning up the inventory

Function clean! can be used to remove all files in a product folder that do not belong to the database, i.e. that don't exist on the ICARE server. Files will be permanently deleted, but the user will always be shown, which files will be deleted, and ask for confirmation.

By passing a file extension or a vector of file extensions to the keyword argument keepext, those files will be ignored for the clean-up. This can be useful, for example, to keep log files defaulting to the .log extension. Otherwise, they will be removed not counting as part of the database and only the log file from the current cleaning run is saved.

The database itself can be cleaned up as well. By default, both the original and converted files are considered part of the database. But one or the other file type can be removed during cleaning. This is done by specifying which datatype shall be removed to the erase keyword argument with the Extension enum. The two choices available are original and converted. The enums are passed directly to the erase kwarg (without quotes or any other specifier). They are constants in ICARE that are exported from the package. Therefore, they should not be overwritten.

Warning

The Extension enum together with the values original and converted is exported by ICARE.jl. Do not unintentionally overwrite them or you will get error messages, when trying to set the erase option in clean!, unless you prepend the constants with the module name ICARE.original and ICARE.converted.

Danger

If the original files are removed during clean!, they are permanently lost and will have to be re-downloaded, if needed. If converted files are deleted, they can be converted again, if the originals are still present, which is much faster than downloading.

ICARE.clean!Function
clean!(
    root::AbstractString=".",
    erase::Extension=none;
    keepext::Union{AbstractString,Vector{<:AbstractString}}="",
    logfile::String = "clean.log",
    loglevel::Symbol = :Debug
) -> SortedDict

clean!(
    inventory::SortedDict,
    erase::Extension=none;
    keepext::Union{AbstractString,Vector{<:AbstractString}}="",
    logfile::String = "clean.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Clean a product folder recursively from all content not listed in the inventory, i.e. not available on the ICARE server, or not flagged as extra files in the inventory extra section with the attach! function. The function has two methods – you can either provide an AbstractString with the path of the product folder or the inventory as SortedDict of the product. The latter is more performant as the inventory doesn't need to be loaded first.

Both methods allow an optional second argument that specifies whether either the original or converted files should be additionally cleaned from the local database. The optional parameter values are predefined constants from the Extension enum. Both methods return the updated inventory for reference.

See also: attach!, detach!, ignore!, unignore!, convert!, convert(::AbstractString), sftp_download

Keyword Arguments

  • keepext::Union{AbstractString,Vector{<:AbstractString}}: One or multiple (as vector) file extensions (e.g. ".log", [".yaml", ".log"]) to keep during clean-up even if not part of the inventory. Can be used to keep log or metadata files.
  • logfile::String: The name of the log file (default: "clean.log"; the name will be appended by the current date and time).
  • loglevel::Symbol: The log level for the download process (default: :Debug).
source
ICARE.ExtensionType

Enum Extension

Store general choices for available file extensions in the database.

source

Ignoring granules in the inventory

Users can choose to ignore specific granules in the inventory. This functionality is meant for corrupted files in the remote database and will allow to remove them from the inventory and suppress re-downloads. Ignored files are not considered part of the inventory and are moved from the inventory's dates section to an ignore section, when function ignore! is invoked. The can be re-joined with the inventory, e.g. when files in the remote database are fixed, with function unignore!. Already downloaded, but ignored files will be deleted during clean! operations.

ICARE.ignore!Function
ignore!(
    inventory::SortedDict,
    dates::AbstractDict{Date, Any};
    logfile::String = "ignore.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Flag the dates as ignored in the inventory and ensure they will not get downloaded. Log events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The function returns the updated inventory.

See also: unignore!, attach!, detach!, sftp_download

source
ICARE.unignore!Function
unignore!(
    inventory::SortedDict,
    dates::AbstractDict{Date, Any}=Dict{Date,Any}();
    logfile::String = "ignore.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Unflag the dates from being ignored in the inventory and allow them to be downloaded again. Log events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The function returns the updated inventory.

See also: ignore!, attach!, detach!, sftp_download

source

Allowing extra data in the product folder

Normally, the product folder should be a direct copy of the remote database with the exception of the .inventory.yaml and any log files. However, some users might also want to store additional metadata, analyses of the database files or results from any investigations. For these reasons, additional files and folders can be flagged as extras and are added to an extras section in the inventory. Files in the extras section of the inventory are considered attached to the inventory, but not part of the database. They will be kept during clean! operations, but don't have influence on downloads or anything else.

Function attach! can be used to flag files and folders as extras and function detach! will remove them again and allow for them to be cleaned.

ICARE.attach!Function
attach!(
    inventory::SortedDict,
    extras::Union{AbstractString,Vector{<:AbstractString}};
    logfile::String = "extras.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Attach extra files and folders to the inventory that should be kept during clean! operations. Files nested in foreign folders are recognised as well keeping the parent folders during clean! operations. The extras can be provided as a single AbstractString or as a vector of AbstractStrings. The function returns the updated inventory and logs events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically.

See also: detach!, clean!, ignore!, unignore!

source
ICARE.detach!Function
detach!(
    inventory::SortedDict,
    extras::Union{AbstractString,Vector{<:AbstractString}}=String[];
    logfile::String = "extras.log",
    loglevel::Symbol = :Debug
) -> SortedDict

Detach files and folders from the inventory that were previously marked as extra data to be kept during clean! operations. If no extras are provided, all extra data will be detached. For nested files and folders, the parent will be detached as well, if it contains no other extra data.

The function returns the updated inventory and logs events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically.

See also: attach!, clean!, ignore!, unignore!

source