Analysing and manipulating the inventory
Analysing the inventory
The structure and files including file stats are saved in an inventory that is returned by most API functions like sftp_download, clean!, convert!, and convert. The inventory is a SortedDict that saves relevant data such as the available data range, the file count and the overall size in the metadata. Stats about the data files are kept in the "dates" section and missing dates are listed in "gaps". Function list_inventory gives a simplified view of the folder and file tree together with important statistics about the inventory and the downloaded portion.
Any function relying on the inventory will load it when needed. However, the inventory can also be loaded on its own with load_inventory. For many functions, convenience methods exist that load the inventory from the hidden .inventory.yaml in the product folder and more performant methods exist using the preloaded inventory, skipping the load step.
ICARE.load_inventory — Function
load_inventory(path::AbstractString) -> SortedDictLoad the database inventory from the path to the product folder (and a hidden yaml file) to a SortedDict, which can be processed by other ICARE functions.
See also: list_inventory
ICARE.list_inventory — Function
list_inventory(
inventory::SortedDict;
list_dates::Bool=true,
list_gaps::Bool=true,
list_ignored::Bool=true,
list_extras::Bool=true
)List the content of the inventory in a simplified tree structure showing available and already downloaded folders and files and statistics about the inventory content. Additionally, missing dates (gaps), ignored files, and extra files are listed.
For all but the overall stats, printing can be switched off with keyword arguments (list_dates, list_gaps, list_ignored, list_extras).
Manipulating the inventory
Several functions exist to help shaping the database to the users need.
Batch conversions
Methods convert/convert! can be used to convert data files to a new file format. By default, HDF4 files are upgraded to HDF5. The same could be achieved by re-running sftp_download with convert option set to true. If the original files are already downloaded, then sftp_download will not re-download these files and just convert them. If you have already loaded the inventory, then convert! is the more performant option; convert is a convenience method that will load the inventory first and then call convert!.
Base.convert — Method
convert(
root::AbstractString=".";
sizecheck::Bool=false,
logfile::String = "conversions.log",
loglevel::Symbol = :Debug
) -> SortedDictConvert all files in root and part of the inventory to a new format as defined in the inventory. If files of the new format already exist, they are skipped unless sizecheck is set to true, which will reconvert any file whose size differs from that listed in the inventory. Logging is written to logfile with the specified loglevel. A timestamp is added to the log file name to avoid overwriting existing logs. The function returns the updated inventory.
See also: convert!, sftp_download, ignore!, unignore!
ICARE.convert! — Function
convert!(
inventory::SortedDict;
sizecheck::Bool=false,
logfile::String = "conversions.log",
loglevel::Symbol = :Debug
) -> SortedDictConvert all files in the local database and part of the inventory to a new format as defined in the inventory. If files of the new format already exist, they are skipped unless sizecheck is set to true, which will reconvert any file whose size differs from that listed in the inventory. Logging is written to logfile with the specified loglevel. A timestamp is added to the log file name to avoid overwriting existing logs. The function returns the updated inventory.
See also: convert, sftp_download, ignore!, unignore!
Cleaning up the inventory
Function clean! can be used to remove all files in a product folder that do not belong to the database, i.e. that don't exist on the ICARE server. Files will be permanently deleted, but the user will always be shown, which files will be deleted, and ask for confirmation.
By passing a file extension or a vector of file extensions to the keyword argument keepext, those files will be ignored for the clean-up. This can be useful, for example, to keep log files defaulting to the .log extension. Otherwise, they will be removed not counting as part of the database and only the log file from the current cleaning run is saved.
The database itself can be cleaned up as well. By default, both the original and converted files are considered part of the database. But one or the other file type can be removed during cleaning. This is done by specifying which datatype shall be removed to the erase keyword argument with the Extension enum. The two choices available are original and converted. The enums are passed directly to the erase kwarg (without quotes or any other specifier). They are constants in ICARE that are exported from the package. Therefore, they should not be overwritten.
If the original files are removed during clean!, they are permanently lost and will have to be re-downloaded, if needed. If converted files are deleted, they can be converted again, if the originals are still present, which is much faster than downloading.
ICARE.clean! — Function
clean!(
root::AbstractString=".",
erase::Extension=none;
keepext::Union{AbstractString,Vector{<:AbstractString}}="",
logfile::String = "clean.log",
loglevel::Symbol = :Debug
) -> SortedDict
clean!(
inventory::SortedDict,
erase::Extension=none;
keepext::Union{AbstractString,Vector{<:AbstractString}}="",
logfile::String = "clean.log",
loglevel::Symbol = :Debug
) -> SortedDictClean a product folder recursively from all content not listed in the inventory, i.e. not available on the ICARE server, or not flagged as extra files in the inventory extra section with the attach! function. The function has two methods – you can either provide an AbstractString with the path of the product folder or the inventory as SortedDict of the product. The latter is more performant as the inventory doesn't need to be loaded first.
Both methods allow an optional second argument that specifies whether either the original or converted files should be additionally cleaned from the local database. The optional parameter values are predefined constants from the Extension enum. Both methods return the updated inventory for reference.
See also: attach!, detach!, ignore!, unignore!, convert!, convert(::AbstractString), sftp_download
Keyword Arguments
keepext::Union{AbstractString,Vector{<:AbstractString}}: One or multiple (as vector) file extensions (e.g.".log",[".yaml", ".log"]) to keep during clean-up even if not part of the inventory. Can be used to keep log or metadata files.logfile::String: The name of the log file (default:"clean.log"; the name will be appended by the current date and time).loglevel::Symbol: The log level for the download process (default::Debug).
ICARE.Extension — Type
Enum Extension
Store general choices for available file extensions in the database.
Ignoring granules in the inventory
Users can choose to ignore specific granules in the inventory. This functionality is meant for corrupted files in the remote database and will allow to remove them from the inventory and suppress re-downloads. Ignored files are not considered part of the inventory and are moved from the inventory's dates section to an ignore section, when function ignore! is invoked. The can be re-joined with the inventory, e.g. when files in the remote database are fixed, with function unignore!. Already downloaded, but ignored files will be deleted during clean! operations.
ICARE.ignore! — Function
ignore!(
inventory::SortedDict,
dates::AbstractDict{Date, Any};
logfile::String = "ignore.log",
loglevel::Symbol = :Debug
) -> SortedDictFlag the dates as ignored in the inventory and ensure they will not get downloaded. Log events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The function returns the updated inventory.
See also: unignore!, attach!, detach!, sftp_download
ICARE.unignore! — Function
unignore!(
inventory::SortedDict,
dates::AbstractDict{Date, Any}=Dict{Date,Any}();
logfile::String = "ignore.log",
loglevel::Symbol = :Debug
) -> SortedDictUnflag the dates from being ignored in the inventory and allow them to be downloaded again. Log events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically. The function returns the updated inventory.
See also: ignore!, attach!, detach!, sftp_download
Allowing extra data in the product folder
Normally, the product folder should be a direct copy of the remote database with the exception of the .inventory.yaml and any log files. However, some users might also want to store additional metadata, analyses of the database files or results from any investigations. For these reasons, additional files and folders can be flagged as extras and are added to an extras section in the inventory. Files in the extras section of the inventory are considered attached to the inventory, but not part of the database. They will be kept during clean! operations, but don't have influence on downloads or anything else.
Function attach! can be used to flag files and folders as extras and function detach! will remove them again and allow for them to be cleaned.
ICARE.attach! — Function
attach!(
inventory::SortedDict,
extras::Union{AbstractString,Vector{<:AbstractString}};
logfile::String = "extras.log",
loglevel::Symbol = :Debug
) -> SortedDictAttach extra files and folders to the inventory that should be kept during clean! operations. Files nested in foreign folders are recognised as well keeping the parent folders during clean! operations. The extras can be provided as a single AbstractString or as a vector of AbstractStrings. The function returns the updated inventory and logs events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically.
ICARE.detach! — Function
detach!(
inventory::SortedDict,
extras::Union{AbstractString,Vector{<:AbstractString}}=String[];
logfile::String = "extras.log",
loglevel::Symbol = :Debug
) -> SortedDictDetach files and folders from the inventory that were previously marked as extra data to be kept during clean! operations. If no extras are provided, all extra data will be detached. For nested files and folders, the parent will be detached as well, if it contains no other extra data.
The function returns the updated inventory and logs events with the specified loglevel to the logfile. A timestamp is appended to the log file name automatically.