Downloading from the ICARE server
Synchronising folder structure
ICARE.jl is meant for data that is arranged by years and dates with the following structure:
<root>/<product folder>/yyyy/yyyy_mm_ddThis folder structure is synchronised with the local system and data files are downloaded to the date folders at the lowest level. To minimize server communication and speed up download processes, a local .inventory.yaml file (hidden on Linux and MacOs) is created in the product folder. The .inventory.yaml contains information about the folder structure and file stats and should not be edited or deleted. The inventory is created before the first download of a given product. This process takes several minutes up to hours in extreme cases. After the initial setup, only dates outside the known date range are updated, which is much faster, unless a complete resynchronisation is forced.
Don't edit or delete the .inventory.yaml file in each main product folder unless you know what you are doing! The creation or resynchronisation of the inventory takes several minutes or up to hours in extreme cases.
Downloading data files
Use the sftp_download function to synchronise the AERIS/ICARE server with the local system. Mandatory arguments are the ICARE user credentials, the product, and the date span. Further fine-tuning of the downloads is possible with keyword arguments as described in the sftp_download help. The function returns the current full inventory with all available online dates over the full date range for further exploration.
The inventory has its own version number that follows the rules of semantic versioning. It is, however, considered part of the public API of the ICARE package. So, any breaking changes in the inventory, for which no transitioning from older versions is provided, is considered a breaking change in ICARE itself. If a transition is provided, it would be a major update of the inventory, but a minor update of ICARE.
ICARE.sftp_download — Function
sftp_download(
user::String,
password::String,
product::String,
startdate::Int,
enddate::Int=-1;
version::Union{Nothing,Real} = 4.51,
remoteroot::String = "/SPACEBORNE/CALIOP/",
localroot::String = ".",
convert::Bool = true,
resync::Bool = false,
update::Bool = false,
logfile::String = "downloads.log",
loglevel::Symbol = :Debug
) -> SortedDictDownload satellite data from the Aeris/ICARE server. The function returns a dictionary with the inventory of available online data for the given product.
To use sftp_download, an Aeris/ICARE account is needed that is available for free for non-commercial use.
Positional arguments
user::String/password::String: Aeris/ICARE account credentialsproduct::String: The desired product to download (matches the folder name excluding the version number, e.g.,05kmCPro)startdate::Int/enddate::Int: The start/end date for the download period asInt(format:yyyy[mm[dd]]);
In the dates, the day and month part can be omitted. In this case, the earliest possible start date is selected and the latest possible end date, e.g. 202003 will give a start date of 2020-03-01 and an end date of 2020-03-31. The end date is optional, if omitted, the period defined by startdate is downloaded, either a day, or a month (if the day part is omitted) or a year (if both day and month are omitted).
See also: list_inventory, convert!, convert(::AbstractString), ignore!, unignore!, clean!
Keyword arguments
version::Union{Nothing,Real}: The version number of the product (default:4.51).remoteroot::String: The root path on the remote server (default:"/SPACEBORNE/CALIOP/").localroot::String: The root path on the local machine containing the product folder (default:".").convert::Bool: Whether or not to convert the downloaded files to another file format (default:true).resync::Bool: Whether to re-synchronize the local inventory with the remote server (default:false).update::Bool: Whether to update the local files if newer versions are available on the remote server (default:false). Converted file sizes will be deleted for any updates.logfile::String: The name of the log file (default:"downloads.log"; the name will be appended by the current date and time).loglevel::Symbol: The log level for the download process (default::Debug).
The update option automatically resynchronises the inventory as well.
Re-synchronisation of the inventory will take several minutes up to hours!
For custom version formats, the version can be set to nothing and included in the product string. By default, the product folder is constructed as <product>.v<X.XX> with the version as float with two decimal places independent of the input format.
By default, hdf files (version 4) are assumed as download source, which will be converted to .h5 (HDF5) file unless convert is set to false.
Specifying the download product
By default, CALIOP data (/SPACEBORNE/CALIOP) will be downloaded, but can be changed with the remoteroot keyword argument. Data will be downloaded to the current folder or the folder specified by localroot. Use the third positional argument to specify the product, you want to download. Products are assumed in the format <name>.v<X.XX>, where X.XX is the version number with a two-digit minor version. By default, version 4.51 is assumed, but can be changed with the version keyword argument (e.g., upgrade to version 5 with version=5).
Currently, the newest version 5 shows a significant performance decrease on the server side. Therefore, version 4.51 was chosen as the current default. Conversions to HDF5 format are currently erroring as well with the conversion tools provided.
For custom formats, version can be set to nothing and the entire name and version string passed to the product argument.
Specifying a date range
For convenience, dates are given as integers, so users don't have to import the Dates package. You can either give one date range as fourth positional argument or a start and end date as fourth and fifth positional argument.
The date format is yyyy[mm[dd]], where the day and/or month part can be omitted. Defining a span is possible for:
- a whole year (
yyyy) - a whole month (
yyyymm) - a day (
yyyymmdd)
If the the day and/or month are omitted in the start or end date, they will be filled with the earliest possible day for the startdate and the latest possible date for the enddate.
Some examples are:
20220212: Download the whole day of2022-02-12202004: Download the whole April of 20202020: Download the whole year 20202002, 200206: Download the first half of 2002200207, 2002: Download the second half of 2002200003, 20000315: Download the first half of March 200020000316, 200003: Download the second half of March 2000
Only complete days can be downloaded.
Updating the inventory
As mentioned before, updating the entire inventory including the parts already synced may take a long time and should not be necessary under normal circumstances. It can be achieved by setting resync to true.
If you want to check for updated data files on the server, you can set the update flag to true. This will also resync the complete inventory and, hence, take a long time to finish. If update is set, any newer file on the server compared to the modified time on the local system will be downloaded.
Logging
Some basic information about the current download session is printed to the screen with further comprehensive information in a log file. By default, log files are saved to the product folder as download_<timestamp>.log. You can change the file name with the logfile keyword argument. The file name will automatically be appended with the time stamp, when the download session started. If the file name includes a path, the logfile will be saved to this path. The path can be absolute or relative to your current location (where you started your julia session or where you changed to during your julia session), i.e. logfile = "~/icare.log" will create a logfile icare_<timeestamp>.log in your root directory. Note that the extension can be change as well to, e.g., .txt or .dat.
For the log file the verbosity can be set to 4 levels given as Symbol (with verbosity listed from least to most):
ErrorWarnInfoDebug
By default, all messages are printed allowing you to track the download status on screen and in the log file. The Debug level is used to inform about completed downloads.
Converting data files
During download
During an sftp_download, you can set the convert keyword argument to true. By default, this will save downloads in the HDF5 (.h5) format instead of the expected HDF4 (.hdf) format. You can overload the ICARE.convert_file function, see section about Converting to other file formats. However, you are allowed to save only one other format.
Data conversion is only available under Linux and MacOS, not under Windows. If you want to use ICARE.jl under Windows with data conversion, you need to write your own conversion routine and overload ICARE.convert_file and ICARE.newext (see section about Converting to other file formats).
If you have written a conversion routine for Windows, consider a pull request to this repo with the conversion script (or compiled binary) in the assets folder to share it with others.
If you want both formats saved on your local machine, download the original format with sftp_download by setting convert to false. Use the convert! method to upgrade the downloaded files to the new format.
If your run sftp_download with convert=true the first time, original downloads are not kept. If you later decide, to keep only one format, you can use function clean! to delete one or the other format.
Separate file conversions
Routines exist to do batch conversions of the existing files and are described in the section about Analysing and manipulating the inventory.