Loading New Data through HSDS
I. Create a List of New Files to Upload
- Create a json file {region.json} containing the region of interest.
- The format of the json file is
{ "region": [ {"lon": <lon1>, "lat": <lat1>}, {"lon": <lon2>, "lat": <lat2>}, ... {"lon": <lonN>, "lat": <latN>} ] }
- The points of the polygon must go in counter-clockwise order (a requirement of the NASA CMR system)
- The last point in the array must be identical to the first point in the array
- The format of the json file is
- Run the get_files_in_region.py utility to create the file list. Note the utility uses the icesat2.py package and therefore must execute in an environment where icesat2.py has been installed. See https://github.com/ICESat2-SlideRule/sliderule-python for more details.
$ python get_files_in_region.py {region.json} > {filelist.txt}
II. Download New Data from NSIDC
To download ATL03 and ATL06 data from NSIDC, use Python scripts provided by the tsutterley/read-ICESat-2 repository. Alternatively, the data can be downloaded manually from the NSIDC website either by the Download Via HTTPS
option (reached through Other Access Options) or the Download Script
.
- Checkout the read-ICESat-2 repository (https://github.com/tsutterley/read-ICESat-2.git) and setup a python environment with any necessary dependencies. For example, if using conda, the following packages are needed:
conda install lxml
conda install numpy
-
Use the {filelist.txt} created in the section above, OR create a file containing a list of the ATL03 and ATL06 files you want to download. Each file name is provided on its own line.
- Setup a netrc file with your Earth Data Login username and password. It should have the following contents with {username} and {password} filled out with your credentials:
machine urs.earthdata.nasa.gov login {username} password {password}
- Run the nsidc_icesat2_sync.py script to download the files.
$ python nsidc_icesat2_sync.py --netrc=~/.netrc --index={filelist.txt}
III. Rechunk Data for Optimized Cloud Access
- Go to sliderule/plugins/icesat2/utils and run the script that performs a parallel rechunk of the files.
If rechunking ATL03 data:
$ ./rechunk_atl03_dir.sh {source directory} {destination directory}
If rechunking ATL06 data:
$ ./rechunk_atl06_dir.sh {source directory} {destination directory}
IV. Upload New Data to S3
-
Log into AWS console.
-
Using the
S3
service, navigate to theicesat2-sliderule
bucket. -
Upload the files using the web interface.
If loading ATL03 data, upload to the /data/ATL03
folder.
If loading ATL06 data, upload to the /data/ATL06
folder.
V. Index Data using “hsload –link”
-
Log in to sliderule-icesat2-beta and verify that HSDS is running (e.g.
hsinfo
ordocker ps
) -
Go to sliderule/plugins/icesat2/utils and run the script that performs a parallel linked load of the files.
If loading ATL03 data:
$ ./load_atl03_files.sh ~/{filelist.txt}
If loading ATL06 data:
$ ./load_atl06_files.sh ~/{filelist.txt}