...
Code Block | ||
---|---|---|
| ||
{'gage_id': 6324500, 'latitude': 45.0571972, 'longitude': -105.8783778, 'stations': [{'cat': 'ASOS', 'dist': 83.18954430440758, 'missing_precip': 268566, 'missing_temp': 276833, 'station_id': 'GCC'}, {'cat': 'ASOS', |
File naming convention
Files are named as follows: gage_id_station_id.csv
For example 01037380KRKD_flow.csv
in this case 01037380
is the USGS gage id and KRKD
is the weather station id.
River data format:
The river data is a comma separated file .csv
Code Block |
---|
Unnamed: 0_x hour_updated p01m valid tmpf Unnamed: 0_y agency_cd site_no datetime tz_cd 69512_00060 69512_00060_cd cfs 5.0 2014-01-01 06:00:00+00:00 0.0 2014-01-01 05:58 28.94 5.0 USGS 1491000 2014-01-01 06:00:00+00:00 EST 418 A 418.0 |
Data dictionary
Column name | Type | Description |
---|---|---|
hour_updated |
...
datetime | This the weather station time. This datetime was originally in UTC (we left it as is. | |
datetime | datetime | the USGS datetime (which has also been converted to UTC |
...
). | ||
p01m: | float | is the precipitation in millimeters |
...
valid:
...
that occurred during the past hour. | ||
tmpf | float | is the temperature in Fahrenheit (average over the past hour) |
cfs | float | The discharge in cubic feet per second (the target variable) |
tz_cd | string | The original time zone code of the USGS data. |
agency_cd |
...
string | The code of the USGS agency in charge (not really helpful) | |
site_no | string | Not really helpful always USGS |
Unnamed: 0_x | string |
River flow information
Some rivers are dam and fed whereas others are entirely based on natural flows. Additionally some rivers have had alterations to the way they measure flows.
...
Data is currently stored in Google Cloud Buckets and be accessed via using the gsutil
tool or downloading files manually.
This will download all the temporal data:
gsutil cp -r gs://predict_cfs aistream-datasets/day_addition .
This will download all the meta-data
gsutil cp -r gs://aistream-datasets/flow/meta_data .
See this colaboratory notebook if you don’t know how to install/use gsutil.
Dataset Versions and Planned Updates
V.1.0: Released 9/2/20: River flow data, precipitation data, and station distance, latitude/lon meta-data. This can be found at gs://aistream-datasets/day_addition
V.1.1: Planned In Progress 1/20/20: Adding soil depth meta-data, yearly mean, humidity, and slope data. Adding 2019-2020 for scraped river dataExpanding data to include additional years 2000-2020. Including additional USGS gage data where available such as height, sediment, etc. Finalizing a cleaned up subset to post to Kaggle.
V.1.2 Planned TBD: Adding aerial imagery of river basins and data from 2000-2014 + finalizing flash flood linkage. v.
V.2.0 Adding soil moisture data, snowpack data, yearly mean, humidity, and slope data.