Introduction
FlowDB is a dataset of hourly river flow and precipitation data for over 9000+ rivers in the United States. The dataset was created in order to give insight into flash floods and droughts as well as study how climate change affects water sheds.
Dataset Creation
Part I
Dataset Construction Notebook (I) In this notebook the initial meta-data files are built for each river.
Dataset Construction Notebook (II) This is the notebook where the bulk of the data is gathered each.
USGS Station Meta-Data Part I (includes longitude and latitude): This file was originally retrieved
USGS River flow data retrieved from site
Re-adding the meta-data (a notebook that documents how we later added lat/lon data.
Data Format
Metadata format:
Metadata is in the following format:
gage_id: Refers to the USGS gage_id. This number may actually have a zero in front of it that was accidentally omitted. So in the below case the actual id would 06324500
which corresponds to the
stations: This is a list of the closest weather stations ordered ascending.
cat: The category of weather station
dist: The distance in miles between the gage and the weather station.
missing_precip: The number of missing precipitation values between 2014-2018
missing_temprature: The number of missing temp
station_id: The UUID of the station
{'gage_id': 6324500, 'latitude': 45.0571972, 'longitude': -105.8783778, 'stations': [{'cat': 'ASOS', 'dist': 83.18954430440758, 'missing_precip': 268566, 'missing_temp': 276833, 'station_id': 'GCC'}, {'cat': 'ASOS',
River data format:
The river data is a comma separated file .csv
Unnamed: 0_x hour_updated p01m valid tmpf Unnamed: 0_y agency_cd site_no datetime tz_cd 69512_00060 69512_00060_cd cfs 5.0 2014-01-01 06:00:00+00:00 0.0 2014-01-01 05:58 28.94 5.0 USGS 1491000 2014-01-01 06:00:00+00:00 EST 418 A 418.0
hour_updated: is converted to UTC time
p01m: is the precipitation in millimeters
valid:
tmpf: temperature in fahrenheit
agency_cd: The code of the USGS agency in charge
River flow information
Some rivers are dam and fed whereas others are entirely based on natural flows. Additionally some rivers have had alterations to the way they measure flows.
Accessing the dataset
Data is currently stored in Google Cloud Buckets
gsutil cp gs://predict_cfs .
Dataset Versions and Planned Updates
V.1.0: Released 9/2/20: River flow data, precipitation data, and station distance latitude/lon meta-data
V.1.1: Planned 1/20/20: Adding soil depth meta-data, yearly mean, humidity, and slope data. Adding 2019-2020 for scraped river data.
V.1.2 Planned TBD: Adding aerial imagery of river basins and data from 2000-2014 + finalizing flash flood linkage.