Abstract
Open large-sample datasets are important for various reasons: i) they enable large-sample analyses, ii) they democratize access to data, iii) they enable large-sample comparative studies and foster reproducibility, and iv) they are a key driver for recent developments of machine-learning based modelling approaches.
Recently, various large-sample datasets have been released (e.g. different country-specific CAMELS datasets), however, all of them contain only data of individual catchments distributed across entire countries and not connected river networks.
Here, we present LamaH, a new dataset covering all of Austria and the foreign upstream areas of the Danube, spanning a total of 170.000 km² in 9 different countries with discharge observations for 882 gauges. The dataset also includes 15 different meteorological time series, derived from ERA5-Land, for two different basin delineations: First, corresponding to the entire upstream area of a particular gauge, and second, corresponding only to the area between a particular gauge and its upstream gauges. The time series data for both, meteorological and discharge data, is included in hourly and daily resolution and covers a period of over 35 years (with some exceptions in discharge data for a couple of gauges).
Sticking closely to the CAMELS datasets, LamaH also contains more than 60 catchment attributes, derived for both types of basin delineations. The attributes include climatic, hydrological and vegetation indices, land cover information, as well as soil, geological and topographical properties. Additionally, the runoff gauges are classified by over 20 different attributes, including information about human impact and indicators for data quality and completeness. Lastly, LamaH also contains attributes for the river network itself, like gauge topology, stream length and the slope between two sequential gauges.
Given the scope of LamaH, we hope that this dataset will serve as a solid database for further investigations in various tasks of hydrology. The extent of data combined with the interconnected river network and the high temporal resolution of the time series might reveal deeper insights into water transfer and storage with appropriate methods of modelling.
Original language | English |
---|---|
Title of host publication | Proceedings EGU General Assembly 2021, online, April 2021 |
Number of pages | 1 |
DOIs | |
Publication status | Published - 2021 |
Fields of science
- 305907 Medical statistics
- 202017 Embedded systems
- 202036 Sensor systems
- 101004 Biomathematics
- 101014 Numerical mathematics
- 101015 Operations research
- 101016 Optimisation
- 101017 Game theory
- 101018 Statistics
- 101019 Stochastics
- 101024 Probability theory
- 101026 Time series analysis
- 101027 Dynamical systems
- 101028 Mathematical modelling
- 101029 Mathematical statistics
- 101031 Approximation theory
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 102018 Artificial neural networks
- 102019 Machine learning
- 102032 Computational intelligence
- 102033 Data mining
- 305901 Computer-aided diagnosis and therapy
- 305905 Medical informatics
- 202035 Robotics
- 202037 Signal processing
- 103029 Statistical physics
- 106005 Bioinformatics
- 106007 Biostatistics
JKU Focus areas
- Digital Transformation