Saving Italian hydRological mEasuremeNts (SIREN), a new project in Italy aims to save 70 years of hydro-meteorological measurements by crowd-sourcing the work of volunteers. The work will involve digitizing historical hydrological data to allow researchers to test new methodologies or train new models.
Hydro-meteorological data collection in Italy was managed at the national level by the National Hydrological and Mareographic Service (Servizio Idrografico e Mareografico Nazionale, SIMN) since it first began in the early 1900s.
About 30 years ago SIMN was dismantled and responsibility for data collection was transferred to the regional level, consisting of 19 Regions and 2 Autonomous Provinces.
This shift caused difficulties in the availability of complete and homogeneous records for the whole country. Data acquired in the most recent years are typically available in digital format, making it easier to combine disparate regional datasets. However, historical measurements only exist in printed format, through the Hydrological Yearbooks published by the National Hydrological and Mareographic Service.
There have been a few initiatives to attempt to partially recover this information in the past, but they were often limited in scope, focusing on specific periods or geographical areas.
Why is the lack of digital data a problem?
One of the major problems that both hydrologists and climatologists face is the limited amount of historical data that can be used to test new methodologies or train models. This lack of data is even more critical in a country like Italy, with complex morphology and climate that varies substantially across the territory. The recovery of this considerable amount of data would not only allow a better understanding of the climate of the last century but would also serve to estimate how the climate and the hydrological cycle could change in the future.
Why not use optical character recognition software?
Despite the remarkable improvements in Optical Character Recognition (OCR) software and machine learning / artificial intelligence techniques, the most accurate digitization approach is still based on manual transcription.
Most of these records are printed in old documents, and the ink may be partially damaged. For example, an “8” can be easily mistaken for a “3” in these conditions. Moreover, these tables contain several hand-written corrections performed by different people with different handwriting. All these peculiarities limit the applicability of standardized automated approaches.
How you can help
The SIREN project is looking for volunteers to help in the digitization process. You can find out more here.
The goal of the project is the digitization of Italian historical hydro-meteorological records to produce a consistent national-scale dataset. The task for the volunteers is the transcription of data and the interpretation of handwritten corrections that could be reported on these printed documents.
All the pages of the Hydrological Yearbooks are available as images on the website. Pages are randomly selected for users to transcribe. No specific knowledge of the topic is needed and there is a simple tutorial on the website that allows users to quickly identify the relevant data and the name of the gauging station. No specific equipment is needed, just a computer and an internet connection are enough.
Paola Mazzoglio, a research assistant at Politecnico di Torino, who is involved in organising the project, said the resulting dataset “could be useful for a wide range of possible applications, from design purposes to water management planning to research. Moreover, this new dataset could allow researchers to better understand the climate of the last century in Italy and could be useful to estimate how river floods, drought periods and, more generally, the climate, might change in the future.