Methodological approaches for imputing missing data into monthly flows series

  • Michel Trarbach Bleidorn Departamento de Engenharia Ambiental. Universidade Federal do Espírito Santo (UFES), Avenida Fernando Ferrari, n° 514, CEP: 29075-910, Vitória, ES, Brazil.
  • Wanderson de Paula Pinto Núcleo Integrado de Pesquisa em Engenharia Ambiental. Faculdade da Região Serrana (FARESE), Rua Jequitibá, n° 121, CEP: 29645-000, Santa Maria de Jetibá, ES, Brazil.
  • Isamara Maria Schmidt Departamento de Engenharia Ambiental. Universidade Federal do Espírito Santo (UFES), Avenida Fernando Ferrari, n° 514, CEP: 29075-910, Vitória, ES, Brazil.
  • Antonio Sergio Ferreira Mendonça Departamento de Engenharia Ambiental. Universidade Federal do Espírito Santo (UFES), Avenida Fernando Ferrari, n° 514, CEP: 29075-910, Vitória, ES, Brazil.
  • José Antonio Tosta dos Reis Departamento de Engenharia Ambiental. Universidade Federal do Espírito Santo (UFES), Avenida Fernando Ferrari, n° 514, CEP: 29075-910, Vitória, ES, Brazil.

Abstract

Missing data is one of the main difficulties in working with fluviometric records. Database gaps may result from fluviometric stations components problems, monitoring interruptions and lack of observers. Incomplete series analysis generates uncertain results, negatively impacting water resources management. Thus, proper missing data consideration is very important to ensure better information quality. This work aims to analyze, comparatively, missing data imputation methodologies in monthly river-flow time series, considering, as a case study, the Doce River, located in Southeast Brazil. Missing data were simulated in 5%, 10%, 15%, 25% and 40% proportions following a random distribution pattern, ignoring the missing data generation mechanisms. Ten missing data imputation methodologies were used: arithmetic mean, median, simple and multiple linear regression, regional weighting, spline and Stineman interpolation, Kalman smoothing, multiple imputation and maximum likelihood. Their performances were compared through bias, root mean square error, absolute mean percentage error, determination coefficient and concordance index. Results indicate that for 5% missing data, any methodology for imputing can be considered, recommending caution for arithmetic mean method application. However, as the missing data proportion increases, it is recommended to use multiple imputation and maximum likelihood methodologies when there are support stations for imputation, and the Stineman interpolation and Kalman Smoothing methods when only the studied series is available.

Keywords: Doce river, imputation, missing data.


Published
05/04/2022
Section
Papers