Journal article

Clustering flood events from water quality time-series using Latent Dirichlet Allocation model

To improve hydro-chemical modeling and forecasting, there is a need to better understand flood-induced variability in water chemistry and the processes controlling it in watersheds. In the literature, assumptions are often made, for instance, that stream chemistry reacts differently to rainfall events depending on the season; however, methods to verify such assumptions are not well developed. Often, few floods are studied at a time and chemicals are used as tracers. Grouping similar events from large multivariate datasets using principal component analysis and clustering methods helps to explain hydrological processes; however, these methods currently have some limits (definition of flood descriptors, linear assumption, for instance). Most clustering methods have been used in the context of regionalization, focusing more on mapping results than on understanding processes. In this study, we extracted flood patterns using the probabilistic Latent Dirichlet Allocation (LDA) model, its first use in hydrology, to our knowledge. The LDA method allows multivariate temporal datasets to be considered without having to define explanatory factors beforehand or select representative floods. We analyzed a multivariate dataset from a long-term observatory (Kervidy-Naizin, western France) containing data for four solutes monitored daily for 12 years: nitrate, chloride, dissolved organic carbon, and sulfate. The LDA method extracted four different patterns that were distributed by season. Each pattern can be explained by seasonal hydrological processes. Hydro-meteorological parameters help explain the processes leading to these patterns, which increases understanding of flood-induced variability in water quality. Thus, the LDA method appears useful for analyzing long-term datasets.


Related material