Spatial mapping of annual rainfall in the São Francisco River Basin

Precipitation is an important object of study and plays an important role in the dynamics of rainfall distribution in a region. This study investigated the spatial and temporal variation of precipitation in the São Francisco River Basin (SFRB). A historical series of data from 1989 to 2018 was analyzed, and a random function was decomposed into trend and residual components for analysis of precipitation. Interpolation techniques were used to analyze precipitation spatial behavior over time, using high-resolution precipitation maps. Our results showed that the exponential model prevailed in four periods. The findings also showed a high precipitation variability in the SFRB and enabled us to monitor precipitation behavior over the years, as well as in the different sub-regions in SFRB. Finally, important information was obtained, enabling, for instance, the identification of vulnerable areas suffering from lack of rainfall.


INTRODUCTION
Precipitation is a hydrological variable with high temporal and spatial variability; therefore, its study is central to understanding rainfall distribution dynamics within a given region. Such information is of great importance to manage water resources directly related to commercial activities and urban supply (Medeiros et al., 2019b). Moreover, precipitation variability becomes more accentuated when considering large drainage basins, such as the São Francisco River Basin (SFRB) (Silva and Clarke, 2004). In this sense, to better understand precipitation dynamics, the use of statistical modeling to interpolate non-sampled sites has arisen; however, it is not an easy task due to equipment malfunction, operating errors, or lack of adequate coverage of instruments in the region under study (Araújo et al., 2020).
SFRB covers several states and municipalities and is of paramount importance to urban and rural populations. In this drainage basin, water resources are used for consumption, irrigation, livestock, and energy generation. Many fishermen and boatmen make their living from this area, and underground minerals and vegetation drive the local economy. Furthermore, although power generation through hydroelectric plants is sustainable, water supply depends on streamflows which, in turn, depend on climatic conditions, especially a regular rainfall distribution (Pereira Filho et al., 2020).
The construction of high-resolution maps of rainfall levels in a region is of fundamental importance for hydrological planning, helping to manage water resources and reducing the risks caused by natural disasters, such as floods, erosion and droughts (Parker et al., 2019). Data on precipitation levels in a region is usually obtained from locations where irregularly distributed rainfall stations are installed. Although increasing the amount of rain gauges seems to be a more adequate solution, this implies an increase in the operating price (Pirani and Modarres, 2020). Given this, there is a need to use interpolation methods in order to provide a continuous map of precipitation, especially in regions with a low density of pluviometers.
There are several interpolation methods in the literature, especially deterministic methods, such as those based on inverse distance weighting (IDW) and geostatistical methods, such as ordinary and universal kriging. Chen and Liu (2012) used the IDW method to interpolate rainfall data in the middle Taiwan region based on information from 46 rainfall stations, verifying that this method produces more accurate estimates in the dry period than in the rainy ones. Geostatistics is one of the known statistical methodologies for obtaining estimated values in non-sampled locations, providing high-resolution maps of interpolated values for the entire study region. Wanderley et al. (2013) stated that geostatistical techniques, such as kriging, are useful to generate maps and show results matching the reality. Medeiros et al. (2019a) analyzed a data set consisting of 269 rain gauge stations in the state of Paraíba and modeled the spatiotemporal dynamics of precipitation using geostatistical techniques to obtain rainfall interpolation maps. Likewise, Barros et al. (2020) used geostatistical tools to investigate the variability of mean annual precipitation in the state of Pernambuco, building a rainfall distribution map.
Some scientific research has used interpolation techniques to model the spatial distribution of precipitation in the SFRB. Silva and Clarke (2004) built variograms, geostatistical method, to study the spatial correlation of heavy rainfall in the SFRB for 100 years of return time, featuring 100-year rainfall return maps, concluding that climate difference between the regions that make up the basin and the orographic justify the high spatial variability of rainfall. Santos et al. (2017b) performed a detailed evaluation of drought in the upper San Francisco Basin, and used the IDW interpolation technique for building maps of monthly and annual rainfall based on daily data from the TRMM satellite.
In this sense, high-resolution precipitation maps have become a useful tool for managing water resources. Thus, this study investigated the spatial variation and sought to understand the rainfall mechanism in the SFRB, identifying priority sectors for a more detailed investigation of areas with a shortage of rainfall.

MATERIAL AND METHODS
The São Francisco River Basin (SFRB) covers 8% of the Brazilian territory. It has an extension of 2,863 kilometers and a drainage area of more than 639,219 square kilometers. The São Francisco River extends from the Canastra Mountains in Minas Gerais state to the Atlantic Ocean on the border between Alagoas and Sergipe states. This vast basin integrates the Northeast, Southeast, and Midwest regions of Brazil, covering six states (Minas Gerais, Goiás, Bahia, Pernambuco, Alagoas, and Sergipe) and 508 municipalities, which hold a population of 20,330,051 inhabitants (CODEVASF, 2021). In short, the São Francisco River becomes a strategic link between the Southeast and Northeast regions of Brazil.
The main reservoirs in the SFRB for streamflow control and power generation are Três Marias in Minas Gerais, Sobradinho, Paulo Afonso, and Itaparica in Bahia, and Xingó between Alagoas and Sergipe states. The basin encompasses different biomes, namely, Cerrado, Atlantic Forest, coastal and island biomes, and Caatinga (CBHSF, 2021). Along the SFRB, socioeconomic differences can be observed, ranging from wealthy and highly-populated to extremely poor and sparsely-populated areas.
The dataset used in this study refers to rainfall in the SFRB region, divided into four hydrographic regions: Upper, Middle, Sub-Middle, and Lower São Francisco ( Figure 1a). Figure 1b there is a map of the spatial distribution of altitude in the SFRB, obtained through ordinary kriging, and it is possible to observe that the Upper SF and Lower SF regions have the highest and lowest altitude values, respectively. It is known that altitude influences the climatic conditions of a region. Cavalcanti and Côrrea (2015) report a linear relationship of 93% between altitude and rainfall in the Catimbau National Park, Pernambuco. Petrungaro and Hora (2019) state that orographic configurations significantly influence the spatial distribution of precipitation in the basin that contributes to the Juturnaíba Dam, Rio de Janeiro.
It consisted of 333 rain gauge stations evaluated for 30 years (1989 to 2018), divided into five-year intervals to investigate the temporal behavior of precipitation. The data were gathered from the HidroWeb information system from the National Water Agency (ANA), which is available at the website http://hidroweb.ana.gov.br.
Descriptive measures of minimum, maximum, mean, median, standard deviation, and variation coefficient were used for data exploratory investigation every 5 years.
Geostatistical analysis of mean annual precipitation recorded in the 333 rain gauge stations considered the decomposition of random function [ ( )] into trend [ ( )] and stochastic residual [ ( )] components, as follows Equation 1: In Equation (1), ( ) is the mean annual precipitation and is the vector of geographical coordinates. A multiple linear regression fit was proposed for the analysis of trend component ( ), considering the linear effect of latitude, altitude and longitude, and quadratic effect of longitude. After the fitting, the coefficient of determination (R²) was obtained to verify how much of the precipitation variability was explained by the trend.
The residuals obtained by the following regression ( ) = ( ) −̂( ) were submitted to sample variogram (Webster and Oliver, 2007). Variogram theoretical models (spherical, exponential, and Gaussian) were fitted to sample variogram pseudo-data. Spatial dependence index (SDI), which is the ratio between nugget effect and sill, was also obtained (Cambardella et al., 1994) and classified as follows: SDI ≤ 25%strong spatial dependence, 25% < SDI < 75% -moderate spatial dependence, and SDI ≥75%weak spatial dependence. Several works in the literature have used this index to measure the degree of spatial dependence in precipitation data in Brazil (Montebeller et al., 2007;Gamero et al., 2020).
A weighted least-squares method was used for model estimates, with weights being the ratio between the number of points and the square of their distances. In geostatistics there are different interpolation methods. In this research, the methods of Ordinary Kriging (OK) and Universal Kriging (UK) were proposed. After modeling the trend and obtaining estimates for theoretical variogram models, the next step was applying the OK and UK methods for interpolate annual rainfall values. Additionally, we use the IDW deterministic interpolation method in order to perform comparisons between these three methods. In this method, we insert a power value equal to 2, as recommended in the literature (Goovaerts 2000;Pirani and Modarres, 2020).
For selection of the interpolation method, we apply leave-one-out cross validation. This cross-validation consists of removing the value observed at a geographical coordinate ( = 1, 2, … , 333) and interpolating this value. To this end, root-mean-square error (RMSE), mean absolute error (MAE), and determination coefficient (R²) were used to select the most suitable method for the interpolation. The lowest RMSE and MAE values and the highest R 2 values were sought for selection (Moriasi et al., 2007).
After fitting and selecting the model, annual rainfall interpolation maps were built using a regular 50,000-point grid, with points equivalent to each interpolated value in an area of 12.82 square kilometers. All statistical analyses were conducted using the R software (R Core Team, 2018) and the ggplot2 (Wickham, 2009) and gstat (Pebesma, 2004) libraries. Table 1 shows the descriptive measures for accumulated annual precipitation over 30 years 5 Spatial mapping of annual rainfall in the … Rev. Ambient. Água vol. 17 n. 1, e2762 -Taubaté 2022 over a five-year interval. The analysis using information from the 333 rain-gauge stations in the SFRB showed that the highest annual mean (1143.7 mm) occurred from 1989 to 1993. A decrease in rainfall accumulation was also observed, except from 1999 to 2003. Such a decline continued over the periods and was reflected in the magnitude of the calculated means and medians. The coefficient of variation (CV) showed a variability higher than 30% for all periods, hence a high variability. Climate changes in the SFRB are related to the transitioning from a humid and semi-humid climate in the Upper São Francisco to an arid and semi-arid climate in the Sub-Middle São Francisco. The climate in the SFRB area is strongly affected by rainfall indices, with mean annual precipitation between 400 and 1,500 mm, mean annual temperature from 18 to 27°C, and a low cloud cover, thus leading to a high solar radiation incidence (Pereira et al., 2007;SINGREH, 2002). Dantas and Oliveira (2021) studied data from 39 rain-gauge stations in the SFRB, in the state of Minas Gerais, between 2014 and 2017 and found that precipitation ranged from 594 to 1730 mm, with a mean of 1,223 mm, a standard deviation of 310, maximum of 1,730 mm, and a minimum of 594 mm. Figure 2 shows the behavior of precipitation as a function of longitude, latitude and altitude for each time interval. Note that latitude and altitude have a linear relationship and longitude a quadratic behavior in relation to total annual precipitation. Thus, in adjusting the trend we will use these effects on these components.  Table 2 shows the estimates of the multiple regression model considering geographical coordinates as regressor variables. The variables were statistically significant (P < 0.01) in all six periods. The negative effects of latitude suggest that the accumulated annual precipitation decreases in the south-north direction in the study region. The high determination coefficients suggest that part of the spatial variability in accumulated precipitation within the SFRB can be explained by the trend component. Table 2. Multiple linear regression model estimates of accumulated annual precipitation. 1989-1993 1994-1998 1999-2003 2004-2008 2009-2013 2014-2018 (Intercept) 6641** 6334** 5571** 6632** 6244** 3903** Latitude -0.5575** -0.5742** -0.4776** -0.5771** -0.5313** -0.2874** Longitude -2.7376** -1.8111** -1.9654** -2.0110** -2.0965** -  Table 3 shows the MAE, RMSE, and R² values obtained in the leave-one-out crossvalidation for the interpolation methods IDW, Ordinary Kriging and Universal Kriging. In all scenarios, the methods based on geostatistics were superior to the deterministic IDW method, with the universal kriging interpolation presenting a better performance in four of the six time intervals. Ordinary Kriging presented the best interpolation in the periods comprising the years 2004 to 2013. Pirani and Modarres (2020) carried out a study in which they compared deterministic and geostatistical interpolation methods in rainfall data in the Zayandeh Rud Basin, Iran. In this study, the authors concluded that geostatistical interpolation methods produce better accuracies compared to deterministic methods, when considering a larger number of meters. The exponential variogram model prevailed in four periods (1989-1993, 1994-1998, 2004-2008, 2009-2013 and 2014-2018), followed by the Gaussian and Spherical models from 1994-1998and 1999-2003, respectively. Santos et al. (2017a studied the rainfall seasonal behavior in the Brazilian semi-arid using geostatistical tools and observed that the spherical model had the best R² values for the analyzed variables. However, high R² values do not imply that the values fitted by the model are close to the observed ones, but that there is a strong linear relationship between them. Silva Neto et al. (2020) analyzed the annual maximum daily precipitation data for Tocantins state and the variogram models fitted were spherical, exponential, and Gaussian, with the spherical model being selected by mean absolute percentage error (MAPE). Table 4 shows the highest sill estimate (74007.37) from 2009 to 2013, generating a range of 389 km. In the descriptive analysis, we verified that this was the period that presented the greatest variability of average annual precipitation, with a coefficient of variation of 37.3%. An inversion was observed from 1999 to 2003, reaching the lowest sill value (23285.90) and range of 368.81 km. A justification for this result is the fact that it was in this period that we observed one of the smallest variabilities of annual precipitation. Based on the spatial dependence index (SDI), it was noted that in all periods they presented a dependence classified as moderate, 25% < SDI < 75%, (Cambardella et al., 1994). Similar results were found in a survey that modeled the spatial distribution of annual rainfall in the state of Paraná, between 1996 and 2015, estimating a moderate dependence in 16 of the 20 years analyzed (Gamero et al., 2020).  Figure 3 shows the rainfall distribution pattern along the hydrographic regions in the SFRB over the studied periods. The Upper São Francisco has high accumulated annual precipitation levels, which decrease mainly in the Sub-Middle region towards northeastern Brazil. The lowest precipitation indices were observed from 2014 to 2018 in all regions, with no rains above 1500 mm. The annual precipitation levels in the Sub-Middle decreased over time, with annual precipitation levels below 300 mm in the period from 2014 to 2018 in almost the entire region. Assis et al. (2015) analyzed precipitation in the Sub-Middle of the SFRB from 1964 to 2014 and observed that, according to the rainfall anomaly index (RAI), the only dry years were between 1990 and 2000 and there was no positive RAI in 2012, which was classified as a dry and extremely dry year. Furthermore, the Sobradinho Reservoir, with 320 kilometers of extension, is in the Sub-Middle of the SFRB, specifically within the municipalities of Sobradinho and Casa Nova, in the state of Bahia. This reservoir has a water surface of 4,214 km 2 and a storage capacity of 34.1 billion cubic meters at its nominal elevation of 392.50 m, making it the largest artificial lake in Brazil (CHESF, 2021). The useful volume (%) in the reservoir from 1999 to 2018, considering 5-year intervals, had a monthly mean of 67% of the useful volume from 2004 to 2008. However, this volume has decreased in the last decade, reaching a mean monthly percentage of 22% between 2014 and 2018 (ONS, 2021). Thus, our results reflect this worrying trend, as the period from 2014 to 2018 was the most critical regarding precipitation in the analyzed historical series.   Figure 3 makes it clear that regions with high altitudes tend to have high levels of precipitation. In addition, other factors contribute to this variability in precipitation, such as the climate in the SFRB and the proximity to the ocean in the region of the mouth of the São Francisco River. Upper São Francisco, in the part located in the state of Minas Gerais, has a high altitude, with a climate classified as humid. The middle São Francisco, as it represents the largest physiographic region of the basin, has different types of climate, with a climate similar to that of the Upper São Francisco; but the levels of precipitation decrease when entering the semiarid region, and the climates are then classified as dry and sub-humid. semi-arid. The lowermiddle of the basin is recognized as the driest part, with annual precipitation ranging from 350 to 800 mm and with an average annual temperature of 27ºC, being then defined as a semi-arid and arid climate. In the Lower São Francisco, due to its proximity to the ocean, the climate is a little milder, with an annual temperature of 25ºC and average annual precipitation ranging from 800 to 1,300 mm, with a climate ranging from semi-arid to sub-humid (Hermuche, 2002).

CONCLUSIONS
Geostatistical techniques allowed us to investigate the precipitation variability throughout the sub-regions within the São Francisco River Basin over 30 years. The sub-periods analyzed showed high variability, and the accumulated annual precipitation decreased over the years evaluated. The Sub-Middle region stands out as the most vulnerable in terms of rainfall shortages, which was mostly evident from 2014 to 2018. The geographical coordinates were significant throughout the analyzed period, with latitude having a negative effect and signaling an annual decrease in precipitation towards the south-north direction in the São Francisco River Basin. In addition, the hydrographic regions that make up the basin have peculiar characteristics, thus justifying the difference in the spatial distribution of precipitation between these regions.