Defining management zones for a papaya culture considering soil chemical attributes by fuzzy c-means

In precision agriculture, one of the major problems is the need for a dense sampling grid. A solution to this problem has been the generation of management zones and the correlation between them. This work aims at defining management zones for a papaya culture using the fuzzy C-means grouping method, based on the following soil attributes: phosphorus, calcium, magnesium and potassium; and base saturation. The work was conducted in an approximately 1.2 ha area planted with a commercial culture of Carica papaya L. cultivated in typical Argisol found on the coastal terraces, in the municipality of São Mateus, ES, Brazil. The management zones were defined by the proposed method, based on the following analyses: soil chemical parameters P, Ca, Mg, and base saturation (V), analyzed together; soil chemical parameters P, Ca, Mg, and V, analyzed separately; and, the following combinations of soil chemical attributes Ca-Mg, Ca-V, Ca-Mg-V, P-Ca, P-Mg, P-V, P-Ca-Mg, P-Ca-V, Mg-V, and P-Ca-Mg-V. The generated management zones facilitated an understanding of the spatial variability of the attributes in the study area and the observed similarities between management zones generated from different attributes. In addition, the data were influenced especially by the P and V% attributes.


INTRODUCTION
Agriculture has been undergoing major technological transformations, among them the use of precision agriculture, which can benefit the crop since it identifies areas with potential for producing better quality fruits and assists in understanding the factors determining that.It provides careful and detailed soil and crop management, recognizing differences in soils and plants, allowing adequate management practices for each planting area (Saiz-Rubio and Rovira-Más, 2020).Understanding the spatial distribution of soil attributes is the first step to establish adequate management practices to optimize agricultural productivity and minimize environmental damage.Geostatistics enables us to verify spatial dependence and visualize the attributes of the studied areas (Rodrigues et al., 2012;Delalibera et al., 2012).
In order to determine different management practices with variable rates of input, it is necessary to generate management zones that aim to reduce sampling, resulting in maps representative of the area, with a combination of factors limiting productivity and quality, to which a uniform dose of inputs can be applied (Ramos et al., 2017).
Data-grouping techniques that allow using a set of prominent factors for crop development and for identifying crop variabilities have been used to generate these management areas.Among these mathematical methods, fuzzy C-means for the use of a set of attributes correlated with the nutritional variability of the study area (Rodrigues Junior et al., 2011;Alves et al., 2013).
Rodrigues Junior et al. (2011) defined management zones in a 2.1 ha area cultivated with Coffea arabica L. cv.Catuaí, in Paula Cândido, MG, based on foliar analysis performed with a chlorophyll sensor using the C-means and fuzzy C-means methods, and concluded that both grouping methods had similar results regarding management areas.Alves et al. (2013) established management zones in no-tillage corn and soybean areas based on the spatial variability of soil apparent electrical conductivity and organic matter, using fuzzy C-means.The best results for defining the management zones according to soil texture attributes were obtained from organic matter or electrical conductivity maps and, for the chemical attributes, from organic matter or altitude and organic matter maps.Taylor et al. (2003) used soil productivity and electromagnetic induction data for defining management zones by the C-means algorithm and reported on obtaining coinciding zones.They also proposed a more targeted sampling method for reducing the number of soil samples from 27 to 4.
The essence of the fuzzy methodology is the acceptance of uncertainty that must be modeled and treated by mathematical methods and was introduced by Zadeh (1965).It has been used in several study areas due to its ability to detect similarities among individual data from a dataset to aid in the decision-making process.The fuzzy set expresses the degree to which an element belongs to a set using a pertinence function.Thus, considering that management zones guide area sampling, the spatial distribution of soil and plant attributes, and the varying dosage of inputs, the correlation of the management zones with soil chemical attributes can lead to a better understanding of the spatial distribution of soil attributes and sampling optimization.
Therefore, the objective of this work was the preliminary mapping of management zones in an area planted with a papaya crop based on the P, K, Ca, Mg, and base saturation (V%) chemical attributes determined by chemical analysis of soil samples, using the Fuzzy C-means (FCM) method.

MATERIAL AND METHODS
The data used in this work originated from the experimental work for the master thesis conducted by Sturião et al. (2018) at the Vovô Délio farm, in São Mateus, ES, located at the approximate coordinates E = 407241 m and N = 7905948 m, with 40 m average altitude, and spindle 24.According to Embrapa (2013), the soil in the area is typical of the coastal terraces of this Brazilian region, characterized as Argissolo Amarelo Distrófico of low activity, with smooth undulating relief, and approximately 3.0% slope.
The largest extensions of coastal terraces in the country are found in the Brazilian states of Bahia and Espírito Santo (Jacomine, 1996), which are responsible for more than 80% of the papaya produced.However, the presence of cohesive layers in the soils acts as a counterpoint to some of the main potentialities (Cintra, 2005), meaning that it is important to select more suitable areas and adjust management practices to soils with cohesive horizons subsurface to guarantee the longevity and productivity of the papaya (Salgado and Costa, 2003).These soils generally have a very dense cohesive B horizon, due to the presence of a higher content of dispersed microclay (Corrêa et al., 2008;Lima Neto et al., 2010).
The cohesive sandy texture soil was conventionally prepared and cultivated with papaya of the Golden THB variety.After plant sexing, soil sampling was performed.The deformed samples were collected using an auger at 0 -0.20 m depth, in the crown projection, from two points equidistant approximately 0.20 m, in the fertilizer application point in the planting line direction.The two samples were placed in plastic bags, forming a single sample per sampling point, identified and sealed.
Soil samples were collected from a 1.2 ha plot, in which a sampling point of approximately 5.0 m² was defined, totaling 129 sample points (Figure 1).All samples were georeferenced using a GPS receiver (TechGeo®, Model GTR G2 geodesic) and post-processed.
The chemical variables phosphorus (P), calcium (Ca), magnesium (Mg), and potassium (K) from soil samples collected between 0 -0.20 m depth were analyzed and the base saturation (V%) value was calculated.The data analysis first step was to perform a descriptive statistical analysis of the stored data to evaluate the distribution of the variables throughout the experiment by calculating the position and dispersion parameters such as mean, median, maximum and minimum values, upper and lower quartiles, standard deviation, and the coefficients of variation, asymmetry, and kurtosis.
The data underwent geostatistical analysis to check for spatial dependence and, when present, quantify the degree, using the adjustment of Matheron classical semivariogram (Félix et al., 2016).For each variable, the best fit for the experimental model (Santos et al., 2019) among the semivariogram models was determined by geostatistical analysis.The Vesper® software was used and the model was chosen based on the lowest residual sum of the squares (RSS) and the highest coefficient of determination (R²).Additionally, the spatial dependency index (SDI) was calculated by [C1/(C0 + C1)]*100, where C1 is the structural variance, and C0 + C1 is the threshold.The SDI was classified as follows: weak (SDI ≤ 25%), moderate (25% < SDI ≤ 75%) and strong (SDI > 75%) (Silva et al., 2013).
The spatial dependence was verified for all variables, and the spatial variability map was elaborated using kriging interpolation, which according to Ferreira et al. (2017), is a set of regression analysis techniques that minimizes the estimated variance based on a previous model, taking into account the stochastic dependence between data spatial distribution.
The spatial variability maps were generated using interpolations with a 0.5 x 0.5 m pixel size for the coordinates E and N and the same contour polygon, defined in Figure 1.This standardization guarantees that all generated maps have the same number of pixels and can overlap.According to Linden (2009), grouping analysis, or clustering, has the purpose of separating objects into groups, based on the characteristics that the objects have.The basic idea is to place objects that are similar according to some predetermined criteria in the same group.There is the possibility of generating the grouping, considering the relevance of each value associated with each class, with diffuse clustering being recommended for this use (Yang, 1993;Bezdek and Pal, 1992).This approach allows an individual to be associated with all clusters using a membership function (Zadeh, 1965).Thus, fuzzy sets offer the advantage of manifesting each individual partially to all groups.Among fuzzy algorithms, the most popular is Fuzzy C-means (FCM).This approach to FCM classification minimizes the within-class sum of squares.
Rev. Ambient.Água vol.19, e2959 -Taubaté 2024 In the study of management zones, among the analyzed variables, those that presented spatial dependence were analyzed by the FCM algorithm of the FuzMe® software.The FCM algorithm was applied to group the data to minimize intraclass variance and maximize interclass variance, thus minimizing the objective function Jm of a characteristic function μij (μijª[0,1]) and a class center ci.
This method was used to divide the data groups used to determine the management zones into two and three classes (Ricardo et al., 2016).The objective function used by FCM is described as Equation 1: (1) Where: m is a classification index of the fuzzy degree.
The variable m > 1 defines the allowed distance between the considered point and the centroid being calculated.Thus, the greater the value, the more elements are assigned to this group (Rodrigues Junior et al., 2011).
A preliminary test with five different m values (1.1, 1.3, 2, 5, and 10) was carried out.The results showed that the defined management zones remained constant with the changing m values, as already observed by Rodrigues Junior et al. (2011) and, therefore, m = 1.1 was adopted.
The iteration was recalculated until the difference between Jm actual and Jm anterior was equal to or less than 0.0001.Ten random initiations were made with different initial centroids.
The FCM algorithm was implemented following the steps: 1) Initialize the values of the elements μij of the membership matrix with random numbers between 0 and 1, satisfying the conditions described in Equations 1; 2) Calculate the centroids ci (i = 1, 2,3 ... c); 3) Update the pertinence matrix values; 4) If the number of maximum iterations initially defined has been reached, then the algorithm ends, otherwise, return to Step 2.
Further, the Euclidean distance was used for calculating the distance d between the i-th center of the class and the j-th point of the sample.To optimize the number of classes, the value of two indexes, the Fuzzy Performance Index (FPI) and Modified Partition Entropy (MPE), were determined according to Equations 2 and 3 (Metwally et al., 2019), respectively.

]
(3) After delimiting the groups by the FCM method, the ArcGis® 10.2 software was used for spatializing the data and generating thematic maps.A visual analysis of the maps generated allowed us to choose the maps with the most similar management zones for the different soil attributes used.

Descriptive and geostatistical analysis
Table 1 shows the descriptive statistics of the following soil attributes P, Ca, Mg, K and V% and their range (SR) for papaya culture.
The Pearson correlation coefficients (p < 0.05) between soil chemical attributes (Table 2) were null (0.0 ≤ r ≤ 20.0%) for "P and Mg", "P and K", "P and V", "K and Ca", "V and K"; low (20.0 ≤ r ≤ 40.0%) for "Ca and P", "K and Mg", and moderate (40.0 ≤ r ≤ 60.0%) for "Mg and Ca", "V and Ca", "V and Mg".Further, the soil base saturation (V) indicates how many of the negative charges are occupied by Ca2+, Mg2+ and K+, in relation to the exchange points of the acidic cations H+ and Al³+, being directly correlated with the soil bases.According to Sturião (2018), the soil sampling location affects the correlations found in the results, mainly due to the sampling location and application and homogenization of fertilizers and the quantitative relationship between attributes.
All attributes were submitted to geostatistical analysis.The parameters of the adjusted semivariograms and the coefficient of determination of the cross-validation for each attribute are shown in Table 3, indicating that K did not show spatial dependence, since the semivariogram presented pure nugget effect (PNE).The exponential model was the best fit for the P semivariogram based on the highest coefficient of determination (R1 2 ) while the spherical model was the best for the Ca, V and Mg variables, similar to the result reported by Menezes et al. (2004), for the latter variable.

Management areas by the FCM method
Figures 2 and 3 show the management areas generated by the FCM method from the P, Ca, Mg and V attributes and their combinations.Tables 4 and 5 show the mean values of the three data classes determined for all analyzed soil chemical attributes and their combinations.

Descriptive and geostatistical analysis
A general analysis of the data shows that the levels of the attributes in the sampled soil were above the upper level of the ideal range, probably due to the fertilization done in the area, leaving the area with excess nutrients in the soil.
Table 1 shows that the mean and median values, measurements that characterize the center of the frequency distribution, are close for the Ca, Mg, K and V variables, indicating a trend Rev. Ambient.Água vol.19, e2959 -Taubaté 2024 toward data normal distribution.Thus, we assumed normal data distribution for these variables.
Furthermore, the Mg, K and V data tend to symmetry, which is verified by the asymmetry and kurtosis coefficients.The asymmetry coefficient close to zero characterizes a symmetrical distribution, and the kurtosis coefficient close to zero for P, Mg, K, and V, characterizes a mesokurtic (normal) distribution; however, Ca did not have Cs and Ck near zero, characterizing a distribution that does not tend to normality.Montezano et al. (2006) and Montanari et al. (2005) studied the variability of chemical attributes and reported no normal distribution for most attributes studied, which contradicts the results found in this study.
Table 1 shows that 60% of the attributes studied present an asymmetric distribution to the right, with a mean greater than the median while the kurtosis coefficient (Ck) indicates platykurtic distribution for 60% of the attributes.
The data variability analyzed by the coefficient of variation (CV) and classified indicates that (Gomes et al., 2018): • mean CV for Ca, Mg, K and V, totaling 80% of the attributes studied (12 < CV < 62%).Similar results were found by Menezes et al. (2004), Silveira et al. (2000) and Silva et al. (2003), in the 0.0 -0.2 m depth and by Souza et al. (2003) at the 0.6 -0.8 m depth; • high CV for P (CV > 62%), representing 20% of the attributes.Cavalcante et al. (2007) reported that higher CV is due to fertilization residual effects and application method, usually in the sowing line and without complete homogenization, unlike liming, which was distributed throughout the area.Menezes et al. (2004) studied the spatial variability of pH, Ca, Mg and V% of the soil in different reliefs cultivated with sugarcane in Guariba, SP, and found mean CV values for Ca, Mg and V that corroborate the results of this work.Barbieri et al. (2008) investigated the spatial variability of chemical attributes of Argisol regarding variable application rate of inputs in different reliefs and reported a low coefficient of variation only for base saturation in the concave area, at the 0 -0.20 m depth.The other attributes had average or high coefficients of variation in the two depths studied.
The mean soil fertility was analyzed based on the ideal range for the papaya crop in Espírito Santo (Table 1).Thus, based on these classes, we have mean values above the upper value of the ideal/satisfactory range (SR) for P (20.12 mg.dm -3 ) and K (67.44 mg.dm -3 ).The mean value of Mg is below the lower value of the ideal range while the other attributes are within the ideal range.
The greater mobility of P in the soil of the study area is possibly explained by its low clay content in the 0 -0.20 m depth, while the predominant kaolinite and irrigation favored the diffusion and availability to the absorption by the plants (Sturião, 2018).Also, the samples were collected in the crown projection in the fertilization site.
When we study the geostatistics, spherical and exponential models have been the most suitable to describe the behavior of semivariograms of soil and plant attributes (Menezes et al., 2004;Barbieri et al., 2008;Rodrigues Junior et al., 2011;Alves et al., 2013).Analyzing the Semivariogram, the pure nugget effect (PNE) indicates a lack of spatial dependence for the shortest distance sampled (5.7 m), so K behaves randomly in the considered distance.
The range reflects the spatial continuity of the attribute that at distances greater than the range behaves randomly.Therefore, the range value is desired to be as large as possible, since it conveys greater precision to data interpolation, i.e., greater spatial continuity.The Ca, Mg and V attributes had a range of approximately 22 m and adjusted to the spherical model, allowing us to infer the same spatial distribution pattern because they have close ranges and adjusted to the same theoretical model.
Table 2 shows that the elements Ca and V had strong SDI.According to Silva et al. (2013), with moderate and high R1 2 , respectively, but presented low values of R2 2 .All adjustments presented R1 2 higher than 50%, and 75% had R1 2 greater than 75%.
The cross-validation study identifies areas with no spatial dependence, indicating the need for new samplings (Rodrigues Junior et al., 2011), showing the importance of cross-validation to indicate the robustness of the chosen model to estimate the values in places where no information is available.
From the adjusted semivariograms, the attributes with spatial dependence (P, Ca, Mg, V) were interpolated by the ordinary kriging method to determine their values in non-sampled locations.Subsequently, a visual analysis of the area using the calculated attributes allowed the selection of different management zones/areas according to their specific needs.
Based on the results of R2 2 , all chemical variables (P, Ca, Mg and V) that fit the theoretical semivariograms were used to define the management zones except for K, due to the pure nugget effect presented.Therefore, the management zones were defined using the FuzMe® software and FCM algorithm based on the following soil chemical parameters: P, Ca, Mg, and V, analyzed separately, and the following combinations Ca-Mg, Ca -V, Ca -Mg -V, P -Ca, P -Mg, P -V, P -Ca -Mg, P -Ca -V, Mg -V, P -Ca -Mg -V as well.
The ideal number in each proposed problem was analyzed to determine the number of classes into which the dataset was divided, and consideration was given as to whether the defined classes were technically and economically feasible.According to Máquina et al. (2020), the optimal number of classes for each management zone is determined by the minimum values of the Fuzziness Performance Index (FPI) and Modified Partition Entropy (MPE).Thus, the FPI and MPE indexes were calculated for 2 and 3 classes with similar results, but slightly lower FPI and MPE values were obtained for 3 classes.Likewise, Molin and Castro (2008), Yan et al. (2008), Xin-Zhong et al. (2009) and Morari et al. (2009) in similar studies also obtained an optimum number of classes equal to three.Thus, this study recommends using three classes to define management zones.

Management zones obtained by the FCM method
According to Loisel et al. (2018), small management areas are difficult to handle, considering subclasses due to technical and economic limitations.When increasing the number of classes, some pixels remain unchanged, while others are reclassified due to the weak Euclidean proximity with the classes allocated and the classification is independent of the variable coordinates.Pedroso et al. (2010) highlighted the disregard for the variable coordinates during the classification process as a limitation of the fuzzy C-means algorithm.
Tables 2, 3 and 4 show that, using the FCM method for grouping using more than one attribute and between different combinations, the mean values of the attribute centroids were maintained in most cases, including the classes.Another similarity verification criterion consisted of visually inspecting the maps (Figure 1 and 2) to analyze the location and composition of the management zone classes aiming at investigating their relationships in different attribute combinations.Similarity was observed among the following management zones "Ca, Ca -Mg"; "V, Ca -V, Ca -Mg -V"; "P -Ca, P -Mg, P -Ca -Mg, P -Ca -V"; and "P -Ca -Mg -V, P, Ca -V." The results suggest that the P and V% attributes guide, for the most part, the delimitation of the management zones.V% was to be expected since it depends on the sum of bases (Ca, Mg and K).
Different data sources can be used for delimiting management zones that are economically more feasible than the total sampling of the area, and easier for applications at variable rates.
It is noteworthy that although the studied area is relatively small (1.2 ha) for generating Rev. Ambient.Água vol.19, e2959 -Taubaté 2024 management zones compared to other works found in the literature and has low variability due to adequate fertilization, the results showed that it was possible to generate management zones.Several attributes can be used to define management zones in an area; it is suggested that isolated or combined attributes be used that are able to reflect the behavior of the variables which one desires to characterize.The higher the correlation between the attributes used to generate the management zones and the object to be characterized, the greater the reliability of the maps generated with the attributes.The management classes suggest areas that have similar behavior regarding the studied attributes, and are therefore markers of the sampling points for the fertility study of the area.

CONCLUSIONS
The generated management zones facilitate our understanding of the spatial variability of the attributes in the study area and to correlate similarities between management zones generated from different variables.However, it was observed that P and V% attributes particularly influenced the data, indicating a degree of coincidence between the generated zones that suggest a more targeted sampling, reducing the number of samples.
Additionally, comparing the results of this work with the work proposed by Sturião (2018), the sampling of the area becomes more directed, requiring fewer sampling points, besides decreasing the number of attributes studied, confirming the effectiveness of the method.

Figure 1 .
Figure 1.Sample mesh elaborated for soil sampling to determine its attributes.

Figure 2 .
Figure 2. Grouping map according to soil attributes P, Ca, Mg, V%, Ca-Mg, Ca-V%, and Ca-Mg-V%, by the Fuzzy C-means algorithm.

Table 1 .
Analysis of descriptive statistics of P, Ca, Mg, K, and V%.

Table 3 .
Semivariogram parameters adjusted for the studied attributes.

Table 4 .
Means of the three data classes for the P, Ca, Mg and V% attributes, by the FCM algorithm.

Table 5 .
Means of the three data classes for the Ca-Mg, Ca-V, P-Ca, P-Mg, P-V, and Mg-V attributes, by the FCM algorithm.