Estimation of Palm Oil Biomass Carbon from Sentinel-2 Image using the Random Forest Classification Method

Oil palm is a carbon absorbing plant that stores it in biomass. To monitor biomass, especially in large areas of oil palm plantations, remote sensing data can be used combined with machine learning algorithms. The aims of this study were to estimate oil palm biomass carbon according to age class using non-destructive methods, as well as analyze the relationship between the reflectance of Sentinel 2 image oil palm and oil palm biomass carbon, and estimate the distribution of oil palm biomass carbon using a learning algorithm random forest (RF) engine. Measurement of biomass at the study site was carried out non-destructively using stratified purposive sampling. The closeness of the relationship between Sentinel 2 image and measured oil palm biomass is assessed from the coefficient of determination of the regression equation. Estimation of the distribution of biomass carbon in all research locations was carried out using the RF method with the Dzetsaka classification tool. The results showed that the highest biomass carbon stock was obtained in oil palm aged 20 years with an average of 59.6 tons C/ha, while the lowest biomass carbon stock was obtained in oil palm aged 17 years with an average of 32.9 tons C/ha. The reflectance value of Sentinel-2 image on the blue, green, red, and near infrared channels has a positive correlation to biomass carbon from oil palm with an R² greater than 0.8. The classification of biomass carbon with the RF approach applied to Sentinel-2 image gives an adequate accuracy value of 76.40% in the combination of the proportion of training and testing data 60% : 40%.


Introduction
Biomass is an important key in assessing an ecosystem (Chapin et al., 2002). Biomass information is needed to estimate and predict ecosystem productivity, carbon stock, nutrient distribution, and fuel accumulation (Brown, 2002). Carbon stock can usually be estimated from the biomass present in vegetated areas such as forests. In addition to forest land cover, plantations are also vegetated areas that are able to absorb carbon so that they are useful in reducing CO2 concentrations in the atmosphere due to increased greenhouse gas (GHG) concentrations. One of the plantations that has the potential to contribute quite high carbon stocks is oil palm plantations with a total area of 14.62 million hectares in Indonesia (BPS, 2022). Oil palm during its life cycle captures and stores CO2 for photosynthetic purposes, and the results are stored as biomass carbon reserves ranging from 53.40 -80.09 tons C/ha (Maulana, 2010;Sukariawan et al., 2019). Because oil palm is an annual plant that has a long life cycle and accumulates its biomass as a large amount of carbon stock. Thus, oil palm is considered capable of contributing to reducing GHG, especially CO2 (Tanaka and Makino, 2007).
Measurement of tree standing biomass was carried out both destructively and nondestructively. Measurement using the destructive method is carried out by harvesting all parts of the plant including the roots, drying them and weighing the weight of the biomass (Sutaryo, 2009). The destructive method is also known as the biomass harvesting method and is often used in research on forest ecosystems (Gibs et al., 2007). The non-destructive method is carried out without any damage to the media to be measured by measuring the height or diameter of the tree and using an allometric equation to extrapolate the biomass (Sutaryo, 2009;).
Pixel clustering algorithms have an important role in producing good classification accuracy and there are many approaches that can be used for mapping and monitoring purposes, and broadly divided into two, namely supervised and unsupervised. Several machine learning (ML) algorithms, such as support vector machine (SVM) and random forest (RF), have been successfully implemented for remote sensing applications (Gislason et al., 2006). The RF and SVM algorithms are considered relatively easy-to-implement methods that can handle learning tasks with a small number of training data sets and produce good accuracy (Sheykhmousa et al., 2020). Research on mapping/monitoring biomass with a combination of remote sensing data and machine learning has been done previously in Usmadi and Pribadi (2021) research using WorldView-2 image and RF classification. The results of this study show a high accuracy value in the RF classification with an R² value of 0.83. This study used a combination of Sentinel-2 satellite image and the RF classification method to estimate palm biomass carbon in oil palm plantations at PTPN VIII Cimulang, Bogor Regency. The aims of this study were to a) estimate oil palm biomass carbon according to age class using non-destructive methods, b) analyze the relationship between oil palm reflectance derived from Sentinel-2 image and oil palm biomass carbon, and c) estimate the distribution of oil palm biomass carbon using the RF machine learning method.

Literature Review
The development of remote sensing technology can be used to support extensive vegetation monitoring activities. Sensor system technology and digital signal processing algorithms make it easier to retrieve information on the state of the earth faster and more accurately (Sudiana and Diasmara, 2008). Estimation of biomass has been done quite a lot using remote sensing techniques by utilizing multispectral image (Dwinta and Murti, 2016;Astriani et al., 2018;Sukariawan et al., 2019;Golindira, 2022). One of the images whose data can be used to estimate biomass through remote sensing is the Sentinel-2 satellite image. Astriani et al. (2018) utilized the vegetation index derived from Sentinel-2 and Landsat 8 OLI image for monitoring oil palm biomass at PTPN VII Lampung Selatan. In this study, Sentinel-2 image is better at estimating biomass compared to Landsat 8 OLI because Sentinel-2's spatial resolution is higher than Landsat 8 OLI. The accuracy of estimation of oil palm carbon stock using Sentinel-2 image with a vegetation index at this location is low because the number and distribution of samples is limited.

Research sites
Data on the aboveground biomass of oil palm was obtained using non-destructive methods where parameter measurements for the allometric equation were measured directly in the field at PT Perkebunan Nusantara VIII Cikasungka, Afdeling Cimulang, Bogor Regency

Materials
Tools for collecting data include the Global Positioning System (GPS), rangefinder, measuring tape, field knife, and camera. The tools for data analysis include a set of computers containing Quantum GIS 10.18 software with the Dzetsaka classification tool plugin, and Google Earth. The materials used in this study consisted of Sentinel-2A level 1C multispectral images, national digital elevation model (DEMNAS) data, and plantation block maps.

Biomass Measurement
Measurement of oil palm biomass was carried out in sample plots in 11 plantation blocks with oil palm ages of 17, 18, 19 and 20 years. Each oil palm block has at least three to four category of slope. The sample plots measuring 100 m x 20 m were determined using the stratified purposive sampling method, where the plot locations were stratified according to age class and slope category. In this way there are 39 sample plots with a size of 100 m x 20 m. For each plot of 100 m x 20 m it was further divided into 10 sub-plots measuring 10 m x 20 m and only 3 sub-plots were taken, for a total of 117 sub-plots. Each sub-plot coordinates were determined using a GPS device and its biomass was measured using a non-destructive method. Information regarding plant age, blocks, slope classes, and the number of plots and sub-plots can be seen in Table 1, while the distribution of sample points is presented in Figure 2.

Figure 2. Sample points distribution
Each oil palm tree in the 10m x 20m sample sub-plot was measured at diameter at breast high (DBH) with a measuring tape on the trunk at a height of about 1.3 m from the ground. After measuring the diameter or circumference of the tree trunk, proceed with measuring the height of the oil palm to the top using a rangefinder. Oil palm biomass is estimated using the allometric equation (Lubis, 2011)

Atmospheric Correction of Sentinel-2 Image
Atmospheric conditions affect the pixel values recorded on the satellite sensor in the form of atmospheric disturbances, therefore an atmospheric correction is needed to minimize this influence. Atmospheric correction of Sentinel-2 Image Level 1C was carried out in QGIS software using the DOS 1 (Dark Object Subtraction 1) approach in the Semi-Automatic Classification Plugin (Congedo, 2021).

Correlations between Palm Biomass and Reflectance
In order to study the close relationship between Sentinel 2 image and oil palm biomass as a result of measurements in the field, a linear regression equation was created with reflectance as the independent variable and biomass as the dependent variable. The coefficient of determination (R²) is used to learn the positive relationship between the two variables, where the reflectance of the blue, green, red, and near-infrared bands is used as the independent variable, while oil palm biomass is the dependent variable. The linear regression equation and the coefficient of determination are as follows:

Estimation of Oil Palm Biomass Distribution
Estimating the distribution of oil palm biomass from Sentinel 2 image at the study site using the RF approach in the QGIS application with the Dzetsaka classification tool plugin (https://github.com/nkarasiak/dzetsaka/#readme). The first step is to run the Train algorithm feature:  Input Sentinel 2 images with band stacking (bands: blue, green, red and near infrared) on the Input raster.  Input training sample on the Input layer, which is located the same as the location of the sample sub-plots in the field, which is represented by a polygon measuring 3 pixels × 3 pixels. Next, select the attribute table containing biomass data in the Field.  Select Random-Forest on Select algorithm to train  Determine the amount of validation data: 25%, 30% (default), 40% or 50% of the training sample on Pixel (%) to keep for validation. RF classification is done with 4 combinations comparison of data training and testing (validation), namely 75%:25%, 70%:30%, 60%:40% and 50%:50% because the RF results are very sensitive to the size of the training data.  Save the RF model file in Output model.  Select Run in the Train algorithm tool dialog.
If the model accuracy is good enough, then run the classification tool feature:  Input Sentinel 2 image with band stacking (bands: blue, green, red and near infrared) on Input raster.  Input the output training model file from the Train algorithm tool on the model learned.  Save the output file in Output raster.  Select Run in the Classification tool dialog.

Biomass Carbon in Palm Oil
Biomass is the total weight of a living tree above the ground including tree roots and is expressed in tons of dry weight per unit area (Brown, 1997). Based on the measurement, the highest oil palm biomass carbon stock was found at the age of 20 years with an average of 59.6 tons C/ha, while the lowest biomass stock was at the age of 17 years with an average of 32.9 tons C/ha (Table 2). These results indicate that as the age of oil palm increases, the biomass also increases. At the age of more than 10 years, oil palms experience an optimal growth, therefore an increase in the biomass can occur (PKT, 2018). According to Yulianto (2015), carbon stock will increase with increasing age. Each body part of the tree is a component that stores biomass such as roots, stems, midribs, leaves, and fruit. Furthermore, Catur and Sidiyasa (2006) stated that the biomass in each part of the tree will increase proportionally with increasing tree dimensions. The relationship between plant age and biomass is presented graphically in Figure 3.

Relationship between Biomass Stock and Reflectance
Biomass has a strong relationship with plant growth so that the biophysical characteristics of plants will be different for each growth including the spectral response to electromagnetic wavelengths. The results showed that the relationship between biomass stock and reflectance values in the blue, green, red and near infrared bands of Sentinel-2 image had a positive and strong relationship with a coefficient of determination greater than 0.9 (Figure 4). In general, every same object has a different sensitivity in reflecting, absorbing and transmitting electromagnetic energy that comes at different wavelengths. Figure 4, shows that the reflectance of the blue and red bands is slightly higher than the reflectance of the green band, while the reflectance of the near infrared band is much higher than that of the blue, green, and red bands.
Green plants absorb a lot of blue and red wavelengths, while green wavelength is reflected by plants a little more than blue and red wavelengths. According to Amliana et al.

215
(2016) the blue and red spectrum have a low reflectance value because a lot of energy is absorbed by plants to carry out photosynthesis, but the green spectrum is reflected more so that the reflectance of the green band tends to be greater than that of the blue and red bands. The reflectance of the near infrared band as presented in Figure 4 is much higher than that of the blue, red, and green bands. Plants are very sensitive to the near infrared wavelength and reflect the maximum which causes the reflectance value in this bands to be high. The reflectance of the near infrared band is much higher than that of the visible spectrum due to the internal structure of the leaf, which acts as an excellent reflector of scattering of near infrared wavelength. Therefore, vegetation can be easily identified on this spectrum.

Oil Palm Biomass Distribution
The distribution of biomass is divided into 4 classes according to the results of measurements of sample sub plots in the field based on age class and estimated for the entire study area using the RF approach. The intervals of the 4 classes of oil palm biomass in the PTPN VIII area are 29 -36, 36 -43, 43 -50, and 55 -63 tons C per ha (Table 3). The accuracy of the RF classification model in grouping pixels into 4 biomass classes using a combination of different amounts of training and testing data shows no difference and the value is greater than 75%, except for the combination of 50% training and 50% testing data. Model accuracy of RF classification with the amount of training and testing data 75% : 25%, 70% : 30%, 60% : 40%, and 50% : 50% respectively are 76.2%, 74.7%, 76 .4%, and 74.4%. In this study, the classification accuracy of greater than 75% was obtained when using training data greater than 60%, while using training data less than 60%, the accuracy of the model obtained was slightly lower than 75%. This shows that the small amount of training data produces weak estimates because the model is limited in learning the grouping or mapping functions on small training data sets. The limited amount of training data makes it difficult for the model to fully capture the characteristics and variations associated with each class. According to Danoedoro (2015) accuracy results can be affected by the large number and distribution of samples and the selection of representative training samples so that a large number of samples is more reliable than a small one. Furthermore, Millard and Richardson (2015) stated that RF results are very sensitive to the size of the training data set. In addition to as much as possible, the training data used in RF classification must also be randomly distributed or made proportional to the actual classes in the image and must have small spatial autocorrelation. The distribution of oil palm biomass carbon using the RF approach with different proportions of the amount of training and testing data is presented in Figure 5. In general, the distribution patterns of the 4 classes of biomass carbon in the study area show similarities with slightly different areas (Table 4). In addition, the results of the mapping of biomass carbon are indicated by the presence of disturbances in the appearance of salt and pepper, where 1 pixel appears as one class in most (pixel groups) of biomass carbon classes. This happens because the RF approach in estimating the distribution of biomass in this study is a pixel-based classification. According to Blaschke et al. (2014) pixel-based classification does not treat homogeneous pixels as 1 homogeneous object and does not group these pixels into an image object. Based on this study, biomass carbon was dominated by age class 4 (55 -63 tons C/ha), followed by class 1 (29 -36 tons C/ha), class 3 (43 -50 tons C/ha) and class 2 (36 -43 tons C/ha). The largest distribution and area of the biomass carbon class of 55 -63 tons C/ha was obtained from a combination of training and test data 60% : 40% followed by 75% : 25%, 70% : 30%, and 50% : 50%. This sequence actually follows the order of accuracy of the RF model from the highest (60% : 40% combination) to the lowest (50% : 50% combination).

Conclusion
Non-destructive measurement results showed that the highest biomass carbon stock was found in 20-year-old oil palm with an average of 59.6 tons C/ha, while the lowest biomass carbon stock was found in 17-year-old oil palm with an average of 32.9 tons C/ha. These results indicate that in the age range of 17-20 years as the age of the oil palm increases, its biomass also increases.
The reflectance of the blue, green, red, and near-infrared bands from Sentinel-2 image has a strong positive relationship with measured oil palm biomass carbon with an R² greater than 0.8. This shows that all these bands can be used to estimate and monitor biomass in the oil palm plantation area of PTPN VIII Cimulang, Bogor Regency.
Classification of biomass carbon using the RF method applied to Sentinel-2 image provides sufficient accuracy of 76.40% on the combination of training data and testing 60% : 40%.