PROGYSAT#2: Run biodivMapR on S2 reflectance data

This tutorial illustrates how to compute spectral diversity indices from Sentinel-2 images acquired over Suriname territory using biodivMapR.

There are several possibilities to run biodivMapR. This tutorial describes the standard application, which processes directly reflectance data following the workflow described in Féret & de Boissieu (2019).

This tutorial assumes that Sentinel-2 data collection download and preparation was performed as described in the previous tutorial.

Set data path and parameters for biodivMapR

# libraries used in this script
library(biodivMapR)

# define data location
Input_Image_File <- '../03_RESULTS/L2A_T21NYF_A018869_20201016T141050/L2A_T21NYF_A018869_20201016T141050_Refl'
Input_Mask_File <- '../03_RESULTS/L2A_T21NYF_A018869_20201016T141050/CloudMask_Binary'
biodivMapR_Dir <- '../03_RESULTS/biodivMapR'
dir.create(path = biodivMapR_Dir,showWarnings = FALSE,recursive = TRUE)

# set parameters
Continuum_Removal <- TRUE # relevant only when using continuous spectral data
NDVI_Thresh <- 0.8        # radiometric filter: all pixels with NDVI < NDVI_Thresh are masked
Blue_Thresh <- 500        # radiometric filter: all pixels with reflectance at B02 (Blue) > Blue_Thresh are masked
NIR_Thresh <- 1500        # radiometric filter: all pixels with reflectance at B08 (NIR) < NIR_Thresh are masked
TypePCA <- 'SPCA'         # choose between PPCA, standardized PCA, or MNF (relevant with imaging spectroscopy, not Sentinel-2)
FilterPCA <- FALSE        # additional filter to clean data from outliers after spectral transformation with PCA or SPCA. !! increases processing time
window_size <- 10         # what is the window size to compute spectral diversity metrics? here, 10 x 10 pixels window corresponds to 1 ha, which is appropriate for forested areas
nbclusters <- 50          # How many spectral species will be defined when applying k-means? ideally this is adjusted and optimized based on field data. 50 clusters is a trade-off
nbCPU <- 4                # adjust depending on your computer
MaxRAM <- 0.1             # adjust depending on your computer

Perform radiometric filtering

biodivMapR intends to compute spectral diversity metrics from vegetated surfaces, and use it as a proxy for biological diversity, following the spectral variation hypothesis. The introduction of pixels unrelated to surfaces of interest (e.g. clouds, shade and more generally non vegetated pixels) may strongly increase the spectral variations, resulting in decreasing contribution of vegetation to these variations, and limiting the capacity to differentiate among vegetation types/species during the next processes.

For this reason, a very basic radiometric filtering based on blue band (detection of clouds and atmospheric scattering), NIR band (detection of shaded vegetation) and NDVI thresholding (detection of non vegetated pixels) is performed with the function perform_radiometric_filtering.


Input_Mask_File <- perform_radiometric_filtering(Image_Path = Input_Image_File, 
                                                 Mask_Path = Input_Mask_File,
                                                 Output_Dir = biodivMapR_Dir, TypePCA = TypePCA,
                                                 NDVI_Thresh = NDVI_Thresh, Blue_Thresh = Blue_Thresh,
                                                 NIR_Thresh = NIR_Thresh)

Once pixels are masked based on the radiometric filter combined with the cloud mask, the resulting image to process corresponds to this:

Perform dimensionality reduction & select components

Once radiometric thresholding is done, a dimensionality reduction is performed, based on the computation of Principal Component Analysis (PCA) or standardized PCA (SPCA), with the function perform_PCA. The components from the resulting spectral transformation are then visually selected in order to discard those containing noise or artifacts, and selecting those showing patterns related to vegetation types.

PCA_Output <- perform_PCA(Input_Image_File = Input_Image_File, 
                          Input_Mask_File = Input_Mask_File,
                          Output_Dir = biodivMapR_Dir, 
                          TypePCA = TypePCA, 
                          FilterPCA=FilterPCA,
                          nbCPU = nbCPU, MaxRAM = MaxRAM, 
                          Continuum_Removal = Continuum_Removal)

# path of the raster resulting from dimensionality reduction
PCA_Files <- PCA_Output$PCA_Files
# number of pixels used for each partition used for k-means clustering
Pix_Per_Partition <- PCA_Output$Pix_Per_Partition
# number of partitions used for k-means clustering
nb_partitions <- PCA_Output$nb_partitions
# path for the updated mask
Input_Mask_File <- PCA_Output$MaskPath

# Select components from the PCA/SPCA/MNF raster
# Sel_PC = path of the file where selected components are stored
Sel_PC <- select_PCA_components(Input_Image_File = Input_Image_File,
                                Output_Dir = biodivMapR_Dir, 
                                PCA_Files = PCA_Files,
                                TypePCA = TypePCA, File_Open = TRUE)

Components are then selected in order to discard those showing artifacts or too much noise. Components 1, 4 and 5 are then selected, and the color composite obtained from these 3 components is displayed in the following figure:

Map Spectral species

The spectral species are then defined based on the selected components. These spectral species result from the application of a K-Means clustering of the selected components. The number of clusters may impact the resulting diversity maps. Therefore, in the perspective of operational application, the optimization of the number of clusters based on ground information is critical.

Kmeans_info <- map_spectral_species(Input_Image_File = Input_Image_File, 
                                    Input_Mask_File = PCA_Output$MaskPath,
                                    Output_Dir = biodivMapR_Dir,
                                    SpectralSpace_Output = PCA_Output,
                                    nbclusters = nbclusters, 
                                    nbCPU = nbCPU, MaxRAM = MaxRAM)

Map alpha diversity

Alpha diversity maps can then be computed from the spectral species distribution.

Several alpha diversity metrics are available (currently Richness, Shannon, Simpson).

print("MAP ALPHA DIVERSITY")
Index_Alpha <- c('Shannon')
map_alpha_div(Input_Image_File = Input_Image_File, 
              Output_Dir = biodivMapR_Dir, 
              TypePCA = TypePCA,
              window_size = window_size, 
              nbCPU = nbCPU, 
              MaxRAM = MaxRAM,
              Index_Alpha = Index_Alpha, 
              nbclusters = nbclusters)

The Shannon index obtained from this process is displayed in the following figure:

Map beta diversity

Beta diversity maps can then be computed from the spectral species distribution.

Beta diversity is based on a 2-steps (or 3-steps depending on image size) process:

The Bray-Curtis dissimilarity is computed between each pair of windows (defined by window_size) in the image
Then an ordination technique (default is Principal Coordinate Analysis, PCoA) is applied on the dissimilarity matrix to transform it into a lower dimension space (3-dimensions is default value), and a 3-bands raster image is produced from the coordinates of each window in this 3D space.
The explicit computation of the matrix becomes unmanageable for large amount of windows: in this case, the dissimilarity matrix is computed on a random subset of windows, followed by PCoA on this subset matrix. Then the dissimilarity between each window and this random subset is computed, and the coordinates of each window in this PCoA space is defined based on a weighted nearest neighbors rule.

print("MAP BETA DIVERSITY")
map_beta_div(Input_Image_File = Input_Image_File, 
             Output_Dir = biodivMapR_Dir, 
             TypePCA = TypePCA,
             window_size = window_size,
             nbCPU = nbCPU, 
             MaxRAM = MaxRAM,
             nbclusters = nbclusters)

The beta diversity map obtained from this process is displayed in the following figure: