GEOG*3420 Remote Sensing of the Environment (W19)

Lab Assignment 4

Introduction

This lab exercise introduces students to unsupervised image classification using the k-Means and modified k-Means classification algorithms. Students will also learn about the use of band ratios to enhance classification accuracy by reducing scene variation in illumination due to topography.

Readings and Resources

The following materials, combined with your textbook, can be used as background materials and to help in answering the assignment questions.

Before you begin

IMPORTANT INFORMATION: You will need to download a fresh copy of the latest version of the WhiteboxTools library before you begin this assignment. Changes have been made to the library since you completed Lab 2 and use of an older version will likely result in incorrect results. In addition, you will need to download the data associated with this lab assignment from the GEOG*3420 CourseLink site. These data, as usual, are quite large and you will need to consider data storage solutions (e.g. a dedicated USB memory stick for the course).

What you need to hand in

You will hand in a printed report summarizing the answer to each of the questions in the following exercise along with the necessary colour images. Notice that you will need to have paid your lab fee to have printing privileges in the Hutt building computer labs.

Part 1: Normalized Difference Indices (Band Ratios)

The process of dividing the brightness values in one band of a multispectral dataset by a second band image is known as Band Ratioing. Ratioing is one of the most common image transformations in remote sensing because:

  1. It can be used to emphasize certain aspects of the shape of the spectral signatures of different land covers, and;

  2. It can be used to de-emphasize the effects of variable illumination within a scene.

The first of these characteristics makes band ratioing particularly useful for creating image products that are derived from the original multispectral dataset and are suited to identifying specific land-cover types. This is the reason why so many of the remote-sensing based vegetation indices are essentially ratios of two bands. However, it is the second property of band ratios listed above that we are most interested in for this lab.

The purpose of an image classification (see Part 2) is to identify meaningful land-cover classes within a scene. Differentiating between deciduous and coniferous forest classes is meaningful. It is, however, not particularly meaningful to differentiate between coniferous forest that is situated on a well-illuminated, sun-facing slope versus the in-shadow opposing slope side. However, these two illumination-varying classes areas will have apparently different radiometric properties in a multispectral data set. Band ratio images can be used to lessen the effect of uneven illumination caused by varying topography, based on the assumption that the ratio of two bands for areas of equivalent land-cover type is the same regardless of what direction the slope faces. Ratioing can help to reduce other causes of varying illumination as well, including the shadows cast by clouds.

After you have downloaded the data associated with this lab assignment from the CourseLink page, decompress (unzip) the data into a working directory that you have created to dedicate to this assignment. Open the contents of this folder and examine the files contained within. These data are the 30 m resolution bands, in GeoTIFF image format, of a subsection of a Landsat 8 scene acquired June 21, 2016. These data should contain six bands (i.e. bands 2 through 7) of image data, for an area of Southern Ontario between Kitchener-Waterloo, Cambridge, and Guelph. If you are unfamiliar with the southern Ontario area, you may also want to explore the area using Google Maps to familiarize yourself with the type of terrain and land-use/land-covers in the area of the image.

We will calculate a number of common normalized difference indices, a type of standardized band ratio, to serve as inputs for a later image classification performed in Part 2 of the lab assignment. Calculate the following image-derived products:

  1. The normalized difference vegetation index: NDVI = (NIR - RED) / (NIR + RED)

  2. The normalized difference water index version 1: NDWI1 = (NIR - SWIR1) / (NIR + SWIR1)

  3. The normalized difference water index version 2: NDWI2 = (GREEN - NIR) / (GREEN + NIR)

  4. The normalized burn ratio: NBR = (NIR - SWIR2) / (NIR + SWIR2)

  5. The normalized blue-red ratio: NBRR = (BLUE - RED) / (BLUE + RED)

For Landsat 8, the spectral designations in the above equations relate to bands in the following way:

BLUE = Band 2

GREEN = Band 3

RED = Band 4

Near infrared (NIR) = Band 5

Shortwave infrared 1 (SWIR1) = Band 6

Shortwave infrared 2 (SWIR2) = Band 7

Write a script to use the WhiteboxTools' **NormalizedDifferenceRatio** tool to create each of the above normalized difference indices. Be sure to clip the distribution tails by 0.5%, and use a correction value of 0.0.

1.1. Include screenshots of each of the five indices, being sure to label each carefully and indicating the minimum and maximum values of each. (15 marks)

1.2. Include a copy of your Python script used to create the five normalized difference indices. (2 marks)

1.3. Compare each of the five indices to the natural-colour composite image. To what extent was the use of the band-ratioing technique able to lessen the apparent effects of cloud shadows in the image? (2 marks)

Part 2: Multi-spectral Image Classification

Multi-spectral image classification involves two distinct activities. The first activity is the recognition of categories of real-world features in the landscape, e.g. 'deciduous forests'. The second activity in all multi-spectral classifications involves labeling pixels within an image data set. With supervised classification methods, the user first identifies real-world land-covers, examines the images to find training areas to typify the 'spectral signatures' of these features and then uses the signatures to label all of the remaining pixels in the scene. Unsupervised classification techniques rely on statistical clustering methods (e.g. k-Means clustering) to find groups, or clusters, of similar pixels with respect to their spectral properties. After this initial clustering phase, the user then has the task of relating the statistically defined spectral classes to real-world land-covers. Both approaches to image classification require a substantial amount of human effort and judgement to identify land-covers within the image scene. The difference is that with supervised classification techniques this human component occurs early on in the process, while unsupervised classification methods require effort after the automated classification step in determining the physical meaning of each statistically defined cluster. Generally, supervised classification techniques are preferred because the image analyst has greater control over the classification (e.g. I may know that I want to classify water, urban, forest, agriculture), whereas, the analyst has very little control over the clusters that are created by unsupervised methods. However, unsupervised classification techniques are useful as an initial exploratory tool and when the analyst is unfamiliar with data or the landscape being analyzed.

k-Means Clustering For Unsupervised Image Classification

Unsupervised classification is described in the course text:

Readings: Mather and Koch (2011), Chapter 8 Classification, Computer Processing of Remotely-Sensed Images, pp 233-240.

It is strongly recommended that you do this reading before continuing.

Use WhiteboxTools' KMeansClustering tool to perform a k-Means unsupervised classification on the five normalized difference indices (NDIs) calculated in Part 1. Mather and Koch (2011) describe how this classification technique works in detail in Chapter 8. Enter the five NDIs as the input images. Call the output image kmeans_class.tif and the output HTML report kmeans_report.html. Set the number of clusters (--classes) to 12, the maximum number of iterations (--max_iterations) to 20. The percent class change threshold (--class_change), which determines when the operation converges, can be set to 1%. Initialize the cluster locations on the diagonal (--initialize="diagonal"). This will place the initial cluster locations randomly along the multi-dimensional diagonal line. Importantly, this means that each run of this tool will result in a unique output. The minimum class size (--min_class_size) should be 1000.

When the tool is complete, several outputs will be created. In addition to the output class image, there will be a classification report. This report includes tables for: 1) the number of pixels in each spectral class, 2) the cluster centroid vectors (i.e. the center point for the class in spectral space), and 3) the the Euclidean distance in spectral space between each cluster centroid and the other clusters. This last table can be useful information for identifying classes that are similar and are candidates for further clustering, i.e. reclassification. Lastly, the report includes a Convergence Plot which describes the number of pixels in the scene that were re-assigned to different clusters with each iteration of the k-mean clustering operation.

2.1. Include the classification report in your final hand-in. (1 mark)

2.2. Based on the Cluster Centroid Distance Analysis table, are there any groups of the 12 clusters that can potentially be joined due to similarity? (2 mark)

2.3. Did the clustering procedure require fewer than the maximum number of iterations to reach convergence, i.e. to reach a point of stability where less than the required 1% of pixels were re-assigned to different clusters? That is, how many iterations were required before clusters became stable? (1 mark)

The classification image that results from running the tool is far from a work of art. In fact, it’s likely down-right ugly (it does have a great deal of inside beauty though!). This is because each of the spectral classes that have been identified have been assigned a random colour in the classification image. And because there was likely quite a few classes in this image, the image appears rather chaotic.

2.4. Relate each of the spectral classes (i.e. the statistically defined clusters) that were identified by the unsupervised classification technique into information classes and present this information in a table (12 marks). Information classes are categories of ground cover. For example, information classes may include categories such as: shallow water, cloud, cloud shadow, agricultural, urban-commercial, urban-residential, forest, forest in-shadow, etc. These classes can be as specific as are needed.

Note, relating spectral classes to information classes can be a challenging and time-consuming task, so be sure to give yourself enough time. Here are some tips:Some spectral classes will correspond to obvious information classes, but others will not. Some may be really difficult to figure out.

  1. Overlay the true-colour composite on the classification image. Then you can zoom into various features and toggle between the two images.

  2. Several classes are likely associated with mixed pixels, i.e. pixels that contain a mixture of ground cover in them. For example, given how narrow roads are relative to the spatial resolution of the imagery, there are likely a lot of pixel that are a mixture of road and road-side vegetation. As such, there may be spectral classes that are associated with the ‘pure’ spectral pattern of certain land-covers, and other related spectral classes that are associated with mixtures. The dendrogram can help you to discern this.

  3. Some spectral classes my be associated with more than one information class and some information classes may overlap with more than one spectral class.

  4. Some information classes are more spectrally variable than others. For example, residential areas contain a wide mixture of land-covers compared to the more homogenous spectral characteristics that you would expect to find in a forested area. An 'agricultural' class may be impossible in areas where there is a great deal of variation in crop type.

  5. Use the Cluster Centroid Distance Analysis table to help you identify similar spectral classes; they are likely to have similar information classes. These are also candidate classes for further merging/reclassing. For example, if there are spectral classes associated with three different crop types, these could be classed into one 'agricultural' class.

Now perform a second k-means clustering operation, using the raw band images and the same input parameters. When it has completed, display the classification image and compare it to the classification that was based on the normalized difference indices.

2.5. Examining the areas of cloud shadow and comparing between the two classification outputs, to what extent did the use of band-ratios as inputs in the original clustering operation reduce the impact of the variable illumination in these cloud obscured sites? (2 marks)