Part 2: Multi-spectral Image Classification

Multi-spectral image classification involves two distinct activities. The first activity is the recognition of categories of real-world features in the landscape, e.g. 'deciduous forests'. The second activity in all multi-spectral classifications involves labeling pixels within an image data set. With supervised classification methods, the user first identifies real-world land-covers, examines the images to find training areas to typify the 'spectral signatures' of these features and then uses the signatures to label all of the remaining pixels in the scene. Unsupervised classification techniques rely on statistical clustering methods (e.g. k-Means clustering) to find groups, or clusters, of similar pixels with respect to their spectral properties. After this initial clustering phase, the user then has the task of relating the statistically defined spectral classes to real-world land-covers. Both approaches to image classification require a substantial amount of human effort and judgement to identify land-covers within the image scene. The difference is that with supervised classification techniques this human component occurs early on in the process, while unsupervised classification methods require effort after the automated classification step in determining the physical meaning of each statistically defined cluster. Generally, supervised classification techniques are preferred because the image analyst has greater control over the classification (e.g. I may know that I want to classify water, urban, forest, agriculture), whereas, the analyst has very little control over the clusters that are created by unsupervised methods. However, unsupervised classification techniques are useful as an initial exploratory tool and when the analyst is unfamiliar with data or the landscape being analyzed.

k-Means Clustering For Unsupervised Image Classification

Unsupervised classification is described in the course text:

Readings: Mather and Koch (2011), Chapter 8 Classification, Computer Processing of Remotely-Sensed Images, pp 233-240.

It is strongly recommended that you do this reading before continuing.

Use WhiteboxTools' KMeansClustering tool to perform a k-Means unsupervised classification on the five normalized difference indices (NDIs) calculated in Part 1. Mather and Koch (2011) describe how this classification technique works in detail in Chapter 8. Enter the five NDIs as the input images. Call the output image kmeans_class.tif and the output HTML report kmeans_report.html. Set the number of clusters (--classes) to 12, the maximum number of iterations (--max_iterations) to 20. The percent class change threshold (--class_change), which determines when the operation converges, can be set to 1%. Initialize the cluster locations on the diagonal (--initialize="diagonal"). This will place the initial cluster locations randomly along the multi-dimensional diagonal line. Importantly, this means that each run of this tool will result in a unique output. The minimum class size (--min_class_size) should be 1000.

When the tool is complete, several outputs will be created. In addition to the output class image, there will be a classification report. This report includes tables for: 1) the number of pixels in each spectral class, 2) the cluster centroid vectors (i.e. the center point for the class in spectral space), and 3) the the Euclidean distance in spectral space between each cluster centroid and the other clusters. This last table can be useful information for identifying classes that are similar and are candidates for further clustering, i.e. reclassification. Lastly, the report includes a Convergence Plot which describes the number of pixels in the scene that were re-assigned to different clusters with each iteration of the k-mean clustering operation.

2.1. Include the classification report in your final hand-in. (1 mark)

2.2. Based on the Cluster Centroid Distance Analysis table, are there any groups of the 12 clusters that can potentially be joined due to similarity? (2 mark)

2.3. Did the clustering procedure require fewer than the maximum number of iterations to reach convergence, i.e. to reach a point of stability where less than the required 1% of pixels were re-assigned to different clusters? That is, how many iterations were required before clusters became stable? (1 mark)

The classification image that results from running the tool is far from a work of art. In fact, it’s likely down-right ugly (it does have a great deal of inside beauty though!). This is because each of the spectral classes that have been identified have been assigned a random colour in the classification image. And because there was likely quite a few classes in this image, the image appears rather chaotic.

2.4. Relate each of the spectral classes (i.e. the statistically defined clusters) that were identified by the unsupervised classification technique into information classes and present this information in a table (12 marks). Information classes are categories of ground cover. For example, information classes may include categories such as: shallow water, cloud, cloud shadow, agricultural, urban-commercial, urban-residential, forest, forest in-shadow, etc. These classes can be as specific as are needed.

Note, relating spectral classes to information classes can be a challenging and time-consuming task, so be sure to give yourself enough time. Here are some tips:Some spectral classes will correspond to obvious information classes, but others will not. Some may be really difficult to figure out.

  1. Overlay the true-colour composite on the classification image. Then you can zoom into various features and toggle between the two images.

  2. Several classes are likely associated with mixed pixels, i.e. pixels that contain a mixture of ground cover in them. For example, given how narrow roads are relative to the spatial resolution of the imagery, there are likely a lot of pixel that are a mixture of road and road-side vegetation. As such, there may be spectral classes that are associated with the ‘pure’ spectral pattern of certain land-covers, and other related spectral classes that are associated with mixtures. The dendrogram can help you to discern this.

  3. Some spectral classes my be associated with more than one information class and some information classes may overlap with more than one spectral class.

  4. Some information classes are more spectrally variable than others. For example, residential areas contain a wide mixture of land-covers compared to the more homogenous spectral characteristics that you would expect to find in a forested area. An 'agricultural' class may be impossible in areas where there is a great deal of variation in crop type.

  5. Use the Cluster Centroid Distance Analysis table to help you identify similar spectral classes; they are likely to have similar information classes. These are also candidate classes for further merging/reclassing. For example, if there are spectral classes associated with three different crop types, these could be classed into one 'agricultural' class.

Now perform a second k-means clustering operation, using the raw band images and the same input parameters. When it has completed, display the classification image and compare it to the classification that was based on the normalized difference indices.

2.5. Examining the areas of cloud shadow and comparing between the two classification outputs, to what extent did the use of band-ratios as inputs in the original clustering operation reduce the impact of the variable illumination in these cloud obscured sites? (2 marks)