Press 'o' to toggle the slide overview and 'f' for full-screen mode.

Choose the theme in which to view this presentation:

Black - White - League - Sky - Beige - Simple
Serif - Blood - Night - Moon - Solarized

## GEOG*3480

### Statistical Analysis of Spatial Data

John Lindsay
Fall 2015

• Jensen and Jensen Chapter 8

### Topics

• Over the next two lectures, we'll discuss:
• Descriptive Statistics
• Descriptive Spatial Statistics
• Spatial Autocorrelation
• Point Pattern Analysis
• Nearest-Neighbour Analysis
• Directional Analysis

### Descriptive Statistics

• Measures of central tendency
• Mode, median, and mean ($$\overset{-}x$$)

• $$\overset{-}x = \frac {\underset{i=1}{\overset{N}{\Sigma}} x} {N}$$

### Descriptive Statistics

• Measures of dispersion
• Variance ($$s^2$$)
• Standard deviation ($$s$$)

• $$s^2 = \frac {\underset{i=1}{\overset{N}{\Sigma}} (x_i - \overset{-}x)^2} {N - 1}$$

• $$s = \sqrt \frac {\underset{i=1}{\overset{N}{\Sigma}} (x_i - \overset{-}x)^2} {N - 1}$$

### Descriptive Statistics

• Skewness
• Measure of the asymmetry of a distribution

• Kurtosis
• Measure of the peakedness of a distribution
(From: Jensen & Jensen 2013)

(From: Jensen & Jensen 2013)

### Descriptive Spatial Statistics

• Mean Centre
• Measure of central tendency that can be used to determine the centre of a distribution plotted in geographic coordinates.

• Standard Distance
• Measure of dispersion of geographically distributed data.

(From: Jensen & Jensen 2013)

### Tobler’s first law

• The first law of geography: “everything is related to everything else, but near things are more related than distant things.” (Tobler, 1970)

• This simple statement forms the basis for a great deal of geographical analysis and is concept underlying the idea of spatial autocorrelation.

• Synonymous with the concept of spatial dependence in geostatistics

### Spatial autocorrelation

• Correlation of a variable with itself through space.
• Correlation versus spatial autocorrelation

• Actually bad news and good news
• Bad for statistical reasons
• Good because, “if geography is worth studying at all, it must be because phenomena do not vary randomly through space” (O'Sullivan and Unwin, 2003, pg. 28)
• Essential for spatial modelling through Interpolation

### Spatial autocorrelation

• Three possibilities:
• Clustered (positive autocorrelation): nearby locations are likely to be similar to one another.

• Random (autocorrelation near zero): no spatial effect is discernible, and observations seem to vary randomly through space

• Dispersed (negative autocorrelation): observations from nearby observations are likely to be different from one another.

### Moran's $$I$$

• Moran's $$I$$ measures the interdependence in spatial distributions.
• Used with interval/ratio level data
• Used to detect spatial trends
• -1 ≤ $$I$$ ≤ 1
• $$I$$ = -1 = dispersed
• $$I$$ = 0 = random
• $$I$$ = +1 = clustered

### Moran's $$I$$

$$I = \frac {N}{\underset{i=1}{\overset{N} \Sigma} \underset{j=1}{\overset{N} \Sigma} w_{ij}} \frac {\underset{i=1}{\overset{N} \Sigma} \underset{j=1}{\overset{N} \Sigma} w_{ij} (x_i - \overset{-} x) (x_j - \overset{-} x)}{\underset{i=1}{\overset{N} \Sigma} {(x_i - \overset{-} x)^2}}$$

Where $$\overset{-} x$$ is the mean of variable $$x$$; $$x_i$$ is the value at $$i$$; $$j$$ is a neighbour of $$i$$; $$w_{ij}$$ is the weight between neighbours $$i$$ and $$j$$.
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)

### Point Pattern Analysis

• Mapped point data often exhibit distinct patterning.

• Patterns result from the spatial component of a control on the phenomenon.

• Understanding the pattern can help with understanding the controlling forces on the phenomenon.

### Point Pattern Analysis

• The patterns that we're interested in with Point Pattern Analysis (PPA) result from the locations of individual points and not on their attributes, for which spatial autocorrelation is more relevant.

• Quadrat Analysis and Nearest-Neighbour Analysis the the two most common methods for PPA

• A quadrat is a user-defined geographic area, usually a square or rectangle, used to measure the distribution of a spatial phenomenon.

• Quadrat analysis can be used to test whether the phenomenon is uniformly distributed.

• The Chi Square test is used with quadrats.
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)

• The value of Chi Square is compared with a table of critical values to determine whether the points are statistically significantly different from a uniformly distribution.

• You should be thinking about the MAUP about now!

• The size, shape, and number of quadrats will impact the results of the quadrat analysis.

### Nearest-neighbour Analysis

• NNA is used in GIS to determine whether point sets are random or non-random.
• If a point set is found to be non-random then we are left with the task of determining what controls the distribution.
• For each point in the set, find the distance to the closest neighbour.
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)

### Nearest-neighbour Analysis

$$d_e = \frac 1 {2 \sqrt{N/A}} = \frac 1 {2 \sqrt{p}}$$

• where $$d_e$$ is the expected density (assuming random distribution); $$N$$ is the number of points; $$A$$ is the study area; $$p$$ is the point density.

$$NNR = \frac {Dist_{Obs}} {Dist_{Ran}} = \frac {d_a} {d_e}$$

• where $$NNR$$ is the nearest-neighbour ratio; $$Dist_{Obs}$$ is the mean NN distance; $$Dist_{Ran}$$ is the expected distance for a random distribution.
(From: Jensen & Jensen 2013)

### Nearest-neighbour Analysis

• Warning: Our estimates of the point density is dependent on our definition of the study area.
• If we change the extent of the study area, we change the results.
Not so clusteredVery clustered

### Nearest-neighbour Analysis

• NNA is also sensitive to the non-uniformity of underlying space.
• NNA assumes that points are free to locate anywhere.
• Consider the gap in stream channel heads below. It's the result of Lake Ontario.

### Circular Data

• Geographers distinguish between directional (0°-360°) and axial (a.k.a. oriented; 0°-180°) data.
• Wind is directional; a road is axial.

• Directional and axial data can be plotted using Rose Diagrams, which are like circular histograms.

### Circular Data

$$\overset{-}\theta = tan^{-1}(\frac{\overset{N}{\underset{i=1}{\Sigma}}{sin \theta_i}} {\overset{N}{\underset{i=1}{\Sigma}}{cos \theta_i}})$$

• where $$\overset{-}\theta$$ is the mean direction, derived from the vector resultant.

### Circular Data

$$\overset{-}R = \frac 1 N \sqrt{(\overset{N}{\underset{i=1}{\Sigma}}{cos \theta_i})^2 + ({\overset{N}{\underset{i=1}{\Sigma}}{sin \theta_i}})^2}$$

• where $$\overset{-}R$$ is the standardized length of the vector resultant, called the mean resultant length, and is a measure of dispersion.
• 0 ≤ $$\overset{-}R$$ ≤ 1, where values near 1 indicate small angular dispersion and vice versa.

### Circular Data

• Axial (oriented) data cannot easily be treated as vectors because there is nothing to distinguish one end of the line from the other.

• An angle of 179° is very close to one of 1°.

• To solve this double all the angles, calculate the statistics with the doubled data, and then halve the angles to get the mean direction, mean resultant length, etc.
• 45° × 2 = 90°
• 225° × 2 = 450° = 450° - 360° = 90°