GEOG*3480
GIS and Spatial Analysis
Statistical Analysis of
Spatial Data
John Lindsay
Fall 2015
Readings
- Jensen and Jensen Chapter 8
Topics
- Over the next two lectures, we'll discuss:
- Descriptive Statistics
- Descriptive Spatial Statistics
- Spatial Autocorrelation
- Point Pattern Analysis
- Quadrat Analysis
- Nearest-Neighbour Analysis
- Directional Analysis
Descriptive Statistics
- Measures of central tendency
- Mode, median, and mean (\(\overset{-}x\))
- \(\overset{-}x = \frac {\underset{i=1}{\overset{N}{\Sigma}} x} {N}\)
Descriptive Statistics
- Measures of dispersion
- Variance (\(s^2\))
- Standard deviation (\(s\))
- \(s^2 = \frac {\underset{i=1}{\overset{N}{\Sigma}} (x_i - \overset{-}x)^2} {N - 1}\)
- \(s = \sqrt \frac {\underset{i=1}{\overset{N}{\Sigma}} (x_i - \overset{-}x)^2} {N - 1}\)
Descriptive Statistics
- Skewness
- Measure of the asymmetry of a distribution
- Kurtosis
- Measure of the peakedness of a distribution
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)
Descriptive Spatial Statistics
- Mean Centre
- Measure of central tendency that can be used to determine the
centre of a distribution plotted in geographic coordinates.
- Standard Distance
- Measure of dispersion of geographically distributed data.
(From: Jensen & Jensen 2013)
Tobler’s first law
- The first law of geography: “everything is related to everything else,
but near things are more related than distant things.” (Tobler, 1970)
- This simple statement forms the basis for a great deal of
geographical analysis and is concept underlying the idea of
spatial autocorrelation.
- Synonymous with the concept of spatial
dependence in geostatistics
Spatial autocorrelation
- Correlation of a variable with itself through space.
- Correlation versus spatial autocorrelation
- Actually bad news and good news
- Bad for statistical reasons
- Good because, “if geography is worth studying at all, it must be
because phenomena do not vary randomly through space”
(O'Sullivan and Unwin, 2003, pg. 28)
- Essential for spatial modelling through Interpolation
Spatial autocorrelation
- Three possibilities:
- Clustered (positive autocorrelation):
nearby locations are likely to be similar to one another.
- Random (autocorrelation near zero):
no spatial effect is discernible, and observations
seem to vary randomly through space
- Dispersed (negative autocorrelation):
observations from nearby observations are
likely to be different from one another.
Spatial autocorrelation
Moran's \(I\)
- Moran's \(I\) measures the interdependence in spatial distributions.
- Used with interval/ratio level data
- Used to detect spatial trends
- -1 ≤ \(I\) ≤ 1
- \(I\) = -1 = dispersed
- \(I\) = 0 = random
- \(I\) = +1 = clustered
Moran's \(I\)
\(I = \frac {N}{\underset{i=1}{\overset{N} \Sigma} \underset{j=1}{\overset{N} \Sigma} w_{ij}} \frac {\underset{i=1}{\overset{N} \Sigma} \underset{j=1}{\overset{N} \Sigma} w_{ij} (x_i - \overset{-} x) (x_j - \overset{-} x)}{\underset{i=1}{\overset{N} \Sigma} {(x_i - \overset{-} x)^2}}\)
Where \(\overset{-} x\) is the mean of variable \(x\); \( x_i \) is the
value at \(i\); \(j\) is a neighbour of \(i\); \( w_{ij} \) is the weight between neighbours \(i\) and \(j\).
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)
Point Pattern Analysis
- Mapped point data often exhibit distinct patterning.
- Patterns result from the spatial component of a control on the phenomenon.
- Understanding the pattern can help with understanding the controlling
forces on the phenomenon.
Point Pattern Analysis
- The patterns that we're interested in with Point Pattern
Analysis (PPA) result from the locations of individual points and not on their
attributes, for which spatial autocorrelation is more relevant.
- Quadrat Analysis and
Nearest-Neighbour Analysis the the two most common methods for PPA
Quadrat Analysis
- A quadrat is a user-defined geographic area,
usually a square or rectangle, used to measure the distribution of a spatial
phenomenon.
- Quadrat analysis can be used to test
whether the phenomenon is uniformly distributed.
- The Chi Square test is used with quadrats.
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)
Quadrat Analysis
- The value of Chi Square is compared with a table of critical values to determine
whether the points are statistically significantly different from a uniformly distribution.
- You should be thinking about the MAUP about now!
- The size, shape, and number of quadrats will impact the results
of the quadrat analysis.
Nearest-neighbour Analysis
- NNA is used in GIS to determine whether point sets are random or non-random.
- If a point set is found to be non-random then we are left with the task
of determining what controls the distribution.
- For each point in the set, find the distance to the closest neighbour.
(From: Jensen & Jensen 2013)
(From: Jensen & Jensen 2013)
Nearest-neighbour Analysis
\(d_e = \frac 1 {2 \sqrt{N/A}} = \frac 1 {2 \sqrt{p}} \)
- where \(d_e\) is the expected density (assuming random distribution);
\(N\) is the number of points; \(A\) is the study area; \(p\) is the point density.
\(NNR = \frac {Dist_{Obs}} {Dist_{Ran}} = \frac {d_a} {d_e} \)
- where \(NNR\) is the nearest-neighbour ratio;
\(Dist_{Obs}\) is the mean NN distance; \(Dist_{Ran}\) is the expected distance for a random distribution.
(From: Jensen & Jensen 2013)
Nearest-neighbour Analysis
- Warning: Our estimates of the point density is dependent on our
definition of the study area.
- If we change the extent of the study area, we change the results.
Not so clustered | Very clustered |
|
|
Nearest-neighbour Analysis
- NNA is also sensitive to the non-uniformity of underlying space.
- NNA assumes that points are free to locate anywhere.
- Consider the gap in stream channel heads below. It's the result of Lake Ontario.
Circular Data
- Geographers distinguish between directional
(0°-360°) and axial (a.k.a. oriented; 0°-180°) data.
- Wind is directional; a road is axial.
- Directional and axial data can be plotted using Rose
Diagrams, which are like circular histograms.
Circular Data
\(\overset{-}\theta = tan^{-1}(\frac{\overset{N}{\underset{i=1}{\Sigma}}{sin \theta_i}} {\overset{N}{\underset{i=1}{\Sigma}}{cos \theta_i}}) \)
- where \(\overset{-}\theta\) is the mean direction,
derived from the vector resultant.
Circular Data
\(\overset{-}R = \frac 1 N \sqrt{(\overset{N}{\underset{i=1}{\Sigma}}{cos \theta_i})^2 + ({\overset{N}{\underset{i=1}{\Sigma}}{sin \theta_i}})^2} \)
- where \(\overset{-}R\) is the standardized length of the vector resultant,
called the mean resultant length, and is
a measure of dispersion.
- 0 ≤ \(\overset{-}R\) ≤ 1, where values near 1 indicate small angular dispersion
and vice versa.
Circular Data
- Axial (oriented) data cannot easily be treated as vectors
because there is nothing to distinguish one end of the line from the other.
- An angle of 179° is very close to one of 1°.
- To solve this double all the angles, calculate the statistics with the
doubled data, and then halve the angles to get the mean direction, mean resultant
length, etc.
- 45° × 2 = 90°
- 225° × 2 = 450° = 450° - 360° = 90°