Press 'o' to toggle the slide overview and 'f' for full-screen mode.

Choose the theme in which to view this presentation:

Black - White - League - Sky - Beige - Simple
Serif - Blood - Night - Moon - Solarized

## GEOG*3480

### Data Quality Part 1

John Lindsay
Fall 2015

• Jensen and Jensen Chapter 4

### Lecture Outline

• Introduction
• Accuracy and Precision
• Types of Error in Geospatial
• Error Propagation
• The Ecological Fallacy
• Modifiable Areal Unit Problem

### Introduction

• Error is a natural component of all data and geospatial data are no different.
• The level of error in a particular data set may limit its suitability for certain applications.
• Error propagates throughout a GIS workflow, i.e. the level of error in the output is greater than the input.
• This is particularly salient in the era of free geospatial data that are shared so easily over the Internet.
• As GIS has become more ubiquitous, GIS users now have widely varying level of experience and backgrounds...the issue of error has become increasingly important.

• Provides a means by which we can store and communicate information about the data quality and errors and the accuracy and precision of the instruments or methods used to collect the data.

• Most geospatial data formats allow for metadata which are often stored as ASCII data in a markup format (e.g. XML). Some data formats (e.g. the GeoTIFF) have built-in means of communicating metadata (i.e. 'tags').
• Most GIS software has the ability to create and edit metadata.
• The problem lies in the fact that GIS do not enforce metadata requirements...analysis tools will create new files that do not require the user to create metadata.

### Accuracy and Precision

• Both are important aspects of data quality and are related.
• Accuracy is the extent to which both attribute and positional data correspond to reality.
• Precision is how exact some measurement is.
• In some instances, this can be expressed as the number of significant decimal places.
• Accuracy is how true something is, precision is how exact we are in communicating it.

### Types of Error in Geospatial

• Attribute Error
• Positional Error
• Topological Error
• Temporal Accuracy

### Attribute Error

• These are generally caused by blunders in data entry or by misclassifications.
• How can we assess the level of attribute error? We have to perform a spot-check on a sample of the data.
• The sample must be representative of the population.

### Error Matrix and Associated Stats

• The info gathered from sampling is usually placed in an error matrix, i.e. a table relating observed vs predicted values.
• Overall Accuracy: number of correctly classified values expressed as a percentage of the total number of data points
• Producer's Accuracy: probability that a reference sample will be correctly classified
• User's Accuracy: probability that a classified value (map value) actually matches the reference data
• Kappa Index of Agreement: a measure of classification accuracy that accounts for chance agreement

### What about interval ratio data?

• KIA is useful for categorial (classified) data, but what about interval/ratio.
• Root-Mean-Square-Error (RMSE)

### Positional Error

• Positional Accuracy measures how close the geographic coordinates of a mapped feature are to reality.
• Includes horizontal and vertical components.
• Calculated by comparing mapped x,y,z values to those measured using a more accurate measuring device.
• Is scale dependent.
• Problems arise when you combine data of different scales in a GIS.

### Temporal Accuracy

• Refers to how up-to-date a geospatial database is.
• How ephemeral is the phenomena being represented?
• Types of temporal events
• Continuous, e.g. Temperature varies continuously
• Majoritive, go on most of the time, e.g. Land use change
• Sporatic, e.g. Storms occur infrequently
• Unique (one-off), e.g. Creation of a flood plain