2. Basic Statistics#
2.1. Topics#
Basic statistics on attributes
Basic statistics on location
Readings: Lloyd Chp. 3, 4
2.2. Statistics on Attributes (Chp. 3)#
Statistics on attributes
Univariate statistics
Multivariate statistics
Inferential statistics
2.3. Descriptive Statistics for Attributes#
Measures of central tendency
Mean, median, mode
Measures of dispersion
Range, minimum, maximum
Variance and standard deviation
The extent to which values vary from the mean
Coefficient of variation (CV)
2.4. Multivariate statistics– Correlation Coefficient#
How two attributes are related to each other
Correlation Coefficient
A (normalized) measurement of the covariance of two variables
Has a value between -1 and 1
Formula: \(r=\frac{\sum_{i=1}^{n}(y_{i}-\overline{y})(z_{i}-\overline{z})}{\sqrt{\sum_{i=1}^{n}(y_{i}-\overline{y})^{2}}\sqrt{\sum_{i=1}^{n}(z_{i}-\overline{z})^{2}}}\)
Variable 1 (y) |
Variable 2 (zᵢ) |
(yᵢ-ȳ) |
(zᵢ-ž) |
(yᵢ-ȳ)×(zᵢ-ž) |
(yᵢ-ȳ)² |
(zⱼ-ž)² |
|---|---|---|---|---|---|---|
12 |
6 |
-20.11 |
-27.00 |
543.00 |
404.46 |
729.00 |
34 |
52 |
1.89 |
19.00 |
35.89 |
3.57 |
361.00 |
32 |
41 |
-0.11 |
8.00 |
-0.89 |
0.01 |
64.00 |
12 |
25 |
-20.11 |
-8.00 |
160.89 |
404.46 |
64.00 |
11 |
22 |
-21.11 |
-11.00 |
232.22 |
445.68 |
121.00 |
14 |
9 |
-18.11 |
-24.00 |
434.67 |
328.01 |
576.00 |
56 |
43 |
23.89 |
10.00 |
238.89 |
570.68 |
100.00 |
75 |
67 |
42.89 |
34.00 |
1458.22 |
1839.46 |
1156.00 |
43 |
32 |
10.89 |
-1.00 |
-10.89 |
118.57 |
1.00 |
SUM |
3092 |
4114.89 |
3172.00 |
Calculation:
Numerator: 3092
Denominator: \(\sqrt{4114.89} \times \sqrt{3172.00} = 64.1474 \times 56.3205 = 3612.8136\)
\(r = 3092 / 3612.8136 = 0.8558\)
2.5. Multivariate statistics–Linear Regression#
Relationship between dependent and independent variable(s)
Ordinary least square regression
Least square approach
Minimize the sum of squared errors (residuals)
Coefficients are calculated by solving a set of equations
Equation: \(z = a * y + b\)
a — slope
b — intercept
Example: \(z = 8.8711 + 0.7514y\)
\(r^2 = 0.7325\)
Goodness of fit
Coefficient of determination
Percent of variation in z explained by y
\(r^2 = (\text{correlation coefficient})^2\)
2.6. Geospatial Statistics on Attributes#
Conventional correlation and linear regression
Two variables measured on the same subjects
Variables related by subjects
Geospatial correlation and linear regression
Variables related by location (and time)
Two variables at the same location
Example: temperature ~ elevation regression
Geographically weighted regression
Two variables nearby a location
Spatial auto-correlation
How a variable is related to itself in space (and time)
Same attribute at two different locations
2.7. Descriptive Statistics for Locations (Chp. 7.2)#
Where is the center of a geographic distribution
How features are distributed around their center
Why measure geographic distributions?
Know the general location of geographic features
Compare the distributions of different features/phenomena
Track changes of distribution in time
Modified from one dimension (attribute) to two dimensions
2.8. Mean Center#
The mean of X and Y coordinates
Minimizing the sum of the squares of the distance to each point
Center of mass/gravity
Sensitive to extreme points
Formula: \(\overline{X} = \sum X_i / N, \overline{Y} = \sum Y_i / N\)
2.9. Central Feature and Median Center#
Central feature
The most centrally located feature
Shortest total distance from all the other features
Median center
Minimizing the sum of the distances to each point
Less influenced by outliers
No simple formula (iterative process)
2.10. Weighted Mean Center#
Center influenced by the attribute at a location
Center of gravity
Formula: \(\overline{X}_w = \frac{\sum W_i X_i}{\sum W_i}, \overline{Y}_w = \frac{\sum W_i Y_i}{\sum W_i}\)
Example:
Points: (5, 0), (2, 4), (5, 8) with weights 2, 4, 4
Mean center (4, 4)
Weighted mean center (4.4, 4)
2.11. Standard Distance#
How feature locations deviate from their mean center
Average distance to mean center
Two dimensional equivalent of standard deviation
Compactness (dispersed vs. clustered)
Formula: \(S_{xy} = \sqrt{\frac{\sum (X_i - \overline{X})^2 + \sum (Y_i - \overline{Y})^2}{N}}\)
2.12. Standard Deviation Ellipse#
Standard distance does not show directional trend (anisotropy)
The standard deviation ellipse gives dispersion and direction measure
2.13. Standard Deviational Ellipse Example#
Example: Monthly mean centers of tornado touchdown locations from 1950 to 2004
2.14. Lab #1—Descriptive Spatial Statistics#
Lab #1—Descriptive Spatial Statistics