### Present Remotely

Send the link below via email or IM

CopyPresent to your audience

Start remote presentation- Invited audience members
**will follow you**as you navigate and present - People invited to a presentation
**do not need a Prezi account** - This link expires
**10 minutes**after you close the presentation - A maximum of
**30 users**can follow your presentation - Learn more about this feature in our knowledge base article

# Statistical Foundation

No description

by

Tweet## Trisha Holtzclaw

on 1 February 2013#### Transcript of Statistical Foundation

Univariate Analysis (single variable analysis) are normally described by three main characteristics:

Distribution (raw tables, frequency distribution tables, histograms, bar graphs)

Central tendency (mean, median, mode)

Dispersion (standard deviation) Analyzing Single Attributes (Univariate Analysis) All well-trained geographers need to be proficient in applying statistical techniques.

This module’s content is intended to provide an understandable introduction to statistical methods in a practical, problem solving framework. http://mediacrushllc.com/2012/internet-statistics-2012/ Analyzing Two or More Attributes

(Bivariate Analysis) Bivariate Analysis Geostatistics - Analysis of

Spatial Location And to Conclude on a lighter note: GIS 3015 Cartographic Skills Statistical Foundation Introduction In this lecture we will cover:

Main statistics used in map analysis

Different types of graphical representations of data .

Basic concepts of statistical analysis. Here’s what you should focus on during this session:

Realize the use and importance of statistics in map making.

Learn the basics so when it is time to analyze data in a GIS (with the click of a button) you will know if the result created is true and correct.

Determine which statistic, graph, or data classification method will be best for a given data set. Statistics extract meaning from your data.

The data you collect in the field or laboratory (physically or electronically) is not meaningful without statistics to extract meaning, and clear graphs, maps and tables to communicate the meaning. Why Do We Need to Use Statistics? Excel water sample data set taken in Choctawhatchee Bay, FL area Specifically, statistical analysis allows us to:

Identify and describe patterns in large amounts of data.

Predict based on patterns through space and time.

Explain the causes of relationships

Exert influence over phenomenon that we wish to control. Statistical analysis is not the only way to examine a phenomenon (consider theoretical, mathematical, or physical modeling), but it is one of the most commonly used approaches. http://en.wikipedia.org/wiki/Global_climate_model#Output_variables Population:

Defined as the total set of elements or things we could study.

Looking to the text’s example of murder rate, the population would actually be the entire population of people living in the study area.

Sample:

Only a portion of the population that is actually examined or collected.

Murder example: to extract a murder rate the analyst is only interested in the sample of the entire population that have been murdered. Population and Sample Descriptive Statistics Statistical methods can be split into two categories: Descriptive and Inferential

Descriptive statistics are what most people think about when they hear the word statistics. Descriptive statistics describe the characteristics of a set of data usually in the form of a numerical value. This category includes the following:

mean, median and mode

range and standard deviation

Skewness and correlation

graphical representation of the data set Inferential Statistics Inferential statistics are used to make an inference about a population from a representative sample of the population. These statistics are expressed as a range of numbers along with a degree of confidence.

Tests of significance (hypothesis testing) tests a claim about the population by analyzing the sample Distribution - Frequency Table Distribution of a dataset can be shown by a frequency table. This is a summary of the number of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. In most cases you would group the raw values into categories based on a range of values:

GPA based on letter grades

Number of students by grade/year

Gender based on number of males and females Distribution - Histograms Distribution can also be shown by using a histogram. The histogram shows the following information about a dataset:

Center of the data

Spread of the data

Skewness of the data

Presence of outliers

Presence of multiple modes in the data

Histograms are compared with a hypothetical normal distribution (bell-shaped). Central Tendency Measures of central tendency are used to indicate a value around which the data are most likely concentrated. Three measures are commonly recognized:

Mode

Median

Mean Measures of Central Tendency Dispersion Measures of dispersion provide an indication of how data are spread along the number line. The two common measures used are the range and the standard deviation.

Range - the highest value minus the lowest value. Dispersion Standard deviation - A more accurate measure of dispersion since an outlier can greatly effect the range. The standard deviation shows the relationship that the data set has to the mean of the sample. Bivariate analysis involves TWO or more variables and deals with causes and relationships between the variable.

Primarily we use correlation and regression analysis in social sciences. Univariate Anaysis Example Histograms Mode - Median - Mean - The most frequently occurring value The middle value in an ordered

set of data The average of the data calculated by summing all the values and dividing by the number of values Normal Distribution Standard Deviation of a sample Correlation Correlation Coefficient (r) The correlation coefficient (r) is a real number between -1 and 1 that measures the strength of a correlation between two variables. Correlation Limitations of Correlation R values Positive r value = positive relationship

Negative r value = negative relationship

Value of r is used to indicate the direction of the relationship Outliers can strongly influence the correlation coeffiecient

High correlations do not necessarily imply a causal relationship

Geographical data tend to be spatially correlated R2 values The strength of a relationship is calculated by r2 (R squared) - the coefficient of determination

The value of r2 ranges from 0 to 1.0 and can be multiplied by 100 to obtain a percentage.

The closer the r2 value is to 1 then the more likely one of your variables can be explained by the other one. Regression Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.

It is important to first determine if there is a relationship between the variables. This is done by using a scatterplot and finding the correlation coefficient.

A linear regression line has an equation of Y = a + bX Geostatistics Since geographers are concerned with the analysis of locational data, these descriptive spatial statistics, referred to as geostatistics, are often used to describe the degree of spatial variability of some phenomenon. Geostatistics Mean Center Calculated by averaging the X and Y coordinates separately The mean center is strongly affected by points at extreme coordinate locations - outliers

Therefore while the mean center represents an average location, it may not represent a "typical" location

The mean center may be considered the center of gravity of a point pattern or spatial distribution Geostatistics Standard Distance Measures the amount of absolute dispersion in a point pattern Just as the mean center serves as the locational equivalent to the mean, standard distance is the spatial equivalent of standard deviation

Standard distance is strongly influenced by extreme locations

Weighted standard distance is used for those geographic applications that require a weighted mean center. Geostatistics Spatial Autocorrelation The tendency for like things to occur near one another in geographic space. Prezi Tutorial Tutorial Audio Clips Most slides in this presentation include audio clips of the lecture material. If you place your mouse at the bottom of the screen you will see the playback controls. Press play and pause to hear/stop the audio. Slides To progress to the next slide, click the right arrow found at the bottom-center of your screen As the audio plays, you can pan and zoom around the presentation. The audio will only end if:

you press the pause button

you go to the next slide

or it runs to the end More Controls To return to the previous slide, click the left arrow, found at the bottom-center of the screen. Use this button if you accidentally pan or zoom into nothingness. (Or after clicking the Home button in the next step. You can also view the presentation as a whole by clicking the Home button... and zoom in and out using the mouse wheel or the zoom buttons. Hold down the left mouse button to pan around the presentation. You can click on text, images, or slides to zoom directly to them. Manually zooming to text and images will not interrupt the audio, but clicking on items will pause audio. Note: Depending on the dimensions of your screen, these arrows might be pointing at nothing! www.dilbert.com Divide by N when calculating

Standard Deviation of a Population! Case 1 Clustered Case 2 Dispersed Case 3 Random Geostatistics Clusted patterns (Case 1) exhibit positive spatial autocorrelation Dispersed patterns (Case 2) exhibit negative spatial autocorrleation Random patterns (Case 3) have no spatial autocorrelation

Full transcriptDistribution (raw tables, frequency distribution tables, histograms, bar graphs)

Central tendency (mean, median, mode)

Dispersion (standard deviation) Analyzing Single Attributes (Univariate Analysis) All well-trained geographers need to be proficient in applying statistical techniques.

This module’s content is intended to provide an understandable introduction to statistical methods in a practical, problem solving framework. http://mediacrushllc.com/2012/internet-statistics-2012/ Analyzing Two or More Attributes

(Bivariate Analysis) Bivariate Analysis Geostatistics - Analysis of

Spatial Location And to Conclude on a lighter note: GIS 3015 Cartographic Skills Statistical Foundation Introduction In this lecture we will cover:

Main statistics used in map analysis

Different types of graphical representations of data .

Basic concepts of statistical analysis. Here’s what you should focus on during this session:

Realize the use and importance of statistics in map making.

Learn the basics so when it is time to analyze data in a GIS (with the click of a button) you will know if the result created is true and correct.

Determine which statistic, graph, or data classification method will be best for a given data set. Statistics extract meaning from your data.

The data you collect in the field or laboratory (physically or electronically) is not meaningful without statistics to extract meaning, and clear graphs, maps and tables to communicate the meaning. Why Do We Need to Use Statistics? Excel water sample data set taken in Choctawhatchee Bay, FL area Specifically, statistical analysis allows us to:

Identify and describe patterns in large amounts of data.

Predict based on patterns through space and time.

Explain the causes of relationships

Exert influence over phenomenon that we wish to control. Statistical analysis is not the only way to examine a phenomenon (consider theoretical, mathematical, or physical modeling), but it is one of the most commonly used approaches. http://en.wikipedia.org/wiki/Global_climate_model#Output_variables Population:

Defined as the total set of elements or things we could study.

Looking to the text’s example of murder rate, the population would actually be the entire population of people living in the study area.

Sample:

Only a portion of the population that is actually examined or collected.

Murder example: to extract a murder rate the analyst is only interested in the sample of the entire population that have been murdered. Population and Sample Descriptive Statistics Statistical methods can be split into two categories: Descriptive and Inferential

Descriptive statistics are what most people think about when they hear the word statistics. Descriptive statistics describe the characteristics of a set of data usually in the form of a numerical value. This category includes the following:

mean, median and mode

range and standard deviation

Skewness and correlation

graphical representation of the data set Inferential Statistics Inferential statistics are used to make an inference about a population from a representative sample of the population. These statistics are expressed as a range of numbers along with a degree of confidence.

Tests of significance (hypothesis testing) tests a claim about the population by analyzing the sample Distribution - Frequency Table Distribution of a dataset can be shown by a frequency table. This is a summary of the number of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. In most cases you would group the raw values into categories based on a range of values:

GPA based on letter grades

Number of students by grade/year

Gender based on number of males and females Distribution - Histograms Distribution can also be shown by using a histogram. The histogram shows the following information about a dataset:

Center of the data

Spread of the data

Skewness of the data

Presence of outliers

Presence of multiple modes in the data

Histograms are compared with a hypothetical normal distribution (bell-shaped). Central Tendency Measures of central tendency are used to indicate a value around which the data are most likely concentrated. Three measures are commonly recognized:

Mode

Median

Mean Measures of Central Tendency Dispersion Measures of dispersion provide an indication of how data are spread along the number line. The two common measures used are the range and the standard deviation.

Range - the highest value minus the lowest value. Dispersion Standard deviation - A more accurate measure of dispersion since an outlier can greatly effect the range. The standard deviation shows the relationship that the data set has to the mean of the sample. Bivariate analysis involves TWO or more variables and deals with causes and relationships between the variable.

Primarily we use correlation and regression analysis in social sciences. Univariate Anaysis Example Histograms Mode - Median - Mean - The most frequently occurring value The middle value in an ordered

set of data The average of the data calculated by summing all the values and dividing by the number of values Normal Distribution Standard Deviation of a sample Correlation Correlation Coefficient (r) The correlation coefficient (r) is a real number between -1 and 1 that measures the strength of a correlation between two variables. Correlation Limitations of Correlation R values Positive r value = positive relationship

Negative r value = negative relationship

Value of r is used to indicate the direction of the relationship Outliers can strongly influence the correlation coeffiecient

High correlations do not necessarily imply a causal relationship

Geographical data tend to be spatially correlated R2 values The strength of a relationship is calculated by r2 (R squared) - the coefficient of determination

The value of r2 ranges from 0 to 1.0 and can be multiplied by 100 to obtain a percentage.

The closer the r2 value is to 1 then the more likely one of your variables can be explained by the other one. Regression Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.

It is important to first determine if there is a relationship between the variables. This is done by using a scatterplot and finding the correlation coefficient.

A linear regression line has an equation of Y = a + bX Geostatistics Since geographers are concerned with the analysis of locational data, these descriptive spatial statistics, referred to as geostatistics, are often used to describe the degree of spatial variability of some phenomenon. Geostatistics Mean Center Calculated by averaging the X and Y coordinates separately The mean center is strongly affected by points at extreme coordinate locations - outliers

Therefore while the mean center represents an average location, it may not represent a "typical" location

The mean center may be considered the center of gravity of a point pattern or spatial distribution Geostatistics Standard Distance Measures the amount of absolute dispersion in a point pattern Just as the mean center serves as the locational equivalent to the mean, standard distance is the spatial equivalent of standard deviation

Standard distance is strongly influenced by extreme locations

Weighted standard distance is used for those geographic applications that require a weighted mean center. Geostatistics Spatial Autocorrelation The tendency for like things to occur near one another in geographic space. Prezi Tutorial Tutorial Audio Clips Most slides in this presentation include audio clips of the lecture material. If you place your mouse at the bottom of the screen you will see the playback controls. Press play and pause to hear/stop the audio. Slides To progress to the next slide, click the right arrow found at the bottom-center of your screen As the audio plays, you can pan and zoom around the presentation. The audio will only end if:

you press the pause button

you go to the next slide

or it runs to the end More Controls To return to the previous slide, click the left arrow, found at the bottom-center of the screen. Use this button if you accidentally pan or zoom into nothingness. (Or after clicking the Home button in the next step. You can also view the presentation as a whole by clicking the Home button... and zoom in and out using the mouse wheel or the zoom buttons. Hold down the left mouse button to pan around the presentation. You can click on text, images, or slides to zoom directly to them. Manually zooming to text and images will not interrupt the audio, but clicking on items will pause audio. Note: Depending on the dimensions of your screen, these arrows might be pointing at nothing! www.dilbert.com Divide by N when calculating

Standard Deviation of a Population! Case 1 Clustered Case 2 Dispersed Case 3 Random Geostatistics Clusted patterns (Case 1) exhibit positive spatial autocorrelation Dispersed patterns (Case 2) exhibit negative spatial autocorrleation Random patterns (Case 3) have no spatial autocorrelation