**Data Analysis**

Collecting data

Investigative techniques we will look at for collecting data are; census, sampling and observation.

Task 1

Research and record the meanings and differences between these data collection techniques.

CATEGORICAL VARIABLE

A categorical variable is a variable whose values are categories.

Examples: blood group is a categorical variable; its values are: A, B, AB or O. So too is construction type of a house; its values might be brick, concrete, timber, or steel.

Categories may have numerical labels, for example, for the variable postcode the category labels would be numbers like 3787, 5623, 2016, etc, but these labels have no numerical significance. For example, it makes no sense to use these numerical labels to calculate the average postcode in Australia.

CENSUS

A census is an attempt to collect information about the whole population.

COLLECTING DATA

Investigative techniques we will look at for collecting data are; census, sampling and observation.

CONTINUOUS VARIABLE

A continuous variable is a numerical variable that can take any value that lies within an interval. In practice, the values taken are subject to the accuracy of the measurement instrument used to obtain these values.

Examples include height, reaction time to a stimulus and systolic blood pressure.

DATA

Data is a general term for a set of observations and measurements collected during any type of systematic investigation.

Primary data is data collected by the user. Secondary data is data collected by others. Sources of secondary data include web-based data sets, the media, books, scientific papers, etc

NUMERICAL VARIABLES

Numerical variables are variables whose values are numbers, and for which arithmetic processes such as adding and subtracting, or calculating an average, make sense.

A discrete numerical variable is a numerical variable, each of whose possible values is separated from the next by a definite 'gap'. The most common numerical variables have the counting numbers 0, 1, 2, 3, … as possible values. Others are prices, measured in dollars and cents.

POPULATION

A population is the complete set of individuals, objects, places, etc, that we want information about. A census is an attempt to collect information about the whole population.

Examples include the number of children in a family or the number of days in a month.

SAMPLE

A sample is part of a population. It is a subset of the population, often randomly selected for the purpose of estimating the value of a characteristic of the population as a whole.

For instance, a randomly selected group of eight-year-old children (the sample) might be selected to estimate the incidence of tooth decay in eight-year-old children in Australia (the population).

VARIABLE

(STATISTICS)

A variable is something measurable or observable that is expected to change either over time or between individual observations.

Examples of variables in statistics include the age of students, their hair colour or a playing field's length or its shape.

http://http://www.abs.gov.au/browse opendocument&ref=topBar

Australian Bureau of Statistics

METALANGUAGE GLOSSARY

Frequency histograms and polygons

Frequency distribution tables

Box plots

Worksheets

Dot plots

Shape of data distribution

Software

Excel to R Studio

http://www.wolframalpha.com/widgets/view.jsp?id=25d70df1dbf954506a4f3015a26d03ea

Widgets

Rolling dice

Formulas

Uniform or rectangular : All bar values are equal in length

Normal or Gaussian : Symmetric - Mean = median = mode

Positive skew (skewed right) : Tail points toward positive numbers

Negative skew (skewed left) : Tail points toward negative numbers

bimodal : 2 modes - Larger is major mode other is minor mode.

The difference between the major and minor mode is called

the amplitude. The least frequent value between the modes

is called the antimode.

multimodal : 2 or more modes

Videos

e.g. A bivariate, multimodal distribution

Prezi Link

http://prezi.com/tus4y5irw1zq/?utm_campaign=share&utm_medium=copy