Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Untitled Prezi

No description
by

Cholo Manalo

on 5 June 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Untitled Prezi

statistics:
The researcher's helping hand
Radiation shielding using Pterygoplichtys disjunctivus’ bones and cartilages
Statistical Treatment to be used:
Analysis of Variance

Sta-tis-tics (n.)
The science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical interpretation (McClave, et. al. 2011).
BASIC
Concepts
data
Measurement
Ratio
descriptive
Statistics
measures of central tendency
Measures of variability
measures of shape
measures of association
statistical
Inferences
Sampling
hypothesis testing for single population
analysis of variance
Sample
vs
population
Parameter
vs
statistics
descriptive
vs
inferential
quantitative
vs
qualitative
hypothesis testing for two population
categorical Data
Non
Statistics
runs
test
mann-
whittney
u test
kruskal-
wallis test
freidman
test
wilcoxon
matched-
pairs signed
rank test
parametric
spearman's
Rank
correlation
nutshell
Statistics in a
SAmple
portion of a whole and, if properly taken, is the representative of the whole (Black, 2013).
Population
a set of units (usually people, objects, transactions or events) that we are interested in studying.
Parameter
a descriptive measure of the population.
Statistic
a descriptive measure of the sample.
Descriptive statistics includes statistical procedures that we use to describe the population we are studying.
Inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample.
Quantitative data are measurements that are recorded on naturally occurring numerical scale.
Qualitative data are measurements that cannot be measured on natural numerical scale; they can only be classified into one of a group of categories.
The highest level of data measurement.
Same properties as interval data but have an absolute zero.
(e.g. height, weight, time, K)
interval
The distances between consecutive numbers have meaning and the data are always numerical.
(e.g. Celsius and Fahrenheit Scales)
Ordinal
Refers to quantities that have a natural ordering. The ranking of favorite sports, the order of people's place in a line, the order of runners finishing a race or more often the choice on a rating scale from 1 to 5.
Nominal
refers to categorically discrete data such as name of your school, type of car you drive or name of a book.
Mean
The mean (or the average) is equal to the sum of all the values in the data set divided by the number of values in the data set.

Population mean : u
Sample mean: x
median
The median is the middle score for a set of data that has been arranged in order of magnitude.
mode
The mode is the most frequent score in our data set
Percentile
Divides a group data into 100 parts.
Widely used in reporting test results.

Quartiles
Divides a group of data into four parts.
Nominal - Mode
Ordinal - Median
Interval/Ratio (not skewed) - Mean
Interval/Ratio (skewed) - Median
when to use
Range
The range of a data set is the difference between the largest and smallest data values.
variance
Determines how close the data in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as the average squared difference of the scores from the mean.
standard deviation
The standard deviation is simply the square root of the variance. This is an
especially useful measure of variability when the distribution is normal or
approximately normal.
correlation
Measure of the degree of relatedness of the variables.
Pearson Product Moment Correlation
>measures the linear correlation of two (sample) variables.
Skewness: indicator used in distribution analysis as a sign of asymmetry and deviation from a normal distribution.

Interpretation:
Skewness > 0 - Right skewed distribution - most values are concentrated on left of the mean, with extreme values to the right.
Skewness < 0 - Left skewed distribution - most values are concentrated on the right of the mean, with extreme values to the left.
Skewness = 0 - mean = median, the distribution is symmetrical around the mean.
Kurtosis - indicator used in distribution analysis as a sign of flattening or "peakedness" of a distribution.

Interpretation:
Kurtosis > 3 - Leptokurtic distribution, sharper than a normal distribution, with values concentrated around the mean and thicker tails. This means high probability for extreme values.
Kurtosis < 3 - Platykurtic distribution, flatter than a normal distribution with a wider peak. The probability for extreme values is less than for a normal distribution, and
the values are wider spread around the mean.
Kurtosis = 3 - Mesokurtic distribution - normal distribution for example.
probability methods
Simple random sampling -Whole population is available.
Stratified sampling (random within target groups) -There are specific sub-groups to investigate (eg. demographic groupings).
Systematic sampling (every nth person) - When a stream of representative people are available (eg. in the street).
Cluster sampling (all in limited groups) - When population groups are separated and access to all is difficult, eg. in many distant cities.
Quota methods
Quota sampling (get only as many as you need) - You have access to a wide population, including sub-groups

Proportionate quota sampling (in proportion to population sub-groups) - You know the population distribution across groups, and when normal sampling may not give enough in minority groups

Non-proportionate quota sampling (minimum number from each sub-group) -There is likely to a wide variation in the studied characteristic within minority groups

Selective methods
Purposive sampling (based on intent) - You are studying particular groups

Expert sampling (seeking 'experts') - You want expert opinion

Snowball sampling (ask for recommendations) -You seek similar subjects (eg. young drinkers)

Modal instance sampling (focus on 'typical' people) - When sought 'typical' opinion may get lost in a wider study, and when you are able to identify the 'typical' group

Diversity sampling (deliberately seeking variation) - You are specifically seeking differences, eg. to identify sub-groups or potential conflicts
Convenience methods
Snowball sampling (ask for recommendations) - You are ethically and socially able to ask and seek similar subjects.

Convenience sampling (use who's available) - You cannot proactively seek out subjects.

Judgment sampling - (guess a good-enough sample) You are expert and there is no other choice.
data points should be independent from each other.
z-test is preferable when n is greater than 30.
the distributions should be normal if n is low, if however n>30 the
distribution of the data does not have to be normal.
the variances of the samples should be the same (F-test).
all individuals must be selected at random from the population.
all individuals must have equal chance of being selected.
sample sizes should be as equal as possible but some differences are allowed.
Data types that can be analyzed with z-tests
Data types that can be analyzed with t-tests
data sets should be independent from each other except in the case of the paired-sample t-test
where n<30 the t-tests should be used
the distributions should be normal for the equal and unequal variance t-test (K-S test or Shapiro-Wilke)
the variances of the samples should be the same (F-test) for the equal variance t-test
all individuals must be selected at random from the population
all individuals must have equal chance of being selected
sample sizes should be as equal as possible but some differences are allowed
ANOVA is a statistical technique that performs a similar function to the t-test, but is capable of dealing with more than two levels or more than one factor.
If tcalc > ttab, we reject the null hypothesis
chi square goodness of fit test
Used to compare observed data with data we would expect to obtain according to a specific hypothesis.

The chi-square test is always testing the null hypothesis, which states that there is no significant difference between the expected and observed result.

Chi-square requires that you use numerical values, not percentages or ratios.

For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Is it by chance or due to other factors?
Chi-Square Test for Independence
The test is applied when you have two categorical variables from a single population.

It is used to determine whether there is a significant association between the two variables.

For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference.
runs test
The runs test can be used to decide if a data set is from a random process.

Purpose: Detect Non-Randomness

Randomness is one of the key assumptions in determining if a univariate statistical process is in control.

If the randomness assumption is not valid, then a different model needs to be used.
parametric
Typical data:
Ratio or Interval
non-parametric
Typical Data:
Ordinal or Nominal
The Wilcoxon signed-rank test is the nonparametric test equivalent to the dependent t-test. It is used to compare two sets of scores that come from the same participants.
As an example ANOVA is a parametric method while Kruskal Wallis is the corresponding non-parametric method which has to be used in case the assumption of normality is rejected in the before the use of ANOVA tests
The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.

It is often considered the nonparametric alternative to the
independent t-test although this is not always the case
Unlike the independent-samples t-test, the Mann-Whitney U test allows you to draw different conclusions about your data depending on the assumptions you make about your data's distribution.
For example, you could use the Mann-Whitney U test to understand whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender (i.e., your dependent variable would be "attitudes towards pay discrimination" and your independent variable would be "gender", which has two groups: "male" and "female").
This can occur when we wish to investigate any change in scores from one time point to another, or when individuals are subjected to more than one condition.
For example, you could use a Wilcoxon signed-rank test to understand whether there was a difference in smokers' daily cigarette consumption before and after a 6 week hypnotherapy programme
Non-Parametric alternative to one-way ANOVA
A one-way anova may yield inaccurate estimates of the P-value when the data are very far from normally distributed. The Kruskal–Wallis test does not make assumptions about normality
Example: Possible differences in graded
performance to three separate activities (e.g.,
final examination score, composite score for all
homework problems, final project score) in a high
school Logo programming language class.
The Friedman test is the non-parametric alternative to the one-way ANOVA with repeated measures.
Example:
A researcher wants to examine whether music has an effect on the perceived psychological effort required to perform an exercise session.
To test whether music has an effect on the perceived psychological effort required to perform an exercise session, the researcher recruited 12 runners who each ran three times on a treadmill for 30 minutes.
At the end of each run, subjects were asked to record how hard the running session felt on a scale of 1 to 10, with 1 being easy and 10 extremely hard. A Friedman test was then carried out to see if there were differences in perceived effort based on music type.

A nonparametric version of the Pearson product-moment correlation. Spearman's correlation coefficient, (also signified by rs) measures the strength of association between two ranked variables.
Example:
In a competition/comtest. Spearman Rank Correlation Coefficient can indicate if judges agree to each other's views as far as talent of the contestants are concerned (though
they might award different numerical scores) - in other
words if the judges are unanimous.
Full transcript