**statistics:**

**The researcher's helping hand**

Radiation shielding using Pterygoplichtys disjunctivus’ bones and cartilages

Statistical Treatment to be used:

Analysis of Variance

Sta-tis-tics (n.)

The science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical interpretation (McClave, et. al. 2011).

**BASIC**

**Concepts**

**data**

**Measurement**

**Ratio**

**descriptive**

**Statistics**

measures of central tendency

Measures of variability

measures of shape

measures of association

**statistical**

**Inferences**

Sampling

hypothesis testing for single population

analysis of variance

Sample

vs

population

Parameter

vs

statistics

descriptive

vs

inferential

quantitative

vs

qualitative

hypothesis testing for two population

categorical Data

**Non**

**Statistics**

runs

test

mann-

whittney

u test

kruskal-

wallis test

freidman

test

wilcoxon

matched-

pairs signed

rank test

**parametric**

spearman's

Rank

correlation

**nutshell**

**Statistics in a**

SAmple

portion of a whole and, if properly taken, is the representative of the whole (Black, 2013).

Population

a set of units (usually people, objects, transactions or events) that we are interested in studying.

Parameter

a descriptive measure of the population.

Statistic

a descriptive measure of the sample.

Descriptive statistics includes statistical procedures that we use to describe the population we are studying.

Inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample.

Quantitative data are measurements that are recorded on naturally occurring numerical scale.

Qualitative data are measurements that cannot be measured on natural numerical scale; they can only be classified into one of a group of categories.

The highest level of data measurement.

Same properties as interval data but have an absolute zero.

(e.g. height, weight, time, K)

interval

The distances between consecutive numbers have meaning and the data are always numerical.

(e.g. Celsius and Fahrenheit Scales)

Ordinal

Refers to quantities that have a natural ordering. The ranking of favorite sports, the order of people's place in a line, the order of runners finishing a race or more often the choice on a rating scale from 1 to 5.

Nominal

refers to categorically discrete data such as name of your school, type of car you drive or name of a book.

Mean

The mean (or the average) is equal to the sum of all the values in the data set divided by the number of values in the data set.

Population mean : u

Sample mean: x

median

The median is the middle score for a set of data that has been arranged in order of magnitude.

mode

The mode is the most frequent score in our data set

Percentile

Divides a group data into 100 parts.

Widely used in reporting test results.

Quartiles

Divides a group of data into four parts.

Nominal - Mode

Ordinal - Median

Interval/Ratio (not skewed) - Mean

Interval/Ratio (skewed) - Median

when to use

Range

The range of a data set is the difference between the largest and smallest data values.

variance

Determines how close the data in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as the average squared difference of the scores from the mean.

standard deviation

The standard deviation is simply the square root of the variance. This is an

especially useful measure of variability when the distribution is normal or

approximately normal.

correlation

Measure of the degree of relatedness of the variables.

Pearson Product Moment Correlation

>measures the linear correlation of two (sample) variables.

Skewness: indicator used in distribution analysis as a sign of asymmetry and deviation from a normal distribution.

Interpretation:

Skewness > 0 - Right skewed distribution - most values are concentrated on left of the mean, with extreme values to the right.

Skewness < 0 - Left skewed distribution - most values are concentrated on the right of the mean, with extreme values to the left.

Skewness = 0 - mean = median, the distribution is symmetrical around the mean.

Kurtosis - indicator used in distribution analysis as a sign of flattening or "peakedness" of a distribution.

Interpretation:

Kurtosis > 3 - Leptokurtic distribution, sharper than a normal distribution, with values concentrated around the mean and thicker tails. This means high probability for extreme values.

Kurtosis < 3 - Platykurtic distribution, flatter than a normal distribution with a wider peak. The probability for extreme values is less than for a normal distribution, and

the values are wider spread around the mean.

Kurtosis = 3 - Mesokurtic distribution - normal distribution for example.

probability methods

Simple random sampling -Whole population is available.

Stratified sampling (random within target groups) -There are specific sub-groups to investigate (eg. demographic groupings).

Systematic sampling (every nth person) - When a stream of representative people are available (eg. in the street).

Cluster sampling (all in limited groups) - When population groups are separated and access to all is difficult, eg. in many distant cities.

Quota methods

Quota sampling (get only as many as you need) - You have access to a wide population, including sub-groups

Proportionate quota sampling (in proportion to population sub-groups) - You know the population distribution across groups, and when normal sampling may not give enough in minority groups

Non-proportionate quota sampling (minimum number from each sub-group) -There is likely to a wide variation in the studied characteristic within minority groups

Selective methods

Purposive sampling (based on intent) - You are studying particular groups

Expert sampling (seeking 'experts') - You want expert opinion

Snowball sampling (ask for recommendations) -You seek similar subjects (eg. young drinkers)

Modal instance sampling (focus on 'typical' people) - When sought 'typical' opinion may get lost in a wider study, and when you are able to identify the 'typical' group

Diversity sampling (deliberately seeking variation) - You are specifically seeking differences, eg. to identify sub-groups or potential conflicts

Convenience methods

Snowball sampling (ask for recommendations) - You are ethically and socially able to ask and seek similar subjects.

Convenience sampling (use who's available) - You cannot proactively seek out subjects.

Judgment sampling - (guess a good-enough sample) You are expert and there is no other choice.

data points should be independent from each other.

z-test is preferable when n is greater than 30.

the distributions should be normal if n is low, if however n>30 the

distribution of the data does not have to be normal.

the variances of the samples should be the same (F-test).

all individuals must be selected at random from the population.

all individuals must have equal chance of being selected.

sample sizes should be as equal as possible but some differences are allowed.

Data types that can be analyzed with z-tests

Data types that can be analyzed with t-tests

data sets should be independent from each other except in the case of the paired-sample t-test

where n<30 the t-tests should be used

the distributions should be normal for the equal and unequal variance t-test (K-S test or Shapiro-Wilke)

the variances of the samples should be the same (F-test) for the equal variance t-test

all individuals must be selected at random from the population

all individuals must have equal chance of being selected

sample sizes should be as equal as possible but some differences are allowed

ANOVA is a statistical technique that performs a similar function to the t-test, but is capable of dealing with more than two levels or more than one factor.

If tcalc > ttab, we reject the null hypothesis

chi square goodness of fit test

Used to compare observed data with data we would expect to obtain according to a specific hypothesis.

The chi-square test is always testing the null hypothesis, which states that there is no significant difference between the expected and observed result.

Chi-square requires that you use numerical values, not percentages or ratios.

For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Is it by chance or due to other factors?

Chi-Square Test for Independence

The test is applied when you have two categorical variables from a single population.

It is used to determine whether there is a significant association between the two variables.

For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference.

runs test

The runs test can be used to decide if a data set is from a random process.

Purpose: Detect Non-Randomness

Randomness is one of the key assumptions in determining if a univariate statistical process is in control.

If the randomness assumption is not valid, then a different model needs to be used.

parametric

Typical data:

Ratio or Interval

non-parametric

Typical Data:

Ordinal or Nominal

The Wilcoxon signed-rank test is the nonparametric test equivalent to the dependent t-test. It is used to compare two sets of scores that come from the same participants.

As an example ANOVA is a parametric method while Kruskal Wallis is the corresponding non-parametric method which has to be used in case the assumption of normality is rejected in the before the use of ANOVA tests

The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.

It is often considered the nonparametric alternative to the

independent t-test although this is not always the case

Unlike the independent-samples t-test, the Mann-Whitney U test allows you to draw different conclusions about your data depending on the assumptions you make about your data's distribution.

For example, you could use the Mann-Whitney U test to understand whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender (i.e., your dependent variable would be "attitudes towards pay discrimination" and your independent variable would be "gender", which has two groups: "male" and "female").

This can occur when we wish to investigate any change in scores from one time point to another, or when individuals are subjected to more than one condition.

For example, you could use a Wilcoxon signed-rank test to understand whether there was a difference in smokers' daily cigarette consumption before and after a 6 week hypnotherapy programme

Non-Parametric alternative to one-way ANOVA

A one-way anova may yield inaccurate estimates of the P-value when the data are very far from normally distributed. The Kruskal–Wallis test does not make assumptions about normality

Example: Possible differences in graded

performance to three separate activities (e.g.,

final examination score, composite score for all

homework problems, final project score) in a high

school Logo programming language class.

The Friedman test is the non-parametric alternative to the one-way ANOVA with repeated measures.

Example:

A researcher wants to examine whether music has an effect on the perceived psychological effort required to perform an exercise session.

To test whether music has an effect on the perceived psychological effort required to perform an exercise session, the researcher recruited 12 runners who each ran three times on a treadmill for 30 minutes.

At the end of each run, subjects were asked to record how hard the running session felt on a scale of 1 to 10, with 1 being easy and 10 extremely hard. A Friedman test was then carried out to see if there were differences in perceived effort based on music type.

A nonparametric version of the Pearson product-moment correlation. Spearman's correlation coefficient, (also signified by rs) measures the strength of association between two ranked variables.

Example:

In a competition/comtest. Spearman Rank Correlation Coefficient can indicate if judges agree to each other's views as far as talent of the contestants are concerned (though

they might award different numerical scores) - in other

words if the judges are unanimous.