**PROJECT**

**This project is designed to familiarize you with gathering data and then describing and analyzing it using SPSS..**

The SPSS Windows And Files

SPSS Windows

SPSS Statistics is a software package that can be used to perform data entry, analysis, to create tables and graphs. SPSS is capable of handling large amounts of data and can perform all of the analysis and much more.

**Frequency Analysis.**

2-Data View

The Simple Bar Graph

**One-Sample T Test**

**Chi-square test**

H1: The mean difference between two samples is not equal to, greater than, less than 0.

**SPSS**

SPSS Statistics has three main windows, plus a pull-down menu at the top. These allow you to see your data, see your statistical output, and see any programming commands you have written. Each window corresponds to a separate type of SPSS file.

1- Variables View..

Contains descriptions of the attributes of each variable in the data file

- Variable Name:

The name of each SPSS variable in a given file must be unique.

- Variable Type:

The kind of data to be recorded (e.g., strings of characters, numeric values, or special numbers like dates). The contents of the Variable Type dialog box depend on the selected data type.

- Column Width:

The width of a variable is the number of characters SPSS will allow to be entered for the variable.

- Decimals:

The decimals of a variable is the number of decimal places that SPSS will display.

- Label:

The label of a variable is a string of text to identify in more detail what a variable represents.

- Values:

for categorical data we often need to know which numbers represent which categories. To indicate how these numbers are assigned, one can add labels to specific values (e.g., 1 = male and 2 = female).

* Click in the Value field to type a specific numeric value.

* Click in the Label field to type the corresponding label.

* Click on the Add button to add this pair of value and label to the list.

value labels can be seen in the Data View by clicking on this icon in the tool bar , which switches between the numeric values and their labels.

- Columns:

determines how wide the variable column should be in Data View mode.

- Align:

The alignment property indicates whether the information in the Data View should be left-justified, right-justified, or centered.

- Missing:

We sometimes want to signal to SPSS that data should be treated as missing, even though there is some other numerical code recorded instead of the data actually being missing.

- Measures:

describes the level of measurement (e.g., nominal, ordinal, or scale)

One of the primary ways of looking at data file is in Data View, so that you can see each row as a case and each column as a variable.

Pull-Down menus

* File Menu:

From the

file

menu you can open several different existing files or a database file such as an excel file or read in a text file. You can also save any changes to the current file.

* Edit Menu:

From the

edit

menu, you can cut, copy, paste, insert variables, insert cases, or use find in the

Data Editor

window.

* Data Menu:

The

data

menu allows you to define variable properties, sort cases, merge files, split files, select cases and use a variable to weight cases.

* Transform Menu:

The

transform

menu is where you will find the options to do some computations on variables, to create new variables from existing ones or recode old variables.

* Analyze Menu:

The

analyze

menu is where all statistical analysis takes place. From descriptive statistics to regression analysis to nonparametric tests.

* Graphs Menu:

The

graph

menu is where you can create high resolution plots and graphs to be edited in the chart editor window or you can create interactive graphs.

* Window:

From the

window

menu you can change the active window. The window with a check mark is the active one. In this case it is the data editor window.

* Help:

The

help

menu allows you to get help on topics in SPSS or to ask the statistics coach some basic questions.

The Output Viewer collects your statistical tables and graphs, and gives you the opportunity to edit them before you save or print them.

WE WILL ANALYZING THESE DATA USING SPSS..

Defining Variable And Entering Data

We need to create the

variables

first. Then enter the

data

by hand.

For each string variables we shall change the type of the variable to "

String

" Then we assign labels to values.

As we mentioned before..

For example.. we label "males" as "M" and "females" as "F".

Data Manipulation

Data files are not always organized in a form to meet specific needs. And we may wish to select a specific subjects to analysis.

- To Compute The Average "

Length

" For The Female.

- First we need to recode

string

variables "

Gender

" into

numeric

variables.

*

Transform

> Recode into different variable > Select the variable "

Gender

" > Click the

Old and New Values…

button > Recode

M

into

0

and

F

into

1

(as shown in figure) > After clicking

Continue

button > Type the new variable name "

NewGender

" (from the output variable section) >

OK

.

From the

Data menu

we can select the case we are looking for.

* select

Select Cases…

* Click the

If condition is satisfied

option.

* Click the

If…

button. Double click on the variable "

NewGender

", then write this condition (

NewGender = 1

)

* Click the

continue

button, Then click the

OK

button.

NOW

, simply we can compute the average

length

for the

female

.

Analyze > Descriptive Statistics > Frequencies > Double click on the variable "

Length

" > Click the

Statistics…

button. Select the

Mean

> Continue > OK.

- SPSS will delete all cases except that under the condition ( NewGender = 1 )

1

2

4

3

3- Output Viewer

**Visual Binning**

This facility lets you interactively create groups (bins, categories) from a continuous variable and visually control the process.

* From the

Transform

menu select the

Visual Binning

.

* Select the variable "

Length

" to move it to the variables to

Bin:

box

* Click the

Continue

button.

* The

Visual Binning

window opens.

* Select the variable "

Length

" from the

Scanned Variable

List:

* Click the

OK

button.

* Click the

Make Cutpoints…

button.

* Type (

10

) in the

First Cutpoint Location:

box, and (

4

) in the

Width:

box.

* The

Number of Cutpoints:

(

3

), and the

Last Cutpoint Location:

(

18

) will automatically occur.

* Click the

Apply

button.

* Click the

Make Labels

button, and then type "

Binned

" in the

Binned Variable:

box.

Frequency analysis is a descriptive statistical method that shows the number of occurrences of each response chosen by the respondents.

- To Make a Frequency Table, Compute a Central Tendency or Dispersion For The "Length"

* Click the

Analyze

menu, point to

Descriptive Statistics

, and select

Frequencies

.

* Select the variable "

Length

" to move it to the

Variable(s):

list box

* Select the

Display frequency tables

check box.

* Click the

Statistics…

button. Select the

Mean

,

Median

,

Mode

,

Variance

and

Standard deviation

.

* Click the

Continue

button, then click the

OK

button.

It is a graph which displays the data by using vertical bars of various heights to represent frequencies.

- To Make a Bar Graph For The Variable "Length"

* Pull the variable "

Length

" to drop it on the

X-axis

.

* Click the

Graphs

menu, select

Chart Builder…

* Select

Bar

, and then select the

Simple bar

.

* Click the

OK

button.

Crosstabs Analysis.

Cross tabulation (or crosstabs for short) is a statistical process that summarizes

categorical

data to create a contingency table.

- TO PERFORM CROSSTABS ANALYSIS TO KNOW HOW MANY

Females

HAS A

Brown

EYE COLOR

* Click the

OK

button.

* Click the

Analyze

menu, point to

Descriptive Statistics

, and select

Crosstabs…

.

* Select the variable "

Gender

" to move it to the

row(s):

list box, and the variable "

EyeColor

" to move it to the

Column(s):

list box.

We conclude that there is 5 from 14

Females

has a

brown

eye color.

CHI-SQUARE is a quantitative measure used to determine whether a relationship exists between two categorical variables.

H0: There is no relationship exists between the two variables.

H1: There is a relationship exists between the two variables.

- in statistical significance testing, the

p-value

is the probability of obtaining a test statistic at least as extreme as the one that was actually observed.

We will often "

reject the null hypothesis

" when the

p-value

turns out to be

less

than a certain

significance level.

- To perform crosstabs analysis to apply Chi-square to determine whether a relationship exists between the variable "Gender" and the variable "EyeColor" with 95% significance level:

* Click the

OK

button.

* Click the

Analyze

menu, point to

Descriptive Statistics

, and select

Crosstabs…

.

* Select the variable "

Gender

" to move it to the

row(s):

list box, and the variable "

EyeColor

" to move it to the

Column(s):

list box.

* Click the

Statistics…

button. Select the

Chi-square

chick box.

The

P-value

is

greater than

the

significance level

, We will

accept

the

null hypothesis

. So, we conclude that no relationship exists between "

Gender

" and "

EyeColor

" at the

5

percent level of significance.

H0: There is no relationship exists between the "

Gender

" and the "

EyeColor

".

H1: There is a relationship exists between the "

Gender

" and the "

EyeColor

".

p-value = 0.648

0.648 > 0.05

Stacked Bar graph

Stacked bar graph is a graph that is used to compare the parts to the whole. The bars in a stacked bar graph are divided into categories. Each bar represents a total.

- To make a stacked bar graph between the "Gender" and the "Weight"

* Click the

OK

button.

* Click the

Graphs

menu, select

Chart Builder…

* Select

Bar

, and then select the

Stacked bar.

* Pull the variable "

Gender

" to drop it on the

X-axis

.

* Pull the variable "

Weight

" to drop it on the

Stack: set color

. (At the upper right corner)

The basic idea of the

One-sample t test

is a comparison between the average of the sample (observed average) and the population (expected average).

H0: difference between observed and expected mean is 0.

H1: difference between observed and expected mean is not 0.

What is SPSS ?

We will often "

reject the null hypothesis

" when the

p-value

turns out to be less than a certain significance level.

- To perform the one-sample t test to compare the mean "Length" with 15 (expected mean).

* Click the

OK

button.

* Click the

Analyze

menu, point to

Compare means

, and select

One-Sample T Test…

.

* Select the variable "

Length

" to move it to the

Test Variable(s):

list box.

* Enter the expected mean

(15)

in the

Test Value

box.

H0: difference between observed & expected mean is 0.

H1: difference between observed & expected mean is not 0.

We will

reject

H0. So, we conclude that the average

Length

of the sampled population is statistically significantly different from

15

at the

5

percent level of significance.

p-value = 0.013

0.013 < 0.05

Confidence interval are defined as:

Confidence Interval

The confidence interval generates a

lower

and

upper

limit for the mean.

- To Calculate The Confidence Interval For The Variable "DaysInHospital"..

* Click the

OK

button.

* Click the

Analyze

menu, point to

Compare means

, and select

One-Sample T Test…

.

* Select the variable "

DaysInHospital

" to move it to the

Test Variable(s):

list box.

The confidence interval for the variable "DaysInHospital" is (4.04 , 5.22).

Paired-Samples T Test

It is used to test if an observed difference between two means is statistically significant for data has normal distribution.

H0: The mean difference between two samples is equal to 0.

We will often "

reject the null hypothesis

" when the p-value turns out to be less than a certain significance level.

- To perform the Paired-sample t test to compare between the mean "

Length

" with the mean "

DaysInHospital

".

* Click the

OK

button.

* Click the

Analyze

menu, point to

Compare means

, and select

Paired-Samples T Test…

* Select the variables "

Length

" and "

DaysInHospital

" to move them to the

Paired Variable(s):

list box.

The P-value is less than the significance level, We will

reject

the null hypothesis. So, we conclude that the mean difference between the "

Length

" and the "

DaysInHospital

" is significantly different at the

5

percent level of significance.

H0: The mean difference between the "

Length

" and the "

DaysInHospital

" is equal to 0.

H1: The mean difference between the "

Length

" and the "

DaysInHospital

" is not equal to 0.

p-value = 0.000

0.000 < 0.05

Independent-Samples T Test

An independent-samples t test is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups

We will often "

reject the null hypothesis

" when the p-value turns out to be less than a certain significance level.

H0: There is no statistically significant difference between the two groups on the dependent variable.

H1: There is a statistically significant difference between the two groups on the dependent variable.

- To perform the Independent-sample t test to compare the mean "

Length

" with the "

Gender

"

* Click the

Continue

button, and then click the

OK

button.

- To perform the Independent-sample t test to compare the mean "

Length

" with the "

Gender

".

* Click the

Analyze

menu, point to

Compare means

, and select

Independent-Samples T Test…

.

* Select the variable "

Length

" to move it to the

Test Variable(s):

list box.

* Select the variable "

Gender

" to move it to the

Grouping Variable(s):

list box.

* Click the

Define Groups…

button. Enter (

0

) in the

Group 1 box

, and (

1

) in the

Group 2 box

.

The P-value is

greater

than the significance level, We will

accept

the null hypothesis. So, we conclude that there is no statistically significant difference between the –

female and male

- mean "

Length

" at the 5 percent level of significance.

H0: The mean "Length" for the –female and male- are equal.

H1: The mean "Length" for the –female and male- are different.

p-value = 0.982

0.982 > 0.05

Simple Linear Regression

Simple linear regression is a method to determine the relationship between a

dependent

variable (Y) and one

independent

variable (X).

The linear equation for simple regression is as follows:

a = intercept

b = slop

**Scatter Plot**

A scatter plot determine if there is a linear relationship between variables or not.

** We will insert a new variable " Nweight" which contains the babies's numerical weight (as shown in figure)

- To make a scatter plot for the variable "

Length

" against the variable "

Nweight

"

* Click the

OK

button.

* Click the

Graphs

menu, select

Chart Builder…

* Select

Scatter/Dot

, and then select the

Simple Scatter plot

.

* Pull the variable "

Length

" to drop it on the

Y-axis

, and the variable "

Nweight

" to drop it on the

X-axis

.

Adding a straight line to the scatter plot

- TO ADD A LINE TO THE SCATTER PLOT

* The fit line was appeared on the

chart

, then close the

Chart Editor

window.

* Double-click the chart in the

Output Viewer

window to open the

Chart Editor

menu.

* Right-click the

chart

, and select

Add Fit Line at Total

.

Predicting Value Of Dependent Variable

- To run a simple regression analysis, to predict the baby's length if his/her weight is 20 kg

* Click the

Statistics…

button. Select the

Estimates

chick box.

* Click the

Analyze

menu, point to

Regression

, and select

Linear…

* Select the variable "

Length

" to move it to the

Dependent:

box, and the variable "

Nweight

" to move it to the

Independent(s):

box.

* Click the

Continue

button, and then click the

OK

button.

b= 1.772

a= -18.811

Predicting the baby's length if his/her weight is 20 kg

The values of a and b should be substituted in the linear equation with X = 20

- TO PREDICT THE BABY'S LENGTH USING THE COMPUTING FUNCTION

* Click the

OK

button.

* Click the

Transform

menu, and select

Compute Variable…

* In the

Target Variable:

box, type [

Predicted

]

* In the

Numeric Expression:

box, type [

-18.811 + 1.772 (20)

]

The predicted Length is 16.63

Coefficient Of Determination

In statistics, the coefficient of determination, denoted R2 and pronounced R squared, indicates how well data points fit a line or curve, giving a value between 0 and 1

- TO COMPUTE R SQUARED

* Click the

Analyze

menu, point to

Regression

, and select

Linear…

* Select the variable "

Length

" to move it to the

Dependent:

box, and the variable "

Nweight

" to move it to the

Independent(s):

box.

* Click the

Statistics…

button. Select the

R squared change

chick box.

* Click the

Continue

button, and then click the

OK

button.

R squared = .888 , There is a strong relationship between the length and weight

Correlation Coefficient

The correlation coefficient is a measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1

- TO COMPUTE THE CORRELATION COEFFICIENT

* Click the

Analyze

menu, point to

Descriptive Statistics

, and select

Crosstabs…

* Select the variable "

Length

" to move it to the

row(s):

list box, and the variable "

Nweight

" to move it to the

Column(s):

list box.

* Click the

Statistics…

button. Select the

Correlations

chick box.

* Click the

Continue

button, and then click the

OK

button.

R is very close to +1, x and y have a strong positive linear correlation, such that as values for x increases,

values for y also increase.

R = .942

The Population Parameters Confidence Interval

In statistics, a confidence interval (CI) is a type of interval estimate of a population parameter and is used to indicate the reliability of an estimate.

- To Calculate The Confidence Interval For The Parameters a and b ..

* Click the

Continue

button, and then click the

OK

button.

* Click the

Analyze

menu, point to

Regression

, and select

Linear…

* Select the variable "

Length

" to move it to the

Dependent:

box, and the variable "

Nweight

" to move it to the

Independent(s)

box.

* Click the

Statistics…

button. Select the

Confidence interval

chick box.

The CI for a ( -24.051 , -13.572 ) and for b ( 1.513 , 2.031 )

One-Way ANalysis Of VAriance

(ANOVA)

Analysis of Variance is a statistical method used to test differences between two or more means.

H1: There is at least one mean is different.

H0: All of the population means are equal.

The ANOVA test procedure produces an

F-statistic

, which is used to calculate the

p-value

. As described if

p < .05

, we

reject

the null hypothesis.

ANOVA Table

- To Perform ANOVA Test To Compare The Mean "

Length

" Among The Three Groups In "

Weight

"

** We shall first recode string variables "Weight" into numeric variable.

* Transform > Recode into different variable > Select the variable "

Weight

" > Click the Old and New Values… button > Recode

Light

(L) into 0,

Medium

(M) into 1 and

Heavy

(H) into 2 (as shown in figure) > After clicking

Continue

button Type the new variable name "

NewWeight

" (from the output variable section) >

OK

- TO RUN THE ANOVA TEST

* Click the

Analyze

menu, point to

Compare means

, and select

One-Way ANOVA…

* Select the variable "

NewWeight

" to move it to the

Factor:

box.

* Select the variable "

Length

" to move it to the

Dependent List:

box.

* Click the

OK

button.

The P-value is greater than the significance level, We will accept the null hypothesis. So, we conclude that there is no statistically significant differences in the mean "Length" in the different groups exist at the 5 percent level of significance.

H0: The mean "

Length

" for all groups in "

Weight

" are equal.

H1: At least one of the mean "

Length

" for the groups in "

weight

" is different.

p-value = 0.090

0.090 > 0.05

Binomial Test

In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories.

H0: The number of observations in each category is equal to that predicted by a biological theory.

H1: The observed data are different from the expected.

We reject the null hypothesis when the p-value is less than .05

- To run the binomial test to show that the number of the

male

is equal to the number of the

female

The variable "

Gender

" was previously recoded in the variable "

NewGender

"

* Click the

OK

button.

* Click the

Analyze

menu, point to

Nonparametric Test

, and select

Binomial…

* Select the variable "

NewGender

" to move it to the

Test Variable List:

box.

* Type (

.50

) in the

Test Proportion:

box.

The P-value is

greater

than the significance level, We will

accept

the null hypothesis. So, we conclude that there is

no

statistically significant differences between the number of the

females

and

males

at the

5

percent level of significance.

H0: The number of the female and male are equal.

H1: The number of the females and males are different.

p-value = 1.000

1.000 > 0.05

The Cumulative Distribution Function

The Cumulative Distribution Function describes the probability that a real-valued X with a given probability distribution will be found at a value less than or equal to x.

- To Compute P ( X ≤ 7 ) with

binomial

distribution, p = 0.6

* Click the

OK

button.

* Click the

Transform

menu, select

Compute Variable…

* Select the

CDF & Noncentral CDF

from the

Function group:

list box.

* Double-click the

Cdf.Binom

from the

Function and Special Variables:

list box, then the expression will occur in the Numeric Expression: box

* Type (

7

) instead of

quant

, (

27

) instead of

n

, and (

0.6

) instead of the

prob

* Type "

prob

" in the

Target Variable:

box.

P ( X ≤ 7 ) = 0.00

The Probability Distribution Function

In statistics, a

probability distribution

assigns a probability to each measurable subset of the possible outcomes of a random experiment, survey, or procedure of statistical inference.

To Compute 〖 , where x represents the variable "Baby", with

binomial

distribution, p = 0.6

* Right-click the

case1

> Select

Insert Cases

> let the variable "

Baby

" starts with

0

**

First

we need to insert new

Row

.

* Pull the variable "

Baby

" to drop it in the

Numeric Expression:

box, click the (

**

) button and type the power (

3

), then click the (

*

) button.

* Click the

Transform

menu, select

Compute Variable…

* Select the

PDF & Noncentral PDF

from the

Function group:

list box.

* Double-click the

Pdf.Binom

from the

Function and Special Variables:

list box, then the expression will occur in the

Numeric Expression:

box.

* Pull the variable "

Baby

" to drop it instead of

quant

, Type (

27

) instead of

n

, and (

0.6

) instead of the

prob

* Type "

" in the

Target Variable:

box.

* Click the

OK

button.

** Compute the Sum for the new variable "PDF"

Analyze > Descriptive Statistics > Frequencies > Double click the variable "

" > Click the Statistics… button > Select the Sum.

References

http://my.ilstu.edu/mshesso/SPSS/tutorial.html

http://www.slideshare.net/itstraining80/spss-statistics-how-to-use-spss#

http://cstpr.colorado.edu/students/envs_5120/essential_stat_ch9.pdf

http://people.ysu.edu/gchang/SPSSE/SPSSOneSampleTTest.pdf

http://www2.cob.ilstu.edu/longfel/TO%20CALCULATE%20CONFIDENCE%20INTERVALS%20USING%20SPSS.doc

http://www.faculty.sfasu.edu/cobledean/biostatistics/lecture4/pairedsamplehypothesis.pdf

http://en.wikipedia.org/wiki/Confidence_interval

http://www.slideshare.net/shoffma5/t-test-for-two-independent-samples

Finally

And special thanks to ..

I wish I'd had a chance to say a proper thanks to every single person who has ever read this project.

Dr. Lamia Balhadji

Sondos Husamuddin Sagor