Present Remotely
Send the link below via email or IM
CopyPresent to your audience
Start remote presentation Invited audience members will follow you as you navigate and present
 People invited to a presentation do not need a Prezi account
 This link expires 10 minutes after you close the presentation
 A maximum of 30 users can follow your presentation
 Learn more about this feature in our knowledge base article
Stata
No description
by
Tweetsunny zhao
on 16 November 2017Transcript of Stata
Stata
*
comments
display
math_equation_you_want_to_calculate
summarize
var
tabstat
var1 var2
, by
（
group
）
statistics(n mean var semean sd min p25 median p50 p75 max range iqr)
*
discrete data
tab
var1 var2
, row col
*
& = and
count if
conditional1
&
conditional2
*
 = or
list if
conditional1

conditional2
summarize
Tests:
One Sample Test
Two Samples Test
Poisson  Discrete
* probability of observing a value
less than x
from a Normal(mean, sd) distribution
display norm
prob
((
x
mean)/sd)
* probability of observing a value
greater than x
from a Normal(mean, sd) distribution
display 1norm
prob
((
x
 mean)/sd)
* probability of observing a value
between
x
and
y
from a Normal(mean, sd) distribution
display normprob((
y
mean)/sd)

normprob((
x
mean)/sd)
*
value
such that the probability less than that value is
prob
from a Normal(mean, sd) distribution
display
inv
norm(
prob
)*sd + mean
// Here
prob
is a
probability
, a number 0<=
prob
<=1
Normal  Continuous
* prob. of observing
exactly x
successes from n trials with probability of success p
display binomial
p
(n,x,p)
* prob. of observing
x or fewer
successes from n trials with probability of success p
display binomial(n,x,p)
* prob. of observing
x or more
successes from n trials with probability of success p
display binomial
tail
(n,x,p)
* prob. of observing
between
x
and
y
successes from n trials with probability of success p
display binomial(n,
y
,p)

binomial(n,
x1
,p)
* prob. of observing
less than x
or
greater than y
successes from n trials with probability of success p
display binomial(n,
x1
,p)
+
binomialtail(n,
y+1
,p)
Binomial  Discrete
*
ci for
means
(normally distributed)
ci
means
var
, level(95)
*
ci for
means
(poisson distributed)
ci
means
var
, poisson level(95)
*
ci for
proportion
ci
proportions
var
, level(95)
*
ci for
variance
ci
variances
var
, level(95)
*
ci for
standard variance
ci
variances
var
,
sd
level(95)
ci
ci
i
Two Sample
One Sample
Distributions:
Binomial Distribution
Normal Distribution
Poisson Distribution
Student's T Distribution
Chisquared Distribution
Aims:
have
proba
bility
want
value
have
value
want
proba
bility
note: which
side
?
left
or
right
x = random variable
p = probability
x = random variable
p = probability
*
Want
Prob.(p)
that X
<=
x
p
= chi2(df,
x
)
*
Want
Prob.(q)
that X
>=
x
q
= chi2
tail
(df,
x
)
*
Want
Value(x)
that P(X
<=
x
)=
p
x
=
inv
chi2(df,
p
)
*
Want
Value(x)
that P(X
>=
x
)=
q
x
=
inv
chi2
tail
(df,
q
)
Chisquare  Continuous
*
Want
Prob.(p)
that X
<=
t
p = t(df,
t
)
*
Want
Prob.(q)
that X
>=
t
q = t
tail
(df,
t
)
*
Want
Value(t)
that P(X
<=
t
)=
p
t
=
inv
t
(df,
p
)
*
Want
Value(t)
that P(X
>=
t
)=
q
t
=
inv
t
tail
(df,
q
)
T  Continuous
*
Prob. that X
=
k
pk
= poisson
p
(mean, k)
*
Want
Prob.(p)
that X
<=
k
p
= poisson(mean, k)
*
Want
Prob.(q)
that X
>=
k
q
= poisson
tail
(mean, k)
*
Mean that P(X
<=
k)
mean =
inv
poisson(k,
p
)
*
Mean that P(X
>=
k)
mean =
inv
poisson
tail
(k,
q
)
graph & chart
hist
var
, bin(
#
)
stem
var
, lines(
#
)
*
box plot in
separate charts
, in
one chart
graph box
var1
,
by
(
var2
)
graph box
var1
,
over
(
var2
)
graph hbox
var
*
bar chart of
count
or
percentage
graph bar (
count
)
var
, over(
var
)
graph bar (
percent
)
var
, over(
var
)
*
spine plot
scc install spineplot
spineplot
var1 var2
, percent
distributions * (
value
+
proba
bility
)
sunnyzhaosifang@gmail.com
This file will be updated, please visit: http://prezi.com/ei58xetk9gpi/?utm_campaign=share&rc=ex0share&utm_medium=copy
Any Feedback is welcome! :)
LinkedIn: https://www.linkedin.com/in/sunnyzhaosifang
Facebook: https://www.facebook.com/sunnyzhaosifang
Seeking a position as statistician, data analyst, data scientist, or similar.
Sunny Zhao (Sifang)
*
Prob. that X
=
k
pk
= binomial
p
(n, k, pi)
*
Want
Prob.(p)
that X
<=
pi
p
= binomial(n, k,
pi
)
*
Want
Prob.(q)
that X
>=
pi
q
= binomial
tail
(n, k,
pi
)
*
Want
Value(pi)
that P(X
<=
pi
)=
p
pi
=
inv
binomial(n, k,
p
)
*
Want
Value(pi)
that P(X
>=
pi
)=
q
pi
=
inv
binomial
tail
(n, k,
q
)
*
Want
Prob.(p)
that X
<=
x
p
= normal( (
x
mean)/sd )
*
Want
Value(x)
that P(X
<=
x
)=
p
x
=
inv
normal(
p
)*sd + mean
More for Normal
More for Binomial
Confidence Interval (CI)
* (
mean
+
proportion
+
variance
+
standard deviation
)
confidence interval
ci
 compute from dataset
ci
i
 compute from summary statistics

immediate
form of ci
For:
mean
proportions
variance
standard variance
*
ci for
means
(normally distributed)
ci
i
means
#obs #mean #sd
, level(95)
*
ci for
means
(poisson distributed)
ci
i
means
#exposure #events
, poisson level(95)
*
ci for
proportion
ci
i
proportions
#obs #succ
, level(95)
*
ci for
variance
ci
i
variances
#obs #variance
, level(95)
*
ci for
standard variance
ci
i
variances
#obs #variance
,
sd
level(95)
by
group
sort
group
by
group
: ci
means
var
by
group
: ci
proportions
var
by
group
: ci
variance
var
by
group
: ci
variance
var
,
sd
ANOVA, Chi_square
Regression and Correlation
Z procedure for one mean
ztest var==value, sd(sigma)
T procedure for one mean
ttest var==value, sd(sigma)
Decision
:
Pvalue (
0.0979
) > =(0.05) so we fail to reject the null hypothesis
Conclusion
:
We do not have sufficient evidence to conclude that the true mean of (context of the problem) is different from
5
days.
Reject H0 if pvalue < alpha
Do not reject H0 if pvalue > alpha
Reject H0 if pvalue < alpha
Do not reject H0 if pvalue > alpha
Decision
pvalue =
0.6822
> = 0.05 so we fail to reject H0
Conclusion
We do not have sufficient evidence to conclude that the true average (data usage) is
less than
5
GB.
NonParametric
TESTs (one sample + Two Samples)
* (
mean
+
proportion
+
variance
)
Mean
Variance
Proportion
For:
mean
proportion
variance
Full transcript*
comments
display
math_equation_you_want_to_calculate
summarize
var
tabstat
var1 var2
, by
（
group
）
statistics(n mean var semean sd min p25 median p50 p75 max range iqr)
*
discrete data
tab
var1 var2
, row col
*
& = and
count if
conditional1
&
conditional2
*
 = or
list if
conditional1

conditional2
summarize
Tests:
One Sample Test
Two Samples Test
Poisson  Discrete
* probability of observing a value
less than x
from a Normal(mean, sd) distribution
display norm
prob
((
x
mean)/sd)
* probability of observing a value
greater than x
from a Normal(mean, sd) distribution
display 1norm
prob
((
x
 mean)/sd)
* probability of observing a value
between
x
and
y
from a Normal(mean, sd) distribution
display normprob((
y
mean)/sd)

normprob((
x
mean)/sd)
*
value
such that the probability less than that value is
prob
from a Normal(mean, sd) distribution
display
inv
norm(
prob
)*sd + mean
// Here
prob
is a
probability
, a number 0<=
prob
<=1
Normal  Continuous
* prob. of observing
exactly x
successes from n trials with probability of success p
display binomial
p
(n,x,p)
* prob. of observing
x or fewer
successes from n trials with probability of success p
display binomial(n,x,p)
* prob. of observing
x or more
successes from n trials with probability of success p
display binomial
tail
(n,x,p)
* prob. of observing
between
x
and
y
successes from n trials with probability of success p
display binomial(n,
y
,p)

binomial(n,
x1
,p)
* prob. of observing
less than x
or
greater than y
successes from n trials with probability of success p
display binomial(n,
x1
,p)
+
binomialtail(n,
y+1
,p)
Binomial  Discrete
*
ci for
means
(normally distributed)
ci
means
var
, level(95)
*
ci for
means
(poisson distributed)
ci
means
var
, poisson level(95)
*
ci for
proportion
ci
proportions
var
, level(95)
*
ci for
variance
ci
variances
var
, level(95)
*
ci for
standard variance
ci
variances
var
,
sd
level(95)
ci
ci
i
Two Sample
One Sample
Distributions:
Binomial Distribution
Normal Distribution
Poisson Distribution
Student's T Distribution
Chisquared Distribution
Aims:
have
proba
bility
want
value
have
value
want
proba
bility
note: which
side
?
left
or
right
x = random variable
p = probability
x = random variable
p = probability
*
Want
Prob.(p)
that X
<=
x
p
= chi2(df,
x
)
*
Want
Prob.(q)
that X
>=
x
q
= chi2
tail
(df,
x
)
*
Want
Value(x)
that P(X
<=
x
)=
p
x
=
inv
chi2(df,
p
)
*
Want
Value(x)
that P(X
>=
x
)=
q
x
=
inv
chi2
tail
(df,
q
)
Chisquare  Continuous
*
Want
Prob.(p)
that X
<=
t
p = t(df,
t
)
*
Want
Prob.(q)
that X
>=
t
q = t
tail
(df,
t
)
*
Want
Value(t)
that P(X
<=
t
)=
p
t
=
inv
t
(df,
p
)
*
Want
Value(t)
that P(X
>=
t
)=
q
t
=
inv
t
tail
(df,
q
)
T  Continuous
*
Prob. that X
=
k
pk
= poisson
p
(mean, k)
*
Want
Prob.(p)
that X
<=
k
p
= poisson(mean, k)
*
Want
Prob.(q)
that X
>=
k
q
= poisson
tail
(mean, k)
*
Mean that P(X
<=
k)
mean =
inv
poisson(k,
p
)
*
Mean that P(X
>=
k)
mean =
inv
poisson
tail
(k,
q
)
graph & chart
hist
var
, bin(
#
)
stem
var
, lines(
#
)
*
box plot in
separate charts
, in
one chart
graph box
var1
,
by
(
var2
)
graph box
var1
,
over
(
var2
)
graph hbox
var
*
bar chart of
count
or
percentage
graph bar (
count
)
var
, over(
var
)
graph bar (
percent
)
var
, over(
var
)
*
spine plot
scc install spineplot
spineplot
var1 var2
, percent
distributions * (
value
+
proba
bility
)
sunnyzhaosifang@gmail.com
This file will be updated, please visit: http://prezi.com/ei58xetk9gpi/?utm_campaign=share&rc=ex0share&utm_medium=copy
Any Feedback is welcome! :)
LinkedIn: https://www.linkedin.com/in/sunnyzhaosifang
Facebook: https://www.facebook.com/sunnyzhaosifang
Seeking a position as statistician, data analyst, data scientist, or similar.
Sunny Zhao (Sifang)
*
Prob. that X
=
k
pk
= binomial
p
(n, k, pi)
*
Want
Prob.(p)
that X
<=
pi
p
= binomial(n, k,
pi
)
*
Want
Prob.(q)
that X
>=
pi
q
= binomial
tail
(n, k,
pi
)
*
Want
Value(pi)
that P(X
<=
pi
)=
p
pi
=
inv
binomial(n, k,
p
)
*
Want
Value(pi)
that P(X
>=
pi
)=
q
pi
=
inv
binomial
tail
(n, k,
q
)
*
Want
Prob.(p)
that X
<=
x
p
= normal( (
x
mean)/sd )
*
Want
Value(x)
that P(X
<=
x
)=
p
x
=
inv
normal(
p
)*sd + mean
More for Normal
More for Binomial
Confidence Interval (CI)
* (
mean
+
proportion
+
variance
+
standard deviation
)
confidence interval
ci
 compute from dataset
ci
i
 compute from summary statistics

immediate
form of ci
For:
mean
proportions
variance
standard variance
*
ci for
means
(normally distributed)
ci
i
means
#obs #mean #sd
, level(95)
*
ci for
means
(poisson distributed)
ci
i
means
#exposure #events
, poisson level(95)
*
ci for
proportion
ci
i
proportions
#obs #succ
, level(95)
*
ci for
variance
ci
i
variances
#obs #variance
, level(95)
*
ci for
standard variance
ci
i
variances
#obs #variance
,
sd
level(95)
by
group
sort
group
by
group
: ci
means
var
by
group
: ci
proportions
var
by
group
: ci
variance
var
by
group
: ci
variance
var
,
sd
ANOVA, Chi_square
Regression and Correlation
Z procedure for one mean
ztest var==value, sd(sigma)
T procedure for one mean
ttest var==value, sd(sigma)
Decision
:
Pvalue (
0.0979
) > =(0.05) so we fail to reject the null hypothesis
Conclusion
:
We do not have sufficient evidence to conclude that the true mean of (context of the problem) is different from
5
days.
Reject H0 if pvalue < alpha
Do not reject H0 if pvalue > alpha
Reject H0 if pvalue < alpha
Do not reject H0 if pvalue > alpha
Decision
pvalue =
0.6822
> = 0.05 so we fail to reject H0
Conclusion
We do not have sufficient evidence to conclude that the true average (data usage) is
less than
5
GB.
NonParametric
TESTs (one sample + Two Samples)
* (
mean
+
proportion
+
variance
)
Mean
Variance
Proportion
For:
mean
proportion
variance