Stata
by
Stata
Stata
*
comments
display
math_equation_you_want_to_calculate
summarize
var
tabstat
var1 var2
, by
（
group
）
statistics(n mean var semean sd min p25 median p50 p75 max range iqr)
*
discrete data
tab
var1 var2
, row col
*
& = and
count if
conditional1
&
conditional2
*
 = or
list if
conditional1

conditional2
summarize
Tests:
One Sample Test
Two Samples Test
Poisson  Discrete
* probability of observing a value
less than x
from a Normal(mean, sd) distribution
display norm
prob
((
x
mean)/sd)
* probability of observing a value
greater than x
from a Normal(mean, sd) distribution
display 1norm
prob
((
x
 mean)/sd)
* probability of observing a value
between
x
and
y
from a Normal(mean, sd) distribution
display normprob((
y
mean)/sd)

normprob((
x
mean)/sd)
*
value
such that the probability less than that value is
prob
from a Normal(mean, sd) distribution
display
inv
norm(
prob
)*sd + mean
// Here
prob
is a
probability
, a number 0<=
prob
<=1
Normal  Continuous
* prob. of observing
exactly x
successes from n trials with probability of success p
display binomial
p
(n,x,p)
* prob. of observing
x or fewer
successes from n trials with probability of success p
display binomial(n,x,p)
* prob. of observing
x or more
successes from n trials with probability of success p
display binomial
tail
(n,x,p)
* prob. of observing
between
x
and
y
successes from n trials with probability of success p
display binomial(n,
y
,p)

binomial(n,
x1
,p)
* prob. of observing
less than x
or
greater than y
successes from n trials with probability of success p
display binomial(n,
x1
,p)
+
binomialtail(n,
y+1
,p)
Binomial  Discrete
*
ci for
means
(normally distributed)
ci
means
var
, level(95)
*
ci for
means
(poisson distributed)
ci
means
var
, poisson level(95)
*
ci for
proportion
ci
proportions
var
, level(95)
*
ci for
variance
ci
variances
var
, level(95)
*
ci for
standard variance
ci
variances
var
,
sd
level(95)
ci
ci
i
Two Sample
One Sample
Distributions:
Binomial Distribution
Normal Distribution
Poisson Distribution
Student's T Distribution
Chisquared Distribution
Aims:
have
proba
bility
want
value
have
value
want
proba
bility
note: which
side
?
left
or
right
x = random variable
p = probability
x = random variable
p = probability
*
Want
Prob.(p)
that X
<=
x
p
= chi2(df,
x
)
*
Want
Prob.(q)
that X
>=
x
q
= chi2
tail
(df,
x
)
*
Want
Value(x)
that P(X
<=
x
)=
p
x
=
inv
chi2(df,
p
)
*
Want
Value(x)
that P(X
>=
x
)=
q
x
=
inv
chi2
tail
(df,
q
)
Chisquare  Continuous
*
Want
Prob.(p)
that X
<=
t
p = t(df,
t
)
*
Want
Prob.(q)
that X
>=
t
q = t
tail
(df,
t
)
*
Want
Value(t)
that P(X
<=
t
)=
p
t
=
inv
t
(df,
p
)
*
Want
Value(t)
that P(X
>=
t
)=
q
t
=
inv
t
tail
(df,
q
)
T  Continuous
*
Prob. that X
=
k
pk
= poisson
p
(mean, k)
*
Want
Prob.(p)
that X
<=
k
p
= poisson(mean, k)
*
Want
Prob.(q)
that X
>=
k
q
= poisson
tail
(mean, k)
*
Mean that P(X
<=
k)
mean =
inv
poisson(k,
p
)
*
Mean that P(X
>=
k)
mean =
inv
poisson
tail
(k,
q
)
graph & chart
hist
var
, bin(
#
)
stem
var
, lines(
#
)
*
box plot in
separate charts
, in
one chart
graph box
var1
,
by
(
var2
)
graph box
var1
,
over
(
var2
)
graph hbox
var
*
bar chart of
count
or
percentage
graph bar (
count
)
var
, over(
var
)
graph bar (
percent
)
var
, over(
var
)
*
spine plot
scc install spineplot
spineplot
var1 var2
, percent
distributions * (
value
+
proba
bility
)
*
Prob. that X
=
k
pk
= binomial
p
(n, k, pi)
*
Want
Prob.(p)
that X
<=
pi
p
= binomial(n, k,
pi
)
*
Want
Prob.(q)
that X
>=
pi
q
= binomial
tail
(n, k,
pi
)
*
Want
Value(pi)
that P(X
<=
pi
)=
p
pi
=
inv
binomial(n, k,
p
)
*
Want
Value(pi)
that P(X
>=
pi
)=
q
pi
=
inv
binomial
tail
(n, k,
q
)
*
Want
Prob.(p)
that X
<=
x
p
= normal( (
x
mean)/sd )
*
Want
Value(x)
that P(X
<=
x
)=
p
x
=
inv
normal(
p
)*sd + mean
More for Normal
More for Binomial
Confidence Interval (CI)
* (
mean
+
proportion
+
variance
+
standard deviation
)
confidence interval
ci
 compute from dataset
ci
i
 compute from summary statistics

immediate
form of ci
For:
mean
proportions
variance
standard variance
*
ci for
means
(normally distributed)
ci
i
means
#obs #mean #sd
, level(95)
*
ci for
means
(poisson distributed)
ci
i
means
#exposure #events
, poisson level(95)
*
ci for
proportion
ci
i
proportions
#obs #succ
, level(95)
*
ci for
variance
ci
i
variances
#obs #variance
, level(95)
*
ci for
standard variance
ci
i
variances
#obs #variance
,
sd
level(95)
by
group
sort
group
by
group
: ci
means
var
by
group
: ci
proportions
var
by
group
: ci
variance
var
by
group
: ci
variance
var
,
sd
ANOVA, Chi_square
Regression and Correlation
Z procedure for one mean
ztest var==value, sd(sigma)
T procedure for one mean
ttest var==value, sd(sigma)
Decision
:
Pvalue (
0.0979
) > =(0.05) so we fail to reject the null hypothesis
Conclusion
:
We do not have sufficient evidence to conclude that the true mean of (context of the problem) is different from
5
days.
Reject H0 if pvalue < alpha
Do not reject H0 if pvalue > alpha
Reject H0 if pvalue < alpha
Do not reject H0 if pvalue > alpha
Decision
pvalue =
0.6822
> = 0.05 so we fail to reject H0
Conclusion
We do not have sufficient evidence to conclude that the true average (data usage) is
less than
5
GB.
NonParametric
TESTs (one sample + Two Samples)
* (
mean
+
proportion
+
variance
)
Mean
Variance
Proportion
For:
mean
proportion
variance
