Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Introduction to Epidemiology

No description

Katherine Bauer

on 24 June 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Introduction to Epidemiology

Epidemiology: Past and Present
Historically, epidemiology was born from the attempt to explain epidemics, which were viewed as punishment by angry gods
As health/disease issues have risen and faded, the focus and methods of epidemiology have evolved to meet the challenges
Measures of Occurrence
Classic Studies
20 years prior to the discovery of the microscope, conducted studies on the cholera outbreaks in London.
Used spot maps to show cases of cholera and believed water was the source of infection.
Removed the handle of the Broad Street pump to control the epidemic.

John Snow
Cross Sectional Design
James Lind
Randomized Clinical Trial Design
Experimental epidemiological study of the etiology and treatment of scurvy.
Concluded that eating citrus treated scurvy and would prevent its occurrence.
Led to the British navy requiring limes or lime juice in seaman’s diet.
Mortality Statistics
# with health indicator
Prevalence favors chronic versus acute diseases
Cannot determine cause and effect
Cumulative Incidence (CI)
aka Incidence Density (ID)
Comparing Incidence
Analytic Epidemiology
The study of the determinants of disease

The investigators do not intervene on study participants’ exposure status
Cohort Study
Case Control Study
The investigators assign an exposure to study participants
Randomized Controlled Trial
Individuals are selected based on their
outcome status
and their
prior exposures
are assessed
Cohort Study
Individuals are selected based on their
followed over time
, and the
incidence of outcomes
is assessed.

Key Features:
Nature/subject/others assign exposure status
No process of assignment to exposure group
Investigators assemble/select/identify cohort and assess exposure status
Cohort members are followed over time to assess incident disease
Those with prevalent disease at the beginning of the study are excluded
Retrospective Cohort
Faster and less expensive
Existing records available to reconstruct the cohort, assess exposure history, and assess outcomes
Prospective Cohort
More control over cohort selection, exposure measurement, follow-up procedures, and outcome measurement
Greater ability to account for other variables (i.e. confounders)
Randomized Controlled Trial (RCT)
Distribution of Health and Disease
Epidemiologists look at the who, where, and when
AKA Person, Place, Time
Understanding person, place, and time helps identify:
Where and for whom public health efforts are needed
How changes in our environment, our communities, our health care, etc. affect people’s health
The causes of disease, and therefore how to prevent them
Causes of Health and Disease
Exposures and Outcomes
Descriptive Epidemiology
Cross-Sectional Study
A study that examines the relationship between exposure and outcome in a population at the same time.
“A snapshot”
Individuals are assessed in regard to their:
Exposure status
Outcome status
Gives you the
of the outcome for the exposed and unexposed
Ecologic Study
Confounding arises when an observed association between an exposure and disease is to some extent due to a third, unaccounted for factor.
This third factor is a confounder.
Not accounting for the third factor distorts the exposure-outcome relationship – introduces bias.
Third Factor
Selection Bias
Systematic Error
The amount of time each person in the study was followed between when they started to be observed and when they got the outcome or were “lost” (died, moved out of the study area, etc.)
The ongoing, systematic collection, analysis, and interpretation of health data.
Data collected as part of surveillance is often publicly available and free
Surveillance efforts are conducted at the local, national, and international levels.
Public health staff call, go to clinics, review records, or examine patients
Health care provider must report
Representative sample of clinics selected for active surveillance
National Notifiable Diseases Surveillance System
Behavioral Risk Factor Surveillance System (BRFSS)
National Health and Nutrition Examination Survey (NHANES)
It is often very useful to be able to compare health outcomes, such as mortality, across multiple populations
Different geographic area (e.g. countries)
Different time periods within the same geographic area
However, demographic differences between populations can make comparing crude prevalence or incidence difficult
To compare populations, we can,
Example: Only look at 0-5 year olds
Or ask, “What would the mortality rate be if the populations had identical age distributions?”
We want an “age-adjusted rate”
Direct Standardization
Use distribution of population from ‘standard’ or ‘reference’ population and apply the mortality rates from your populations of interest to that standard population.
1. Decide on a “standard” population
2. Calculate the stratum-specific weights from your standard population
3. For each population, multiply its stratum-specific rates by the standard populations’ stratum-specific weights
4. Sum the values created by Step 3 to get each country’s standardized summary rate
Indirect Standardization
Rates of one group (larger group) applied to the population distribution of the second group to yield “expected number of deaths”
Calculate the SMR (standardized mortality ratio)
observed deaths
expected deaths
Use SMR to compare the mortality by groups
Methods to Control for Confounding
Design Stage:
Limit study group to narrow range of confounder
Construct study groups that are comparable on levels of the confounder
Randomly assign subjects to exposure group
Analysis Stage:
Use a single "standard" population so confounding factors are equal across exposed and unexposed
Calculate measure of association for each level of the confounder
Statistical Modeling:
Control mathematically for confounders
Consistent errors in the
of a study that results in an incorrect estimate of the causal effect.
A threat to validity which is present in all studies
Can be minimized by the investigator but rarely eliminated
Little can be done to fix or remove error once it has occurred
Recall Bias
In case-control studies when cases remember/report their exposure history differently then cases.
Cases are ruminating on their/their family member’s history.
Cases may have pet theories as to what caused their exposure
Cases may be more or less truthful about their exposure history, especially for sensitive exposures.
Diagnostic/Detection Bias
Individuals with an exposure may be more likely to seek medical attention or be closely followed by their health care provider.
Exposed are then more likely to be accurately classified as diseased as compared to unexposed unless comparable follow-up was ensured.
Interviewer Bias
A systematic difference in soliciting, recording, or interpreting information that occurs in studies using interviews.
Can occur if interviewer is aware of exposure/disease status of participant, or experimental condition that participant has been randomized to.
Masking or blinding helps, but information relayed in the interview may reveal participants’ status.
Standardized questionnaires with close-ended questions help reduce bias.
Careful protocols and consistent observation of interviewers can reduce bias.
Identification of Outcomes
When determining disease or mortality endpoints in cohort studies, study physicians often evaluate medical records, diagnostic results, etc.
If physicians are not blinded, their decision-making may be impaired.
Errors in the
of study participants.`
Consider a 2x2 table - people in 1 of the quadrants have a different probability of being in or staying in your study.
In Case-Control Studies
Control Selection Bias
Different criteria are used to select controls than were used to select cases
Different exclusion criteria (exclusion bias)
Method of control recruitment excluded members of the source population
Self-Selection Bias
Bias resulting from refusal or agreement to participate that is related to both the exposure and disease.
Cardiovascular disease cases with a known family history
Less healthful cancer patients
Low participation is an indicator of self-selection bias
Can be identified by examining characteristics of participants and non-participants
Differential Diagnosis, Selection, or Referral
Individuals with an exposure are more likely to be followed by clinicians, leading to a higher risk of hospitalization.
This will result in an inflated likelihood of exposure among cases in hospital-based case-control studies.
Example: oral contraceptives, bleeding, and early-stage endometrial cancer
Loss to follow-up
Non-differential: Loss on one axis (exposure or disease), unrelated to the other axis
Does not affect relative measures of association, but absolute measures will change
Differential loss: Loss related to both exposure and disease
Can affect both measures of association in either direction
Rarely possible to know for certain which occurred
Healthy Worker Effect
Occurs in occupational exposure studies.
A healthy working population is compared to the general population, which consists of both healthy and ill people.
Healthier individuals may select into jobs if they have to undergo physical exams
Individuals who are exposed and sick leave the occupation
The exposed workers have a lower mortality risk because they are in better health.
Criteria and Prevention
The science of understanding the
of population health so that we may
to prevent disease and promote health.

Person: Listeria Infections in France by Age
Place: HIV Prevalence in Philadelphia by Zip Code
Time: SIDS Rate and Back Sleeping
Key Features:
Focuses on populations rather than individuals
Assumes that disease is not randomly distributed throughout a population
Involves comparison between groups of people

Identifying factors that cause disease is the driving force behind epidemiology.
When we understand what the causes of health or a disease are, we can intervene to promote, prevent, or treat
Much of our work allows us to identify risk or protective factors, which may or may not be causes of health outcomes
Biologic agents (bacteria and viruses)
Chemical agents (pesticides, air pollution)
Behaviors (smoking, drinking)
Social forces (poverty, racism)
Introduced terms epidemiology and endemics
Investigated roles of air, water, place
Studied behavior, diet, and how disease spreads in populations
Hippocrates - The First Epidemiologist (460-377 BC)
First to organize mortality data
Collected data on death from plague
Life tables
Life expectancy (actuarial) methods
John Graunt (1620-1674)
Goldberger was asked by the Surgeon General to investigate pellagra, a disease afflicting many poor in the south
Pellagra is characterized by “the 4 D’s”
Goldberger set out to prove that pellagra was caused, and could be cured by, diet.
Joseph Goldberger
Total population size
Prevalence is a proportion - all individuals in the numerator must also be included in the denominator.
Prevalence must range between 0 and 1, or 0% to 100%
The population may be geographical units, genders, ages, races, etc.
Includes both new cases and existing cases at a point in time
Why do we need to quantify health and disease?
To determine the extent of disease in a community
To identify the causes of health indicators and factors that increase or decrease a person’s risk for an outcome
To evaluate interventions designed to affect health-related states or events

A Mathematical Review
The value obtained by dividing one quantity by another.
A ÷ B

There is not necessarily a specific relationship between the two numbers

There are 14,364 women enrolled at Temple

There are 13,259 men enrolled at Temple

The female to male ratio is:

14,364/13,259 = 1.08 to 1 = 108 to 100
Proportions and Percentages
is a type of ratio where the numerator is part of the denominator

Of the 27,623 students at Temple, 14,364 are women. What proportion of Temple students are women?


is a proportion that has been multiplied by 100

0.52 * 100 = 52%

52% of Temple students are women.
A rate is a proportion where a unit of time forms part of the denominator.

Speed limit=55 miles per hour
20 pregnancies per 100 women per year
5 cases of depression per 1,000 people per year
In contrast to clinical medicine, epidemiology is the science of understanding
population health
The first step of epidemiology is to identify the population of interest.
Populations can be defined by:
Geographic space and time
(e.g. city, country, year of birth)
Characteristic, event, or exposure
(e.g. newborns, menopausal women, experienced natural disaster, smokers, Vietnam veterans)
Considerations of the specific study
(e.g. healthy individuals, study adherers)
Populations can be

Eligibility criteria do not allow movement in or out
Example: Church picnic attendees
Membership is defined by a changeable state or condition
Movement occurs in and/or out of the population
Examples: Philadelphia residents, Temple students
You can think of entry and leaving the population separately - one end may be dynamic and the other stationary

Number of persons with an attribute without reference of the number examined
How many people wear a bike helmet when riding a bike?
How many people got sick after eating at an on-campus restaurant yesterday?
Counts may be sufficient for some purposes:
To identify the beginning of an outbreak or emergence of a serious health condition
To determine the need for health care resources
To compare the number of cases in the same population from year to year, assuming steady state population
The Role of Time
Point prevalence
: the numerator counts the total number of “cases” at a given point in time
“Do you currently have the flu?”

Period prevalence
: the numerator counts the total number of “cases” at all points during a given time period or interval
“Have you had the flu over the past year?”

Lifetime prevalence
: the numerator counts the total number of “cases” over a lifetime
“Have you ever had the flu?”
# of new cases
# of persons in population
at risk
at the start of observation period
"At risk” means those who have the potential to get the outcome.
People are not at risk if,
They already have the outcome
They physically cannot get the disease
They are immune to a disease
They cannot meet the definition of a disease
CI assumes:
Stationary population
Entire population is observed for the whole time period
The longer the time period, the more this assumption can be a problem
Remain at risk for entire the time period
# of new cases
at risk
(during a specified time period)
Relationship between Incidence and Prevalence
Prevalence is a mixture of two types of individuals:
People with disease before the period of observation
People who became sick during the period of observation

Therefore, prevalence is affected by the
how long the disease lasts
Mortality Rate
The incidence of death
Major uses:
To compare the risk of dying from a disease in two populations
To determine whether treatment of a disease has become more successful over time
A measure of disease severity
A surrogate measure for disease incidence for highly lethal diseases
Number of deaths during a specified time period
Number living at start or mid-point of time period
Key points:
Must specify the time period over which deaths are counted
Can be a “specific” rate-remember to consider who should be in the denominator
Age-specific (mortality among 60-year olds)
Cause-specific (lung-cancer mortality)
People in denominator must have the potential to be in the numerator
Non-specific mortality rate = “crude” mortality rate
In 2009, the population of Philadelphia was 1,514,694.
14,133 deaths occurred among Philadelphia residents in 2009.
Mortality rate for Philadelphia residents in 2009:

= 0.0093
=933 per 100,000 residents

933 per 100,000 residents Philadelphia residents died in 2009.
Case-Fatality Rate
The proportion of people with a disease who die from that disease
# of deaths caused by a condition
# of persons with the condition
Numerator and denominator should cover same time period (often 1 year)
Must specify time period over which deaths are counted
Serves as a measure of the severity of a disease
Within 2004, 31,860 new cases of pancreatic cancer were diagnosed.
Within the same year, 31,270 patients with pancreatic cancer died

Case-fatality Rate = # of deaths caused by a condition/# of persons with the condition

1-year Case-fatality Rate = 31, 270/31,860

1 –year Case-fatality Rate = 91.8%

In 2004, 91.8% of individuals with pancreatic cancer died within 1 year.
Proportionate Mortality Rate
Of all deaths, how many were from a specific condition?
# of deaths (in a year) caused by a condition
# of deaths (in a year) from all causes
Often expressed as a percentage
Proportional mortality ratio for 10 major causes of death in the United States, 2005.
Person: Listeria Infections in France by Age
Health and disease are not equally or randomly distributed throughout populations
The prevalence of disease varies by age, geography, time, race/ethnicity, gender, etc.
To understand that variation we look at who, where, and when (aka: person, place, and time)
This is
descriptive epidemiology
When we understand person, place, and time, we can move on to asking “Why?” – this is
analytic epidemiology

: Who is affected
: Where the condition occurs
: When and over what period the condition occurred

Basis for planning, provision, and evaluation of health services
Target populations can be identified
Program priorities set
Emerging problems identified
Identification of questions to be addressed by analytic epidemiology
“Hypothesis generating”
“Person” can refer to any personal characteristics that create “like” groups of people:
Marital Status
Place of Birth
Socio-economic Status
Health, disease, and death vary by geographic areas
Differences by place can reflect differences in access to health care, environmental factors, social factors, etc.
Examples of place:
Across states or cities in US
Urban vs. Suburban vs. Rural
Within cities or states
Understanding the changes in health and disease over time allows us to identify whether changes in our society, for better or worse, matter.
Time is critical to identifying specific diseases, the sources of diseases, potential exposures, etc.
Age is one of the most important factors to consider when describing the occurrence of health or disease.
The causes of illness and death differ dramatically by stage of life
Differences in health and disease by sex may reflect genetic/biological differences, behavioral differences, or societal differences.
Sex vs. Gender
Many health indicators and disease statistics are reported separately by sex (“sex stratified”) because differences by sex are often important.
Race is primarily a social and cultural, not a biological, construct.
The US Census classifies race into five major categories:
American Indian and Alaska Native
Native Hawaiian and other Pacific Islander
Most common ethnic division is Hispanic/non-Hispanic
Individuals most often self-identify their race, although newer methods that can identify genetic lineage are being used.
Overweight among US Children (2-19) by Race and Sex
Childhood Overweight and Obesity by Sex
Prevalence of Overweight and Obesity among Children by Age
Socioeconomic Status (SES)
Socio-economic Status (SES) is “a descriptive term for a person’s position in society”
Ideally a composite measure including:
Income level
Education level
Type of occupation (pink collar, blue collar, etc.)
Other measures of SES may include whether a child receives free or reduced-price lunch, whether a family received food stamps or other benefits, etc.
Prevalence of Obesity in US Adults by Annual Household Income
Cyclic Trends
Cyclic (Seasonal) Trends: Increases and decreases in the frequency of disease over a period of several years or within a year.
Often related to environmental factors (eg. weather) and the natural course of the disease.
Influenza peaks in February
Allergies and asthma may peak in spring
Heart attacks occur most frequently on weekends and Mondays
Incidence of West Nile Virus by Week, 1999-2008
Natural disaster
Smoking policy
Cholesterol level
IV drug use
Unintended pregnancy
Quality of life
Contingency Table
Instead of exposure and outcome information being collected from individuals, ecologic studies use information about the exposure and outcome at the group level.
Examples: nations, states, census tracts, counties.
May be used when individual-level measurements are not available
May be used when exposures truly are group-level, such as laws, pollution, etc.
Group Level
Instead of asking each individual:
What is your GPA?
Do you drink alcohol?
We obtain group level data from 20 universities:
Average GPA across all students
Percent of student population reporting they drink alcohol
And then we compare across universities
Ecologic Fallacy
“An erroneous inference that may occur because an association observed between variables on an aggregate level does not necessarily represent or reflect the association that exists at an individual level”
Relatively inexpensive
Relatively fast
Can provide a first look at an exposure-disease relationship
Cannot determine the temporal relationship between exposure and outcome
Exposures may truly be population-level
Policies, laws, health care systems
Can be easy and inexpensive
Can generate new hypotheses
Ecologic fallacy: When a relationship observed at a population level is assumed to occur among individuals
Potential and alternative explanations cannot be accounted for or tested
An event, condition, or characteristic that preceded disease onset and that had the event, condition, or characteristic been different the disease would not have occurred at all or would not have occurred until some later time.

Correlation vs. Causation
“There are causal and non-causal relationships, the art of causal inference is a method for distinguishing the two.”
An association may be observed, but that does not mean the relationship is causal
Smokers are more likely to carry lighters - do lighters cause cancer?
Roosters crow before sunrise - did the rooster cause the sun to rise?
Runners are likely to be thin - does running cause weight loss?
We are limited in our ability to identify causes because of the ways we must conduct some studies, the fact that exposures tend to cluster together, and errors in the way we conduct studies.
The Counterfactual
Observing the counterfactual is the only way to identify a cause in certainty:
What would have happened to those exposed,
but for
the exposure?
Would an individual have asthma if the only thing different in their life was that they did not live in sub-standard housing?
Would a baby have gotten leukemia if the only thing different in their life is that their mother didn’t use a specific chemical when they were in utero?
The majority of epidemiological studies are observational.
We observe different groups of people – those who are exposed and are unexposed.
We do not observe the same person in the exposed and unexposed condition.
We cannot separate out the exposure, from the variety of other reasons that an individual may have been exposed.
Randomized control trials provide stronger evidence for causality because we assign exposure.
Sufficient-Component Cause Model
Many health outcomes have multiple contributing determinants that act together to bring on a change in health state.
An elderly woman breaks her hip from falling down icy steps - what caused her hip fracture?
Causal Pies
Health outcomes are the result of a combination of causes, and often, several different combinations of causes can each be responsible for the health outcome.

If we consider the cause of a disease to be a pie:
Component causes:
Pieces of the pie - a minimal set of conditions that come together to cause the health outcome to occur
Necessary causes:
Without this piece of the pie, the health outcome cannot occur
Sufficient cause:
A complete pie with all the necessary and component causes.
Important notes:
Component causes can act far apart in time.
Blocking the action of any component cause prevents the completion of the sufficient cause and therefore prevents the disease by that pathway.
The prevalence of component causes directly affects the incidence of disease.
Hill's Tenants
In 1965, A. Bradford Hill published “The Environment and Disease: Association or Causation” describing 9 aspects that should be considered in assessing evidence for causality.
Statistical methods cannot establish proof of a causal relationship in an association. The
causal significance of an association is a matter of judgment which goes beyond any statement of statistical probability.

To judge or evaluate the causal significance of the association a number of criteria must be utilized.
Temporal Relationship
The exposure to the factor occurs before the outcome develops
Prospective cohort studies are in general a stronger than cross-sectional, retrospective cohort, or case-control studies for establishing temporal relationships
Biologic Plausibility
A biologically plausible mechanism exists to explain why the relationship between exposure and outcome would be expected to occur
Replication of the Findings
Evidence for causality is stronger if the association is replicated in different populations, by different researchers, at different times, using different study designs.
Summary of Studies on the Relationship between Breastfeeding and Later Obesity
Alternative Explanations (Confounding)
Alternative explanations have been explored
distortion of the true relationship between exposure and outcome by a “third factor” which is associated with both the exposure and the outcome
Maternal Obesity
Maternal Education
Household Income
Dose-Response Relationship
The level of the outcome varies in response to the level of exposure
Association between Breastfeeding and Obesity at 4 Years-of-Age in Whites and African Americans
Strength of Association
The stronger the relationship, the less likely there are clear alternative explanations that haven't been accounted for.
Barriers to Causality
Information Bias
Systematic flaws in obtaining information about exposures and/or outcomes. The information gathered is incorrect.
Occurs after the study has started
Pertains to how the data are collected
Results in consistent incorrect classification of participants as either exposed/exposed or diseased/non-diseased
Occurs from information bias
Is the consistent incorrect assignment of subjects to exposure and/or outcome status
Some of your exposed study participants are classified as unexposed (or vice versa)
Some of your participants with your outcome are classified as not having your outcome (or vice versa)
Random Error
Every measure has some degree of error or imprecision.
Error can be consistent (bias) or random
Random error results in ‘noise’, makes it more difficult to see a ‘signal’
Sampling Variability
If we could measure the entire population any observed differences would be true
But, for nearly every study we select a sample from a population
Statistical inference is the process of making generalizations from a sample to the source population
We must account for the possibility that the result we observe is due to the sample we picked and not a truth of the population
Exposures can be known as “interventions” or “treatments”
“Intervention condition” refers to which exposure has been assigned:
Intervention vs. control
Intervention vs. usual care
Intervention 1 vs. Intervention 2 vs. control
RCTs are considered the “gold standard” – the best choice for testing the effect of an exposure
Experimental Units
Individual people are randomized to an intervention condition.
Useful when an intervention works directly on a person and when assignment of an intervention to the person won’t affect others in the study.
Individuals are randomized to either take an active drug or a placebo.
An individual randomized to receive a new screening test for prostate cancer, or receive the standard screening test.
Groups (Group Randomized Trial)
Communities of people are randomized to an intervention condition
Necessary when intervention can only be applied to a group
Useful when intervention with one person may affect those around them (contamination)
Often we want group interventions to work through diffusion
Worksites are assigned to implement smoke-free workplace policies.
Schools are assigned to implement nutrition education curriculum.
Families are assigned to receive parenting counseling.
Exposures are assigned by chance
All randomized units have the
chance of being assigned to any of the exposure groups
Impossible to know in advance to which group a unit will be assigned
Randomization balances differences across groups (as long as you have enough units)
The exposed and unexposed groups are similar on all attributes that affect the occurrence of disease (confounders)
Confounding is eliminated; the control group serves as the counterfactual to the intervention group on average
Since subjects’ assignment to a treatment group is determined after enrolling in the study, the potential for selection bias is greatly reduced.
Information bias is reduced when:
Exposure status is assigned, not dependent on memory or records.
Participants or investigators are blinded, or unaware of what exposure they’re getting.
Example: In a drug trial, if the person collecting data knows who is on the active drug, they may look for “success” of the drug more than they would among people taking the placebo
Execution of Randomization
Table of random numbers
Computer-generated random numbers
Envelope system

Stratified/Matched Randomization: If there is a concern that there are not enough numbers to successfully balance specific confounders, we can divide our subjects into groups based on the confounder first, and then randomized within those groups.
Blocking: Randomization occurs in groups to ensure a mix of treatment groups with regard to time of enrollment.
Blinding and Compliance
What is it?
Concealing the treatment group status of study subjects
Who is blinded?
Single blinded—subjects
Placebo: a biologically inert intervention which is used to induce any psychological effects of the test intervention
Double blinded—subjects and data collectors
Blinding is often not possible in behavioral or policy-level interventions
Full compliance by subjects is rare
Intervention condition subjects often:
Do not take a drug as prescribed
Do not adhere to the diet that’s assigned
Do not implement policies that are expected of them
Control subjects may also “drop in” to the intervention
Adopt a new diet that is similar to the intervention
Consume more of a vitamin because more foods become fortified
Some non-compliance can be controlled, some cannot.
Methods to improve compliance
Simplify regimen
Enroll motivated and knowledgeable people
Present realistic picture of tasks required in study
Obtained detailed medical history to exclude those for whom compliance will be difficult
Maintain frequent contact with participants
Conduct run-in period
Intention-to-treat approach:
analyze data according to group to which subjects were randomized, regardless of the actual treatment
Actual treatment approach:
analyze data according to treatment actually received rather than the group to which the subjects were randomized
Breaks random assignment
Introduces bias and confounding
Special Designs
Planned cross-over (vs. parallel):
After being randomized to an intervention condition and observed for a specific period, subjects are switched to the other condition.
Each subject serves as their own control.
Must have a washout period between treatments.
Factorial Design
Can address two or more interventions in a single trial.
Could be used for two different exposures which you believe will affect different outcomes and not interact.
Or, allows for examination of interactions between exposures if you expect different outcomes from the doubly-exposed group.
Efficacy vs. Effectiveness Trials
Efficacy trial:
Determines if the treatment improves outcomes under ideal circumstances
Goal is often to enhance internal validity
Effectiveness trial:
Determines if the treatment improves outcomes under non-ideal circumstances (less controlled, “routine” or “real-world” circumstances)
Goal is often to enhance external validity
Difference between two is largely based on:
Subjects selected for study
How “intensely” the intervention is implemented
Unique Issues, Strengths, and Limitations of RCTs
Unique Issues:
Exposure must be a modifiable factor
Intervention being tested should be sufficiently different from the control condition to justify a trial
Legitimate uncertainty exists about the effect of alternative interventions on the outcome (equipoise)
The benefits of the new intervention should outweigh the potential risks
Protects against confounding
Observed factors
Unobserved factors
Protects against bias
Cannot always randomly assign a treatment (exposure)
Not suitable for rare outcomes
Only as strong as the protocol
Faithful implementation (adherence)
Why choose an observational (vs. experimental) study design?
Assigning exposure is:
Outcomes would take too long to occur
Outcomes are too rare
Not enough prior knowledge to justify
When is a cohort study conducted?
Already some evidence about the role of specific risk or protective factors in the etiology of the outcome
Interval between the exposure and the development of the outcomes is short
Exposure is rare
Outcome is relatively common among exposed
Exposures and Cohorts
Examples of Exposures:
“Assigned” by individual:
Drinking or not drinking alcohol
Amount of regular alcohol consumption
Having had gastric bypass surgery or not
“Assigned” by subject, manufacturer, legislation, etc.
Having a car with an airbag
Living in a walkable neighborhood
“Assigned” by nature
Having a specific genotype or not
Experiencing a natural disaster
Cohort Selection
Cohort studies enroll disease-free participants who are exposed and unexposed, or gradations of exposed, to the outcome(s) of interest.
Unexposed group = referent group, comparison group

A cohort study can recruit one population group/cohort and identify varying levels of exposure within the one group
Examples: Project EAT, Population exposed to 9/11
Or, two cohorts can be selected – an exposed and an unexposed
Example: National Children’s Study
Cohort Studies and the Counterfactual
When individuals choose their exposure, or are “assigned” them non-randomly, the exposed and unexposed groups differ in ways other than just the exposure.
There are characteristics that differ across the exposed and unexposed groups that
the associations between exposures and outcomes
Confounding is a "mixing of effects" - when exposures are correlated, it can be difficult to disentangle which, if any, of the exposures is a cause of the outcome.
Example: smoking, drinking alcohol, and esophageal cancer
In the future we will learn specifically how to identify and address confounding
Maternal Smoking
Infant Birth Weight
Neighborhood Air Pollution
Childhood Asthma
Worksite Chemical Exposure
Lung Cancer
Disease-free participants are grouped based on past or current exposure
Participants are followed into the future
Retrospective/Non-concurrent/historical cohort:
Exposures and outcomes have already occurred when study begins
Exposures and outcomes are assessed from records
Past exposures and outcomes assessed
Cohort continued to follow into the future
For all, participants are selected based on exposure status, not outcome status.
Assembling the Cohort and Assessing Exposures
It is important to assemble a cohort that is:
At risk for the disease
For example, for a study on ovarian cancer you would not include men or women who have had their ovaries removed
Does not have history of the disease (especially if your outcome is a chronic disease)
Can be assessed via surveys, interviews, or existing records
Does not have undetected disease
Can be assessed via lab tests, exams, etc.
Assessing Exposures:
Be specific about exposure of interest
This is informed by your understanding of the disease mechanisms
Latent period – period between exposure and development of disease
Exposures can be assessed dynamically as time progresses
Lots of data, lots of complications
Objective measures of exposure are best
Self-reported data can also be used
Different accuracy of reporting depending on exposure
Follow-up and Measuring Outcomes
Goals of Follow-up:
Complete follow-up of all of your cohort members
Complete measurement of outcomes
Standardized measurement of outcomes
Possible methods of tracking:
Collect extensive contact information at recruitment
Use government records
Use credit information
Internet searches
National Death Index
Population registries
Measuring outcomes:
Assessment of outcomes needs to be the same for exposed and unexposed
Outcomes of cohort studies can be:
Incident disease
First or reoccurrence
Change in measure of health risk
Direct methods to assess outcomes:
Questionnaires and interviews
Biological specimens and lab tests
Indirect methods to assess outcomes:
Surveillance systems
Physician/clinic/insurance records
Strengths and Limitations
Temporality of exposure and outcomes is clear
Good for common outcomes that occur within a relatively short time frame
Useful for rare exposures
Can study multiple outcomes
May be able to study multiple exposures
Variations in exposure over time can be taken into account
Often need large samples
Potential for loss to follow-up over time
Statistical power
Potential for confounding when exposure is not randomly assigned
Inefficient for rare or delayed outcomes
Can be a high burden on participants
Need to ensure sufficient variation in exposure status
Why conduct a case-control study?
The outcome of interest is rare
The time lag between exposure and disease is long
Underlying population is hard to track
Cases are commonly selected from:
Relatively easy and inexpensive
Reduces potential that cases in hospital/being treated are different than those in the community
Time consuming and expensive
Disease registries/Death certificates
Cohort studies
Cases are the same as those that would be included in the cohort study
Case definition is based on symptoms, physical examinations, and results of diagnostic tests.
Strict definitions of a case are needed
This ensures that our cases are as homogeneous as possible
Example: Congenital malformations
Incident cases are preferred so we learn about the development of the disease
Using prevalent cases will identify factors related to the duration of a disease/survival
If a disease leads to death quickly, those who survive and are selected as prevalent cases may be different because they survived
Source cohort:
The population that gave rise to the cases included in the study
Imagine you have started with a cohort, your cases are those who developed disease, your controls those who do not.
Your controls are individuals who would have been cases if they got the outcome.
“the would criterion”
Generally from the geographic area that gave rise to the cases
Provides greater assurance that the controls are comparable to the cases with respect to demographic and other important characteristics
Difficult to identify
Are not interested in participating in research
Recall of prior exposure differs from cases
Individuals who are receiving medical attention for things other than the outcome of interest
Easy to identify
Good participation rates
Likely to be from source population if referral pattern is similar
Recall of prior exposures may be similar to cases
Must ensure that controls’ medical condition is not associated with the same exposure as the study outcome
Generally a sicker population, this may bias results
Friends and family of cases
Likely to share socio-demographic characteristics
Bias arises if their exposures are too similar
Individuals who have died
May be appropriate control for dead cases
Exposure ascertainment similar to dead cases
Not necessarily representative of source cohort
More likely to have behavioral risk factors
Cohort study - Nested case-control study
Controls provide a less expensive and faster way of determining the exposure experience in the population that generated the cases
Measured exposure when it happened
Exposure Assessment
Accurately measuring exposures is one of the most difficult aspects of case-control studies.
By design, exposure has happened in the past
People are terrible about remembering their past!
People often remember their past differently if they became ill.
How can we measure exposures?
Surveys of study participants
Prone to recall bias
Surveys of surviving relatives
Medical records
Biological samples if exposure is certain to precede disease
Other records of exposure status
Recall Bias
Cases may be more or less likely to accurately recall an exposure than controls
If you are diagnosed with a serious illness, you’re going to spend a lot of time thinking about what might have caused it.
Controls have not spent this time reflecting on their past, so may less accurately remember exposures.
On the other hand, cases may be incorrectly remembering their exposure history, perhaps due to a “pet theory” about the cause of their disease
Reducing recall bias:
Find an objective measure of exposure history
Medical records, employment records, etc.
Biological samples
Select a control group that has another type of disease
These people are also likely thinking about their past exposures
Change your study design
Try to test your hypothesis within a cohort study where exposure information was already collected.
Advantages and Limitations
Usually faster and less expensive than cohort studies
Optimal for rare diseases
Optimal for diseases with long delay between exposure and disease
Useful when little is known about the disease because multiple exposures can be assessed
Useful when underlying population is dynamic
Limited to one disease at a time
Inefficient for rare exposures
Temporal relationship between exposure and disease may be unclear
Participants’ recall may affect study results
Controls are difficult to identify
Cannot measure disease frequency since you’re in control of number of subjects with disease that you include
Measuring specific characteristics of individuals, usually for the purposes of classifying individuals into two categories—abnormal (”positive”) and normal (“negative”)
Everyday examples:
TSA screening at airports
State trooper “radar” screening on highways
Screening for glass bottles at stadiums

In Public Health and Medicine:
A test performed in the absence of symptoms, in apparently well individuals, to sort out those with a high probability of having or developing disease from those with a low probability of having or developing a disease
Goal: To reduce or prevent the occurrence of disability or death
Pap tests
PKU in newborns
Gestational diabetes test in pregnant women
Diagnostic tests
are performed when symptoms are present and designed to diagnose the condition causing the symptoms

Criteria for screening viability:
Disease is serious with severe consequences, high frequency, or both
Effective treatment which reduces morbidity and/or mortality is available
Alternatively, effective ways of preventing spread to other people should be known and available
Disease has a reasonably long and detectable preclinical phase
Persons who are screened have access to follow-up facilities for diagnosis and treatment
Treatment is known to be more effective if initiated early—in the preclinical stage—rather than in the symptomatic stage
Primary prevention
Maintenance of health by individual or community efforts so that disease process does not begin
Screening goal is to prevent disease from occurring
By identifying risk factors for a disease, its occurrence can be prevented by intervening on modifiable risk factors
Cholesterol and blood pressure screening to prevent cardiovascular disease
Secondary prevention:
Reduction in the expression and severity of a clinical disease
Does not prevent disease, but reduces impact or improves survival of a disease that is already present
Identifying a disease at an early stage so that the course (natural history) of the disease can be altered through intervention
Pap test to detect early cervical cancer
Mammogram to detect early breast cancer
Fasting blood sugar to detect early diabetes
Reliability and Validity
the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world
the consistency of a set of measurements or of a measuring instrument
Measures can be reliable but not valid
Get the same response over and over, but it is not the true answer
Measures cannot be valid but not reliable
If your measure fluctuates greatly, it is not assessing the true answer well
Every measurement has some imprecision or unreliability
Goal is to isolate between-subject variability (“real” differences) and eliminate less interesting sources of variability that contribute “noise” to our measurements
If variability due to extraneous sources is greater than between-subject variability, it may be impossible to detect true differences
Sources of Variability
Subject related
Normal biological variation by circumstance of testing
Environment related
Variation due to the settings in which the measurements are taken
Instrument related
Variation in how the instrument performs on the same sample
Observer related
Variation between screeners in applying or interpreting results
Intra-observer variability:
differences in repeated measurement by the same screener
Inter-observer variability:
differences in measurement between screeners
Increasing Reliability
Observer training
Improve technique of measurement, ensure consistency across observers
Quality assurance
Remove human judgment if possible
Average multiple measurements over a short period
Consistency and standardization of equipment, labs, etc.
The extent to which a test identifies the truth and distinguishes between persons with and without the disease
Validity is assessed by determining whether the test has high
Sensitivity: The ability of the test to correctly identify who has the disease
Specificity: The ability of the test to correctly identify who does not have the disease
To assess the validity of a screening test there should be some confirmatory diagnostic test (“gold standard”)—an additional test that allows us to know the true disease status
In the absence of a gold standard, longitudinal studies could be used to identify those who develop the disease over time
Methods to Assess Validity
The ability of the test to correctly identify who has the disease
Sensitivity = True positives/All those with disease
The ability of the test to correctly identify who does not have the disease
Specificity = True negatives/All those without disease
Assessing Validity
The probability that the test will be positive when the disease is present (PID “Positive In Disease”)
The proportion of diseased people who are correctly identified as “positive” (abnormal) by the test
Ability of the test to identify those with the disease
The probability that the test will be negative when the disease is absent (NIH “Negative In Health”)
The proportion of non-diseased people who are correctly identified as “negative” (normal) by the test
Ability of the test to identify those without the disease
Screening Cut Points
Optimal tests are both:
Sensitive: maximize true positives and minimize false negatives
Specific: maximize true negatives and minimize false positives
Must weigh the consequences of misclassification errors
By maximizing sensitivity at the expense of specificity, you end up with more false positives
People without disease who are mistakenly classified as “abnormal”
They are labeled and put through unnecessary cost and anxiety of more testing
Greater burden on the health care system
By maximizing specificity at the expensive of sensitivity, you end up with more false negatives
People with disease who are not detected
They cannot benefit when disease is severe and/or early treatment helps
Predictive Values
Positive Predictive Value: Proportion of individuals with a positive test who have preclinical disease (top half of table)
Negative Predictive Value: Proportion of individuals without preclinical disease who test negative (bottom half of table)
Relationship between Predictive Value and Disease Prevalence
Higher disease prevalence in the population will increase the PPV of a test
Therefore, a screening program will be most productive and efficient in a high-risk population
Evaluating Effectiveness of Screening Programs
Formal evaluation of screening programs is often not undertaken when the intervention is simple and highly effective
But evaluation is important to determine whether people who are screened derive greater benefit from being screened than people who are not screened
Exposure (being screened)
Outcome (length or quality of life)
Observational study designs
Cohort: compare outcomes in screened (exposed) and non-screened (unexposed)
Case-control: compare screening (exposure) history in cases and controls
Experimental study design
RCT: randomly assign to screening or no screening
Lead-Time Bias
Screening programs may detect disease earlier, but the time of death is the same as what would have occurred without screening
Overestimates the benefits of screening
Survival time may seem longer for those screened because diagnosis was made earlier, not because you changed the progression of the disease
If screening is associated with improved survival, the screened group should survive longer than the control plus the lead time
Length Bias
Screening may identify individuals with less aggressive cases of disease
These cases may have a longer pre-clinical stage (and thus often longer clinical phase) and are more likely to be detected by screening
Would appear that screening improved outcomes, when really the diseases detected would have had a “better” natural history anyway
Pre-symptomatic stage
Symptomatic Stage
Poor prognosis (short survival)
Good prognosis (long survival)
Volunteer Bias
When comparing the outcomes of those who participate in screening programs, those who get screenings are different in many ways than those who do not:
Insurance coverage
Comfort with medical system
More health conscious
Higher risk for disease
These differences may also be related to likelihood of getting disease or mortality
Cumulative Incidence Ratio (CIR)
Null value is 1
Values > 1 indicate that the exposure is a risk factor
Values < 1 indicate that the exposure is a protective factor
Incidence Rate Ratio (IRR)
Null value is 1
Values > 1 indicate that the exposure is a risk factor
Values < 1 indicate that the exposure is a protective factor
Incidence Rate among exposed = A/PTe
Incidence Rate among non-exposed = C/PTu
Communicating Relative Risk
A 10 year study examining associations between red meat intake and cardiovascular disease (CVD) observed an IRR of 3.0.
“Individuals who consumed red meat were 3 times as
to have incident CVD compared to individuals who did not consume red meat over 10 years.”
“Individuals who consumed red meat were 200% more
to have incident CVD compared to individuals who did not consume red meat over 10 years.”
A 2 year study examining associations between pertussis vaccination and pertussis incidence observed a CIR of 0.15.
When the RR is < 1, subtract it from 1 to express preventive association
“Over 2 years, individuals who received a pertussis vaccination were 85% ((1-0.15)*100) less
to develop pertussis than individuals who were not vaccinated.
** Can also switch exposed group to get a RR > 1
Attributable Risk
How many cases of the outcome are attributable to the exposure?
Attributable risk (AR)/Risk difference (RD)
Incidence in exposed group - Incidence in unexposed group
Null value = 0
Retains same units as the incidence measure used
In words:
Among every 10,000 women who use HRT, there are 42 excess cases of breast cancer (over 5 years) compared to women who do not use HRT.
Causal interpretation:
If women taking HRT stopped taking HRT, 42 cases of breast cancer per 10,000 individuals could be prevented over 5 years.
Other Absolute Measures
Attributable Risk Percent (AR%)
We can not only calculate the number of events prevented by removing an exposure, but the proportion of cases that would be prevented.
AR% =
(CIexposed - CIunexposed)
x 100
What proportion of total incidence is attributable to the exposure?
Attributable risk percent (AR%)
Proportion of excess disease among the exposed
Population attributable risk (PAR)
Amount of excess disease among the total population
Population attributable risk percent (PAR%)
Proportion of excess disease among the total population
Population Attributable Risk (PAR)
We can not only calculate the number of cases attributable to the exposure among the exposed, but we can also look among the entire population (those exposed and unexposed).
2 Methods:
If we know the incidence in our entire population:
Incidence in total population - Incidence among unexposed
Otherwise, we need to know the proportion of total population that are exposed (prevalence of exposure in population)

PAR = Proportion of population exposed (P1) x (CIexposed – CIunexposed))
PAR = (CIexposed x P1) + (CIunexposed x Proportion unexposed (P0))
Population Attributable Risk Percent (PAR%)
In the same way that we can expand attributable risk from only those in the exposed group to a whole population (exposed + unexposed), we can do the same with attributable risk percent.
Again, we need to know either:
CI of the outcome in entire population
Proportion exposed in entire population
PAR% =
(CIpopulation - CIunexposed)
If you know CI in entire population:
If you know the proportion of the population exposed:
PAR% =
1+P1 (CIR-1)
Population attributable risk will never be more than the attributable risk among the exposed
Null value = 0
Has same units as incidence measure used
If you believe your association is causal, population attributable risk represents the amount of proportion of disease in your population due to your exposure, or the amount or proportion of disease that could be eliminated by taking away your exposure.
Relative or Absolute?
Relative measures are much more commonly reported, mostly because statistical software calculates them more readily
Relative measures are dependent on the incidence of disease in the unexposed
Absolute measures will provide the same strength of association, regardless of incidence in the unexposed
Odds Ratios
In case-control studies we intentionally selected people with and without disease
Therefore, we cannot know the true incidence (probability) of disease among the exposed and unexposed
We cannot calculate a risk ratio if we can’t calculate incidence.
We can go “backwards” from the disease and can calculate the odds that cases were exposed and the odds that controls were exposed
Odds and Odds Ratios (OR)
Calculating odds is an alternative to calculating probability
Probability = Likelihood = Risk = Incidence
Probability =
Chance of outcome occurring
Total number of possible outcomes
Odds =
Probability of outcome occurring
Probability of outcome not occurring
Odds that a case was exposed = A/C
Odds that a control was exposed = B/D

Our Odds Ratio (OR) is the ratio of the odds that cases were exposed to the odds that controls were exposed

OR = (A/C)/(B/D) = AD/BC
If OR = 1
The odds of exposure is equal in cases and controls (no association)
If OR > 1
The odds of exposure is greater in cases than in controls (positive association, possibly causal)
If OR < 1
The odds of exposure is less in cases than in controls (negative association, possibly protective)
Odds and Odds Ratios are hard to understand!
What we truly care about is risk – What is the among of risk associated with a particular exposure?
OR can approximate RRs if the disease is rare (<10% incidence)
This is called the “rare disease assumption”
Prevalence & Incidence
Measures of Association
Criteria for Confounding:

The third factor must be a risk factor for the outcome.
The third factor must be associated with the exposure.
Third factor must not result from the exposure (cannot be on the causal pathway)
The third factor is a “common cause” of the exposure and outcome.
Third Factor
The third factor is associated with the exposure, is a risk factor for the outcome, and is not on the causal pathway.
Confounding is a different kind of bias
Selection bias and information bias are due to errors in the way we conduct studies.
Confounding is not the fault of the investigator; confounding reflects unevenly distributed characteristics in a population
Unlike systematic error, confounding can be addressed after the study has been conducted.
Not taking a confounder into account can:
Overestimate the strength of the true association
Produce an OR or RR that is further from the null than it should be
Underestimate the strength of the true association
Produces an OR or RR that is closer to the null than it should be
Confounding is not an “all or none” phenomenon
Hypothesis Testing
Type I Error
Type I error (false positive): occurs if an investigator rejects a null hypothesis that is actually true in the source population
To avoid Type I Error:
Set the standard level of significance
Corresponds to a confidence level of 95%
The true result will reside 95 of 100 times somewhere between the 95% confidence intervals
Probability of a Type I error (reporting an association when it does not exist) = 5%; Incorrectly rejecting null hypothesis
Type II Error
Type II error (false negative): occurs if an investigator fails to reject a null hypothesis that is not true in the source population
To avoid Type II Error:
Type II error or beta refers to the probability of missing an association when it really exists
Power = 1 - beta
80% power to detect a difference if it truly exists = beta or Type II error of 20%
Increase sample size, increase the power of the study and decreases the Type II error
Null hypothesis
states there is no association between predictor and outcome variable (H0)
Tested directly, accept or reject the null
Alternative hypothesis
states there is an association (HA)
Accepted by default
refers to the ability of a test to correctly reject the null hypothesis when the alternative hypothesis is true
When an association is present the power of a study will increase as the study size increases
Health and disease do not occur at random.
Public Health Programming
Health Policy Development and Evaluation
Disease Outbreak Investigation and Intervention
Clinical Care
Disease Screening
Innovations in Epidemiologic Studies
Modern Epidemiology
*The term rate is often incorrectly used to describe ratios and proportions
The occurrence of
new cases
of diseases that developed over a specified
time period
in a
New cases:
People who were disease-free at the start of your time watching them.
Time period:
Individuals are monitored over a specific period of time
at risk
of getting the disease
At Risk
Streptomycin Tuberculosis Trial (1940s)
Refining methods used to evaluate the effectiveness of disease treatments
Controlled clinical trial to study the use of streptomycin to treat pulmonary tuberculosis (TB)
107 patients with acute progressive bilateral pulmonary TB
Randomly assigned
55 received streptomycin and bed rest
52 received bed rest alone (control)
Results: 7% of streptomycin patients died and 27% of control patients died
Use of randomization
Patient eligibility restrictions
Precise and objective data collection methods
Ethical considerations
Doll and Hill's Studies (1950s)
Case-control studies:
709 subjects who had lung cancer (cases)
709 subjects who had diseases other than lung cancer (controls)
Asked about smoking behaviors
Results: More lung cancer patients than non-cancer patients were smokers
They considered a wide range of problems in the design and analysis of their study
Prospective Study:
Invited 59,600 members of the British Medical Association to complete a survey about smoking habits
Obtained info on causes of death among respondents
Results: Death rates were 2 to 3 times as high among cigarette smokers as among lifelong smokers
Many subjects so it had adequate "power" to examine numerous health effects
Long follow up period (50 years)
Framingham Study (1947-present)
To identify ways of identifying latent cardiovascular disease (CVD) among healthy volunteers
Determine the causes of CVD
Now investigates a wide variety of diseases
Initially enrolled 5,000 healthy adult residents of Framingham, MA
Framingham was selected because:
Stable town population
Investigators could identify a sufficient number of people with and without risk factors of CVD
Local doctors were eager to recruit study subjects
Gathers information on:
Medical history
Smoking habits
Alcohol use
Physical activity
Dietary intake
Emotional stress
Height and Weight
Blood pressure
Vital signs and symptoms
Cholesterol levels
Glucose levels
Bone mineral density
Genetic characteristics
Epidemiology has expanded tremendously in size, scope, and influence
Sub-specialties have been established that are defined either by:
Examples: reproductive, cancer, cardiovascular, infectious disease, psychiatric
Examples: environmental, behavioral, nutritional, pharmacoepidemiology
Examples: pediatric, geriatric
Novel levels at which to examine health determinants:
Genetic and Molecular
Epidemiologic Study Designs
Studies of laws, policies, environmental factors, etc. have to be studied at a group level because there is not between individual differences.
Age of driver’s license and mortality due to car crashes
Smoking bans in bars and heart attack incidence
School district-wide food policies and obesity prevalence
Differences between people that could affect comparison of conditions are eliminated.
Carryover: Interventions could have an effect for longer than they are actively being used.
Order of interventions may affect patients’ reactions.
Cannot be conducted for interventions that cannot be “washed out.”
General Population
Other Controls
Recruit 100 infants with an ear infection (cases)
50 are bottle fed, 50 are breast fed
Recruit 100 infants without an ear infection (controls)
25 are bottle fed, 75 are breast fed
Is there an association between bottle feeding (vs. breast) and ear infections?
What happens if we recruit 1,000 controls vs. 100, and they are bottle vs. breast-fed in the same proportions (25% vs. 75%)?
Incidence of infection among bottle-fed = 50/300 = .17 = 17 per 100
Incidence of infection among breast-fed = 50/800 = .06 = 6 per 100

Disease frequencies are real phenomenon and should not be influenced by study design or decisions regarding number of controls selected for our study!
If we could calculate the incidence of disease,
Incidence of ear infection among bottle-fed = 50/75 = .67 = 67 per 100
Incidence of ear infection among breast-fed = 50/125 = .40 = 40 per 100
What would have happened to those exposed,
but for
the exposure?
Factors that affect prevalence:
Disease duration
Treatment effectiveness
Increased incidence (new cases)
Migration of cases into or out of population
Migration of non-cases into or out of population
Changes in detection

What happens when you don’t have a stationary population?
The assumptions of who remains at risk, who gets your disease, etc. are wrong and result in incorrect conclusions.
Need a calculation that takes into account how much time at risk each person contributed.

In 2009, 22 million Americans contracted the swine flu.

4,000 people died from the swine flu.

What is the case-fatality rate?

Case-fatality Rate = # of deaths caused by a condition/# of persons with the condition

= .0002
= .02%

Of those who contracted swine flu, .02% died from swine flu

Confounding occurs because the unexposed group differs from the exposed group in one or more ways that affect the incidence of the outcome.
The unexposed group is not the same as the exposed group “but for” the exposure.
The unexposed group cannot serve as the counterfactual to the exposed group.
Confounding and the Counterfactual:

Problems with having just counts:
Limit comparisons across populations
Provide limited information on the breadth of the problem
An outcome by any other name...

may also be referred to as diseases, illnesses, disease states, or health indicators
may also be referred to as predictors, determinants, causes, risk factors, or protective factors
Types of Variables
Can take on one of a fixed number of possible values.

Exposure/outcome is present or absent
Examples: HIV, cancer
Graded values
Examples: self-rated health, servings of fruit/day

Within limits, any value is possible
Examples: blood pressure, viral load, gestational age

Types of Exposures
Innate exposures:
Factors that individuals are born with:
Biological sex
DNA sequence
Acute exposures:
relatively short duration, do not recur
Prenatal chemical exposure
Chronic exposures:
Tend not to change over time
Time varying/dynamic exposures:
vary across the life course
Physical activity

Third Factor
This is mediation!
Does reading the newspaper cause cancer?
In Cohort Studies
Aim for high participation rates
Be cognizant about how your eligibility criteria or method of recruitment might exclude people from your source population.
Follow study participants well
Use same selection criteria for cases and controls/exposed and unexposed

Reducing Selection Bias
A study design where the investigator manipulates the exposure to determine the effects of the exposure on the outcome.
Changes among the exposed group are compared to changes among a comparison group
The decision of which of group will be assigned is determined by random chance.
"Parental Weight Status and Offspring Cardiovascular Disease Risks: a Cross-Sectional Study of Chinese Children."
"Associations between organochlorine pesticides and cognition in U.S. elders: National Health and Nutrition Examination Survey 1999-2002."
"Factors associated with non-use of condoms in an online community of frequent travellers."
The burden of disease at a given time

Current, 120,659 of the 1,553,000 residents of Philadelphia have diabetes.

Prevalence =
# with health indicator/total population

Prevalence = 120,659/1,553,000

Prevalence = 0.078

Prevalence = 7.8%

Among the 5,600 women receiving gynecological care through Temple, 345 received a new diagnosis of syphilis in 2012.

CI = # of new cases/# of persons at risk at the start of the observation period
CI = 345/5,600
CI = 0.062
Ci = 6.2% or 62 per 1,000 women
5,600 women receiving gynecological care through Temple were assessed for STDs every 3 months for 1 year. Over that year, the women contributed 60,200 person-months, 325 received a new diagnosis of syphilis during the year.

IR = # of new cases/person-time at risk over time period
IR = 325/60,200 person-months
IR = 0.0054 per person-month

IR = 0.064 per person-year
IR = 64 per 1000 persons per year
IR = 64 per 100 persons per 10 years
Time Trends
Age effects: differences in health indicators due to aging.
Cohort effects: differences in health indicators due to the time or generation they were born.
Period effects: differences in health indicators due to factors that vary over time.
Does breastfeeding vs. formula feeding occur before onset of higher weight/rapid weight gain?
Many studies cross-sectional although ask about prior feeding practices.
Having a smaller baby at birth may lead to feeding choice AND predict future weight.
Rapid weight gain may lead to women supplementing with formula due to perception of baby’s hunger.

Breastfed babies metabolize less energy and protein
Greater insulin response in formula-fed babies, which may stimulate fat deposition
Infant regulates intake
No cues from bottle or “schedules”
Flavor exposure

Meta analysis findings:
A mean difference in BMI of breastfed vs. formula fed babies was non-significant at -.04 BMI units.
Any breastfeeding:
Relative risk of obesity for BF = 0.87 (13% increase in obesity risk)

“It appears doubtful whether there will ever be a study conducted that has the appropriate methodology and statistical power to yield substantial and undisputable evidence for or against a protective effect of breastfeeding against childhood overweight.”
Beyerlein and Von Kries, 2011
Full transcript