Tiedot reseptilääkkeiden ostosta voidaan saada Kelan rekistereistä.

Tieto kuolinpäivästä ja syystä Tilastokeskuksen kuolemansyytilastosta

Sairaalahoidot THL:sta

HUSLAB:in ja muiden HUS:n yksiköiden tietoja voidaan käyttää kliinisen lääke-epidemiologian tutkimuksessa.

Uuden lain mahdollistamat biopankit tuovat lisää dataa tutkimuksen käyttöön

19.9. 2013

Jari Haukka, Hjelt-instituutti

jari.haukka@helsinki.fi

Tarkastelen esimerkkejä tutkimuksista maailmalla

**Lääke-epidemiologia.**

Mitä, miksi ja miten?

Mitä, miksi ja miten?

Pharmacoepidemiology: Pharmacoepidemiology is the study of the use of and the effects of drugs in large numbers of people.

Kliinisen lääke-epidemiologian voidaan ajatella tutkivan lääkkeiden vaikutsta suuressa joukossa potilaita.

Voidaan ajatella että potilaista on saatavilla runsaasti mittaustietoa, koska he ainakin jossain vaiheessa ovat olleet esim. erityissairaanhoidossa.

Epidemiologisen tutkimusasetelman yhdistäminen yksityiskohtaiseen kliiniseen tietoon luo mahdollisuudet hyvään tutkimukseen.

**Lääke-epidemiologia**

Mitä (kliininen) lääke-epidemiologia tutkii?

Valtakunnalliset ja HUS:n tietovarastot kliinisen lääke-epidemiologian tiedon lähteinä

Esimerkki joka valottaa lääke-epidemiologiseen tutkimukseen ja tulosten tulkintaan liittyviä ongelmia

**Mitä käsitellään**

Vain pieni osa potilaista mukana tässä analyysissa

Yli 5% todennäköisyys saada tutkittu hoito

**Esimerkki**

**Valtakunnalliset ja HUS:n tietovarastot**

Goal: To Obtain Valid and Precise Information on Association Between Exposure and Disease Using a Minimum of Resources

Research question involves a

prevention, treatment, or

causal factor.

Moderate or large effect expected.

Trial not ethical or feasible.

Trial too expensive.

Research question involves a prevention or treatment.

Small effect expected.

Ethical and feasible.

Money is available.

Little known about disease.

Evaluate many exposures.

Disease is rare.

Disease has long induction and latent period.

Exposure data are expensive.

Underlying population is dynamic.

Little known about exposure.

Evaluate many effects of an

exposure.

Exposure is rare

Underlying population is fixed.

Disease has short induction and latent period.

Current exposure.

Want high-quality data.

Disease has long induction and latent period.

Historical exposure.

Want to save time and money.

**Select study design**

EXPERIMENTAL

OBSERVATIONAL

CASE–CONTROL

COHORT

RETROSPECTIVE

PROSPECTIVE

Examine rates of disease in relation to a population-level factor.

Population-level factors include summaries of individual population members, environmental measures, and global measures.

Study groups are usually identified by place, time, or a combination of the two.

Limitations include the ecological fallacy and lack of information on important

variables.

Advantages include low cost, wide range of exposure levels, and the abilityto examine contextual effects on health.

Cross-sectional

Examine association at a single point in time, and so measure exposure prevalence in relation to disease prevalence.

Cannot infer temporal sequence between exposure and disease if exposure is a changeable characteristic.

Other limitations may include preponderance of prevalent cases of long duration and healthy worker survivor effect.

Advantages include generalizability and low cost.

Ecologic studies

A classical ecologic study examines the rates of disease in relation to a factor described on a population level. Thus, “the units of analysis are populations or groups of people rather than individuals.”

The lack of individual-level information leads to a limitation of ecologic studies known as the “ecological fallacy” or “ecological bias.” The ecological fallacy means that “an association observed between variables on an aggregate level does not necessarily represent the association that exists at the individual level.”

In other words, one cannot necessarily infer the same relationship from the group level to the individual level.

Ecological fallacy

We study the following paper:

Ahern, Thomas P., Lars Pedersen, Maja Tarp, Deirdre P. Cronin-Fenton, Jens Peter Garne, Rebecca A. Silliman, Henrik Toft Sørensen, and Timothy L. Lash. 2011. “Statin Prescriptions and Breast Cancer Recurrence Risk: A Danish Nationwide Prospective Cohort Study.” Journal of the National Cancer Institute. doi:10.1093/jnci/djr291. http://jnci.oxfordjournals.org/content/early/2011/08/01/jnci.djr291.abstract.

Please answer the following questions:

How was the study population defined

How was the statin exposure defined (yes/no, amount)

Was there any dose-response checked

Which limitations of ther study your think are the most important

Case Study: Cohort Study

We study the following paper:

Please answer the following questions:

How was the study population defined?

Did choice of model have any effect on results?

If there differences, how would you interpret them?

Which research question is, in you opinion,: most relevant (p 267, left column):

1) Estimate the average treatment effect in a population whose distribution of risk factors is equal to that for the t-PA-treated patients only .

2) Estimates the average effect of treatment in the entire study population, that is, for patients who were and were not treated with t-PA.

Case Study: Cohort Study

Case Study:

Case-control study

Kurth, Tobias, Alexander M. Walker, Robert J. Glynn, K. Arnold Chan, J. Michael Gaziano, Klaus Berger, and James M. Robins. 2006. “Results of Multivariable Logistic Regression, Propensity Matching, Propensity Adjustment, and Propensity-based Weighting under Conditions of Nonuniform Effect.” American Journal of Epidemiology 163 (3) (February 1): 262 -270. doi:10.1093/aje/kwj047.

**Epidemiological Study Designs**

CW

**DAG**

Directed Acyclical Graph

overeating

1: 1

2: 3

3: 25

4: 543

5: 29281

6: 3781503

7: ~1 000 000 000

How many DAGs?

number of edges

Why to use DAG:

DAG terms

d-separation or

directed global markov condition

Observational (Markov) equivalent

A path p is said to be d-separated (=blocked) by a set of nodes Z if and only if:

p contains a chain “i->m->j” or a fork “i<-m->j” such that the middle node “m” is in Z

or

p contains “collider” “i->m<-j” such that the middle node m is not in Z and such that no descendant of m is in Z

A set Z is said to d-separate X from Y if Z blocks every path from a node in X to a node in Y

d-separation when E

E=Ø: no

E={S}:(B,L),(B,X)

E={L}:(X,S),(X,B),(X,C)

E={L,B}:(C,S),(C,X),(X,S)

S=smoker, L=lung cancer

B=bronchitis, X=pos. X-ray

C= cough

S

B

L

C

X

Markov equivalence

Many Bayesian networks may represent the same statements of conditional independence. They are statistically undistinguishable called Markov equivalent. All equivalent networks share the same underlying undirected graph (called the skeleton) but may differ in the direction of edges that are not part of a collider (v-structure)

Observational eq. DAGs

A set of variables Z satisfies the back door criterion relative to an ordered pair of var. s (Xi,Xj) in a DAG G if

No node in Z:is a descendant of Xi and

Z blocks every path between Xi and Xj that contains an arrow into Xj

If Z is back-door to pair (X,Y) then causal effect of X on Y is:

Z is enough to control for confounding variables (Greenland et al. 1999)

Back door

E

D

G

"node" or "edge"

"link" or "arch"

Describe conditional indepence between variables

Only relevant varsiables in DAG

Qualitative description

G

E

D

G

D

E

G

E

D

Conditional indendence in directed graphs. The three archetypal situations in the definition of d-separation. In the chain and the fork, conditioning on the middle node makes the others independent. In a collider, X and Z are marginally independent, but become dependent once Y is known.

Markowetz and Spang BMC Bioinformatics 2007 8(Suppl 6):S5 doi:10.1186/1471-2105-8-S6-S5

Conditional indendence in directed graphs. The three archetypal situations in the definition of d-separation. In the chain and the fork, conditioning on the middle node makes the others independent. In a collider, X and Z are marginally independent, but become dependent once Y is known.

Markowetz and Spang BMC Bioinformatics 2007 8(Suppl 6):S5 doi:10.1186/1471-2105-8-S6-S5

diabetes

obesity

DAGs could be use in model selection

Which variables should be taken into account when controlling for confounding

M- and Z- bias

Three basic structures

Miguel A. Hernan, Sonia Hernandez-Diaz, and James M. Robins, “A Structural Approach to Selection Bias,” Epidemiology 15, no. 5 (2004): 615-625.

Florian Markowetz and Rainer Spang, “Inferring cellular networks - a review,” BMC Bioinformatics 8, no. 6 (2007): S5.

Jenni Ilomäki et al., “Relationship between alcohol consumption and myocardial infarction among ageing men using a marginal structural model,” The European Journal of Public Health (March 11, 2011), http://eurpub.oxfordjournals.org/content/early/2011/03/11/eurpub.ckr013.abstract.

Now, draw DAG for your own study!

Onyebuchi A Arah, “The role of causal reasoning in understanding Simpson’s paradox, Lord’s paradox, and the suppression effect: covariate selection in the analysis of observational studies,” Emerging Themes in Epidemiology 5 (2008): 5.

BW

BP

CW

BW

BP

Directed acyclic graph (DAG) showing that birth

weight (BW) has a direct effect as well as an indirect

effect via current weight (CW) on blood pressure

(BP).

DAG showing a scenario where birth weight (BW)

has a causal effect on and shares a common cause –

current weight (CW) – with blood pressure (BP). That

is, the relationship between BW and BP is confounded by CW.

#

# Example of a descriptive DAG

#

# WHO data of health measurements in Europe

# (http://www.who.int/whosis/en/)

# GNIncome: Gross national income per capita (PPP international $)

# Phys: Physicians density (per 10 000 population)

# Mort.CHD: Age-standardized mortality rate for cardiovascular diseases (per 100 000 population)

# Healt.GDP: Total expenditure on health as percentage of gross domestic product

# Hosp.Beds: Hospital beds (per 10 000 population)

# HALE.all: Healthy life expectancy (HALE) at birth (years) both sexes

head(tmp.data,3)

> head(tmp.data,3)

GNIncome Phys Mort.CHD Healt.GDP Hosp.Beds HALE.all

Albania 6000 12 537 6.2 30 61

Armenia 4950 37 498 4.7 44 61

Austria 36040 37 204 9.9 76 71

>

library(bnlearn)

# Two different algorithms

tmp.ex1<- mmhc(tmp.data,perturb=500,restart=20)

tmp.ex2<- rsmax2(tmp.data)

Two algoritms - two structures

tmp.ex1<- mmhc(tmp.data,perturb=500,restart=20)

tmp.ex2<- rsmax2(tmp.data)

How to connect data and DAG?

One option is to use different algorithms for "learning" the structure of Bayesian networks from the data

Some algorithms implemented in R (http://www.rproject.org), e.g. packages "bnlearn" and "deal"

Marco Scutari, “Learning Bayesian Networks with the bnlearn R Package,” Journal of Statistical Software 35, no. 3 (2010): 1–22.

Reference

DAG from (Ilomäki et al. 2011)

Presented by

Jari Haukka, PhD

Sr. lecturer

Hjelt Institute

University of Helsinki

jari.haukka@helsinki.fi