### Present Remotely

Send the link below via email or IM

CopyPresent to your audience

Start remote presentation- Invited audience members
**will follow you**as you navigate and present - People invited to a presentation
**do not need a Prezi account** - This link expires
**10 minutes**after you close the presentation - A maximum of
**30 users**can follow your presentation - Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

### Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.

You can change this under Settings & Account at any time.

# Survival Analysis

1st seminar

by

Tweet## Yoon Mi Park

on 5 September 2014#### Transcript of Survival Analysis

Survival Analysis Life Span Length of stay in a hospital Duration of a strike Quantity for recall of particular automobile component Expire date for frozen product What is the Similarity of these 5 words? Time A collection of Statistical procedures for data analysis for which the outcome variable of interest is Time until an Event occurs. Time = Survival Time (T>=0) Years

Months

Weeks

Days , Age Start to follow-up Study End Event occurs Event = Failure Death

Disease Incidence

Strike

Recall Negative individual experience Characteristic of Survival Analysis data Individuals do not all enter the study at the same time

when the study ends, some individuals still haven't had the event yet

Other individuals drop out or get lost in the middle of the study, and all we know about them is the last time they were still "free" of the event Censoring data Censoring occurs when we have some information about individual survival time, but we don't know the survival time exactly. Right-Censoring - Study termination but no event

- lost to follow-up

- withdraws Left-Censoring When a individual's true survival time is less than or equal to that individual's observed time. Survival Function Hazard Function T=survival time, (T>=0)

t=specific value for T Conditional probability

per unit time Conditional Instantaneous Failure Rate = Relationship between S(T) and h(T) Survival Data Layout for Computation Remission time for two groups of leukemia patients Q. How can we interpret these data set? Descriptive measures 1. Average survival time * 2. Average hazard rate As a result, Treatment group's survival times are longer than that of placebo's. Placebo group's hazard rate is bigger than treatment group's hazard rate. But, They do not compare the two group at the different points in time of follow-up. Kaplan-Meier Survival Curve Type of data * 1. If there is no censoring data 2. If there is censoring data General KM Formula Q. How to evaluate statistically equivalency of KM curves? Testing Methods * Mantel-Haenzel Logrank test

Peto & Peto's version of the Logrank test

Gehan's Generalized Wilcoxen

Peto & Peto's and Prentice's generalized wilcoxon

Tarone-Ware and Fleming-Harrington classes

Cox's F-Test (non-parametric version) Log Rank Test Log rank test is distributed as χ2 with 1 df. under the null hypothesis that the survival function for two groups are the same. Expected Cell Counts * Observed - Expected * Log-Rank Statistic * Pearson's Chi-square test is approximately equivalent to the Mantel-Haenszel Log Rank test * Confounding and Interaction Effect confounding effect * A perceived relationship between an independent variable and a dependent variable that has been misestimated due to account for a confound variable Interaction effect * Two independent variables interact if the effect of one of the variables differs depending on the level of the other variable. Q. How can we evaluate the possible interaction effect? T= weeks until going out of remission

X1= group status

X2= log WBC(white blood cell count)

X3= X1*X2 Evaluate the possible interaction effect of log WBC on group status R code Test for significance of interaction term P-value = 0.510 > 0.05 can't reject null hypothesis, there is no significant interaction effect LR(Likelihood Ratio) Test Wald Test * Q. How to assess the effect of treatment status adjusting for log WBC using model 2? Three statistical objectives Test for the significance Wald test statistics = 0.00127 (< 0.05) * Point estimate of the effect Estimated hazard ratio : Hazard for the placebo group is 3.648 times higher than the hazard for the treatment group Confidence interval for Hazard ratio Q. how to assess confounding effect? model1's HR is higher than model2's HR 95% CI is narrower for model2 than model1 model1 model2 width= 8.607 width= 6.748 > Cox PH Model baseline hazard function

(unspecified function) exponential part - baseline hazard depends on t, but not on the X's

- exponential expression involves the X's, but does not involve t Time-independent variables Hazard Ratio Hazard ratio(HR) is defined as the hazard for one individual divided by the hazard for a different individual. PH assumption * HR is constant over time then, proportionally constant over time Adjusted Survival Curves using the Cox PH model cox model hazard function cox model survival function estimated survival function general formula for adjusted survival curve for all covariates in the model Q. What if proportional hazard fails? - Do a stratified analysis

- Include a time-varying covariate to allow changing hazard ratios over time

- Include interaction with time Stratified Analysis Suppose :

- X1 satisfied proportionality assumption

- proportionality simply doesn't hold b/w various levels of a second variable X2 Stratified model If X2 is discrete( with levels) and there is enough data,

fit the stratified model well. Models with Time-dependent Interactions Exponential part includes time variable, t is positive, then the hazard ratio would be increasing over time

is negative, decreasing over time R code R code Modeling of Survival data Proportional Hazards (PH) models * suppose X1= 1 for treated subjects and X2=0 for untreated subjects Accelerated Failure Time (AFT) models * this is a semi-parametric model where w is an "error distribution". typically we place a parametric assumption on w:

- exponential, weibull, gamma

- lognormal X is a vector of covariates of interest.

X may include:

- continuous factors (age, blood pressure)

- discrete factors ( gender, marital status)

- possible interactions ( age by sex interaction) This is most common model used for survival data. - Even though is unspecified, we can estimate the Beta's - It is closely approximate the results for the correct parametric model Q. Why do we call it Proportional Hazard? If we think of as the hazard rate for the treated group, as the hazard for control then: This implies that the ratio of two hazards is a constant, which does NOT depend on time, t. In other words, the hazards of the two groups remain proportional over time. is referred to as the hazard ratio Q. How do we estimate the model parameters? Likelihood Estimation for the PH model * Cox (1972) derived the likelihood, and generized it for censoring, using the idea of a Partial Likelihood Partial Likelihood log-partial likelihood is partial likelihood score equations A sum of "observed" - "expected" values : The maximum partial likelihood estimators can be founded by solving Based on standard likelihood theory, Variance of can be obtained by inverting the second derivative of the partial likelihood, From the Modifications to the likelihood to adjust for ties Cox's (1972) modification: discrete method Peto-Brewlow method Efron's (1977) method Exact method (Kalbfleisch and Prentice) Exact marginal method Breslow method * suppose individuals 1 and 2 fail from {1,2,3,4} at time let be the hazard ratio for individual i (compared to baseline) Evaluating PH Assumption Graphical Approach Goodness-of fit Test Using Time-dependent Variables Graphical Approach Comparing estimated -ln(-ln) survival curves over

different categories of variable * Comparing observed with predicted survival curves * If the distance between two curves is constant

which means two curves are approximate parallel,

then we can say that the PH assumption is satisfied. If observed and predicted curves are close,

then we can conclude the PH assumption is satisfied. log-log survival curve -Positive or negative, either of which is acceptable

-Step function

-Range Using Cox PH model Observed plot = KM survival curves

Expected plot = Cox PH model Goodness-of-fit Test (1) obtain Schoenfeld residuals

(2) Rank failure times

(3) Test correlation of residuals to ranked failure time R code P-value < 0.05 reject Ho

P-value > 0.05 don't reject Ho don't reject Ho,

it means two groups satisfy PH assumption Assessing the PH Assumption using

Time-Dependent Variables Extended Cox model: under the Ho, Wald statistic or LR statistic are distributed by Chi-square distribution with df. 1 If Ho is rejected, PH Assumption violated

If Ho is not rejected, PH Assumption satisfied (Assuming PH OK)

Full transcriptMonths

Weeks

Days , Age Start to follow-up Study End Event occurs Event = Failure Death

Disease Incidence

Strike

Recall Negative individual experience Characteristic of Survival Analysis data Individuals do not all enter the study at the same time

when the study ends, some individuals still haven't had the event yet

Other individuals drop out or get lost in the middle of the study, and all we know about them is the last time they were still "free" of the event Censoring data Censoring occurs when we have some information about individual survival time, but we don't know the survival time exactly. Right-Censoring - Study termination but no event

- lost to follow-up

- withdraws Left-Censoring When a individual's true survival time is less than or equal to that individual's observed time. Survival Function Hazard Function T=survival time, (T>=0)

t=specific value for T Conditional probability

per unit time Conditional Instantaneous Failure Rate = Relationship between S(T) and h(T) Survival Data Layout for Computation Remission time for two groups of leukemia patients Q. How can we interpret these data set? Descriptive measures 1. Average survival time * 2. Average hazard rate As a result, Treatment group's survival times are longer than that of placebo's. Placebo group's hazard rate is bigger than treatment group's hazard rate. But, They do not compare the two group at the different points in time of follow-up. Kaplan-Meier Survival Curve Type of data * 1. If there is no censoring data 2. If there is censoring data General KM Formula Q. How to evaluate statistically equivalency of KM curves? Testing Methods * Mantel-Haenzel Logrank test

Peto & Peto's version of the Logrank test

Gehan's Generalized Wilcoxen

Peto & Peto's and Prentice's generalized wilcoxon

Tarone-Ware and Fleming-Harrington classes

Cox's F-Test (non-parametric version) Log Rank Test Log rank test is distributed as χ2 with 1 df. under the null hypothesis that the survival function for two groups are the same. Expected Cell Counts * Observed - Expected * Log-Rank Statistic * Pearson's Chi-square test is approximately equivalent to the Mantel-Haenszel Log Rank test * Confounding and Interaction Effect confounding effect * A perceived relationship between an independent variable and a dependent variable that has been misestimated due to account for a confound variable Interaction effect * Two independent variables interact if the effect of one of the variables differs depending on the level of the other variable. Q. How can we evaluate the possible interaction effect? T= weeks until going out of remission

X1= group status

X2= log WBC(white blood cell count)

X3= X1*X2 Evaluate the possible interaction effect of log WBC on group status R code Test for significance of interaction term P-value = 0.510 > 0.05 can't reject null hypothesis, there is no significant interaction effect LR(Likelihood Ratio) Test Wald Test * Q. How to assess the effect of treatment status adjusting for log WBC using model 2? Three statistical objectives Test for the significance Wald test statistics = 0.00127 (< 0.05) * Point estimate of the effect Estimated hazard ratio : Hazard for the placebo group is 3.648 times higher than the hazard for the treatment group Confidence interval for Hazard ratio Q. how to assess confounding effect? model1's HR is higher than model2's HR 95% CI is narrower for model2 than model1 model1 model2 width= 8.607 width= 6.748 > Cox PH Model baseline hazard function

(unspecified function) exponential part - baseline hazard depends on t, but not on the X's

- exponential expression involves the X's, but does not involve t Time-independent variables Hazard Ratio Hazard ratio(HR) is defined as the hazard for one individual divided by the hazard for a different individual. PH assumption * HR is constant over time then, proportionally constant over time Adjusted Survival Curves using the Cox PH model cox model hazard function cox model survival function estimated survival function general formula for adjusted survival curve for all covariates in the model Q. What if proportional hazard fails? - Do a stratified analysis

- Include a time-varying covariate to allow changing hazard ratios over time

- Include interaction with time Stratified Analysis Suppose :

- X1 satisfied proportionality assumption

- proportionality simply doesn't hold b/w various levels of a second variable X2 Stratified model If X2 is discrete( with levels) and there is enough data,

fit the stratified model well. Models with Time-dependent Interactions Exponential part includes time variable, t is positive, then the hazard ratio would be increasing over time

is negative, decreasing over time R code R code Modeling of Survival data Proportional Hazards (PH) models * suppose X1= 1 for treated subjects and X2=0 for untreated subjects Accelerated Failure Time (AFT) models * this is a semi-parametric model where w is an "error distribution". typically we place a parametric assumption on w:

- exponential, weibull, gamma

- lognormal X is a vector of covariates of interest.

X may include:

- continuous factors (age, blood pressure)

- discrete factors ( gender, marital status)

- possible interactions ( age by sex interaction) This is most common model used for survival data. - Even though is unspecified, we can estimate the Beta's - It is closely approximate the results for the correct parametric model Q. Why do we call it Proportional Hazard? If we think of as the hazard rate for the treated group, as the hazard for control then: This implies that the ratio of two hazards is a constant, which does NOT depend on time, t. In other words, the hazards of the two groups remain proportional over time. is referred to as the hazard ratio Q. How do we estimate the model parameters? Likelihood Estimation for the PH model * Cox (1972) derived the likelihood, and generized it for censoring, using the idea of a Partial Likelihood Partial Likelihood log-partial likelihood is partial likelihood score equations A sum of "observed" - "expected" values : The maximum partial likelihood estimators can be founded by solving Based on standard likelihood theory, Variance of can be obtained by inverting the second derivative of the partial likelihood, From the Modifications to the likelihood to adjust for ties Cox's (1972) modification: discrete method Peto-Brewlow method Efron's (1977) method Exact method (Kalbfleisch and Prentice) Exact marginal method Breslow method * suppose individuals 1 and 2 fail from {1,2,3,4} at time let be the hazard ratio for individual i (compared to baseline) Evaluating PH Assumption Graphical Approach Goodness-of fit Test Using Time-dependent Variables Graphical Approach Comparing estimated -ln(-ln) survival curves over

different categories of variable * Comparing observed with predicted survival curves * If the distance between two curves is constant

which means two curves are approximate parallel,

then we can say that the PH assumption is satisfied. If observed and predicted curves are close,

then we can conclude the PH assumption is satisfied. log-log survival curve -Positive or negative, either of which is acceptable

-Step function

-Range Using Cox PH model Observed plot = KM survival curves

Expected plot = Cox PH model Goodness-of-fit Test (1) obtain Schoenfeld residuals

(2) Rank failure times

(3) Test correlation of residuals to ranked failure time R code P-value < 0.05 reject Ho

P-value > 0.05 don't reject Ho don't reject Ho,

it means two groups satisfy PH assumption Assessing the PH Assumption using

Time-Dependent Variables Extended Cox model: under the Ho, Wald statistic or LR statistic are distributed by Chi-square distribution with df. 1 If Ho is rejected, PH Assumption violated

If Ho is not rejected, PH Assumption satisfied (Assuming PH OK)