The default p-value that is calculated by survfit()
is the log rank p-value from the score test, which is one of the most oft-quoted p-values for survival data.
If you want to obtain a p-value for each individual stratum compared to the base / reference stratum, then you can use the Cox proportional hazards model, which will produce the same log rank p-value as Survfit()
when ties are ‘exact’:
load AML data
require(survival)
require(survminer)
aml
time status x
1 9 1 Maintained
2 13 1 Maintained
3 13 0 Maintained
4 18 1 Maintained
5 23 1 Maintained
6 28 0 Maintained
7 31 1 Maintained
8 34 1 Maintained
9 45 0 Maintained
10 48 1 Maintained
11 161 0 Maintained
12 5 1 Nonmaintained
13 5 1 Nonmaintained
14 8 1 Nonmaintained
15 8 1 Nonmaintained
16 12 1 Nonmaintained
17 16 0 Nonmaintained
18 23 1 Nonmaintained
19 27 1 Nonmaintained
20 30 1 Nonmaintained
21 33 1 Nonmaintained
22 43 1 Nonmaintained
23 45 1 Nonmaintained
let’s change the data to add a third stratum, ‘SuperMaintained‘
aml$x <- as.character(aml$x)
aml[10,3] <- 'SuperMaintained'
aml[11,3] <- 'SuperMaintained'
aml[22,3] <- 'SuperMaintained'
aml[23,3] <- 'SuperMaintained'
aml$x <- factor(aml$x, levels = c('Nonmaintained','Maintained','SuperMaintained'))
aml
time status x
1 9 1 Maintained
2 13 1 Maintained
3 13 0 Maintained
4 18 1 Maintained
5 23 1 Maintained
6 28 0 Maintained
7 31 1 Maintained
8 34 1 Maintained
9 45 0 Maintained
10 48 1 SuperMaintained
11 161 0 SuperMaintained
12 5 1 Nonmaintained
13 5 1 Nonmaintained
14 8 1 Nonmaintained
15 8 1 Nonmaintained
16 12 1 Nonmaintained
17 16 0 Nonmaintained
18 23 1 Nonmaintained
19 27 1 Nonmaintained
20 30 1 Nonmaintained
21 33 1 Nonmaintained
22 43 1 SuperMaintained
23 45 1 SuperMaintained
With the factor()
command, I also set the reference level to ‘Nonmaintained‘.
use survfit()
fit <- survfit(
Surv(time, status) ~ x,
data = aml)
surv_pvalue(fit)
variable pval method pval.txt
x 0.005309417 Log-rank p = 0.0053
ggsurvplot(
fit,
conf.int = FALSE,
surv.median.line = c('hv'),
data = aml,
pval = TRUE,
pval.method = TRUE,
risk.table = FALSE)
So, the log rank p-value is 0.005309417; Survival is worse for the ‘Nonmaintained‘ stratum.
use coxph()
coxfit <- coxph(
Surv(time, status) ~ x,
data = aml,
ties="exact")
summary(coxfit)
Call:
coxph(formula = Surv(time, status) ~ x, data = aml, ties = "exact")
n= 23, number of events= 18
coef exp(coef) se(coef) z Pr(>|z|)
xMaintained -1.12456 0.32479 0.58806 -1.912 0.05584 .
xSuperMaintained -2.60439 0.07395 0.92232 -2.824 0.00475 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
xMaintained 0.32479 3.079 0.10258 1.0284
xSuperMaintained 0.07395 13.523 0.01213 0.4508
Concordance= 0.729 (se = 0.054 )
Likelihood ratio test= 10.87 on 2 df, p=0.004
Wald test = 8.58 on 2 df, p=0.01
Score (logrank) test = 10.48 on 2 df, p=0.005
So, the p-values for each stratum are:
- Maintained vs Nonmaintained, 0.05584
- SuperMaintained vs Nonmaintained, 0.00475
The hazard ratios (HRs) are given by:
- exp(coef), HR
- exp(-coef), HR flipped the other way
- lower .95, lower 95% CI of the HR
- upper .95, upper 95% CI of the HR
check that the log rank p-values are the same in both models
The log rank p-value from the Cox model can be accessed by: summary(coxfit)$sctest[3]
round(summary(coxfit)$sctest[3], digits = 9) == round(surv_pvalue(fit)[,2], digits = 9)
pvalue
TRUE
Kevin
Read more here: Source link