survival-analysis-km
v1.0.0Generates Kaplan-Meier survival curves, calculates survival statistics
Installation
Survival Analysis (Kaplan-Meier)
Kaplan-Meier survival analysis tool for clinical and biological research. Generates publication-ready survival curves with statistical tests.
Features
- Kaplan-Meier Curve Generation: Publication-quality survival plots with confidence intervals
- Statistical Tests: Log-rank test, Wilcoxon test, Peto-Peto test
- Hazard Ratios: Cox proportional hazards regression with 95% CI
- Summary Statistics: Median survival time, restricted mean survival time (RMST)
- Multi-group Analysis: Supports 2+ comparison groups
- Risk Tables: Optional at-risk table below curves
Usage
Python Script
python scripts/main.py --input data.csv --time time_col --event event_col --group group_col --output results/
Arguments
| Argument | Description | Required |
|---|---|---|
--input |
Input CSV file path | Yes |
--time |
Column name for survival time | Yes |
--event |
Column name for event indicator (1=event, 0=censored) | Yes |
--group |
Column name for grouping variable | Optional |
--output |
Output directory for results | Yes |
--conf-level |
Confidence level (default: 0.95) | Optional |
--risk-table |
Include risk table in plot | Optional |
Input Format
CSV with columns: - Time column: Numeric, time to event or censoring - Event column: Binary (1 = event occurred, 0 = censored/right-censored) - Group column: Categorical variable for stratification
Example:
patient_id,time_months,death,treatment_group
P001,24.5,1,Drug_A
P002,36.2,0,Drug_A
P003,18.7,1,Placebo
Output Files
km_curve.png: Kaplan-Meier survival curvekm_curve.pdf: Vector version for publicationssurvival_stats.csv: Statistical summary (median survival, confidence intervals)hazard_ratios.csv: Cox regression results with HR and 95% CI- `logrank_test.csv**: Pairwise comparison p-values
- `report.txt**: Human-readable summary report
Technical Details
Statistical Methods
- Kaplan-Meier Estimator: Non-parametric maximum likelihood estimate of survival function
- Product-limit estimator: Ŝ(t) = Π(tᵢ≤t) (1 - dᵢ/nᵢ)
-
Greenwood's formula for variance estimation
-
Log-Rank Test: Most widely used test for comparing survival curves
- Null hypothesis: No difference between groups
-
Weighted by number at risk at each event time
-
Cox Proportional Hazards: Semi-parametric regression model
- h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ...)
- Proportional hazards assumption checked via Schoenfeld residuals
Dependencies
lifelines: Core survival analysis librarymatplotlib,seaborn: Visualizationpandas,numpy: Data handlingscipy: Statistical tests
Technical Difficulty: High ⚠️
This skill involves advanced statistical modeling. Results should be reviewed by a biostatistician, especially for: - Proportional hazards assumption violations - Small sample sizes (< 30 per group) - Heavy censoring (> 50%) - Time-varying covariates
References
See references/ folder for:
- Kaplan EL, Meier P (1958) original paper
- Cox DR (1972) regression models paper
- Sample datasets for testing
- Clinical reporting guidelines (ATN, CONSORT)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--input |
str | Required | Input CSV file path |
--time |
str | Required | Column name for survival time |
--event |
str | Required | |
--group |
str | Required | |
--output |
str | Required | Output directory for results |
--conf-level |
float | 0.95 | |
--risk-table |
str | Required | Include risk table in plot |
--figsize |
str | '10 | |
--dpi |
int | 300 |
Example
# Basic survival curve
python scripts/main.py
--input clinical_data.csv
--time overall_survival_months
--event death
--group treatment_arm
--output ./results/
--risk-table
Output includes: - Survival curves with 95% confidence bands - Median survival: Drug A = 28.4 months (95% CI: 24.1-32.7), Placebo = 18.2 months (95% CI: 15.3-21.1) - Log-rank test p-value: 0.0023 - Hazard ratio: 0.62 (95% CI: 0.45-0.85), p = 0.003
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
Prerequisites
# Python dependencies
pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support