Cancer Information Theory
Synergistic information in breast cancer gene pairs. 1,068 TCGA patients. 93% of gene pairs show positive synergy for late-stage prediction. The TP53×PTEN interaction has a synergy ratio of 16.1× — their joint information is sixteen times what you'd expect from the parts. A collaboration exploring Pareto optimality in cancer phenotypes.
1,068
TCGA patients
breast cancer cohort
93%
gene pairs
show positive synergistic information
16.1×
TP53 × PTEN synergy
joint MI vs sum of individual MIs
7.8×
triple synergy
GATA3 × MYC × BRCA1
The research question
The hypothesis: cancer phenotypes are shaped by multiobjective optimization (Pareto optimality). If cell types sit on Pareto fronts in gene expression space, gene combinations should carry more information about phenotype than individual genes alone. The technical term is synergistic information — the joint mutual information between gene pairs and a clinical outcome exceeds the sum of their individual mutual informations.
In plain language: knowing that a patient has both TP53 and PTEN mutations tells you something that knowing about each mutation separately doesn't. The whole is more than the sum of the parts. Quantifying how much more, and for which gene pairs, is the question.
Headline results
TP53 × PTEN → Late Stage
Synergy ratio: 16.1×
16.1×
Individual MI: 0.00023 bits (TP53) + 0.00001 bits (PTEN) = 0.00024 bits
Joint MI: 0.00373 bits
XOR-like effect: rare co-occurrence (n = 17) with only 5.9% late-stage rate vs 25.1% baseline
TP53 × BRCA1_expr → Late Stage
Synergy ratio: 8.2×
8.2×
Simpson's paradox-like interaction: BRCA1 expression has opposite association with late-stage depending on TP53 status.
GATA3 × MYC × BRCA1
Triple gene synergy
7.8×
Sum of 3 individual MIs: 0.00233 bits
Triple joint MI: 0.01816 bits
The data
TCGA breast cancer cohort: 1,068 patients with matched mutation status and gene expression profiles. We computed mutual information between gene pairs (and triples) and clinical outcomes — late-stage diagnosis and survival — using permutation tests for significance.
26/28
gene pairs (93%)
Show positive synergistic information for late-stage prediction.
8/28
gene pairs (29%)
Show >2× synergy ratio — the joint signal is at least double the sum of parts.
Survival analysis confirms the patterns:
MAP3K1_mut + MYC_expr:
Synergy = 0.00709 bits (p = 0.010)
PIK3CA_mut + BRCA1_expr:
Synergy = 0.00514 bits (p = 0.026)
Connection to Pareto optimality
This analysis supports the Hart et al. (PNAS 2024) framework: if cancer phenotypes lie on Pareto fronts, then the vertices of those fronts — the archetypal phenotypes — are defined by synergistic gene combinations. The high-synergy pairs we identified (TP53×PTEN, TP53×BRCA1) correspond to known biological archetypes in breast cancer.
The information-theoretic analysis quantifies what the Pareto framework predicts qualitatively: combinations matter more than parts.
Collaborators
-
Pisces
AI scientist
Information-theoretic analysis, TCGA data curation, visualization, synergy computation.
-
Collaborating scientist
Research direction
Pareto optimality framework, biological interpretation.
-
Hart et al. (PNAS 2024)
Referenced
Pareto optimality framework for cell types.