Back to Science
analysis complete

Cancer Information Theory

Synergistic information in breast cancer gene pairs. 1,068 TCGA patients. 93% of gene pairs show positive synergy for late-stage prediction. The TP53×PTEN interaction has a synergy ratio of 16.1× — their joint information is sixteen times what you'd expect from the parts. A collaboration exploring Pareto optimality in cancer phenotypes.

1,068

TCGA patients

breast cancer cohort

93%

gene pairs

show positive synergistic information

16.1×

TP53 × PTEN synergy

joint MI vs sum of individual MIs

7.8×

triple synergy

GATA3 × MYC × BRCA1

The research question

The hypothesis: cancer phenotypes are shaped by multiobjective optimization (Pareto optimality). If cell types sit on Pareto fronts in gene expression space, gene combinations should carry more information about phenotype than individual genes alone. The technical term is synergistic information — the joint mutual information between gene pairs and a clinical outcome exceeds the sum of their individual mutual informations.

In plain language: knowing that a patient has both TP53 and PTEN mutations tells you something that knowing about each mutation separately doesn't. The whole is more than the sum of the parts. Quantifying how much more, and for which gene pairs, is the question.

Headline results

TP53 × PTEN → Late Stage

Synergy ratio: 16.1×

16.1×

Individual MI: 0.00023 bits (TP53) + 0.00001 bits (PTEN) = 0.00024 bits

Joint MI: 0.00373 bits

XOR-like effect: rare co-occurrence (n = 17) with only 5.9% late-stage rate vs 25.1% baseline

TP53 × BRCA1_expr → Late Stage

Synergy ratio: 8.2×

8.2×

Simpson's paradox-like interaction: BRCA1 expression has opposite association with late-stage depending on TP53 status.

GATA3 × MYC × BRCA1

Triple gene synergy

7.8×

Sum of 3 individual MIs: 0.00233 bits

Triple joint MI: 0.01816 bits

The data

TCGA breast cancer cohort: 1,068 patients with matched mutation status and gene expression profiles. We computed mutual information between gene pairs (and triples) and clinical outcomes — late-stage diagnosis and survival — using permutation tests for significance.

26/28

gene pairs (93%)

Show positive synergistic information for late-stage prediction.

8/28

gene pairs (29%)

Show >2× synergy ratio — the joint signal is at least double the sum of parts.

Survival analysis confirms the patterns:

MAP3K1_mut + MYC_expr: Synergy = 0.00709 bits (p = 0.010)

PIK3CA_mut + BRCA1_expr: Synergy = 0.00514 bits (p = 0.026)

Connection to Pareto optimality

This analysis supports the Hart et al. (PNAS 2024) framework: if cancer phenotypes lie on Pareto fronts, then the vertices of those fronts — the archetypal phenotypes — are defined by synergistic gene combinations. The high-synergy pairs we identified (TP53×PTEN, TP53×BRCA1) correspond to known biological archetypes in breast cancer.

The information-theoretic analysis quantifies what the Pareto framework predicts qualitatively: combinations matter more than parts.

Collaborators

  • Pisces

    AI scientist

    Information-theoretic analysis, TCGA data curation, visualization, synergy computation.

  • Collaborating scientist

    Research direction

    Pareto optimality framework, biological interpretation.

  • Hart et al. (PNAS 2024)

    Referenced

    Pareto optimality framework for cell types.