feat: add GO enrichment analysis page for ProteomicsLFQ results #8

hjn0415a · 2026-02-04T05:50:12Z

This PR adds a new GO Enrichment Analysis page for ProteomicsLFQ results.
The page allows users to perform GO term enrichment (BP, CC, MF) based on protein-level differential abundance results.

Added a new Streamlit results page: results_proteomicslfq.py
Integrated GO enrichment analysis using MyGene.info for GO annotation
Foreground proteins are selected based on configurable p-value and |log2FC| thresholds
Enrichment is computed using Fisher’s exact test
Results are visualized as bar plots and tables, separated by GO category (BP / CC / MF)
Added mygene as a new dependency

Summary by CodeRabbit

New Features
- Added "GO Terms" page to the Results section for protein analysis.
- Implemented Gene Ontology enrichment analysis with adjustable p-value and log2FC thresholds.
- Results displayed across three tabs with visualizations and detailed data tables.

coderabbitai · 2026-02-04T05:55:23Z

📝 Walkthrough

Walkthrough

A new GO enrichment workflow is added to the ProteomicsLFQ results interface. The feature allows users to adjust p-value and log2FC thresholds, fetches UniProt GO terms via MyGene.info, performs Fisher's exact test enrichment analysis, and displays results across three GO term categories. A new dependency on mygene is introduced.

Changes

Cohort / File(s)	Summary
UI Routing `app.py`	Added new "GO Terms" page (🧪 icon) to Results section, referencing the new results_proteomicslfq module.
GO Enrichment Workflow `content/results_proteomicslfq.py`	New module implementing Streamlit-based ProteomicsLFQ results interface with abundance data display, GO enrichment pipeline including threshold adjustment, MyGene.info API integration, protein set construction, Fisher's exact test analysis, and results visualization across BP/CC/MF tabs.
Dependencies `requirements.txt`	Added mygene package dependency for UniProt GO term retrieval.

Sequence Diagram

sequenceDiagram
    actor User
    participant Streamlit as Streamlit UI
    participant DataHandler as Data Handler
    participant MyGeneAPI as MyGene.info API
    participant Stats as Statistical Engine
    participant Viz as Visualization

    User->>Streamlit: Open ProteomicsLFQ Results
    Streamlit->>DataHandler: Retrieve abundance data
    DataHandler-->>Streamlit: Protein abundance table
    Streamlit->>User: Display table & GO Enrichment form

    User->>Streamlit: Adjust p-value/log2FC thresholds
    User->>Streamlit: Run GO Enrichment
    Streamlit->>DataHandler: Filter proteins by thresholds
    DataHandler-->>Streamlit: Filtered foreground & background sets

    Streamlit->>MyGeneAPI: Fetch GO terms for proteins
    MyGeneAPI-->>Streamlit: GO annotations (BP/CC/MF)
    
    Streamlit->>Stats: Perform Fisher's exact test per GO type
    Stats-->>Streamlit: Enrichment p-values & statistics
    
    Streamlit->>Viz: Generate results (bars & tables)
    Viz-->>Streamlit: Three tabs with visualizations
    Streamlit->>User: Display enrichment results

Poem

🐰 Hops of joy through GO's grand garden,
Proteomics bloom with enrichment's pardon,
Fisher's test dances, MyGene API sings,
Three tabs of terms on biology's wings! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the main change: adding a GO enrichment analysis page for ProteomicsLFQ results, which aligns with all the modifications in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@content/results_proteomicslfq.py`:
- Around line 68-117: The foreground/background counts are using all proteins
(bg_ids/fg_ids) even if MyGene returned no annotation, so update
run_go_enrichment to first compute annotated_ids = set(res["query"].astype(str))
(or otherwise derive the set of IDs present in the filtered res) and then
replace bg_set and fg_set with their intersections with annotated_ids before
computing N_bg/N_fg and running the Fisher tests; keep building go2bg/go2fg from
res rows as-is so counts and p-values reflect only annotated proteins.

🧹 Nitpick comments (4)

requirements.txt (1)

152-152: Consider pinning mygene for deterministic builds.

requirements.txt is generated by pip-compile, but mygene is unpinned. Align it with the rest of the lockfile by re-running pip-compile or pinning a version to avoid non-reproducible installs.
content/results_proteomicslfq.py (3)
45-50: Wrap the GO enrichment UI in @st.fragment to avoid full reruns.

This keeps slider/button interactions from re-running the entire page. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.
Suggested refactor (skeleton)
+@st.fragment
+def go_enrichment_panel(pivot_df):
     st.subheader("🧬 GO Enrichment Analysis")
     p_cutoff = st.slider(...)
     fc_cutoff = st.slider(...)
     if st.button("Run GO Enrichment"):
         ...
+
+go_enrichment_panel(pivot_df)
55-65: Avoid blind Exception catches to improve debuggability.

Catching broad exceptions hides unexpected failures. Consider narrowing to the likely exceptions (e.g., AttributeError/IndexError in parsing and request-related exceptions around the API call) or re-raise after logging.

Also applies to: 140-141

134-137: Use streamlit_plotly_events for interactive Plotly charts.

Right now the chart is displayed but you aren’t capturing interactions. Consider using plotly_events to support click/selection actions. As per coding guidelines, **/*.py: Use Plotly and streamlit_plotly_events for interactive visualizations.
Example integration
+from streamlit_plotly_events import plotly_events
 ...
-                                st.plotly_chart(fig, use_container_width=True)
+                                selected = plotly_events(fig, click_event=True, select_event=True)
+                                st.plotly_chart(fig, use_container_width=True)

coderabbitai · 2026-02-04T05:55:26Z

content/results_proteomicslfq.py

+                    bg_ids = analysis_df["UniProt"].dropna().unique().tolist()
+                    fg_ids = analysis_df[
+                        (analysis_df["p-value"] < p_cutoff) &
+                        (analysis_df["log2FC"].abs() >= fc_cutoff)
+                    ]["UniProt"].dropna().unique().tolist()
+
+                    if len(fg_ids) < 3:
+                        st.warning(f"Not enough significant proteins (p < {p_cutoff}, |log2FC| ≥ {fc_cutoff}). Found: {len(fg_ids)}")
+                    else:
+                        res_list = mg.querymany(bg_ids, scopes="uniprot", fields="go", as_dataframe=False)
+                        res = pd.DataFrame(res_list)
+                        if "notfound" in res.columns:
+                            res = res[res["notfound"] != True]
+
+                        def extract_go_terms(go_data, go_type):
+                            if not isinstance(go_data, dict) or go_type not in go_data:
+                                return []
+                            terms = go_data[go_type]
+                            if isinstance(terms, dict):
+                                terms = [terms]
+                            return list({t.get("term") for t in terms if "term" in t})
+
+                        for go_type in ["BP", "CC", "MF"]:
+                            res[f"{go_type}_terms"] = res["go"].apply(lambda x: extract_go_terms(x, go_type))
+
+                        fg_set = set(fg_ids)
+                        bg_set = set(bg_ids)
+
+                        def run_go_enrichment(go_type):
+                            go2fg = defaultdict(set)
+                            go2bg = defaultdict(set)
+                            for _, row in res.iterrows():
+                                uid = str(row["query"])
+                                for term in row[f"{go_type}_terms"]:
+                                    go2bg[term].add(uid)
+                                    if uid in fg_set:
+                                        go2fg[term].add(uid)
+
+                            records = []
+                            N_fg = len(fg_set)
+                            N_bg = len(bg_set)
+                            for term, fg_genes in go2fg.items():
+                                a = len(fg_genes)
+                                if a == 0:
+                                    continue
+                                b = N_fg - a
+                                c = len(go2bg[term]) - a
+                                d = N_bg - (a + b + c)
+                                _, p = fisher_exact([[a, b], [c, d]], alternative="greater")
+                                records.append({"GO_Term": term, "Count": a, "GeneRatio": f"{a}/{N_fg}", "p_value": p})


⚠️ Potential issue | 🟠 Major

Foreground/background counts include unannotated proteins, biasing Fisher p-values.

N_bg/N_fg are computed from all proteins, even those without GO annotations. This inflates the background and can understate enrichment. Restrict both sets to annotated proteins returned by MyGene before computing Fisher’s exact test.

Proposed fix

- bg_ids = analysis_df["UniProt"].dropna().unique().tolist() + bg_ids = analysis_df["UniProt"].dropna().unique().tolist() fg_ids = analysis_df[ (analysis_df["p-value"] < p_cutoff) & (analysis_df["log2FC"].abs() >= fc_cutoff) ]["UniProt"].dropna().unique().tolist() ... - fg_set = set(fg_ids) - bg_set = set(bg_ids) + annotated_ids = set(res["query"].astype(str)) + bg_set = annotated_ids + fg_set = annotated_ids.intersection(map(str, fg_ids))

🧰 Tools

🪛 Ruff (0.14.14)

[error] 80-80: Avoid inequality comparisons to True; use not res["notfound"]: for false checks

Replace with not res["notfound"]

(E712)

[warning] 91-91: Function definition does not bind loop variable go_type

(B023)

🤖 Prompt for AI Agents

In `@content/results_proteomicslfq.py` around lines 68 - 117, The foreground/background counts are using all proteins (bg_ids/fg_ids) even if MyGene returned no annotation, so update run_go_enrichment to first compute annotated_ids = set(res["query"].astype(str)) (or otherwise derive the set of IDs present in the filtered res) and then replace bg_set and fg_set with their intersections with annotated_ids before computing N_bg/N_fg and running the Fisher tests; keep building go2bg/go2fg from res rows as-is so counts and p-values reflect only annotated proteins.

feat: add GO enrichment analysis page for ProteomicsLFQ results

827367e

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GO enrichment analysis page for ProteomicsLFQ results #8

feat: add GO enrichment analysis page for ProteomicsLFQ results #8

Uh oh!

hjn0415a commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Changes

Sequence Diagram

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add GO enrichment analysis page for ProteomicsLFQ results #8

Are you sure you want to change the base?

feat: add GO enrichment analysis page for ProteomicsLFQ results #8

Uh oh!

Conversation

hjn0415a commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Changes

Sequence Diagram

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hjn0415a commented Feb 4, 2026 •

edited by coderabbitai bot

Loading