bugfix: extract numbered topic items within interest section by RonghaiHe · Pull Request #19 · AI45Lab/iDeer

RonghaiHe · 2026-06-05T09:56:03Z

Similar to Pull Request: extract numbered topic items within interest section as queries of semantic scholar. The difference is that current version is based on the latest commit: 9aa7efb

Have tested via

python main.py --sources semanticscholar --save --skip_source_emails

…queries_from_description Co-Authored-By: OpenCode <opencode-agent[bot]@users.noreply.github.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the Semantic Scholar source query derivation to parse “interest” sections and numbered topic lists from a free-form research description.

Changes:

Reworks _derive_queries_from_description to use a section-based state machine (in_interest_section) rather than per-line prefix stripping.
Extracts topics primarily from numbered items inside an “interest/关注/研究” section, with additional cleanup around common separators.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        in_interest_section = False
+
+        for line in desc.split("\n"):


+            # Lines containing "interest" signal the start of the interest section
+            if re.search(r'\binterest', lower) or "关注" in lower or "研究" in lower:
+                in_interest_section = True
+                continue  # header line itself is not a topic


+        in_interest_section = False
+
+        for line in desc.split("\n"):


+            # Within the interest section, extract from numbered items
+            if in_interest_section:
+                m = re.match(r'^\d+[\.\)\-:、]\s*(.*)', line)
+                if m:
+                    content = m.group(1)


+                    if content and len(content) > 1:
+                        queries.append(content[:120])

        return queries or ["artificial intelligence"]


    def _derive_queries_from_description(self) -> list[str]:
-        """Extract up to 3 search queries from the user description."""
+        """Extract search queries from the user's research description."""
+        import re


            lower = line.lower()
+
+            # "not interested" signals the end of the interest section
            if any(neg in lower for neg in ("not interested", "不感兴趣", "don't", "exclude")):


fix: extract numbered topic items within interest section in _derive_…

dd79417

…queries_from_description Co-Authored-By: OpenCode <opencode-agent[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 5, 2026 09:56

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: extract numbered topic items within interest section#19

bugfix: extract numbered topic items within interest section#19
RonghaiHe wants to merge 1 commit into
AI45Lab:mainfrom
RonghaiHe:bugfix/queries_extraction

RonghaiHe commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RonghaiHe commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants