fix: compaction pairwise fallback, conflict-aware merge prompt, metadata caps, and litellm pickle compat by Piotr1215 · Pull Request #273 · redis/agent-memory-server

Piotr1215 · 2026-04-08T19:30:31Z

Problem

Four related issues affecting memory compaction reliability and quality:

1. Compaction deadlock in dense memory clusters (merges 0 memories)

_semantic_merge_group_is_cohesive() rejects all merge groups when memories form dense topical clusters. Each memory has "extra neighbors" outside the candidate group, triggering the ambiguous-group skip. Any user who creates memories about related topics (e.g., multiple commits on the same feature) will see compaction scan hundreds of memories and merge exactly zero.

Logs show dozens of Skipping ambiguous semantic merge group entries per run, with Merged 0 memories at the end.

2. Merge prompt does not handle conflicting memories

The current merge prompt ("merge similar or duplicate memories into a single, coherent memory") blends everything together. When memories contain contradictory information — common when iterating on a solution across commits — the merged result preserves both sides of the contradiction instead of resolving it.

Example: Memory A says "use tool X for task Y", Memory B (created later) says "switched from X to Z for task Y because X had issues." The current prompt merges them into a single memory that mentions both tools without clarity on which is current.

3. Unbounded topic/entity growth on successive merges

merge_memories_with_llm unions all topics and entities from source memories without limit. Over successive compaction rounds where merged memories get re-merged, topic counts inflate to 20+ and entity counts to 30+, mostly with LLM-generated verbose synonyms of the same concept.

4. Docket worker crashes on litellm exception serialization

When a litellm BadRequestError is raised and retries are exhausted, the Docket worker tries to serialize the exception via cloudpickle.dumps(). Litellm exception classes have mandatory __init__ args (message, model, llm_provider) and contain unpickleable _thread.RLock objects, causing TypeError: cannot pickle '_thread.RLock' object. This crashes the worker process repeatedly.

Fix

1. Pairwise merge fallback in `deduplicate_by_semantic_search`

When the full candidate group fails the cohesion check, fall back to merging just the anchor memory with its single closest neighbor (by vector distance). The pair is re-validated through _semantic_merge_group_is_cohesive — a pair has far fewer "extra neighbors" than a large group, so it passes where the group did not.

This breaks the deadlock gradually: each compaction run merges the tightest pairs, reducing the cluster density for the next run.

2. Conflict-aware merge prompt with recency precedence

The merge prompt now:

Includes creation timestamps (UTC-normalized) for each memory
Instructs the LLM to prefer newer information when facts conflict
Drops superseded information rather than blending contradictions
Preserves concrete details (commit hashes, file paths, version numbers)
Enforces plain text output (no markdown formatting in merged text)

3. Topic and entity caps in `merge_memories_with_llm`

Module-level constants MAX_MERGED_TOPICS (15) and MAX_MERGED_ENTITIES (20) cap metadata after union. When over the limit, shorter strings are kept preferentially — shorter identifiers tend to be more precise while longer ones are often LLM-generated verbose synonyms.

4. Litellm pickle compatibility patch

New module litellm_pickle_compat.py patches 14 litellm exception classes with custom __reduce__ methods that filter out unpickleable attributes before serialization. Auto-applied via side-effect import in docket_tasks.py. Note: This is the same fix as PR branch bsb/fix-litellm-pickle-231 (cherry-picked commits 2b7febd, 90a8122, 708f91a) — included here for completeness since that branch hasn't been merged yet.

Test plan

Run compaction on a memory store with dense topical clusters — verify non-zero merges
Verify worker no longer crashes from pickle errors after sustained operation
Create two memories with conflicting facts (different timestamps), trigger merge, verify newer fact wins
Create memories with 20+ topics each, merge them, verify output has ≤15 topics
Verify existing full-group cohesive merges still work (pairwise fallback only fires when full group fails)
Verify Pairwise merged memory X with Y appears in logs for fallback path
Verify cloudpickle round-trip for litellm exceptions (covered by test_docket_serialization.py)

Note

Medium Risk
Moderate risk because it changes semantic deduplication/merge behavior (including new LLM-driven pairwise fallback and NO_MERGE flow) and adds monkey-patching of third-party exception classes, which can affect production compaction outcomes and worker error handling.

Overview
Semantic compaction now merges in dense clusters instead of stalling. When a candidate group fails _semantic_merge_group_is_cohesive, deduplicate_by_semantic_search falls back to an LLM-driven pairwise merge with the closest neighbor, indexing the merged record before deleting sources.

LLM merge behavior is tightened and bounded. merge_memories_with_llm becomes nullable via a NO_MERGE escape hatch, includes UTC creation timestamps and recency-wins conflict rules in the prompt, and caps merged topics/entities growth.

Adds guards against embedding-provider input limits and worker crashes. Introduces settings.max_merge_input_chars to decline oversized merges, adds defensive truncation + richer failure logging in LiteLLMEmbeddings, and monkey-patches litellm exception classes to be cloudpickle-serializable for Docket; new tests cover size gates, updated compaction invariants, and exception round-tripping.

^{Reviewed by Cursor Bugbot for commit dcfe785. Bugbot is set up for automated code reviews on this repo. Configure here.}

…pickle-231 worker crashes with "TypeError: cannot pickle '_thread.RLock' object" when docket tries to serialize litellm BadRequestError after exhausted retries. patches litellm exception classes with custom __reduce__ to bypass unpickleable attributes. cherry-picked 2b7febd, 90a8122, 708f91a.

compaction was merging 0 memories because _semantic_merge_group_is_cohesive rejects all groups in dense clusters (every memory has "extra neighbors"). now falls back to pairwise merge with the single closest neighbor when the full group is non-cohesive, breaking the deadlock. also caps merged topics at 15 and entities at 20 to prevent unbounded growth across successive merge rounds. shorter strings kept preferentially as they tend to be more precise identifiers vs LLM-generated synonyms.

memories created from successive commits often conflict (commit A says "use approach X", commit B supersedes with "use approach Y"). the old merge prompt just blended both into one text, preserving contradictions. new prompt includes creation timestamps and instructs the LLM to prefer newer memories when facts conflict, drop superseded information, and preserve concrete details (hashes, paths, versions). also enforces plain text output to prevent markdown formatting in merged memory text.

Copilot

Pull request overview

This PR improves long-term memory semantic compaction behavior to make merges progress in dense clusters, produce conflict-aware merged text, and prevent topic/entity metadata from growing without bound; it also adds a Docket/cloudpickle compatibility patch for LiteLLM exceptions.

Changes:

Add pairwise merge fallback in deduplicate_by_semantic_search when a full candidate group fails the cohesion check.
Update merge_memories_with_llm to (a) cap merged topics/entities and (b) use a conflict/recency-aware merge prompt that includes timestamps.
Introduce litellm_pickle_compat monkey-patch and add tests validating cloudpickle round-tripping for Docket task args/exceptions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`agent_memory_server/long_term_memory.py`	Adds topic/entity caps, conflict-aware merge prompt with timestamps, and pairwise merge fallback for dense semantic clusters.
`agent_memory_server/litellm_pickle_compat.py`	Adds a pickle-safe `__reduce__` patch for LiteLLM exception classes (auto-applied on import).
`agent_memory_server/docket_tasks.py`	Imports the LiteLLM pickle-compat module for side-effect patching in Docket task workers.
`tests/test_docket_serialization.py`	Adds regression tests for cloudpickle serialization of task arguments and LiteLLM exceptions after patching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T19:34:43Z

 from docket import Docket

+import agent_memory_server.litellm_pickle_compat  # noqa: F401 — patches litellm exceptions for cloudpickle
 from agent_memory_server.config import settings


This PR’s title/description focuses on semantic compaction changes, but this file also introduces a global litellm exception monkey-patch (side-effect import) to address cloudpickle/Docket serialization, plus adds dedicated tests. Please update the PR description/title to mention this additional behavior change, since it affects runtime behavior for all Docket tasks and is orthogonal to compaction.

- move MAX_MERGED_TOPICS/ENTITIES to module level (ruff N806) - fix comment indentation inside if-raise block (dead code appearance) - use explicit UTC normalization for timestamps instead of isoformat[:19]

the pairwise fallback intentionally changes behavior: dense clusters that previously blocked all merges now allow closest-pair merges. tests updated to verify the anti-snowball property (no mega-memories) rather than asserting zero merges. fake_merge returns known text so re-indexing after merge doesn't fail on ControlledEmbeddings lookup.

bridge pair (bridge+diet) passes pairwise cohesion because the distant cluster member (sports) is outside the distance threshold from diet's perspective. test now accepts either outcome and verifies at-most-one merge. chain with 35-degree polar spacing allows adjacent-pair merges so >= 2 surviving memories is the correct anti-snowball bound.

When merge_memories_with_llm succeeded but the subsequent re-index of the merged memory failed (e.g. embedding provider returns 400 on a malformed input), the previous order — delete sources first, then index merged — destroyed both originals and lost the merged copy. This was observed in production with Ollama returning "invalid input type" mid- compaction; each crash silently lost two memories. Reorder both deduplication paths inside deduplicate_by_semantic_search to index-then-delete with a try/except that aborts the merge and preserves all sources on indexing failure. Remove the now-duplicate index call from compact_long_term_memories' caller. Also wrap the aembedding(**kwargs) call with diagnostic logging that captures input types, lengths, and a truncated sample of the first item on failure — so the next provider-specific shape rejection surfaces enough context to fix without redeploying.

production logs showed 1172 pairwise rejections vs 11 successes (~99% reject rate, 0 merges per compaction). root cause: the pairwise fallback re-applied _semantic_merge_group_is_cohesive to the [closest] pair, which only passes for *bridge* topologies where the closest neighbor has no other in-threshold neighbors. dense homogeneous clusters (the actual production distribution) trigger identical rejections — same lexical gate, same outcome. resolution: hand the conceptual judgement to the LLM. vector recall already validated lexical proximity (closest.dist <= threshold); the merge prompt now has an explicit NO_MERGE escape hatch for "vector-similar but conceptually distinct" pairs. merge_memories_with_llm returns MemoryRecord | None — None means the LLM declined. both call sites in deduplicate_by_semantic_search now handle None by preserving the originals (return memory, False; no deletion). this replaces a coarse lexical gate with strong-model judgement — strict upgrade for the dense-cluster case while preserving the bridge case via NO_MERGE rather than vector arithmetic. NO_MERGE detection uses tolerant prefix match (.upper().startswith("NO_MERGE")) since LLMs occasionally add trailing punctuation. full LLM response logged when NO_MERGE fires for diagnostics.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{Reviewed by Cursor Bugbot for commit 94f6c19. Configure here.}

cursor · 2026-05-05T13:29:05Z

+                [merged_memory],
+                redis_client=redis_client,
+                deduplicate=False,
+            )


Merged memory double-indexed when called from index path

Medium Severity

deduplicate_by_semantic_search now indexes the merged memory internally via index_long_term_memories, but when called from the index_long_term_memories function itself (line 1078), the returned merged memory is assigned to current_memory, which is then appended to processed_memories and indexed a second time via db.add_memories. The compact_long_term_memories caller correctly removed its redundant indexing call, but the index_long_term_memories call site still re-indexes the result, causing duplicate embedding API calls and duplicate extract_memory_structure background tasks.

Additional Locations (1)

agent_memory_server/long_term_memory.py#L1077-L1088

^{Reviewed by Cursor Bugbot for commit 94f6c19. Configure here.}

After v6 (94f6c19) the index-before-delete reorder prevented data loss when Ollama nomic-embed-text rejected oversized merged outputs (HTTP 400 "input length exceeds context length"), but every aborted merge still wasted an LLM call and the originals stayed un-merged. Live diagnostic logging from v6 captured the exact failure: a merge of two memories totaling ~9k chars produced a merged text Ollama refused to embed because nomic-bert is architecturally capped at 2048 tokens (model-level, not overridable via options.num_ctx). Layer 1 (primary, surgical): pre-merge size gate in deduplicate_by_semantic_search. Both pairwise and cohesive group paths now check sum(len(m.text) for m in candidates) before calling merge_memories_with_llm. If combined > settings.max_merge_input_chars (default 5500), decline the merge upfront, log distinctly, return (memory, False) — same control flow as NO_MERGE. Layer 2 (safety net): truncation in LiteLLMEmbeddings before provider call. Catches any oversized text that bypasses the size gate (e.g. a single memory ingested directly via index_long_term_memories larger than the embedding provider's token cap). logger.warning on every truncation so frequent firings surface in logs. Threshold rationale: 5500 chars handles the empirical failure boundary (5881-char source embedded fine alone, 9436-char merge failed). Tunable via MAX_MERGE_INPUT_CHARS env var. Embedding-side cap (6000 chars) is a conservative buffer under nomic-bert's 2048-token ceiling for English prose (~0.25-0.40 tokens/char). Tests: two new unit tests verify both gate paths decline without invoking the LLM merge or deletion, with the boom-merge / boom-delete sentinels asserting the doomed work never runs.

Piotr1215 added 3 commits April 8, 2026 18:59

Copilot AI review requested due to automatic review settings April 8, 2026 19:30

Copilot started reviewing on behalf of Piotr1215 April 8, 2026 19:31 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

fix: address review feedback — lint, indentation, timezone

8e592a1

- move MAX_MERGED_TOPICS/ENTITIES to module level (ruff N806) - fix comment indentation inside if-raise block (dead code appearance) - use explicit UTC normalization for timestamps instead of isoformat[:19]

Piotr1215 changed the title ~~fix: pairwise merge fallback, conflict-aware prompt, and metadata caps in compaction~~ fix: compaction pairwise fallback, conflict-aware merge prompt, metadata caps, and litellm pickle compat Apr 8, 2026

Piotr1215 added 3 commits April 8, 2026 21:39

fix: remove unnecessary else after return (ruff RET505)

94f1c5e

cursor Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread agent_memory_server/long_term_memory.py Outdated

Piotr1215 added 2 commits May 5, 2026 15:19

cursor Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: compaction pairwise fallback, conflict-aware merge prompt, metadata caps, and litellm pickle compat#273

fix: compaction pairwise fallback, conflict-aware merge prompt, metadata caps, and litellm pickle compat#273
Piotr1215 wants to merge 10 commits into
redis:mainfrom
Piotr1215:fix/compaction-pairwise-merge-and-conflict-prompt

Piotr1215 commented Apr 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Piotr1215 commented Apr 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

1. Compaction deadlock in dense memory clusters (merges 0 memories)

2. Merge prompt does not handle conflicting memories

3. Unbounded topic/entity growth on successive merges

4. Docket worker crashes on litellm exception serialization

Fix

1. Pairwise merge fallback in deduplicate_by_semantic_search

2. Conflict-aware merge prompt with recency precedence

3. Topic and entity caps in merge_memories_with_llm

4. Litellm pickle compatibility patch

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 5, 2026

Choose a reason for hiding this comment

Merged memory double-indexed when called from index path

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Piotr1215 commented Apr 8, 2026 •

edited by cursor Bot

Loading

1. Pairwise merge fallback in `deduplicate_by_semantic_search`

3. Topic and entity caps in `merge_memories_with_llm`