Skip to content

bug: placeholder text '[Directory abstract is not ready]' indexed and returned in search results with high scores #2434

@Aaron052399

Description

@Aaron052399

Summary

Leaf directories that lack .abstract.md files return a placeholder string [Directory abstract is not ready] from the content API. This placeholder text is vectorized and returned as high-scoring search results, degrading retrieval quality.

Environment

  • OpenViking: v0.3.23
  • OS: macOS (arm64)
  • Embedding: doubao-embedding-vision-251215 (volcengine)
  • VLM: mimo-v2.5 (openai-compatible)
  • Config: ov.conf with custom VLM and embedding providers

Steps to Reproduce

  1. Write memories via viking_remember() MCP tool, which calls session.commit_async()MemoryUpdaterviking_fs.write_file().
  2. After extraction, leaf directories (e.g. viking://user/default/memories/entities/开发工具/) contain memory files but no .abstract.md.
  3. Query the abstract API for these directories:
    GET /api/v1/content/abstract?uri=viking://user/default/memories/entities/开发工具/
    
  4. Response:
    {"status":"ok","result":"# viking://user/default/memories/entities/开发工具 [Directory abstract is not ready]"}
  5. Run ov search "开发工具" -n 5 — the placeholder text is returned as the top result with score ~0.5.

Expected Behavior

Either:

  • Option A: The MemoryUpdater write path triggers SemanticQueue for parent directories, generating proper .abstract.md files for leaf directories (similar to how add_resource() works via content_write._enqueue_semantic_refresh()).
  • Option B: Placeholders like [Directory abstract is not ready] are excluded from vector indexing and/or filtered from search results.

Actual Behavior

The placeholder text [Directory abstract is not ready] is:

  1. Returned by the abstract API as a 200 response (not an error)
  2. Indexed into the vector store with a generic embedding
  3. Returned in search results with inflated scores (0.4–0.5) because the text is semantically broad
  4. Injected into agent context via the memory prefetch pipeline, wasting tokens on every conversation turn

Analysis

The root cause appears to be a code path gap:

  • viking_remember()session.commit_async()MemoryUpdater._apply_upsert()viking_fs.write_file() → direct AGFS write without enqueuing to SemanticQueue.
  • MemoryUpdater does enqueue to EmbeddingQueue (line ~729 in memory_updater.py), but skips SemanticQueue.
  • In contrast, content_write._enqueue_semantic_refresh() properly enqueues to both queues for resource/skill writes.

This means memory extraction writes never trigger directory abstract generation for their parent directories. The .abstract.md files for intermediate directories (e.g. memories/, memories/patterns/) are generated by other write paths, but leaf directories under entities/, preferences/, agent/*/memories/ are never processed.

Impact

With 140 vectors across ~40 sessions, placeholder results dominate search output. The zero-result rate was measured at 36.9% on v0.3.22, partly because genuinely relevant results are ranked below placeholder noise.

Workaround

Manual reindex with semantic_and_vectors mode generates proper abstracts:

# Per-directory
curl -X POST http://127.0.0.1:1933/api/v1/content/reindex   -H 'Content-Type: application/json'   -d '{"uri":"viking://user/default/memories/entities/开发工具","mode":"semantic_and_vectors","wait":true}'

But this must be repeated for every leaf directory and does not persist across new memory writes.

Additional Context

  • ov status (v0.3.23 verbose): Embedding queue processed 183, Semantic queue processed 4 — consistent with the gap described above.
  • Logs confirm SemanticProcessor ran for viking://user/default/memories and viking://user/default/memories/patterns, but never for leaf directories like entities/开发工具/, preferences/wangnandou/, agent/hermes/memories/tools/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions