[Question]: Repeated add_resource to the same directory/URI still reports full Embedding queue work on v0.3.22

### Your Question

## Summary                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                               
  When calling `add_resource` repeatedly with the same local directory and the same stable `to` URI, OpenViking v0.3.22 still reports the same `Embedding.processed` count even when no files changed.                                                         
                                                                                                                                                                                                                                                               
  This makes `watch_interval` / scheduled refresh expensive for large directories, because a no-op refresh appears to enqueue and process the same amount of embedding work as the initial import.                                                             
                                                                                                                                                                                                                                                               
  I expected the incremental behavior from #659 to skip unchanged files/directories, or at least expose whether `Embedding.processed` represents actual embedding provider calls vs. no-op/skipped queue messages.                                             
                                                                                                                                                                                                                                                               
  ## Environment                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                               
  - OpenViking version: `0.3.22`                                                                                                                                                                                                                               
  - Install source: PyPI package                                                                                                                                                                                                                               
  - Mode: HTTP server                                                                                                                                                                                                                                          
  - Client: Python `AsyncHTTPClient`                                                                                                                                                                                                                           
  - Embedding provider: Ollama                                                                                                                                                                                                                                 
  - Embedding model: `bge-m3:latest`                                                                                                                                                                                                                           
  - Config highlights:                                                                                                                                                                                                                                         
    - `embedding.text_source = "content_only"`                                                                                                                                                                                                                 
    - `embedding.max_concurrent = 1`                                                                                                                                                                                                                           
    - local filesystem workspace                                                                                                                                                                                                                               
  - Resource type: local directory with two Markdown files                                                                                                                                                                                                     
  - API call:                                                                                                                                                                                                                                                  
    - `add_resource(..., wait=True, strict=False, preserve_structure=True, to=<same_uri>)`                                                                                                                                                                     
                                                                                                                                                                                                                                                               
  ## Reproduction                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                               
  Create a clean OpenViking workspace and start `openviking-server`.                                                                                                                                                                                           
                                                                                                                                                                                                                                                               
  Create a local directory:

  ```text
  knowledge-base/
    alpha.md
    beta.md

  alpha.md:

  # Alpha

  OpenViking repeat add alpha v1. Stable text marker ALPHA_REPEAT.
beta.md:

  # Beta

  OpenViking repeat add beta v1. Stable text marker BETA_REPEAT.

  Then run the same import three times:

  from openviking import AsyncHTTPClient

  client = AsyncHTTPClient(
      url="http://127.0.0.1:<port>",
      account="codeask",
      user="codeask",
      agent_id="codeask",
      timeout=300,
  )
  await client.initialize()

  to_uri = "viking://resources/codeask/wiki/repeat-add-incremental-smoke"

  # 1. Initial import
  await client.add_resource(
      path="/tmp/.../knowledge-base",
      to=to_uri,
      reason="repeat add smoke first_full",
      instruction="Index this tiny wiki fixture for incremental add_resource testing.",
      wait=True,
      timeout=240,
      strict=False,
      preserve_structure=True,
  )

  # 2. Repeat import with no file changes
  await client.add_resource(
      path="/tmp/.../knowledge-base",
      to=to_uri,
      reason="repeat add smoke second_no_change",
      instruction="Index this tiny wiki fixture for incremental add_resource testing.",
      wait=True,
      timeout=240,
      strict=False,
      preserve_structure=True,
  )
# 3. Modify only alpha.md, then import again                                                                                                                                                                                                                 
  # alpha.md content changed, beta.md unchanged                                                                                                                                                                                                                
  await client.add_resource(                                                                                                                                                                                                                                   
      path="/tmp/.../knowledge-base",                                                                                                                                                                                                                          
      to=to_uri,                                                                                                                                                                                                                                               
      reason="repeat add smoke third_one_file_changed",                                                                                                                                                                                                        
      instruction="Index this tiny wiki fixture for incremental add_resource testing.",                                                                                                                                                                        
      wait=True,                                                                                                                                                                                                                                               
      timeout=240,                                                                                                                                                                                                                                             
      strict=False,                                                                                                                                                                                                                                            
      preserve_structure=True,                                                                                                                                                                                                                                 
  )                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                               
  ## Actual Result                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                               
  All three calls succeeded with no queue errors, but the returned queue_status showed the same Embedding work each time:                                                                                                                                      
                                                                                                                                                                                                                                                               
  first_full:                                                                                                                                                                                                                                                  
    Semantic.processed = 1                                                                                                                                                                                                                                     
    Embedding.processed = 8                                                                                                                                                                                                                                    
    error_count = 0                                                                                                                                                                                                                                            
    elapsed = 8.67s                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                               
  second_no_change:                                                                                                                                                                                                                                            
    Semantic.processed = 1                                                                                                                                                                                                                                     
    Embedding.processed = 8                                                                                                                                                                                                                                    
    error_count = 0                                                                                                                                                                                                                                            
    elapsed = 3.60s

  third_one_file_changed:
    Semantic.processed = 1
    Embedding.processed = 8
    error_count = 0
    elapsed = 3.56s

  Full returned queue summaries:
{                                                                                                                                                                                                                                                            
    "first": {                                                                                                                                                                                                                                                 
      "Semantic": {                                                                                                                                                                                                                                            
        "processed": 1,                                                                                                                                                                                                                                        
        "requeue_count": 0,                                                                                                                                                                                                                                    
        "error_count": 0                                                                                                                                                                                                                                       
      },                                                                                                                                                                                                                                                       
      "Embedding": {                                                                                                                                                                                                                                           
        "processed": 8,                                                                                                                                                                                                                                        
        "requeue_count": 0,                                                                                                                                                                                                                                    
        "error_count": 0                                                                                                                                                                                                                                       
      }                                                                                                                                                                                                                                                        
    },                                                                                                                                                                                                                                                         
    "second_no_change": {                                                                                                                                                                                                                                      
      "Semantic": {                                                                                                                                                                                                                                            
        "processed": 1,
        "requeue_count": 0,
        "error_count": 0
      },
      "Embedding": {
        "processed": 8,
        "requeue_count": 0,
        "error_count": 0
      }
    },
    "third_one_file_changed": {
      "Semantic": {
        "processed": 1,
        "requeue_count": 0,
        "error_count": 0
      },
      "Embedding": {
        "processed": 8,
        "requeue_count": 0,
        "error_count": 0
      }
    }
  }

 ## Expected Result

  For a repeated add_resource with the same directory and same to URI:

  1. If no file content changed:
      - Ideally no embedding provider calls should be made.
      - Embedding.processed should be 0, or there should be a separate skipped/noop metric.
      - The task should still complete successfully.

  2. If only one file changed:
      - Only the changed file and necessary affected directory metadata should be re-vectorized.
      - Unchanged sibling files should not be re-embedded.

  ## Why This Matters

  We are integrating OpenViking as a wiki knowledge base indexer. A normal integration pattern is:

  - use one stable to URI per feature directory;
  - periodically refresh the same directory;
  - optionally use watch_interval or an application-side scheduled sweep.

  If a no-op refresh still performs full Embedding queue work, then scheduled refresh/watch becomes expensive for large wiki/code directories, especially with local embedding models.

  This also makes it hard for downstream systems to estimate indexing cost and progress, because Embedding.processed does not distinguish actual embedding calls from skipped/no-op work.

  ## Questions

  1. Is this behavior expected in v0.3.22?
  2. Does Embedding.processed count actual embedding provider calls, or can it include skipped/no-op queue messages?
  3. Is there a supported way to tell whether repeated add_resource actually reused existing vectors?
  4. Is watch_interval expected to avoid re-embedding unchanged files, or does it simply re-run add_resource on schedule?
  5. Is there a recommended API for efficient no-op refresh of a stable local directory?

  ## Related Work

  - #659 introduced add_resource incremental update behavior.
  - #709 added resource watch scheduling/status tracking.
  - #890 looks related to skipping re-embedding when content hash is unchanged.
  - #1800 looks related to skipping unchanged sibling subtrees during incremental update.

  The current behavior seems to indicate that repeated same-directory imports still perform full Embedding queue work, or at least report full Embedding queue work, even when content is unchanged.



### Context

_No response_

### Code Example (Optional)

```python

```

### Related Area

None

### Before Asking

- [x] I have checked the [documentation](https://www.openviking.ai/docs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Repeated add_resource to the same directory/URI still reports full Embedding queue work on v0.3.22 #2383

Your Question

Summary

Environment

Reproduction

Related Area

Before Asking

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Question]: Repeated add_resource to the same directory/URI still reports full Embedding queue work on v0.3.22 #2383

Description

Your Question

Summary

Environment

Reproduction

Related Area

Before Asking

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions