[Feature]: Refactor VectorDB storage abstraction to support self-hosted external backends such as PostgreSQL + pgvector

## Problem Statement

- OpenViking's current `openviking/storage/vectordb` and `openviking/storage/vectordb_adapters` layers are still tightly coupled to the existing collection runtime and index lifecycle.
- This makes it difficult to integrate with self-hosted external vector database ecosystems, especially in environments where users already operate their own databases and expect OpenViking to connect to them instead of owning the database runtime itself.
- The current design is also not a good fit for managed database deployments or horizontally scalable database architectures, because the persistence layer is not cleanly separated from the rest of the storage orchestration logic.
- This is especially relevant for broader adoption outside the current default ecosystem, where PostgreSQL-based infrastructure is common and users expect easier integration with existing database stacks.

---

- OpenViking 当前的 `openviking/storage/vectordb` 和 `openviking/storage/vectordb_adapters` 仍然与现有的 collection runtime 和 index lifecycle 紧耦合。
- 在用户已经有自己的数据库基础设施、并且希望 OpenViking 连接这些数据库而不是自行承载数据库 runtime 的场景下， OpenViking 很难对接自托管的外部向量数据库生态。
- 当前设计也不适合 managed database 部署或可水平扩展的数据库架构，因为 persistence layer 和其他 storage orchestration 逻辑没有被清晰分离。
- 这对于当前默认生态之外的更广泛采用尤其重要，在北美，agentic application 有很大一部分的storage layer 都是基于managed postgres

## Proposed Solution

- Refactor the storage abstraction so that OpenViking acts as an orchestration layer for schema, table, and index setup, instead of owning the full runtime behavior of the underlying vector database.

- Introduce a stable backend interface that remains backward compatible with existing functionality while allowing new providers to be added with minimal impact on retrieval and business logic.
- Use PostgreSQL + pgvector as the first target backend for this refactor, since it is widely used in self-hosted and North American infrastructure environments.
- Design the refactor so that future backends can be integrated in a provider-style way, similar to how AGFS already supports multiple storage backends such as S3.
---
- 重构存储抽象，使 OpenViking 成为负责 schema、table 和 index 初始化与编排的 orchestration layer，而不是继续拥有底层向量数据库的完整 runtime 行为。
- 引入一个稳定的后端接口，在保持现有功能 backward compatible 的前提下，让新 provider 的接入尽量不影响 retrieval 和业务逻辑层。
- 将 PostgreSQL + pgvector 作为这次重构的第一目标后端，因为它在自托管场景和北美基础设施环境中都非常常见。
- 这次重构应采用 provider-style 的扩展方式，为未来接入更多后端打基础，类似 AGFS 目前已经能够支持 S3 等多种存储后端。

## Alternatives Considered

- Keep the current architecture and add more adapters directly on top of it.This would likely increase maintenance cost without solving the core problem, because the underlying abstraction would still assume the current runtime model.
- Delay the refactor until more backends are requested.However, starting with a cleaner storage interface now would make future ecosystem expansion much easier and reduce the cost of later integration work.
---
- 保持现有架构不变，仅在其上继续增加更多 adapter。这样做大概率只会增加维护成本，而不能解决核心问题，因为底层抽象仍然默认沿用当前的 runtime model。
- 等到有更多后端需求时再进行重构。但如果现在就开始梳理更清晰的 storage interface，未来扩展生态会更容易，也能降低后续接入工作的成本。

## Feature Area

- Storage/VectorDB

- Retrieval/Search

- Core (Client/Engine)

## Use Case

- I want to self-host OpenViking and connect it to an existing database stack rather than being forced into the current built-in vector database runtime model.In practice, PostgreSQL is often already available in production environments, so supporting pgvector would reduce adoption friction and make onboarding easier for more users.A cleaner storage interface would also make OpenViking more attractive to users who want to integrate it into their own infrastructure instead of adopting a new database runtime.

---

- 我在尝试self-host 的方式部署 OpenViking，并将其连接到现有数据库栈，而不是被迫使用当前内置vectordb。在实际adopt这类项目的时候，storage layer 往往已经存在了，因此支持 pgvector 可以显著降低接入门槛，让更多用户更容易上手，同时，不管是个人，还是企业 更清晰的存储接口也会让 OpenViking 对那些希望灵活接入自身基础设施、而不是引入一套新数据库 runtime 的用户更有吸引力。这也可以让每个人更灵活的配置自己的infra 资源 :D



# Sample IaC for Storage Engine Layer 
```yaml
storage:
  vectordb:
    backend: postgres_pgvector
    url: postgresql://user:password@host:5432/dbname
    schema: openviking
    table: context
    index_name: default
```

# Additional Context
- 这个需求的重点并不只是一次性地新增一个后端，Postgres(pgvector), milvus, Qdurant 这种，它更重要的目标是重新定义 storage boundary，让 OpenViking 能更自然地接入外部向量数据库生态，先支持 PostgreSQL + pgvector 会是一个很务实的验证方式，也有助于提升 OpenViking 在数据库生态更丰富环境中的采用率

Contribution

- [x] I am willing to contribute to implementing this feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Refactor VectorDB storage abstraction to support self-hosted external backends such as PostgreSQL + pgvector #2357

Problem Statement

Proposed Solution

Alternatives Considered

Feature Area

Use Case

Sample IaC for Storage Engine Layer

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Refactor VectorDB storage abstraction to support self-hosted external backends such as PostgreSQL + pgvector #2357

Description

Problem Statement

Proposed Solution

Alternatives Considered

Feature Area

Use Case

Sample IaC for Storage Engine Layer

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions