Skip to content

fix: 修复附件 markdown 读取链路与生产部署挂载#647

Open
safesummer wants to merge 1 commit intoxerrors:mainfrom
safesummer:fix/attachment-sandbox-sync
Open

fix: 修复附件 markdown 读取链路与生产部署挂载#647
safesummer wants to merge 1 commit intoxerrors:mainfrom
safesummer:fix/attachment-sandbox-sync

Conversation

@safesummer
Copy link
Copy Markdown

修复聊天附件 markdown 读取失败的问题,并补充生产部署说明。

问题现象:

  • 上传 docx 后会生成 attachments/*.md
  • 前端文件树能看到该 md
  • 但智能体后续 read_file('/home/gem/user-data/uploads/attachments/*.md') 可能返回 404,导致无法继续处理附件内容

本次修复包含:

  • 将上传附件转换出的 markdown 同步写入 LangGraph files/uploads 状态
  • 补充附件状态同步与读取相关单元测试、集成测试
  • 修复生产环境中 sandbox provisioner 挂载错误 saves 目录导致的附件不可见问题
  • 补充生产部署、升级校准和版本记录文档

已验证:

  • 正式部署环境下 docx 上传后能生成 markdown
  • 智能体可成功读取 /home/gem/user-data/uploads/attachments/*.md
  • 非流式真实聊天请求已返回 finished 并基于附件内容生成回复

@xerrors
Copy link
Copy Markdown
Owner

xerrors commented Apr 18, 2026

感谢 PR,我先让 copilot 帮忙 review 一下,它给的意见供你参考

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在修复“聊天附件转换出的 Markdown 在智能体链路中读取不稳定/404”的问题,通过把解析后的 Markdown 内容同步写入 LangGraph state(files/uploads),并补齐相关测试与生产部署文档说明。

Changes:

  • 附件上传解析后,将 uploads 以及解析出的 Markdown 内容(files)写入 LangGraph state,提升 read_file 可用性
  • 更新 agent state 提取逻辑与单元/集成测试,覆盖“上传 office -> 转 Markdown -> state 可读”的链路
  • 补充生产部署校准、验证与升级指引;调整 API 镜像构建阶段 apt 下载重试配置

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/develop-guides/roadmap.md 记录新增的已知 bug/修复项条目
docs/develop-guides/changelog.md 增加 v0.6.1 修复说明(附件 state 同步、挂载、测试)
docs/advanced/deployment.md 增补生产环境变量注意事项、默认智能体模型校准、验证与升级步骤
docker/api.Dockerfile apt 源协议切换到 https 并增加重试参数以降低构建失败率
backend/test/unit/services/test_conversation_service_attachment_state.py 单测覆盖附件 state 同步时写入 files/uploads
backend/test/unit/services/test_chat_service_sync.py 单测更新 extract_agent_state 输出包含 uploads 并验证保留行为
backend/test/integration/api/test_chat_router.py 集成测试覆盖 docx 上传后 state files 可读与内容接口可读
backend/package/yuxi/services/conversation_service.py 新增/写入 state.files,并在同步时同时写入 uploads/files
backend/package/yuxi/services/chat_service.py extract_agent_state 输出结构补齐 uploads
backend/package/yuxi/agents/state.py LangGraph state schema 增加 files/uploads 字段及合并策略

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def merge_uploads(existing: list[dict] | None, new: list[dict] | None) -> list[dict]:
"""Replace uploads with the latest normalized snapshot."""
return list(new or existing or [])
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge_uploads() uses new or existing, so an explicit empty list from graph.aupdate_state(values={"uploads": []}) will be treated as falsy and the old uploads will be retained. This breaks the “latest snapshot” semantics and means deleting all attachments won’t clear state.uploads. Consider returning new whenever it is not None (even if empty), and only falling back to existing when new is None.

Suggested change
return list(new or existing or [])
if new is not None:
return list(new)
if existing is not None:
return list(existing)
return []

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +25
"""Merge state file dictionaries with latest values winning."""
if existing is None:
return new or {}
if new is None:
return existing
return existing | new
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge_files() merges dicts with existing | new, which cannot remove keys. When attachments are deleted, conversation_service._sync_thread_upload_state() will send an empty {} for files, but the old file entries will remain in state. If files is intended to represent the current attachment snapshot, consider replacing with new when provided (even if empty), or implement a deletion/removal mechanism.

Suggested change
"""Merge state file dictionaries with latest values winning."""
if existing is None:
return new or {}
if new is None:
return existing
return existing | new
"""Replace state file dictionaries with the latest snapshot when provided."""
if new is None:
return existing or {}
return dict(new or {})

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +176
if attachment.get("status") != "parsed":
continue

file_path = attachment.get("file_path")
markdown = attachment.get("markdown")
if not isinstance(file_path, str) or not file_path.strip():
continue
if not isinstance(markdown, str) or not markdown:
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_state_files() only uses attachment["file_path"] as the key. Existing attachment records (or future records) may only have path set (which is already validated in _build_state_uploads()), causing parsed attachments to be skipped and not materialized into state.files. Consider falling back to attachment.get("path") when file_path is missing/blank for better backward compatibility.

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +188
files: dict[str, dict] = {}
for attachment in attachments:
if attachment.get("status") != "parsed":
continue

file_path = attachment.get("file_path")
markdown = attachment.get("markdown")
if not isinstance(file_path, str) or not file_path.strip():
continue
if not isinstance(markdown, str) or not markdown:
continue

uploaded_at = attachment.get("uploaded_at") or utc_isoformat()
files[file_path] = {
"content": markdown.splitlines(),
"created_at": uploaded_at,
"modified_at": uploaded_at,
}

return files


Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now two very similar helpers for building state.files: yuxi.services.chat_service._build_state_files() and this new conversation_service._build_state_files(). Keeping both risks behavioral drift (e.g., splitlines vs split("\n"), timestamp defaults, path validation). Consider consolidating into a single shared helper (or importing one from the other) to ensure consistent state materialization.

Suggested change
files: dict[str, dict] = {}
for attachment in attachments:
if attachment.get("status") != "parsed":
continue
file_path = attachment.get("file_path")
markdown = attachment.get("markdown")
if not isinstance(file_path, str) or not file_path.strip():
continue
if not isinstance(markdown, str) or not markdown:
continue
uploaded_at = attachment.get("uploaded_at") or utc_isoformat()
files[file_path] = {
"content": markdown.splitlines(),
"created_at": uploaded_at,
"modified_at": uploaded_at,
}
return files
from yuxi.services.chat_service import _build_state_files as _shared_build_state_files
return _shared_build_state_files(attachments)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants