05 - File 智能体

一、定位

FileReactAgent 是基于 ReAct + RAG 的文件问答智能体，专注于围绕已上传文件内容的对话式问答。核心工具是 FileContentService.loadContent，会根据文件的 embed 字段自动选择直接加载或RAG 语义检索模式。

二、核心特性

特性	说明
多文件类型	支持 PDF / DOCX / TXT（文本） + PNG / JPG / GIF / BMP（图片）
RAG 自适应	embed=1 → RAG；embed=0/null → 全量
多轮 ReAct	可调用 `loadContent` 多次，支持追问
持久化记忆	默认从 MySQL 加载最近 30 条消息
图片问答	图片上传时使用 Qwen-VL 识别，存入 `extractedText`
引用溯源	引用类型响应携带 `fileid` 等元数据

三、执行流程

Client ──HTTP──→ AgentController.fileStream()
                       │
                       ▼
        initFileReactAgent() ─── 注入 FileContentService
                       │
                       ▼
        createPersistentChatMemory(conversationId, 30)
                       │
                       ▼
        stream(conversationId, query, fileId)
                       │
                       ▼
        streamInternal():
          1. checkRunningTask()
          2. registerTask()
          3. 构造 messages:
             - SystemMessage (ReactAgentPrompts.getFilePrompt())
             - [optional] SystemMessage (systemPrompt)
             - History (loadChatHistory)
             - UserMessage("<question>...</question>")
             - UserMessage("<fileid>...</fileid>")
          4. saveQuestion(SaveQuestionRequest)
                       │
                       ▼
        ReAct 循环（同 WebSearch）
          - AI 决策: 调用 loadContent?
            → 是: loadContent(fileId, question)
                  → embed=1: RAG 检索
                  → embed=0: 直接加载
            → 否: 输出最终答案
                       │
                       ▼
        doFinally():
          - saveSessionResult() 保存到 ai_session
          - taskManager.stopTask() 清理任务

四、关键代码解析

4.1 消息构建

// ===== 加载 System Prompt（始终放在最开始）=====
messages.add(new SystemMessage(ReactAgentPrompts.getFilePrompt()));
if (StringUtils.hasText(systemPrompt)) {
    messages.add(new SystemMessage(systemPrompt));
}

// ===== 加载历史记忆 =====
loadChatHistory(conversationId, messages, true, true);

// ===== 加载文件内容或文件信息 =====
// （已废弃：loadFileContent 改为通过工具调用）
messages.add(new UserMessage("<question>" + question + "</question>"));
messages.add(new UserMessage("<fileid>" + currentFileId + "</fileid>"));

提示词约束（getFilePrompt）：

你的回答必须基于当前文件的内容，禁止编造信息。
文件的具体内容请必须调用 loadContent 工具来获取。

4.2 工具调用识别

// 如果是 loadContent 工具，解析参数并发送 thinking 消息
if (toolName.contains("loadContent")) {
    JSONObject args = JSON.parseObject(argsJson);
    String question = (String) args.get("question");
    String loadThink = "📂 正在检索文件内容，请稍等...";
    sink.tryEmitNext(createThinkingResponse(loadThink));
}

4.3 Builder 模式

FileReactAgent agent = FileReactAgent.builder()
        .name("file react")
        .chatModel(chatModel)
        .tools(fileContentService)  // 包装为 ToolCallback
        .sessionService(sessionService)
        .taskManager(taskManager)
        .build();

注意：tools(...) 接受 ToolCallback...（变参）或 List<ToolCallback>。

五、FileContentService 工具

tool/FileContentService.java 是 File Agent 的核心工具：

@Tool(description = "根据文件ID加载文件内容或进行RAG语义检索...")
public String loadContent(
        @ToolParam(description = "文件ID") String fileId,
        @ToolParam(description = "用户的问题，用于语义检索（可选）") String question) {

    // 1. 查询文件信息
    var fileInfo = fileManageService.getFileInfo(fileId);

    // 2. 检查处理状态
    if (fileInfo.getStatus() != FileInfo.FileStatus.SUCCESS) {
        return "文件处理中或处理失败，当前状态: " + fileInfo.getStatus();
    }

    // 3. 根据 embed 字段选择加载方式
    Integer embed = fileInfo.getEmbed();
    if (embed != null && embed == 1) {
        return retrieveWithRAG(fileId, fileInfo, question);  // RAG
    } else {
        return loadDirectly(fileId, fileInfo);              // 全量
    }
}

5.1 加载策略

场景	触发条件	行为
小文件	`embed=0` 或 `embed=null`	`loadDirectly()` → 返回完整 `extractedText`
大文件	`embed=1`	`retrieveWithRAG()` → 语义检索 TopK=5 片段
文件未就绪	`status != SUCCESS`	返回提示信息
文件不存在	`fileInfo == null`	返回错误信息

5.2 响应格式

=== 文件信息 ===
文件名: xxx.pdf
文件类型: pdf

=== 文件内容 ===
[内容或检索结果]

六、文件上传与向量化

文件上传和向量化由 FileManageService 和 FileParserService 协作完成（详见 10 文件管理服务）：

HTTP POST /file/upload (MultipartFile)
        │
        ▼
FileManageService.uploadFile()
        │
        ├── 1. 生成 fileId
        ├── 2. 保存到数据库（状态 PROCESSING）
        ├── 3. 上传到 MinIO
        ├── 4. 更新数据库（状态 SUCCESS + minioPath）
        │
        ├── 5. isTextFile?
        │     └── 解析（PDF/Word/TXT）→ extractedText
        │         └── 文本长度 ≥ 5000?
        │             ├── 是: processLargeFileEmbedding()
        │             │      └── 切分 → 向量化 → 存储到 PgVector
        │             │              (embed=1)
        │             └── 否: (embed=0)
        │
        └── 6. isImageFile?
              └── 调用 Qwen-VL 多模态识别
                  └── 识别结果存入 extractedText
                      (embed=0)

大文件阈值：LARGE_FILE_THRESHOLD = 5000 字符

切分器：OverlapParagraphTextSplitter(500, 50) - 每块 500 字符，相邻重叠 50 字符

七、与 WebSearch Agent 的对比

维度	WebSearch Agent	File Agent
工具	Tavily MCP（搜索）	`loadContent` 工具（文件）
数据源	公网	MinIO + PgVector
工具调用	`search` / `tavily`	`loadContent`
maxRounds	5	默认 5
thinking 消息	`🔍 正在搜索...`	`📂 正在检索文件内容...`
输出 reference	Tavily 链接列表	暂未启用（可扩展）

八、典型请求/响应

请求：

1	GET /agent/file/stream?query=这份文档的核心观点是什么&conversationId=user-456&fileId=abc-123

流式响应：

data: {"type":"thinking","content":"📂 正在检索文件内容，请稍等..."}

data: {"type":"thinking","content":"好的，我已经获取了文件内容"}

data: {"type":"text","content":"根据文件内容"}

data: {"type":"text","content":"，这份文档的核心观点是..."}

九、错误处理

场景	处理
`fileId` 为空	Flux.error(new IllegalArgumentException(“文件ID不能为空”))
`query` 为空	Flux.error(new IllegalArgumentException(“查询参数不能为空”))
任务正在执行	Flux.error(new IllegalStateException(“该会话正在执行中，请稍后再试”))
文件未找到	工具返回 “文件不存在，文件ID: xxx”
文件处理中	工具返回 “文件处理中或处理失败…”
LLM 调用失败	`doOnError` 触发 `sink.tryEmitError`