09 - RAG 与向量检索
一、定位
EmbeddingService 是 RAG(Retrieval-Augmented Generation)的核心实现,封装了:
- Query 压缩重写(
CompressionQueryTransformer) - 多查询扩展(
MultiQueryExpander) - 向量检索(PgVector)
- 元数据过滤(
fileid)
二、RAG 流水线
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| 用户问题 question ↓ ┌───────────────────────────────────────┐ │ Step 1: Query 压缩重写 │ │ CompressionQueryTransformer │ │ - 移除冗余/口语化表达 │ │ - 保留核心语义 │ │ 输出: compressed query │ └───────────────┬───────────────────────┘ ↓ ┌───────────────────────────────────────┐ │ Step 2: Query 多查询扩展 │ │ MultiQueryExpander(numberOfQueries=3) │ │ - 生成 3 个相关查询 │ │ - 保留原始查询(includeOriginal=true)│ │ 输出: 4 个 Query │ └───────────────┬───────────────────────┘ ↓ ┌───────────────────────────────────────┐ │ Step 3: 向量检索(每个 Query) │ │ - 构建 filter: fileid = xxx │ │ - topK = 5 │ │ - similarity: COSINE_DISTANCE │ │ - index: HNSW │ └───────────────┬───────────────────────┘ ↓ ┌───────────────────────────────────────┐ │ Step 4: 去重 + 聚合 │ │ - seenIds 去重 │ │ - 保留 Document.getText() │ │ 输出: List<String> │ └───────────────────────────────────────┘
|
三、核心代码
3.1 EmbeddingService.ragRetrieve
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| @Service public class EmbeddingService {
@PostConstruct public void init() { vectorStore = pgVectorStoreFactory.createPgVectorStore("vector_file_info"); }
public List<String> ragRetrieve(String fileId, String question) { Query query = Query.builder().text(question).build();
ChatClient chatClient = ChatClient.builder(chatModel).build(); CompressionQueryTransformer queryTransformer = CompressionQueryTransformer.builder() .chatClientBuilder(chatClient.mutate()) .build(); Query compressed = queryTransformer.transform(query);
QueryExpander queryExpander = MultiQueryExpander.builder() .chatClientBuilder(chatClient.mutate()) .numberOfQueries(3) .includeOriginal(true) .build(); List<Query> expandedQueries = queryExpander.expand(compressed);
List<String> results = new ArrayList<>(); Set<String> seenIds = new HashSet<>(); FilterExpressionBuilder builder = new FilterExpressionBuilder(); Filter.Expression filter = builder.eq("fileid", fileId).build();
for (Query eq : expandedQueries) { List<Document> docs = vectorStore.similaritySearch( SearchRequest.builder() .query(eq.text()) .topK(5) .filterExpression(filter) .build());
for (Document doc : docs) { if (seenIds.add(doc.getId())) { results.add(doc.getText()); } } }
return results; } }
|
3.2 向量化存储(embedAndStore)
1 2 3 4 5 6 7 8 9 10
| private static final int EMBEDDING_BATCH_SIZE = 9;
public void embedAndStore(List<Document> documents) { for (int i = 0; i < documents.size(); i += EMBEDDING_BATCH_SIZE) { List<Document> batches = documents.subList(i, Math.min(i + EMBEDDING_BATCH_SIZE, documents.size())); vectorStore.doAdd(batches); } }
|
四、向量库配置
utils/DynamicPgVectorStoreFactory.java 动态创建 PgVectorStore:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| @Component public class DynamicPgVectorStoreFactory {
public PgVectorStore createPgVectorStore(String tableName) { JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
boolean tableExists = tableExists(tableName);
return PgVectorStore.builder(jdbcTemplate, embeddingModel) .dimensions(1024) .distanceType(PgVectorStore.PgDistanceType.COSINE_DISTANCE) .indexType(PgVectorStore.PgIndexType.HNSW) .initializeSchema(true) .removeExistingVectorStoreTable(false) .vectorTableName(tableName) .maxDocumentBatchSize(100) .build(); } }
|
关键配置:
| 配置项 | 值 | 说明 |
|---|
dimensions | 1024 | 嵌入向量维度(text-embedding-v4) |
distanceType | COSINE_DISTANCE | 余弦相似度 |
indexType | HNSW | 高维近似最近邻索引 |
initializeSchema | true | 自动建表 |
application.yml:
1 2 3 4 5 6 7
| embeddings: store: host: localhost port: 5433 database: vector_store user: postgres password: postgres
|
五、文件切分
splitter/OverlapParagraphTextSplitter.java:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| public class OverlapParagraphTextSplitter extends TextSplitter {
private final int chunkSize; private final int overlap;
public OverlapParagraphTextSplitter(int chunkSize, int overlap) { if (chunkSize <= 0) throw new IllegalArgumentException("chunkSize 必须大于 0"); if (overlap < 0) throw new IllegalArgumentException("overlap 不能为负数"); if (overlap >= chunkSize) throw new IllegalArgumentException("overlap 不能大于等于 chunkSize"); this.chunkSize = chunkSize; this.overlap = overlap; }
@Override protected List<String> splitText(String text) { String[] paragraphs = text.split("\\n+"); List<String> allChunks = new ArrayList<>(); StringBuilder currentChunk = new StringBuilder();
for (String paragraph : paragraphs) { int start = 0; while (start < paragraph.length()) { int remainingSpace = chunkSize - currentChunk.length(); int end = Math.min(start + remainingSpace, paragraph.length()); currentChunk.append(paragraph, start, end);
if (currentChunk.length() >= chunkSize) { allChunks.add(currentChunk.toString());
String overlapText = ""; if (overlap > 0) { int overlapStart = Math.max(0, currentChunk.length() - overlap); overlapText = currentChunk.substring(overlapStart); } currentChunk = new StringBuilder(); if (!overlapText.isEmpty()) { currentChunk.append(overlapText); } } start = end; } }
if (currentChunk.length() > 0) { allChunks.add(currentChunk.toString()); } return allChunks; } }
|
配置:FileManageService.processLargeFileEmbedding 使用 chunkSize=500, overlap=50。
算法特点:
- 按段落优先切分
- 段内超长时再按字符数切分
- 相邻块之间保留
overlap 字符重叠(避免语义割裂)
六、文件上传与向量化集成
FileManageService.uploadFile 完整流程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| @Transactional(rollbackFor = Exception.class) public FileInfo uploadFile(MultipartFile file) { if (isTextFile(fileType)) { var parseResult = fileParserService.parseFile(file); fileInfo.setExtractedText(parseResult.truncatedText());
if (isLargeFile(parseResult.fullText())) { processLargeFileEmbedding(fileId, parseResult.fullText()); fileInfo.setEmbed(1); } } else if (isImageFile(fileType)) { fileInfo.setExtractedText(image2Text(file)); } }
private void processLargeFileEmbedding(String fileId, String text) { Document document = new Document(text); List<Document> documents = List.of(document);
OverlapParagraphTextSplitter splitter = new OverlapParagraphTextSplitter(500, 50); List<Document> chunks = splitter.apply(documents);
for (int i = 0; i < chunks.size(); i++) { Document chunk = chunks.get(i); chunk.getMetadata().put("fileid", fileId); chunk.getMetadata().put("chunkId", i); }
embeddingService.embedAndStore(chunks); }
|
大文件阈值:LARGE_FILE_THRESHOLD = 5000 字符
七、嵌入模型
application.yml:
1 2 3 4 5 6 7 8 9
| spring: ai: openai: embedding: base-url: https://dashscope.aliyuncs.com/compatible-mode/ api-key: @dashscope.api.key@ options: model: text-embedding-v4 dimmension: 1024
|
模型选择:通义千问 text-embedding-v4(1024 维)
八、过滤表达式
PgVector 支持丰富的元数据过滤:
1 2 3 4 5 6 7 8
| Filter.Expression filter = builder.eq("fileid", fileId).build();
builder.and(...) builder.or(...) builder.gt(...) builder.in(...)
|
典型查询:
1 2 3 4
| SELECT * FROM vector_file_info WHERE fileid = 'abc-123' ORDER BY embedding <=> $1 LIMIT 5;
|
九、Spring AI 检索器
EmbeddingService 在底层使用了 Spring AI 提供的:
CompressionQueryTransformer - Query 压缩MultiQueryExpander - 多查询扩展PgVectorStore - PgVector 集成FilterExpressionBuilder - 过滤表达式构建
十、性能指标
| 指标 | 值 |
|---|
| 单次检索 Query 数 | 1(原始)+ 3(扩展)= 4 |
| 每个 Query topK | 5 |
| 单次检索返回上限 | 20(4×5 去重) |
| 嵌入维度 | 1024 |
| 距离度量 | COSINE_DISTANCE |
| 索引类型 | HNSW |
十一、扩展方向
- 混合检索:向量 + 关键词(BM25)混合排序
- Reranker:使用 bge-reranker 对初筛结果重排
- HyDE:使用假设性文档嵌入(Hypothetical Document Embeddings)
- 跨文件检索:去除
fileid 过滤实现全库检索 - 多模态 Embedding:支持图片 embedding