09 - RAG 与向量检索

一、定位

EmbeddingService 是 RAG(Retrieval-Augmented Generation)的核心实现,封装了:

  • Query 压缩重写CompressionQueryTransformer
  • 多查询扩展MultiQueryExpander
  • 向量检索(PgVector)
  • 元数据过滤fileid

二、RAG 流水线

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
用户问题 question

┌───────────────────────────────────────┐
│ Step 1: Query 压缩重写 │
│ CompressionQueryTransformer │
│ - 移除冗余/口语化表达 │
│ - 保留核心语义 │
│ 输出: compressed query │
└───────────────┬───────────────────────┘

┌───────────────────────────────────────┐
│ Step 2: Query 多查询扩展 │
│ MultiQueryExpander(numberOfQueries=3) │
│ - 生成 3 个相关查询 │
│ - 保留原始查询(includeOriginal=true)│
│ 输出: 4 个 Query │
└───────────────┬───────────────────────┘

┌───────────────────────────────────────┐
│ Step 3: 向量检索(每个 Query) │
│ - 构建 filter: fileid = xxx │
│ - topK = 5 │
│ - similarity: COSINE_DISTANCE │
│ - index: HNSW │
└───────────────┬───────────────────────┘

┌───────────────────────────────────────┐
│ Step 4: 去重 + 聚合 │
│ - seenIds 去重 │
│ - 保留 Document.getText() │
│ 输出: List<String> │
└───────────────────────────────────────┘

三、核心代码

3.1 EmbeddingService.ragRetrieve

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@Service
public class EmbeddingService {

@PostConstruct
public void init() {
// 动态创建 vectorStore,表名: vector_file_info
vectorStore = pgVectorStoreFactory.createPgVectorStore("vector_file_info");
}

public List<String> ragRetrieve(String fileId, String question) {
Query query = Query.builder().text(question).build();

// Step 1: 压缩重写
ChatClient chatClient = ChatClient.builder(chatModel).build();
CompressionQueryTransformer queryTransformer = CompressionQueryTransformer.builder()
.chatClientBuilder(chatClient.mutate())
.build();
Query compressed = queryTransformer.transform(query);

// Step 2: 多查询扩展
QueryExpander queryExpander = MultiQueryExpander.builder()
.chatClientBuilder(chatClient.mutate())
.numberOfQueries(3)
.includeOriginal(true)
.build();
List<Query> expandedQueries = queryExpander.expand(compressed);

// Step 3: 向量检索(带 fileid 过滤)
List<String> results = new ArrayList<>();
Set<String> seenIds = new HashSet<>();
FilterExpressionBuilder builder = new FilterExpressionBuilder();
Filter.Expression filter = builder.eq("fileid", fileId).build();

for (Query eq : expandedQueries) {
List<Document> docs = vectorStore.similaritySearch(
SearchRequest.builder()
.query(eq.text())
.topK(5)
.filterExpression(filter)
.build());

for (Document doc : docs) {
if (seenIds.add(doc.getId())) { // 去重
results.add(doc.getText());
}
}
}

return results;
}
}

3.2 向量化存储(embedAndStore)

1
2
3
4
5
6
7
8
9
10
private static final int EMBEDDING_BATCH_SIZE = 9;

public void embedAndStore(List<Document> documents) {
// 批量写入,避免单次请求过大
for (int i = 0; i < documents.size(); i += EMBEDDING_BATCH_SIZE) {
List<Document> batches = documents.subList(i,
Math.min(i + EMBEDDING_BATCH_SIZE, documents.size()));
vectorStore.doAdd(batches);
}
}

四、向量库配置

utils/DynamicPgVectorStoreFactory.java 动态创建 PgVectorStore:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@Component
public class DynamicPgVectorStoreFactory {

public PgVectorStore createPgVectorStore(String tableName) {
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);

// 检查表是否存在
boolean tableExists = tableExists(tableName);

return PgVectorStore.builder(jdbcTemplate, embeddingModel)
.dimensions(1024) // 向量维度
.distanceType(PgVectorStore.PgDistanceType.COSINE_DISTANCE) // 余弦距离
.indexType(PgVectorStore.PgIndexType.HNSW) // HNSW 索引
.initializeSchema(true) // 自动创建 schema
.removeExistingVectorStoreTable(false) // 不删除已存在表
.vectorTableName(tableName)
.maxDocumentBatchSize(100) // 批量大小
.build();
}
}

关键配置

配置项说明
dimensions1024嵌入向量维度(text-embedding-v4)
distanceTypeCOSINE_DISTANCE余弦相似度
indexTypeHNSW高维近似最近邻索引
initializeSchematrue自动建表

application.yml

1
2
3
4
5
6
7
embeddings:
store:
host: localhost
port: 5433
database: vector_store
user: postgres
password: postgres

五、文件切分

splitter/OverlapParagraphTextSplitter.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public class OverlapParagraphTextSplitter extends TextSplitter {

private final int chunkSize; // 每块最大字符数
private final int overlap; // 相邻块重叠字符数

public OverlapParagraphTextSplitter(int chunkSize, int overlap) {
if (chunkSize <= 0) throw new IllegalArgumentException("chunkSize 必须大于 0");
if (overlap < 0) throw new IllegalArgumentException("overlap 不能为负数");
if (overlap >= chunkSize) throw new IllegalArgumentException("overlap 不能大于等于 chunkSize");
this.chunkSize = chunkSize;
this.overlap = overlap;
}

@Override
protected List<String> splitText(String text) {
// 1. 按段落分割
String[] paragraphs = text.split("\\n+");
List<String> allChunks = new ArrayList<>();
StringBuilder currentChunk = new StringBuilder();

for (String paragraph : paragraphs) {
int start = 0;
while (start < paragraph.length()) {
int remainingSpace = chunkSize - currentChunk.length();
int end = Math.min(start + remainingSpace, paragraph.length());
currentChunk.append(paragraph, start, end);

if (currentChunk.length() >= chunkSize) {
allChunks.add(currentChunk.toString());

// 计算重叠
String overlapText = "";
if (overlap > 0) {
int overlapStart = Math.max(0, currentChunk.length() - overlap);
overlapText = currentChunk.substring(overlapStart);
}
currentChunk = new StringBuilder();
if (!overlapText.isEmpty()) {
currentChunk.append(overlapText);
}
}
start = end;
}
}

if (currentChunk.length() > 0) {
allChunks.add(currentChunk.toString());
}
return allChunks;
}
}

配置FileManageService.processLargeFileEmbedding 使用 chunkSize=500, overlap=50

算法特点

  • 按段落优先切分
  • 段内超长时再按字符数切分
  • 相邻块之间保留 overlap 字符重叠(避免语义割裂)

六、文件上传与向量化集成

FileManageService.uploadFile 完整流程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
@Transactional(rollbackFor = Exception.class)
public FileInfo uploadFile(MultipartFile file) {
// 1. 生成 fileId
// 2. 保存到数据库(状态 PROCESSING)
// 3. 上传到 MinIO
// 4. 更新数据库(状态 SUCCESS + minioPath)
// 5. 根据文件类型处理:
if (isTextFile(fileType)) {
// 解析文本
var parseResult = fileParserService.parseFile(file);
fileInfo.setExtractedText(parseResult.truncatedText());

// 大文件才向量化
if (isLargeFile(parseResult.fullText())) {
processLargeFileEmbedding(fileId, parseResult.fullText());
fileInfo.setEmbed(1);
}
} else if (isImageFile(fileType)) {
// Qwen-VL 识别
fileInfo.setExtractedText(image2Text(file));
}
// 6. 持久化
}

private void processLargeFileEmbedding(String fileId, String text) {
Document document = new Document(text);
List<Document> documents = List.of(document);

// 切分
OverlapParagraphTextSplitter splitter = new OverlapParagraphTextSplitter(500, 50);
List<Document> chunks = splitter.apply(documents);

// 添加元数据
for (int i = 0; i < chunks.size(); i++) {
Document chunk = chunks.get(i);
chunk.getMetadata().put("fileid", fileId);
chunk.getMetadata().put("chunkId", i);
}

// 向量化存储
embeddingService.embedAndStore(chunks);
}

大文件阈值LARGE_FILE_THRESHOLD = 5000 字符

七、嵌入模型

application.yml

1
2
3
4
5
6
7
8
9
spring:
ai:
openai:
embedding:
base-url: https://dashscope.aliyuncs.com/compatible-mode/
api-key: @dashscope.api.key@
options:
model: text-embedding-v4
dimmension: 1024

模型选择:通义千问 text-embedding-v4(1024 维)

八、过滤表达式

PgVector 支持丰富的元数据过滤:

1
2
3
4
5
6
7
8
// 等于
Filter.Expression filter = builder.eq("fileid", fileId).build();

// 也支持
builder.and(...)
builder.or(...)
builder.gt(...)
builder.in(...)

典型查询

1
2
3
4
SELECT * FROM vector_file_info
WHERE fileid = 'abc-123'
ORDER BY embedding <=> $1 -- 余弦距离排序
LIMIT 5;

九、Spring AI 检索器

EmbeddingService 在底层使用了 Spring AI 提供的:

  • CompressionQueryTransformer - Query 压缩
  • MultiQueryExpander - 多查询扩展
  • PgVectorStore - PgVector 集成
  • FilterExpressionBuilder - 过滤表达式构建

十、性能指标

指标
单次检索 Query 数1(原始)+ 3(扩展)= 4
每个 Query topK5
单次检索返回上限20(4×5 去重)
嵌入维度1024
距离度量COSINE_DISTANCE
索引类型HNSW

十一、扩展方向

  • 混合检索:向量 + 关键词(BM25)混合排序
  • Reranker:使用 bge-reranker 对初筛结果重排
  • HyDE:使用假设性文档嵌入(Hypothetical Document Embeddings)
  • 跨文件检索:去除 fileid 过滤实现全库检索
  • 多模态 Embedding:支持图片 embedding