Spring 集成 Elasticsearch 的三种姿势与选型

为什么这件事比看起来复杂

接 Elasticsearch 这件事，乍一看就是"加个依赖、写两行代码"。真做起来你会发现——**ES 的 Java 客户端在 7.x → 8.x 之间发生过断崖式变化 **：

7.x 时代主推 RestHighLevelClient，但它在 7.15 被官方标记为 deprecated
8.x 提供了全新的 ElasticsearchClient（基于 Java API Client）
Spring Data Elasticsearch 在不同版本间也跟着切换底层
还有 ES 7 → 8 的 mapping 兼容、_doc type 移除等历史包袱

很多博客文章给的代码例子混杂着 5.x、6.x、7.x、8.x 的写法，新人照着写常常踩到 deprecation 警告或者根本跑不起来。本文把这件事的现状捋清楚—— 给出一份适用于 ES 8.x + Spring Boot 3.x 的指南，同时讲一下迁移和共存。

三种 Java 接 ES 的方式

方式	抽象层级	适用场景
Low Level REST Client	极低	特殊、罕见 API，完全控制 HTTP
Java API Client (8.x 推荐)	中	99% 的业务场景
Spring Data Elasticsearch	高	CRUD + Repository 抽象，简化模型

简单说：

自己拼 JSON 调 REST → Low Level
ES 官方推荐的"业务用法" → Java API Client
像写 JPA 一样玩 ES → Spring Data ES

一、官方推荐：Java API Client（8.x）

引入依赖：

1
2
3
4
5
6
7
8
9
<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.10.0</version>
</dependency>
<dependency>
    <groupId>jakarta.json</groupId>
    <artifactId>jakarta.json-api</artifactId>
</dependency>

构造 Client：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@Configuration
public class EsConfig {
    @Bean
    public ElasticsearchClient esClient() {
        RestClient restClient = RestClient.builder(
                new HttpHost("localhost", 9200, "http")).build();
        ElasticsearchTransport transport =
                new RestClientTransport(restClient, new JacksonJsonpMapper());
        return new ElasticsearchClient(transport);
    }
}

索引文档

1
2
3
4
5
6
7
public void index(Article article) throws IOException {
    IndexResponse resp = esClient.index(i -> i
            .index("articles")
            .id(article.getId().toString())
            .document(article));
    log.info("indexed: {}", resp.result());
}

注意 Java API Client 大量用 Lambda + DSL Builder 风格——避免了 7.x 时代手拼 XContentBuilder 的痛苦。

查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SearchResponse<Article> resp = esClient.search(s -> s
        .index("articles")
        .query(q -> q.bool(b -> b
                .must(m -> m.match(t -> t.field("title").query(keyword)))
                .filter(f -> f.term(t -> t.field("status").value(1)))
        ))
        .from(0).size(10)
        .sort(so -> so.field(f -> f.field("createdAt").order(SortOrder.Desc))),
    Article.class);

resp.hits().hits().forEach(h -> log.info("{} - {}", h.id(), h.source().getTitle()));

这套 DSL 几乎和 ES 原生的 JSON 查询体一一对应，可读性远超手拼 JSON，类型也安全。

聚合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SearchResponse<Void> resp = esClient.search(s -> s
        .index("articles")
        .size(0)
        .aggregations("by_status", a -> a
                .terms(t -> t.field("status"))),
    Void.class);

resp.aggregations()
    .get("by_status")
    .sterms()
    .buckets()
    .array()
    .forEach(b -> log.info("status={}, count={}", b.key().stringValue(), b.docCount()));

二、Spring Data Elasticsearch：JPA 风格

如果你喜欢 Spring Data 那种 Repository 风格，能让 90% 的 CRUD 自动生成：

1
2
3
4
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

1
2
3
spring:
  elasticsearch:
    uris: http://localhost:9200

实体类：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
@Document(indexName = "articles")
@Setting(shards = 3, replicas = 1)
public class Article {
    @Id
    private String id;

    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String title;

    @Field(type = FieldType.Keyword)
    private String status;

    @Field(type = FieldType.Date, format = DateFormat.date_hour_minute_second_millis)
    private LocalDateTime createdAt;
}

Repository：

1
2
3
4
public interface ArticleRepository extends ElasticsearchRepository<Article, String> {
    Page<Article> findByTitleContaining(String keyword, Pageable pageable);
    List<Article> findByStatus(String status);
}

业务里像 JPA 一样用：

1
2
articleRepository.save(article);
Page<Article> page = articleRepository.findByTitleContaining("Spring", PageRequest.of(0, 10));

复杂查询用 ElasticsearchOperations：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
NativeQuery query = NativeQuery.builder()
        .withQuery(q -> q.bool(b -> b
                .must(m -> m.match(t -> t.field("title").query("Spring")))
                .filter(f -> f.range(r -> r.field("createdAt")
                        .gte(JsonData.of("2021-01-01"))))
        ))
        .withPageable(PageRequest.of(0, 10))
        .build();

SearchHits<Article> hits = operations.search(query, Article.class);

注意到了吗？Spring Data ES 底层其实就是 Java API Client——但在 CRUD 层多包了一层 Repository 抽象。

三、`RestHighLevelClient` 怎么办？

如果你接手了一个 ES 7.x 时代的项目，里面遍地 RestHighLevelClient：

1
2
3
4
5
6
7
// 老代码示意
RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(new HttpHost("localhost", 9200, "http")));

SearchRequest req = new SearchRequest("articles");
req.source(new SearchSourceBuilder().query(...));
SearchResponse resp = client.search(req, RequestOptions.DEFAULT);

它已经被 deprecated，但官方留了兼容 jar 让你能升到 ES 8 而不立刻重写：

1
2
3
4
5
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.17.21</version>
</dependency>

但这是临时过渡，不是长期方案——当你的 ES 集群升到 8.x，RestHighLevelClient 在 8.x 上能跑但不再有新功能、不再适配新版查询能力。 新写代码不要再用它，老代码遇到要改的时候顺手迁到 Java API Client。

几个工程上的关键决策

1. 什么时候用 Spring Data ES，什么时候直接用 Java API Client

	Spring Data ES	Java API Client
简单 CRUD	✓ 极简	△ 自己写
复杂查询/聚合	△ 要 NativeQuery，绕一圈	✓ 直接 DSL
实体映射	✓ `@Field` 注解	△ 自己写 Jackson 配置
索引管理	✓ 自动建索引/Settings	△ 自己调 API
性能控制	△ 隔了一层	✓ 直接
学习成本	低（懂 Spring Data）	中（懂 ES 查询 DSL）

建议：CRUD 多、查询不复杂的业务，用 Spring Data ES；查询场景重的搜索/分析业务，直接用 Java API Client。

混用也很常见——Repository 处理基础 CRUD，搜索接口直接注入 ElasticsearchClient。

2. 索引建模：动态 vs 显式

强烈建议显式建 mapping，不要依赖 ES 的"自动推断"：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
@Bean
public CommandLineRunner initIndex(ElasticsearchClient esClient) {
    return args -> {
        boolean exists = esClient.indices().exists(e -> e.index("articles")).value();
        if (!exists) {
            esClient.indices().create(c -> c
                .index("articles")
                .mappings(m -> m
                    .properties("title",  p -> p.text(t -> t.analyzer("ik_max_word")))
                    .properties("status", p -> p.keyword(k -> k))
                    .properties("createdAt", p -> p.date(d -> d))
                )
                .settings(s -> s.numberOfShards("3").numberOfReplicas("1"))
            );
        }
    };
}

动态推断会把"看似数字的字符串"识别成 long，把"中文字段"识别成 text 但用错分词器，到生产环境出问题再改 mapping 就晚了 ——ES 不允许修改字段类型，只能 reindex。

3. 中文分词：IK 不是默认装的

ES 默认分词器对中文按字切分，多数全文检索场景下不适用。生产环境必须装 ik 分词器：

1
2
# 8.10.0 示例（注意：插件版本必须与 ES 严格一致）
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v8.10.0/elasticsearch-analysis-ik-8.10.0.zip

维护方变化：原 medcl/elasticsearch-analysis-ik 在 ES 8.x 后期已不再积极维护，主线维护已迁至 infinilabs/analysis-ik。新装请优先用 infinilabs 仓库的 release，能更快跟上新 ES 版本。

然后字段映射里指定 analyzer = "ik_max_word"（写入用最细切，搜索用 ik_smart 粗切）：

1
2
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String title;

4. 分页深度的坑

ES 默认 from + size 的分页最多到 10000 条。如果业务可能翻深页（比如导出全部），用 search_after 或 scroll：

1
2
3
4
5
6
7
8
// search_after：基于上一页最后一条记录的 sort 字段
SearchResponse<Article> resp = esClient.search(s -> s
        .index("articles")
        .size(100)
        .sort(so -> so.field(f -> f.field("createdAt").order(SortOrder.Desc)))
        .sort(so -> so.field(f -> f.field("_id").order(SortOrder.Asc)))
        .searchAfter(lastSortValues),
    Article.class);

5. 别在 ES 上做事务/严格一致性

ES 是近实时的——文档写入后默认 1 秒才能搜到（refresh_interval）。所以 ES 不能作为强一致的存储。常见模式是：

主存储用 MySQL / PostgreSQL（事务保证）
搜索/聚合走 ES（异步同步过去）
同步通过 Canal / Debezium / Flink CDC 解析 binlog

排错的几个真实坑

类型不匹配：mapping 里是 keyword，查询用 match——查不出来。term/keyword 用 term query，text 用 match
大字段返回：用 _source.includes/excludes 控制返回字段，避免拖宽带
分页超过 max_result_window：要么改配置要么用 search_after
批量写入：用 Bulk API，不要一条一条 index
mapping 写错改不掉：reindex 是唯一出路。前期建模一定要慎重

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// 批量写入示例
BulkRequest.Builder br = new BulkRequest.Builder();
for (Article a : articles) {
    br.operations(o -> o.index(i -> i.index("articles").id(a.getId()).document(a)));
}
BulkResponse result = esClient.bulk(br.build());
if (result.errors()) {
    result.items().forEach(item -> {
        if (item.error() != null) log.error("bulk failed: {}", item.error().reason());
    });
}

选型总结

flowchart TD
    Start([新项目接 ES?])
    Start --> NewProject{Spring Boot 项目?}
    NewProject -->|是| HasComplex{复杂搜索/聚合多吗?}
    NewProject -->|否| Direct[Java API Client]
    HasComplex -->|不多，CRUD 为主| SDE[Spring Data ES]
    HasComplex -->|很多| Mix[Spring Data ES + Java API Client 混用]

老项目如果用的是 RestHighLevelClient：

ES 还在 7.x：先稳住，规划升级
已经升到 8.x：新代码用 Java API Client，老代码遇到要改时顺手迁

小结

把全文压成一句话：

ES 8.x 时代，Java 业务接 ES 的标准答案是：CRUD 用 Spring Data ES，复杂查询用 Java API Client，老代码里的 RestHighLevelClient 慢慢替换掉。

工程上多记几条铁律：

mapping 显式声明，不要依赖动态推断
中文必装 IK 分词器
不要把 ES 当事务存储用
深分页用 search_after，别用 from+size 翻天
批量写入用 Bulk

把这些做对，ES 这层基本不会翻车。