JDK 8 Stream 与 Lambda 实战指南

写在前面

JDK 8 已经发布超过 5 年（写本文时是 2019 年）——如果你还在写这种代码：

1
2
3
4
5
6
7
8
9
List<String> result = new ArrayList<>();
for (User user : users) {
    if (user.getAge() >= 18) {
        if (user.getStatus() == 1) {
            result.add(user.getName().toUpperCase());
        }
    }
}
Collections.sort(result);

而不是这种：

1
2
3
4
5
6
List<String> result = users.stream()
        .filter(u -> u.getAge() >= 18)
        .filter(u -> u.getStatus() == 1)
        .map(u -> u.getName().toUpperCase())
        .sorted()
        .collect(Collectors.toList());

那是时候认真学一下 Stream 和 Lambda 了。

Stream 不只是"写法变化"——它把 Java 的数据处理范式从"循环 + 临时变量"改成"声明式管道"，让 Java 写起来像 SQL。

本文从基础到进阶把 Stream + Lambda 的常用、好用、容易踩坑的姿势过一遍。

一、Lambda：函数式接口的捷径

Lambda 是什么

Lambda 表达式是函数式接口的实例——一个匿名类的极简写法。

1
2
3
4
5
6
7
8
9
// JDK 8 之前
Runnable r = new Runnable() {
    @Override public void run() {
        System.out.println("hello");
    }
};

// JDK 8 Lambda
Runnable r = () -> System.out.println("hello");

任何只有一个抽象方法的接口（即 SAM——Single Abstract Method）都能用 Lambda 替代。

函数式接口家族

JDK 提供了一组通用函数式接口：

接口	签名	用途
`Function<T, R>`	R apply(T t)	转换
`Predicate<T>`	boolean test(T t)	判断
`Consumer<T>`	void accept(T t)	消费
`Supplier<T>`	T get()	提供
`BiFunction<T,U,R>`	R apply(T, U)	二元转换
`BinaryOperator<T>`	T apply(T, T)	二元归约

1
2
3
4
Function<Integer, String> f = i -> "num:" + i;
Predicate<String> isLong = s -> s.length() > 10;
Consumer<String> print = System.out::println;
Supplier<List<String>> newList = ArrayList::new;

方法引用：Lambda 的更简洁形式

1
2
3
4
5
// Lambda
users.stream().map(u -> u.getName())

// 方法引用（更简洁）
users.stream().map(User::getName)

四种方法引用：

类型	语法	例子
静态方法	`Class::method`	`Integer::parseInt`
实例方法	`instance::method`	`System.out::println`
类的实例方法	`Class::method`	`String::toUpperCase`
构造方法	`Class::new`	`ArrayList::new`

二、Stream：流式数据处理

一个 Stream 的生命周期

flowchart LR
    Source[数据源
List/Set/Array] --> Intermediate[中间操作
filter/map/sorted]
    Intermediate --> Terminal[终止操作
collect/forEach/reduce]
    Terminal --> Result[结果]

中间操作返回 Stream<T>，可以继续链式调用——懒执行
终止操作触发实际计算——只能调一次

常用中间操作

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
List<User> users = ...;

// filter：筛选
users.stream().filter(u -> u.getAge() > 18)

// map：变换
users.stream().map(User::getName)
users.stream().mapToInt(User::getAge)        // 原始类型流，避免装箱

// flatMap：拍平嵌套结构
users.stream().flatMap(u -> u.getRoles().stream())

// distinct：去重
users.stream().map(User::getCity).distinct()

// sorted：排序
users.stream().sorted(Comparator.comparing(User::getAge))
users.stream().sorted(Comparator.comparing(User::getAge).reversed())
users.stream().sorted(Comparator.comparing(User::getCity).thenComparing(User::getAge))

// limit / skip：分页
users.stream().skip(20).limit(10)

// peek：调试用
users.stream().peek(u -> log.debug("processing {}", u))

常用终止操作

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// collect：收集成集合
List<User> list = users.stream().filter(...).collect(toList());
Set<String> set  = users.stream().map(User::getName).collect(toSet());

// 收集成 Map
Map<Long, User> map = users.stream().collect(toMap(User::getId, Function.identity()));

// 分组
Map<String, List<User>> byCity = users.stream().collect(groupingBy(User::getCity));

// 聚合：count/sum/avg/min/max
long count = users.stream().filter(u -> u.getAge() >= 18).count();
int totalAge = users.stream().mapToInt(User::getAge).sum();

// 归约
int max = users.stream().mapToInt(User::getAge).max().orElse(0);

// 短路操作
boolean anyAdult = users.stream().anyMatch(u -> u.getAge() >= 18);
Optional<User> first = users.stream().filter(...).findFirst();

// forEach
users.stream().forEach(System.out::println);

三、Stream 的范式价值：让 Java 代码像 SQL

很多人觉得 Stream 是"换汤不换药"——其实它是数据处理范式的转移。看这两段代码：

场景：从用户列表里统计每个城市 18 岁以上用户的平均年龄，按平均年龄降序

传统写法：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Map<String, List<Integer>> cityAges = new HashMap<>();
for (User u : users) {
    if (u.getAge() >= 18) {
        cityAges.computeIfAbsent(u.getCity(), k -> new ArrayList<>())
                .add(u.getAge());
    }
}
Map<String, Double> avgByCity = new HashMap<>();
for (Map.Entry<String, List<Integer>> e : cityAges.entrySet()) {
    double avg = e.getValue().stream().mapToInt(Integer::intValue).average().orElse(0);
    avgByCity.put(e.getKey(), avg);
}
List<Map.Entry<String, Double>> sorted = new ArrayList<>(avgByCity.entrySet());
sorted.sort((a, b) -> Double.compare(b.getValue(), a.getValue()));

Stream 写法：

1
2
3
4
5
6
List<Map.Entry<String, Double>> result = users.stream()
        .filter(u -> u.getAge() >= 18)
        .collect(groupingBy(User::getCity, averagingInt(User::getAge)))
        .entrySet().stream()
        .sorted(Map.Entry.<String, Double>comparingByValue().reversed())
        .collect(toList());

SQL 等价：

1
2
3
4
5
SELECT city, AVG(age) avg_age
FROM users
WHERE age >= 18
GROUP BY city
ORDER BY avg_age DESC;

Stream 写法和 SQL 几乎一一对应——这就是范式价值。写起来你描述的是『要什么』，不是『怎么做』。

四、Collectors 的进阶用法

Collectors 是 Stream 的"宝藏类"，掌握后能写出极强的归约操作：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// 计数
Map<String, Long> count = users.stream()
        .collect(groupingBy(User::getCity, counting()));

// 求和
Map<String, Integer> sum = users.stream()
        .collect(groupingBy(User::getCity, summingInt(User::getAge)));

// 平均
Map<String, Double> avg = users.stream()
        .collect(groupingBy(User::getCity, averagingInt(User::getAge)));

// 收集到 List 但只取名字
Map<String, List<String>> names = users.stream()
        .collect(groupingBy(User::getCity, mapping(User::getName, toList())));

// 拼接成字符串
String allNames = users.stream()
        .map(User::getName)
        .collect(joining(", ", "[", "]"));    // → [张三, 李四, 王五]

// 多级分组
Map<String, Map<Integer, List<User>>> byCityAndAge = users.stream()
        .collect(groupingBy(User::getCity, groupingBy(User::getAge)));

// partitioningBy：boolean 分两组
Map<Boolean, List<User>> adultPart = users.stream()
        .collect(partitioningBy(u -> u.getAge() >= 18));

五、Optional：终结 NPE 的优雅工具

JDK 8 还引入了 Optional——配合 Stream 用得心应手。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// 旧写法
User u = userRepo.findById(1L);
if (u != null) {
    String name = u.getName();
    if (name != null) {
        return name.toUpperCase();
    }
}
return "UNKNOWN";

// Optional 写法
return Optional.ofNullable(userRepo.findById(1L))
        .map(User::getName)
        .map(String::toUpperCase)
        .orElse("UNKNOWN");

Optional 用法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Optional<User> opt = userRepo.findById(1L);

opt.isPresent();                       // 是否有值
opt.ifPresent(u -> log.info(u));       // 有值时执行
opt.orElse(defaultUser);               // 没值时给默认
opt.orElseGet(() -> heavyDefault());   // 没值时延迟计算默认
opt.orElseThrow(() -> new BusinessException("not found"));
opt.filter(u -> u.getAge() > 18);     // 不满足时变 empty
opt.map(User::getName);                // 转换
opt.flatMap(u -> Optional.ofNullable(u.getEmail()));   // 链式 Optional

Optional 的边界

Optional 只设计用作返回值——不要做参数、不要做字段：

1
2
3
4
5
6
7
8
// ❌ 反模式：字段
private Optional<User> user;

// ❌ 反模式：参数
public void process(Optional<User> u) {}

// ✓ 返回值
public Optional<User> findUser(Long id) {}

字段用 Optional 浪费内存（包装类）；参数用 Optional 让调用方多一次包装动作，得不偿失——直接接受可空参数 + 内部 Optional.ofNullable 处理。

六、踩坑提醒

1. Stream 不能复用

1
2
3
Stream<User> s = users.stream().filter(...);
List<User> a = s.collect(toList());
List<User> b = s.collect(toList());   // ❌ IllegalStateException: stream has already been operated upon

2. forEach 里的副作用

1
2
List<User> result = new ArrayList<>();
users.stream().forEach(result::add);   // ❌ 反模式

要收集结果用 collect(toList())——forEach 是"消费"语义，不是"收集"。

3. parallelStream 不是免费午餐

1
users.parallelStream().filter(...).collect(toList());

并行流听起来很美——多数场景下比串行慢：

集合元素少（< 1 万）：开销盖过收益
操作很快（CPU < 1ms）：线程切换更亏
I/O 密集（数据库/文件）：会把公共 ForkJoinPool 占满，影响其他业务

只在 CPU 密集 + 数据量大 + 没共享状态时才考虑 parallelStream，且最好用自己的 ForkJoinPool 而不是公共的。

4. toMap 的 key 重复

1
2
users.stream().collect(toMap(User::getCity, Function.identity()));
// → IllegalStateException: duplicate key

key 重复时要指定 merge 函数：

1
users.stream().collect(toMap(User::getCity, Function.identity(), (a, b) -> a));

5. mapToInt vs map

1
2
users.stream().mapToInt(User::getAge).sum();    // ✓
users.stream().map(User::getAge).reduce(0, Integer::sum);   // ❌ 涉及装箱拆箱，慢

数值场景优先用 mapToInt/Long/Double——能避免装箱开销。

6. peek 不要做副作用

1
users.stream().peek(u -> u.setStatus(1)).count();   // ❌ peek 是调试用的

JDK 9+ 在终止操作是 count() 等"已知不需要遍历"的可短路场景下，会跳过 peek——副作用不执行。普通 collect / forEach 的 peek 会执行，但语义上 peek 就只是"调试用的旁路"——业务副作用不要放在 peek 里。

七、什么场景该用 Stream，什么场景不该

✅ 适合 Stream 的场景

数据筛选 / 转换 / 聚合
多步数据处理
集合 → 集合
写法贴近 SQL 的场景

❌ 不适合 Stream 的场景

简单循环——for (Foo f : list) 比 list.forEach(...) 更直观
检查异常需要处理——Lambda 里抛 checked exception 是噩梦
需要 break/continue——Stream 没有这种语义（虽然 findFirst 类似 break）
需要修改外部变量——Stream 强制函数式风格
可读性比性能更重要——简单逻辑别强行链式

小结

把全文压一句：

**Stream + Lambda 不是写法升级，是范式转移——让 Java 从『循环 + 临时变量』变成『声明式管道』，写起来像 SQL，读起来像描述意图。 **

记住几条：

优先方法引用，其次 Lambda，最后匿名内部类（按可读性视情况调整）
Optional 只作返回值，不做字段/参数
parallelStream 慎用，多数场景串行更快
数值用 mapToInt/Long/Double
复杂归约都在 Collectors 里

把这套姿势用熟，Java 写数据处理代码会突然变得轻松——这种顿悟感是 JDK 8 给 Java 写代码人的最大礼物。