Examples of finding, counting and removing duplicate elements from a Java Stream.
Tutorial Contents
Overview
Java Streams are a lazily processed sequence of elements that supports sequential and parallel operations through a Stream pipeline. A Stream won’t process elements from the source until a terminal operation of the Stream’s pipeline runs.
This tutorial provides quick examples of finding, counting and removing duplicate elements from a Stream of Java objects or custom objects.
Remove Stream Duplicates using distinct()
The Java Stream interface provides several intermediate operations to process and filter elements in a Java Stream. The ‘distinct()‘ method of the Stream deduplicates Java Stream elements and returns a new Stream of the unique elements.
Example of using distinct() to remove Stream duplicates
Stream<String> stream = Stream.of("a", "b", "c", "b", "d", "a", "d");
Stream<String> output = stream.distinct();
output.forEach(System.out::print)
//prints:
//abcd
Code language: Java (java)
Remove Stream Duplicates using Set
Alternatively, we can use a Java Set to remove duplicates from a Stream. As Java Sets contain unique elements, we can collect our Stream into a Set and create a new Stream with all duplicates removed.
Example of using Java HashSet to remove duplicate elements from a Stream.
Stream<String> stream = Stream.of("a", "b", "c", "b", "d", "a", "d");
Stream<String> output = stream
.collect(Collectors.toSet())
.stream();
output.forEach(System.out::print)
//prints:
//abcd
Code language: Java (java)
Please note that the Java HashSets are unordered collections, which means they won’t preserve the order of the elements.
Remove Duplicates from a Stream of Custom Objects
The distinct() method internally uses the equals() method to check if two elements are equal. To remove duplicates from a Stream of custom objects, our custom class must provide the equality logic.
public class Student {
private final Long studentId;
private final String firstName;
private final String lastName;
private final Integer age;
@Override
public boolean equals(Object other) {
if (!(other instanceof Student student2)) {
return false;
}
return student2.studentId.equals(this.studentId);
}
@Override
public int hashCode() {
return studentId.hashCode();
}
}
Code language: Java (java)
The equals() method in our custom class uses the studentId field to decide if two class instances are equal. Now, we can use the distinct() method on a Stream of the Student objects.
Stream<Student> stream = Stream.of(
new Student(1L, "Bob", "Jack", 12),
new Student(2L, "Nick", "Stephen", 14),
new Student(3L, "Bob", "Holden", 14),
new Student(2L, "Nick", "Stephen", 14)
);
Stream<Student> stream = getStudentsStream();
Stream<Student> output = stream.distinct();
output.forEach(System.out::print)
//prints:
//Student(studentId=1, firstName=Bob, lastName=Jack, age=12)
//Student(studentId=2, firstName=Nick, lastName=Stephen, age=14)
//Student(studentId=3, firstName=Bob, lastName=Holden, age=14)
Code language: Java (java)
Using Stream distinct() by a Particular Field
Sometimes, we cannot modify the equals() method in our custom class, or we want to use a different comparison logic than the one provided by the equals() method.
We can create a wrapper class around our custom object for such cases. The wrapper class will provide our custom comparison logic in the form of its equals() and hashCode() implementations.
Example of using a wrapper class to remove duplicates from a Java Stream based on a specific field or two.
@Getter
@RequiredArgsConstructor
class StudentWrapper {
private final Student student;
@Override
public boolean equals(Object other) {
if (!(other instanceof StudentWrapper wrapper2)) {
return false;
}
return wrapper2.student.getFirstName()
.equals(this.student.getFirstName());
}
@Override
public int hashCode() {
return student.getFirstName().hashCode();
}
}
Code language: Java (java)
Now, we can map the Stream of our custom object into a Stream of the wrapper class and use the distinct() on it.
Stream<Student> stream = Stream.of(
new Student(1L, "Bob", "Jack", 12),
new Student(2L, "Nick", "Stephen", 14),
new Student(3L, "Bob", "Holden", 14),
new Student(2L, "Nick", "Stephen", 14)
);
Stream<Student> output = stream
.map(StudentWrapper::new)
.distinct()
.map(StudentWrapper::getStudent);
output.forEach(System.out::print)
//prints:
//Student(studentId=1, firstName=Bob, lastName=Jack, age=12)
//Student(studentId=2, firstName=Nick, lastName=Stephen, age=14)
Code language: Java (java)
Count Duplicates in a Stream
We have seen how we can remove duplicates from a Stream using the distinct() method. However, sometimes we may wish to count the duplicates. To do that, we can use the toMap() collector.
Example of counting the duplicates in a Stream
Stream<Integer> stream = Stream.of(22, 31, 22, 34, 25, 31, 34);
Map<Integer, Long> map = stream
.collect(toMap(Function.identity(), x -> 1L, Long::sum));
map.entrySet().forEach(System.out::println);
//prints:
//34=2
//22=2
//25=1
//31=2
Code language: Java (java)
Summary
We learned how to use Java Stream’s distinct() method in different scenarios to remove duplicate elements from a Stream. The distinct() method performs an object’s equality check and returns a new Stream containing the unique elements.
We also learned that the equals() method should provide the equality logic to deduplicate a Stream of custom objects. If we want to remove duplicates from a Stream using specific fields not covered by the equals() method, we can use the wrapper class workaround. Lastly, we learned how to count duplicate elements in a Stream using the toMap() collector.
You can refer to our GitHub Repository for the complete source code of the examples used in this tutorial.