Java 8 Streams API – Tutorial

Java Streams API, provides functional-style operations on streams of elements. Streams provide methods for filtering, transforming and aggregating data over streaming data in an very efficient way ( we will see the details in sections below). Typically, they are used over collections.

Streams make Java collections framework more richer and easier to work with. Using streams and by chaining functions we can create a pipeline. Streams are available in java.util.stream package

Before we get more into streams, let us first understand few terms that we are going to use very often, the first one is pipeline.

How many times we have used for loops to iterate through a collection to find out what is the MAX value or we loop through the collection to filter out some values.

These are common problems and they need common implementations. Why reinvent it again. This is where streams comes to the rescue by providing such functions that operate over data like filter, map, max, sum etc.

You should have seen such functions in SQL. For example – select max(salary) from employee; Here the implementation of max is done by the database engine, we only express our need that we need the max value. It is up to the database engine to apply optimizations, perform the computation in best possible manner and provide the result. This is where the streams api comes to the picture and it adds above capabilities to java. If the code is running on a multi core, multi cpu system then streams can leverage parallel execution. All these, we need not worry about and is taken care. I guess, you have got enough reasons to start using streams api.

Pipeline

A data pipeline consists of various stages running either in parallel or in sequential mode, performing various actions on the data stream.

For example cat hello.txt | grep “xml” | wc -l is a very simple pipeline.Streams can be used to transform / aggregate data flowing through the pipeline

A stream pipeline consists 3 parts. 1) Source– (such as a Collection, IO channel ) 2) Intermediate transformation -operations such as filter , map 3) Terminal operation – such as Stream.forEach or Stream.reduce , collect etc.

Structure of Steam API

List<Integer> userTransIds = 
    transactions.stream()
                .filter(t -> t.getUserId() == User.id)
                .sorted(sorted)
                .map(Transaction::getId)
                .collect(toList());
data pipeline

The source ( a collection of Transaction objects) is processed using filter , sort and map transforms and finally collected back as a List.

Characteristics of Streams

  • No storage. A stream doesn’t store elements, that is it is not a data structure for storing elements. It passes elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
  • Immutability: Streams are functional in nature and they don’t change the state of its source, the source is immutable and various steps of stream creates new copies after applying the transformations.
  • Lazy Evaluation: Streams are lazy and evaluated only when required. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
  • Unbounded : As the name “stream” implies, it is data arriving in streams and hence we can consider it as an unbounded collection.

Example : Compare Traditional Style vs Streams API

Traditional Way of Processing

package com.stackrules.java;
 
import java.util.ArrayList;
import java.util.List;
 
/**
 * Showcases how a filtering operation was performed in the old ways
 * This has resulted in too much code as you see even for just filter operation.
 */
 
public class CollectionFilteringTradtional {
 
 
    public static void main(String[] args) {
 
        List<Integer> numbers = new ArrayList<Integer>();
        //add elements to the collection
        //There is a much better way to do this using the streams. Will see that later.
        for (int i = 0; i < 10; i++) {
            numbers.add(i);
        }
        //Traditional way for filtering elements: we want to remove odd numbers
        List<Integer> evenNumbers = new ArrayList<Integer>();
        for(Integer n:numbers) {
            if(n%2==0){
                evenNumbers.add(n);
            }
        }
        System.out.println("Filtered numbers ( the even numbers) "+evenNumbers);
        //Too much code and we have to define another collection.
    }
}

From the above code, it is quite evident how tedious the process is. We need to create for loop to iterate the elements, then implement the filtering logic and then add the filtered values to a new collection. This is really too much redundant lines of code. Observe, how we have used streams to write neat, compact and more functional style code in example below.

Better Way using Streams API

package com.stackrules.java;
 
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
 
/**
 * Showcases the usage of java.util.stream api to perform efficient easier operations on collections
 */
 
public class CollectionFilteringStreamsWay {
 
    public static void main(String[] args) {
 
        List<Integer> numbers = new ArrayList<Integer>();
        //add elements to the collection
        //There is a much better way to do this using the streams. Will see that later.
        for (int i = 0; i < 10; i++) {
            numbers.add(i);
        }
        //returns even numbers
        List<Integer> evenNumbers = numbers.stream().filter(x->x%2==0).collect(Collectors.toList());
        System.out.println("Filtered numbers ( the even numbers)"+evenNumbers);
 
    }

Creating Stream Sources

If you already have a collection, it can be converted to streams by using the stream() function of collections. It has the definition as Stream<E> stream().

List<Integer> numbers = new ArrayList<Integer>();
//Convert to a stream
numbers.stream()

Using Stream builder

java.util.stream package provides builders to create in memory streams of different types. IntStream, LongStream,DoubleStream are few common ones.

IntStream.builder().add(10).add(20).build();

Parallelism

All the examples we have seen so far doesn’t show how to perform operations on the stream in parallel. How do you parallelize the code ?

To perform parallel execution, use the parallelStream() function instead of the stream() function. Collection classes provide Collection.stream() and Collection.parallelStream() and by using the second one, we can process streams in parallel.

The data in the collection will be processed in a parallel fashion. Only during an aggregate, collect or more abstractly any reduce operation, the data is brought together. All the steps prior to the reduce operation runs in parallel.

Leave a Comment