Structured Streaming Guide
Structured Streaming is the Apache Spark API that lets you express computation on streaming data in the same way you express a batch computation on static data. The Spark SQL engine performs the computation incrementally and continuously updates the result as streaming data arrives. For an overview of Structured Streaming, see the Apache Spark Structured Streaming Programming Guide.
These topics provide introductory notebooks, details on how to use specific types of streaming sources and sinks, how to put streaming into production, and notebooks demonstrating example use cases:
For detailed information on how you can perform complex streaming analytics using Apache Spark, see the posts in this multi-part blog series:
- Real-time Streaming ETL with Structured Streaming
- Working with Complex Data Formats with Structured Streaming
- Processing Data in Apache Kafka with Structured Streaming
- Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming
- Taking Apache Spark’s Structured Streaming to Production
- Running Streaming Jobs Once a Day For 10x Cost Savings: Part 6 of Scalable Data
- Arbitrary Stateful Processing in Apache Spark’s Structured Streaming
For information about the legacy Spark Streaming feature, see