Apache Flink Big Data Stream Processing Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de | bbdc.berlin |
[email protected] XLDB – 11.10.2017
1
© 2013 Berlin Big Data Center • All Rights Reserved
© DIMA 2017
Agenda
Disclaimer: I am neither a Flink developer nor affiliated with data Artisans.
2
2
© DIMA 2017
Agenda Flink Primer • Background & APIs (-> Polystore functionality) • Execution Engine • Some key features Stream Processing with Apache Flink • Key features
With slides from data Artisans, Volker Markl, Asterios Katsifodimos 3
3
© DIMA 2017
Flink Timeline
4
4
© 2013 Berlin Big Data Center • All Rights Reserved
© DIMA 2017
Stratosphere: General Purpose Programming + Database Execution Adds
Draws on Database Technology
• • • •
5
Relational Algebra Declarativity Query Optimization Robust Out-of-core
• • • •
Draws on MapReduce Technology
Iterations • Scalability Advanced Dataflows • User-defined General APIs Functions Native Streaming • Complex Data Types • Schema on Read
© DIMA 2017
The APIs Analytics Stream- & Batch Processing Stateful Event-Driven Applications 6
6
Stream SQL Table API (dynamic tables)
6
DataStream API (streams, windows) Process Function (events, state, time) © 2013 Berlin Big Data Center • All Rights Reserved
© DIMA 2017
Process Function class MyFunction extends ProcessFunction[MyEvent, Result] { // declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state
}
} 7
7
// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } © 2013 Berlin Big Data Center • All Rights Reserved
7
© DIMA 2017
Data Stream API val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer09<>(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path))
8
8
© 2013 Berlin Big Data Center • All Rights Reserved
8
© DIMA 2017
Table API & Stream SQL
9
9
© 2013 Berlin Big Data Center • All Rights Reserved
9
© DIMA 2017
What can I do with it? Batch processing
Machine Learning at scale
Stream processing
Graph Analysis
Complex event processing Flink
An engine that can natively support all these workloads. 10
10
© 2013 Berlin Big Data Center • All Rights Reserved
© DIMA 2017
Flink in the Analytics Ecosystem Hive
Applications & Languages
Data processing engines
11
11
Giraph
Cascading
Mahout
Crunch
Pig
MapReduce Spark
App and resource management
Yarn
Storage, streams
HDFS
Flink Tez
Storm
Mesos HBase
Kafka
© 2013 Berlin Big Data Center • All Rights Reserved
11
… © DIMA 2017
Where in my cluster does Flink fit? Gathering
Integration
Analysis
Server logs
Upstream systems
Trxn logs Sensor logs
12
Gather and backup streams Offer streams for consumption Provide stream recovery
-
Analyze and correlate streams Create derived streams and state Provide these to upstream systems © DIMA 2017
Architecture • Hybrid MapReduce and MPP database runtime • Pipelined/Streaming engine – Complete DAG deployed
Worker 1
Worker 2
Worker 3
Worker 4
Job Manager
13
13
© DIMA 2017
Flink Execution Model • Flink program = DAG* of operators and intermediate streams • Operator = computation + state • Intermediate streams = logical stream of records
14
14
© DIMA 2017
Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }
GroupRed sort
Type extraction stack
Dataflow Graph
forward
Join Hybrid Hash
Cost-based optimizer
build HT
hash-part [0]
hash-part [0]
Map
DataSourc e lineitem.tbl
Filter
Pre-flight (Client)
probe
DataSourc e orders.tbl
Program
Memory manager
Out-of-core algorithms
Batch & streaming
State & checkpoints
deploy operators
Workers 15
15
track intermediate results
© 2013 Berlin Big Data Center • All Rights Reserved
Recovery metadata Task scheduling
Master © DIMA 2017
1616 16
Rich set of operators Map, Reduce, Join, CoGroup, Union, Iterate, Delta Iterate, Filter, FlatMap, GroupReduce, Project, Aggregate, Distinct, Vertex-Update, Accumulators, …
© DIMA 2017
Effect of optimization Hash vs. Sort Partition vs. Broadcast Caching Reusing partition/sort
Execution Plan A
Run on a sample on the laptop
Execution Plan B Run on large files on the cluster
Execution Plan C
Run a month later after the data evolved
17 17
17
© DIMA 2017
Flink Optimizer Transitive Closure
Co-locate DISTINCT + JOIN Iterate Iterate Forward
replace
HDF S
Hybrid Hash Join Group Reduce (Sorted (on [0])) Distinc Join Union ton [1] Co-locate JOIN + UNION Hash Partition Hash Partition on [1] Step function Hash Partition on [0] paths new Paths
Loop-invariant data cached in memory
• What you write is not what is executed • No need to hardcode execution strategies
18
18
• Flink Optimizer decides: – – – – –
Pipelines and dam/barrier placement Sort- vs. hash- based execution Data exchange (partition vs. broadcast) Data partitioning steps In-memory caching
© DIMA 2017
1919 19
Scale Out
© DIMA 2017
Stream Processing with Flink
20
© DIMA 2017
8 Requirements of Big Streaming • Keep the data moving – Streaming architecture
• Declarative access – E.g. StreamSQL, CQL
• Handle imperfections – Late, missing, unordered items
• Predictable outcomes – Consistency, event time
• Integrate stored and streaming data – Hybrid stream and batch
• Data safety and availability – Fault tolerance, durable state
• Automatic partitioning and scaling – Distributed processing
• Instantaneous processing and response
The 8 Requirements of Real-Time Stream Processing – Stonebraker et al. 2005 21
21
© DIMA 2017
8 Requirements of Streaming Systems • Keep the data moving – Streaming architecture
• Declarative access – E.g. StreamSQL, CQL
• Handle imperfections – Late, missing, unordered items
• Predictable outcomes – Consistency, event time
• Integrate stored and streaming data – Hybrid stream and batch – see StreamSQL
• Data safety and availability – Fault tolerance, durable state
• Automatic partitioning and scaling – Distributed processing
• Instantaneous processing and response
The 8 Requirements of Real-Time Stream Processing – Stonebraker et al. 2005 22
22
© DIMA 2017
How to keep data moving? Discretized Streams (mini-batch) Stream discretizer
while (true) { // get next few records // issue batch computation }
Job
Job
Job
Job
Native streaming while (true) { // process next record } 23
23
Long-standing operators
© DIMA 2017
Declarative Access – Stream SQL
Stream / Table Duality
Table without Primary Key
24
24
Table with Primary Key
© 2013 Berlin Big Data Center • All Rights Reserved
24
© DIMA 2017
Handle Imperfections - Event Time et al. • Event time – Data item production time
• Ingestion time – System time when data item is received
• Processing time – System time when data item is processed
• Typically, these do not match! • In practice, streams are unordered! Image: Tyler Akidau 25
25
© DIMA 2017
Time: Event Time Example Event Time Episode Episode Episode Episode Episode Episode Episode IV V VI I II III VII
1977
1980
1983
1999
2002
2005
2015
Processing Time 26
26
© 2013 Berlin Big Data Center • All Rights Reserved
26
© DIMA 2017
Flink’s Windowing • Windows can be any combination of (multiple) triggers & evictions – Arbitrary tumbling, sliding, session, etc. windows can be constructed.
• Common triggers/evictions part of the API – Time (processing vs. event time), Count
• Even more flexibility: define your own UDF trigger/eviction • Examples:
dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))); dataStream.keyBy(0).window(TumblingEventTimeWindows.of(Time.seconds(5)));
• Flink will handle event time, ordering, etc. 27
27
© DIMA 2017
Example Analysis: Windowed Aggregation
StockPrice(SPX, 2113.9) StockPrice(FTSE, 6931.7) StockPrice(HDP, 23.8) StockPrice(HDP, 26.6)
(1) val (2) val (3) val (4) val 28
(2)
StockPrice(HDP, 23.8)
(3)
StockPrice(SPX, 2113.9) StockPrice(FTSE, 6931.7) StockPrice(HDP, 26.6)
(4)
StockPrice(SPX, 2113.9) StockPrice(FTSE, 6931.7) StockPrice(HDP, 25.2)
(1)
windowedStream = stockStream.window(Time.of(10, SECONDS)).every(Time.of(5, SECONDS)) lowest = windowedStream.minBy("price") maxByStock = windowedStream.groupBy("symbol").maxBy("price") rollingMean = windowedStream.groupBy("symbol").mapWindow(mean _) © DIMA 2017
Data Safety and Availability • Ensure that operators see all events – “At least once” – Solved by replaying a stream from a checkpoint – No good for correct results
• Ensure that operators do not perform duplicate updates to their state – “Exactly once” – Several solutions
• Ensure the job can survive failure 2929 29
© DIMA 2017
Lessons Learned from Batch
batch-2
batch-1
• If a batch computation fails, simply repeat computation as a transaction • Transaction rate is constant • Can we apply these principles to a true streaming execution?
30
30
30 © DIMA 2017
Taking Snapshots – the naïve way t1
t2
execution snapshots Initial approach (e.g., Naiad) • Pause execution on t1,t2,.. • Collect state • Restore execution 31
31
31 © DIMA 2017
Asynchronous Snapshots in Flink t1
snapshotting
t2
snapshotting
Propagating markers/barriers
snap - t2
snap - t1
Full or incremental 32
32
© DIMA 2017
Conclusion Apache Flink! The case for Flink as a stream processor • Ideal basis for polystore computations • Full feature big data streaming engine
33
33
© DIMA 2017
Thank You Contact: Tilmann Rabl
[email protected]
34
© 2013 Berlin Big Data Center • All Rights Reserved
© DIMA 2017