High Performance and Low-Latency Batch Processing
Cascading applications that require high performance or low-latency batch processing modes can leverage the Apache Flink open source platform for distributed stream and batch data processing. This project was contributed by data Artisans and allows existing Cascading-MapReduce users to port their applications to Apache Flink with virtually no code changes.
About Cascading on Flink
Apache Flink is a replacement for MapReduce to support large-scale batch workloads and streaming data flows. It eliminates the concept of mapping and reducers and leverages in-memory storage, resulting in significant performance gains over MapReduce.
With Cascading on Flink, Cascading programs taking advantage of its unique set of runtime features:
- Flexible network stack which supports low-latency pipelined data transfers as well as batch transfers for massive scale-out.
- Active memory management and custom serialization stack which enables highly efficient operations on binary data and effectively prevent JVM OutOfMemoryErrors as well as frequent Garbage Collection pauses.
- In-memory operators that gracefully go to disk in case of scarce memory resources.
- Memory-safe execution means very little parameter tuning is necessary to reliably execute Cascading programs on Flink.
Cascading user can port their MapReduce applications to run on Apache Flink with virtually no code changes.
Source and Documentation