Cascading on Apache FlinkTM

High Performance and Low-Latency Batch Processing

Cascading applications that require high performance or low-latency batch processing modes can leverage the Apache FlinkTM open source platform for distributed stream and batch data processing. This project was contributed by data Artisans and allows existing Cascading-MapReduce users to port their applications to Apache FlinkTM with virtually no code changes.

About Cascading on Flink

Apache FlinkTM is a replacement for MapReduce to support large-scale batch workloads and streaming data flows. It eliminates the concept of mapping and reducers and leverages in-memory storage, resulting in significant performance gains over MapReduce.

With Cascading on FlinkTM, Cascading programs taking advantage of its unique set of runtime features:

  • Flexible network stack which supports low-latency pipelined data transfers as well as batch transfers for massive scale-out.
  • A
  • Active memory management and custom serialization stack which enables highly efficient operations on binary data and effectively prevent JVM OutOfMemoryErrors as well as frequent Garbage Collection pauses.
  • A
  • In-memory operators that gracefully go to disk in case of scarce memory resources.
  • A
  • Memory-safe execution means very little parameter tuning is necessary to reliably execute Cascading programs on FlinkTM.

Cascading user can port their MapReduce applications to run on Apache FlinkTM with virtually no code changes.

“Apache®, Apache Flink™, Flink™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.”