From the project site..
“Apache Flink is a platform for scalable stream and batch processing. Flink’s execution engine features low-latency pipelined and scalable batched data transfers and high-performance, in-memory operators for sorting and joining that gracefully go out-of-core in case of scarce memory resources.
Apache Flink uses in-memory storage to achieve massive performance gains over MapReduce. It’s active memory management and custom serialization stack enables highly efficient operations on binary data and effectively prevents JVM OutOfMemoryErrors as well as frequent Garbage Collection pauses. Memory-safe execution means very little parameter tuning is necessary to reliably execute Cascading programs on Flink.”
According to data Artisans, with virtually no code changes, Cascading 3.0 applications will run in Apache Flink, furthering the portability promise of Cascading through their contribution.
We are very excited to see another alternative for high performance production deployments made available to our community.
Link to Source code: http://cascading.org/cascading-flink/
Data Artisans blog: http://data-artisans.com/announcing-cascading-on-flink/