Cascading

Please note that all new project news and releases have moved to https://cascading.wensel.net

The Cascading Ecosystem is a collection of applications, languages, and APIs for developing data-intensive applications.

At the ecosystem core is Cascading, a Java API for defining complex data flows and integrating those flows with back-end systems, and a query planner for mapping and executing logical flows onto a computing platform.

There are quite a few extensions to Cascading providing integrations with popular systems, testing frameworks, and tools that leverage Cascading.

Sitting on top of the Cascading API are languages and tools to simplify the development of data-intensive applications. For Scala developers, see Scalding. For Clojure developers, see Cascalog. For SQL developers, see Lingual. And for Java developers, the raw Cascading API can be used, or a fluent interface named Fluid.

Sitting below the Cascading query planner are platform providers and rules for mapping data flows onto a given platform like Apache Hadoop, Apache Tez, Apache Flink, or simply locally in memory (suitable for many streaming applications).

Learn more from the the User Guide, the most recent Cascading and Scalding books, or the tutorials and example applications. To learn about Cascading internals, see this post on the 3.x query planner.

Welcome to the Cascading Ecosystem

Recent News

Cascading 3.3.0 Released

Cascading 3.3 WIP

Cascading 4 Adds Native JSON Support

More

Cascading 4 for Streaming with Amazon S3 and Apache Kafka

Concurrent/Driven acquired by Xplenty. What does this mean for Cascading?

Cascading 3.2.0 Released

Cascading 3.1 Release