Big Data Application Development
Cascading is a Java application framework that enables typical developers to quickly and easily develop rich Data Analytics and Data Management applications that can be deployed and managed across a variety of computing environments. Cascading works seamlessly with Apache Hadoop and API compatible distributions.
Data Processing API
At it’s core, Cascading is a rich Java API for defining complex data flows and creating sophisticated data oriented frameworks. These frameworks can be Maven compatible libraries, or Domain Specific Languages (DSLs) for scripting.
Data Integration API
Cascading allows developers to create and test rich functionality before tackling complex integration problems. Thus integration points can be developed and tested before plugging them into a production data flow.
Process Scheduler API
The Process Scheduler coupled with the Riffle lifecycle annotations allows Cascading to schedule unit of work from any third-party application.
Cascading was designed to fit into any Enterprise Java development environment. With its clear distinction between “data processing” and “data integration”, its clean Java API, and JUnit testing framework, Cascading can be easily tested at any scale. Even the core Cascading development team runs 1,500 tests daily on an Continuous Integration server and deploys all the tested Java libraries into our own public Maven repository, conjars.org.
Because Cascading is Java based, it naturally fits into all of the JVM based languages available. Notably Scala, Clojure, Jruby, Jython, and Groovy. Within many of these languages, scripting and query languages have been created by the Cascading community to simplify ad-hoc and production ready analytics and machine learning applications. See the extensions page for more information.