Data Processing API

Cascading, at its core, is a data processing definition language, implemented as a simple Java API.

This API is based on a pipe and filters metaphor, so it provides features like splitting and joining of data streams. For filters and functions to be applied to the stream. As well as grouping on keys and aggregation of the keys multiple values.

When a collection of these functions, filters, grouping, etc, are plugged in with each other, the result is what we call a 'pipe assembly', to help stick with the underlying metaphor. What it represents is a definition of the work that needs to be applied to some, so far, unspecified data set. In actuality, it is just a 'directed acyclic graph' or DAG of reusable code fragments.