Cascading is a proven application development platform for building Data applications on Apache Hadoop. Whether solving simple or complex data problems, Cascading balances an optimal level of abstraction with the necessary degrees of freedom through a computation engine, systems integration framework, data processing and scheduling capabilities.
Cascading Benefits
- Quickly build robust, reliable, data-oriented applications
- Develop testable and reusable integrations, data processing code and algorithms
- Leverage existing best practices, skill sets and tools
KEY FEATURES
THE SECRET SAUCE
WHAT MAKES CASCADING SO EFFECTIVE
Division of Logic
Cascading allows you to develop your business logic separately from your integration logic. Develop complete applications and write unit tests without touching a single Hadoop API. It gives you the degrees of freedom to easily move through the application development life-cycle and separately deal with integrating existing systems.
Think in Business Terms
Cascading provides a rich API that allows you to think in terms of data and business problems with capabilities such as sort, average, filter, merge, etc. The computation engine and process planner convert your business logic into efficient parallel jobs and delivers the optimal plan at run-time to your Hadoop installation.
Systems Integration
Hadoop is never used alone and Cascading allows you to easily read and write from a variety of external systems to Hadoop and then write results to another system. The Cascading SDK comes with many pre-built and supported integrations, with many more provided by the community.