Extensions

Cascading Community Projects


The Cascading ecosystem is filled with support for a variety of programming languages, data sources, serializers and tools that extend the functionality of Cascading applications. These extensions are available for use with Cascading and are contributed code from both Concurrent and the Cascading community. Many new projects are actively available through Cascading GitHub and the Conjars Maven jar repository.

Note: Most projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects. Many are also under active development.

Supported Languages

Supported languages extend Cascading functionality with domain-specific features and functionality of another language.

Language
Project
Description
Resources
License

Clojure
Clojure for Cascading
Apache 2.0

Java
From Concurrent, the proven framework for building enterprise data applications
Apache 2.0

JRuby
From Etsy, JRuby for Cascading
LGPL 3

PMML
From Concurrent, PMML for Cascading
Apache 2.0

From Openscoring, PMML for Cascading
AGPL 3.0

Python
From Twitter, Python for Cascading
Apache 2.0

Scala
From Twitter, Scala for Cascading
Apache 2.0

SQL
From Concurrent, an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop
Apache 2.0

Data Source Connectivity (Taps)

A tap is a Cascading term that refers to a physical data source. These data sources can be used as inputs and outputs in Cascading.

Data Source
Project
Description
Resources
License

Accumulo
Accumulo data source for Cascading
Apache 2.0

Cassandra
Cassandra data source for Cascading
Apache 2.0, Eclipse

Derby
Derby data source for Cascading via JDBC
Apache 2.0

Elasticsearch
Elasticsearch data source for Cascading
Apache 2.0

ElephantDB
ElephantDB data source for Cascading
Custom

H2
H2 data source for Cascading via JDBC
Apache 2.0

HBase
HBase data source for Cascading
Apache 2.0

Hive
Integrate and run Hive in Cascading
Apache 2.0

Hive data source for Cascading
Apache 2.0

JDBC
From Concurrent, provides support for reading/writing data to/from an RDBMS via JDBC drivers
Apache 2.0

Oracle
Oracle data source for Cascading via JDBC
Apache 2.0

Memcached
Memcached data source for Cascading
Apache 2.0

MongoDB
MongoDB data source for Cascading
Apache 2.0

MySQL
MySQL data source for Cascading via JDBC
Apache 2.0

Neo4j
Neo4j data source for Cascading
Apache 2.0

Parquet
Parquet data source for Cascading
Apache 2.0

PostgreSQL
PostgreSQL data source for Cascading via JDBC
Apache 2.0

Redshift
Amazon Redshift data source for Cascading via JDBC
Apache 2.0

SimpleDB
From Scale Unlimited, SimpleDB data source for Cascading
Apache 2.0

Solr
From Scale Unlimited, Solr data source for Cascading
Custom

Splunk
Splunk data source for Cascading
Apache 2.0

Teradata
Teradata data source for Cascading via JDBC
Apache 2.0

Serializers

Serializers provide integration with Cascading by translating data objects into other formats that can be stored and reconstructed.

Serializer
Project
Description
Resources
License

Avro
From Scale Unlimited, data serialization for Apache Avro
Apache 2.0

JSON
JavaScript Object Notation (JSON) utility classes for Cascading
GNU

Kryo
Provides a drop-in Kryo serialization for your Cascading (or Hadoop) workflow
Eclipse

Protocol Buffers
From Square, library for working with Protocol Buffers
MIT

Thrift
Serializer and raw comparator for using TBase and TEnum objects in Hadoop
Custom

Tools

Cascading tools help create, debug, maintain, and otherwise support Cascading apps and functionality.

Project
Description
Resources
License

From Typesafe, an integration between Scalding and Typesafe Activator
Apache 2.0

Web mining toolkit that runs as a series of Cascading pipes
Apache 2.0

From Square, functions, filters, and other tools for Cascading
Apache 2.0

Tool to migrate relational databases into Hadoop
Apache 2.0

From LiveRamp, a collection of tools to build, debug, and run data workflows
Apache 2.0

Simhashing is an algorithm that calculates “group id” (minimum hash) content
GPL 3

Tiny wrapper around Hadoop for chaining operations
Apache 2.0

Set of utilities for Cascading workflows for various projects
Apache 2.0

From Etsy, a framework for building machine learning models in Hadoop using Scalding
MIT

From Concurrent, a Fluent API for Cascading
Apache 2.0

From Concurrent, plugin for IntelliJ IDE that improves the experience of Cascading app development
Apache 2.0

From Etsy, a build and execution tool for Cascading.JRuby that handles packaging for execution on Hadoop
Custom

Leiningen is for automating Clojure projects
Apache 2.0

From Concurrent, an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop
Apache 2.0

From Concurrent, a command line interface for load testing and benchmarking
Apache 2.0

From Concurrent, a command line interface for building data processing jobs
Apache 2.0

Library for executing collections of dependent processes as a single process
Apache 2.0

From Hotels.com, this is a unit testing framework for Cascading applications to simplify automated tests for cascades, flows, assemblies and operations
Apache 2.0

Scalding unit testing library for test-driven development
Apache 2.0

Have a related Cascading project that’s not listed?

Let us know here!