Extensions

Cascading Community Projects


The Cascading ecosystem is filled with support for a variety of programming languages, data sources, serializers and tools that extend the functionality of Cascading applications. These extensions are available for use with Cascading and are contributed code from both Concurrent and the Cascading community. Many new projects are actively available through Cascading GitHub and the Conjars Maven jar repository.

Note: Most projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects. Many are also under active development.

Supported Languages

Supported languages extend Cascading functionality with domain-specific features and functionality of another language.

Language Project Description Resources License
Clojure Cascalog Clojure for Cascading GitHub | Groups | Issue Tracking | Stack Overflow | Docs | Tutorials Apache 2.0
Java Cascading From Concurrent, the proven framework for building enterprise data applications GitHub | Groups | Docs | Tutorials Apache 2.0
JRuby Cascading.JRuby From Etsy, JRuby for Cascading GitHub | Issue Tracking LGPL 3
PMML Pattern From Concurrent, PMML for Cascading GitHub | Groups | Issue Tracking | Docs | Tutorials Apache 2.0
JPMML-Cascading From Openscoring, PMML for Cascading GitHub | Groups | Issue Tracking AGPL 3.0
Python PyCascading From Twitter, Python for Cascading GitHub | Issue Tracking | Tutorials Apache 2.0
Scala Scalding From Twitter, Scala for Cascading GitHub | Groups | Issue Tracking | Stack Overflow | Docs | Tutorials Apache 2.0
SQL Lingual From Concurrent, an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop GitHub | Groups | Issue Tracking | Docs | Tutorials | Binary Apache 2.0

 

Data Source Connectivity (Taps)

A tap is a Cascading term that refers to a physical data source. These data sources can be used as inputs and outputs in Cascading.

Data Source Project Description Resources License
Accumulo Cascading.Accumulo Accumulo data source for Cascading GitHub | Issue Tracking Apache 2.0
Cassandra Cascading-Cassandra Cassandra data source for Cascading GitHub | Issue Tracking Apache 2.0, Eclipse
Elasticsearch elasticsearch-hadoop Elasticsearch data source for Cascading GitHub | Issue Tracking | Tutorials Apache 2.0
ElephantDB ElephantDB ElephantDB data source for Cascading GitHub | Issue Tracking Custom
HBase Cascading.HBase HBase data source for Cascading GitHub Apache 2.0
Hive Cascading-Hive Integrate and run Hive in Cascading GitHub | Issue Tracking Apache 2.0
Cascading.Hive Hive data source for Cascading GitHub | Issue Tracking Apache 2.0
JDBC Cascading-JDBC From Concurrent, provides support for reading/writing data to/from an RDBMS via JDBC drivers GitHub | Issue Tracking Apache 2.0
Memcached Cascading.Memcached Memcached data source for Cascading GitHub Apache 2.0
MongoDB Cascading-Mongomigrate MongoDB data source for Cascading GitHub Apache 2.0
Neo4j Cascading.Neo4j Neo4j data source for Cascading GitHub | Issue Tracking Apache 2.0
Parquet Parquet-mr Parquet data source for Cascading GitHub | Groups | Issue Tracking Apache 2.0
SimpleDB Cascading.SimpleDB From Scale Unlimited, SimpleDB data source for Cascading GitHub | Issue Tracking Apache 2.0
Solr Cascading.Solr From Scale Unlimited, Solr data source for Cascading GitHub | Issue Tracking Custom
Splunk Tbana Splunk data source for Cascading GitHub | Issue Tracking Apache 2.0

 

Serializers

Serializers provide integration with Cascading by translating data objects into other formats that can be stored and reconstructed.

Serializer Project Description Resources License
Avro Cascading.Avro From Scale Unlimited, data serialization for Apache Avro GitHub | Issue Tracking Apache 2.0
JSON Cascading.JSON JavaScript Object Notation (JSON) utility classes for Cascading GitHub | Issue Tracking GNU
Kryo Cascading.Kryo Provides a drop-in Kryo serialization for your Cascading (or Hadoop) workflow GitHub | Issue Tracking Eclipse
Protocol Buffers Cascading2-protobufs From Square, library for working with Protocol Buffers GitHub | Issues MIT
Thrift Cascading-Thrift Serializer and raw comparator for using TBase and TEnum objects in Hadoop GitHub | Issue Tracking Custom
Typesafe Activator Activator Scalding From Typesafe, an integration between Scalding and Typesafe Activator GitHub | Issue Tracking Apache 2.0

 

Tools

Cascading tools help create, debug, maintain, and otherwise support Cascading apps and functionality.

Project Description Resources License
Bixo Web mining toolkit that runs as a series of Cascading pipes GitHub | Issue Tracking | Tutorials Apache 2.0
Cascading-helpers From Square, functions, filters, and other tools for Cascading GitHub | Issue Tracking Apache 2.0
Cascading-dbmigrate Tool to migrate relational databases into Hadoop GitHub | Issue Tracking Apache 2.0
Cascading_ext From LiveRamp, a collection of tools to build, debug, and run data workflows GitHub | Issue Tracking Apache 2.0
Cascading-simhash Simhashing is an algorithm that calculates “group id” (minimum hash) content GitHub | Issue Tracking GPL 3
Cascading-tube Tiny wrapper around Hadoop for chaining operations GitHub | Issue Tracking Apache 2.0
Cascading.utils Set of utilities for Cascading workflows for various projects GitHub | Issue Tracking Apache 2.0
Conjecture From Etsy, a framework for building machine learning models in Hadoop using Scalding GitHub | Issue Tracking MIT
Fluid From Concurrent, a Fluent API for Cascading GitHub | Issue Tracking Apache 2.0
IntelliJ IDEA Plugin From Concurrent, plugin for IntelliJ IDE that improves the experience of Cascading app development GitHub | Issue Tracking Apache 2.0
Jading From Etsy, a build and execution tool for Cascading.JRuby that handles packaging for execution on Hadoop GitHub | Issue Tracking | Tutorials Custom
Lein-Cascading Leiningen is for automating Clojure projects GitHub | Issue Tracking | Mail List | IRC | Tutorials Apache 2.0
Lingual From Concurrent, an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop GitHub | Groups | Issue Tracking | Docs | Binary Apache 2.0
Load From Concurrent, a command line interface for load testing and benchmarking GitHub | Issue Tracking | Binary Apache 2.0
Multitool From Concurrent, a command line interface for building data processing jobs GitHub | Binary Apache 2.0
Riffle Library for executing collections of dependent processes as a single process GitHub | Issue Tracking Apache 2.0

 

Have a related Cascading project that’s not listed?

Let us know here!