Cascading Community Projects
The Cascading ecosystem is filled with support for a variety of programming languages, data sources, serializers and tools that extend the functionality of Cascading applications. These extensions are available for use with Cascading and are contributed code from both Concurrent and the Cascading community. Many new projects are actively available through Cascading GitHub and the Conjars Maven jar repository.
Note: Most projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects. Many are also under active development.
Supported languages extend Cascading functionality with domain-specific features and functionality of another language.
|Clojure||Cascalog||Clojure for Cascading||GitHub | Groups | Issue Tracking | Stack Overflow | Docs | Tutorials||Apache 2.0|
|Java||Cascading||From Concurrent, the proven framework for building enterprise data applications||GitHub | Groups | Docs | Tutorials||Apache 2.0|
|JRuby||Cascading.JRuby||From Etsy, JRuby for Cascading||GitHub | Issue Tracking||LGPL 3|
|PMML||Pattern||From Concurrent, PMML for Cascading||GitHub | Groups | Issue Tracking | Docs | Tutorials||Apache 2.0|
|Python||PyCascading||From Twitter, Python for Cascading||GitHub | Issue Tracking | Tutorials||Apache 2.0|
|Scala||Scalding||From Twitter, Scala for Cascading||GitHub | Groups | Issue Tracking | Stack Overflow | Docs | Tutorials||Apache 2.0|
|SQL||Lingual||From Concurrent, an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop||GitHub | Groups | Issue Tracking | Docs | Tutorials | Binary||Apache 2.0|
Data Source Connectivity (Taps)
A tap is a Cascading term that refers to a physical data source. These data sources can be used as inputs and outputs in Cascading.
|Accumulo||Cascading.Accumulo||Accumulo data source for Cascading||GitHub | Issue Tracking||Apache 2.0|
|Cassandra||Cascading-Cassandra||Cassandra data source for Cascading||GitHub | Issue Tracking||Apache 2.0, Eclipse|
|Elasticsearch||elasticsearch-hadoop||Elasticsearch data source for Cascading||GitHub | Issue Tracking | Tutorials||Apache 2.0|
|ElephantDB||ElephantDB||ElephantDB data source for Cascading||GitHub | Issue Tracking||Custom|
|HBase||Cascading.HBase||HBase data source for Cascading||GitHub||Apache 2.0|
|Hive||Cascading-Hive||Integrate and run Hive in Cascading||GitHub | Issue Tracking||Apache 2.0|
|Cascading.Hive||Hive data source for Cascading||GitHub | Issue Tracking||Apache 2.0|
|JDBC||Cascading-JDBC||From Concurrent, provides support for reading/writing data to/from an RDBMS via JDBC drivers||GitHub | Issue Tracking||Apache 2.0|
|Memcached||Cascading.Memcached||Memcached data source for Cascading||GitHub||Apache 2.0|
|MongoDB||Cascading-Mongomigrate||MongoDB data source for Cascading||GitHub||Apache 2.0|
|Neo4j||Cascading.Neo4j||Neo4j data source for Cascading||GitHub | Issue Tracking||Apache 2.0|
|Parquet||Parquet-mr||Parquet data source for Cascading||GitHub | Groups | Issue Tracking||Apache 2.0|
|SimpleDB||Cascading.SimpleDB||From Scale Unlimited, SimpleDB data source for Cascading||GitHub | Issue Tracking||Apache 2.0|
|Solr||Cascading.Solr||From Scale Unlimited, Solr data source for Cascading||GitHub | Issue Tracking||Custom|
|Splunk||Tbana||Splunk data source for Cascading||GitHub | Issue Tracking||Apache 2.0|
Serializers provide integration with Cascading by translating data objects into other formats that can be stored and reconstructed.
|Avro||Cascading.Avro||From Scale Unlimited, data serialization for Apache Avro||GitHub | Issue Tracking||Apache 2.0|
|Kryo||Cascading.Kryo||Provides a drop-in Kryo serialization for your Cascading (or Hadoop) workflow||GitHub | Issue Tracking||Eclipse|
|Protocol Buffers||Cascading2-protobufs||From Square, library for working with Protocol Buffers||GitHub | Issues||MIT|
|Thrift||Cascading-Thrift||Serializer and raw comparator for using TBase and TEnum objects in Hadoop||GitHub | Issue Tracking||Custom|
Cascading tools are applications that create, debug, maintain, and otherwise support Cascading apps and functionality.
|Bixo||Web mining toolkit that runs as a series of Cascading pipes||GitHub | Issue Tracking | Tutorials||Apache 2.0|
|Cascading-helpers||From Square, functions, filters, and other tools for Cascading||GitHub | Issue Tracking||Apache 2.0|
|Cascading-dbmigrate||Tool to migrate relational databases into Hadoop||GitHub | Issue Tracking||Apache 2.0|
|Cascading_ext||From LiveRamp, a collection of tools to build, debug, and run data workflows||GitHub | Issue Tracking||Apache 2.0|
|Cascading-simhash||Simhashing is an algorithm that calculates “group id” (minimum hash) content||GitHub | Issue Tracking||GPL 3|
|Cascading-tube||Tiny wrapper around Hadoop for chaining operations||GitHub | Issue Tracking||Apache 2.0|
|Cascading.utils||Set of utilities for Cascading workflows for various projects||GitHub | Issue Tracking||Apache 2.0|
|Jading||Build and execution tool for Cascading.JRuby that handles packaging for execution on Hadoop||GitHub | Issue Tracking | Tutorials||Custom|
|Lein-Cascading||Leiningen is for automating Clojure projects||GitHub | Issue Tracking | Mail List | IRC | Tutorials||Apache 2.0|
|Lingual||From Concurrent, an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop||GitHub | Groups | Issue Tracking | Docs | Binary||Apache 2.0|
|Load||From Concurrent, a command line interface for load testing and benchmarking||GitHub | Issue Tracking | Binary||Apache 2.0|
|Multitool||From Concurrent, a command line interface for building data processing jobs||GitHub | Binary||Apache 2.0|
|Riffle||Library for executing collections of dependent processes as a single process||GitHub | Issue Tracking||Apache 2.0|
Have a related Cascading project that’s not listed?
Let us know here!