Cascading Extensions are user contributed code for use with Cascading
Many projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects.
Also many new projects are becoming available through the Cascading GitHub page and the Conjars Maven jar repository. Search there for jar availability and versions.
Tools
- Lingual – A SQL command shell and JDBC Driver for executing ANSI SQL queries as Cascading applications on Apache Hadoop clusters.
- Bixo – A Cascading based web crawling and data mining toolkit. A more robust replacement for Apache Nutch maintained by Scale Unlimited.
- Cascading-DBMigrate – An alternative to Cascading.JDBC for relational data access and integration.
- Load – A command line tool for creating high load jobs on a Hadoop cluster.
- Multitool – A command line tool for processing large files like sed and grep on unix.
Programming Languages (DSLs)
- Cascalog – A Clojure based DSL maintained by Twitter.
- Cascading.JRuby – A JRuby based DSL maintained by Etsy. Also see Jading for app packaging.
- PyCascading – A Jython based DSL maintained by Twitter.
- Scalding – A Scala based DSL maintained by Twitter.
Integration
- Apache Avro Schemes – Cascading.Avro and Cascading-Avro. [conjars]
- Cascading.HBase : cwensel & trendmicro – Integration with HBase. Make note of the available branches and forks.
- Cascading.JDBC – Integration with the JDBC API for read/write access to relational databases.
- Cascading.JSON – A set of operations for creating and manipulating JSON data.
- Cascading.Kryo – Serialization for the Kryo data serialization format. [conjars]
- Cascading.Memcached – Integration with Memcached, Membase, and ElasticSearch.
- Cascading-MongoMigrate – Integration with MongoDB.
- cascading2-protobufs – Schemes and Serializers for Google Protocol Buffers.
- Cascading.SimpleDB – Integration with Amazon SimpleDB. [conjars]
- Cascading.Solr – Integration with Apache Solr.
- Cascading-Thrift – Integration with Apache Thrift.
Libraries
- cascading-ext – A collection of tools from LiveRamp.
- cascading-helpers – A library of utilities for easing Java Cascading development. Includes the fluent API Pump. Maintained by Square.
- cascading-simhash – A library for creating similarity hashes of documents.
- cascading-utils – A few utilities from ScaleUnlimited.
Related
- Riffle – Annotations and Classes for managing and executing dependent processes from within a Cascading application. Intended to allow third-party projects to integrate with Cascading without adding Cascading dependencies. May be used standalone.
Many modules are still early stage and are being put forth so more users can test and improve them (so fork freely).
Modules are externally hosted and many are maintained by users. To have your module listed, send an email to support@cascading.org.