Cascading Modules and Extensions
Cascading Modules are user contributed code and extensions to Cascading.
Many projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects.
Also many projects are becoming available through the Conjars Maven jar repository. Search there for jar availability and versions.
Tools
- Bixo - A Cascading based web crawling and data mining toolkit. A more robust replacement for Apache Nutch maintained by Bixo Labs.
- Cascading-DBMigrate - An alternative to Cascading.JDBC for relational data access and integration.
- Cascading.Load - A command line tool for creating high load jobs on a Hadoop cluster. Useful for tuning Hadoop configurations.Source code
- Cascading.Multitool - A command line tool for processing large text files and datasets like sed and grep on unix. Source code
Programming Languages (DSLs)
- Cascalog - A Clojure based DSL maintained by Twitter.
- Cascading.Groovy - An example Groovy based DSL. Source code.
- Cascading.JRuby - A JRuby based DSL maintained by Etsy.
- PyCascading - A Jython based DSL maintained by Twitter.
- Scalding - A Scala based DSL maintained by Twitter.
Integration
- Cascading.Avro - Cascading Scheme for the Apache Avro data serialization format. [conjars]
- Cascading.HBase : cwensel & trendmicro - Integration with HBase. Make note of the available branches and forks.
- Cascading.JDBC - Integration with the JDBC API for read/write access to relational databases.
- Cascading.JSON - A set of operations for creating and manipulating JSON data.
- Cascading.Kyro - Serialization for the Kyro data serialization format. [conjars]
- Cascading.Memcached - Integration with Memcached, Membase, and ElasticSearch.
- Cascading.SimpleDB - Integration with Amazon SimpleDB. [conjars]
- Cascading.Solr - Integration with Apache Solr.
- Cascading-Thrift - Integration with Apache Thrift.
Libraries
- cascading-simhash - A library for creating similarity hashes of documents.
Related
- Riffle - Annotations and Classes for managing and executing dependent processes. A lightweight alternative to Cascading's Cascade class. For use by non-GPL licensed projects.
Many modules are still early stage and are being put forth so more users can test and improve them (so fork freely).
Modules are externally hosted and many are maintained by users. To have your module listed, send an email to support@cascading.org.