Cascading 1.2 Now Available

We are happy to announce that Cascading 1.2 is now publicly available for download.

This release features many performance and usability enhancements while remaining backwards compatible with 1.0 and 1.1.

Specifically:

  • Performance optimizations during grouping (StreamComparator)
  • Composable map-side partial aggregations (AggregateBy)
  • Native Riffle support for non-Cascading (or nested iterative Cascading) processes (ProcessFlow and Riffle)

For a detailed list of changes see:
CHANGES.txt

We are also happy to announce that Cascading and its extensions have their own Maven/Ivy Jar repository, Conjars. Conjars is a public repository, any developer wishing to publish Cascading libraries and extensions can register their public key and push artifacts. Conjars is a simple fork of the Clojars repo code.

Along with this release are a number of extensions created by the Cascading user community.

Among these extension are:

  • Cascading.Avro – Cascading Scheme for the Apache Avro data serialization format.
  • Cascading.Memcached – Integration with Memcached, Membase, and ElasticSearch.
  • Bixo – a web mining toolkit
  • DBMigrate – a tool for migrating data to/from RDBMSs into Hadoop
  • Apache HBase, Amazon SimpleDB, and JDBC integration
  • JRuby and Clojure based scripting languages for Cascading
  • Cascalog – a robust interactive extensible query language

This release will run against 0.19.x, and 0.20.x. Including Amazon Elastic MapReduce.