Cascading 0.8.0 Released

Version 0.8.0 of Cascading is now available for download. For details on new features and bug fixes, see the CHANGES.txt file. This is a major release consisting of many features and some incompatible API changes, please read on.

This release includes a large number of changes and we won’t list them all here. But there are just a couple worth providing additional explanation for.

First off c.p.PipeAssembly was renamed to c.p.SubAssembly. There is still a PipeAssembly class, but it has been marked as deprecated, so please update your code to use SubAssembly.

Cascading now uses, internally, a custom Hadoop InputFormat that allows a single MapReduce job to have multiple input files that must be parsed by different InputFormat instances. Previously Cascading would force these to be normalized into a SequenceFile, increasing processing time. This is no longer a limitation and should reduce CoGrouping Flows by a few MapReduce jobs.

For those wondering, only allowing a single InputFormat is a limitation enforced by Hadoop. Cascading now transparently works around this.

c.p.CoGroup can now group on the same Tap multiple times if the Tap Tuple stream branches through unique paths into the CoGroup. Previously Cascading would fail if joins were attempted on the same input file, regardless of the intermediate processing.

This version of Cascading will execute on Hadoop 0.18, but changes internal to Hadoop cause some features to fail, so is not recommended for production. We hope to have a compatible version soon.