Cascading 0.6.0 Released
Version 0.6.0 of Cascading is now available for download. For details on new features and bug fixes, see the CHANGES.txt file. For a quick summary, read on.
This release provides two major features. Stream Assertions and Trap Taps.
Stream Assertions are used in a similar fashion as the Java language assert
function.
As the developer assembles more complex assemblies, it makes sense to inline assertions on the data expected in the stream. Assertions can test that a given source is clean, or verify that certain functions, filters, or aggregators are working as expected.
Assertions can be applied in two scopes, Strict or Validating. Strict assertions make sense as regression or unit style tests, and validating can be used as sanity checks during staging or production.
When a given assembly is planned into a Flow using the FlowConnector, unwanted assertions can be planned out completely so they offer no performance penalty. So re-usable assemblies can have loads of assertions internally, but they won’t translate into any overhead if unwanted during runtime.
The next feature is Trap Taps. They are similar to sinks and sources, except instead of being bound to the head or tail of a given assembly, they are bound to pipes within an assembly. If an operation invoked by a given Pipe instance (Each or Every) fails, the incoming Tuple will be saved to the named trap Tap.
This allows systems to continue running with no data loss if bad data leaks into the stream causing an operation to fail. This is extremely useful for low fidelity processes like web crawling and indexing. If a page just can’t be parsed, it can be saved for later and the job continues its work without it.
Please note this release has only been tested with Hadoop 0.16.x.