Cascading 3.3.0 Released

Cascading 3.3.0 has been released.

Cascading 3.3 WIP

Just a heads up we are finishing up work on a 3.3 WIP, so please give it a spin.

Cascading 4 Adds Native JSON Support

Work on Cascading 4 continues with new support for native JSON data types while opening the door for uniform support of other nested data types.

Cascading 4 for Streaming with Amazon S3 and Apache Kafka

Work on Cascading 4 continues with a new sub-project that provides new Cascading local mode Tap implementations for Amazon S3 and Apache Kafka.

Concurrent/Driven acquired by Xplenty. What does this mean for Cascading?

As part of our recent acquisition of Concurrent/Driven, Xplenty is also taking charge of Cascading, the popular open source project that is used to create and execute complex data processing workflows on Hadoop clusters.

Cascading 3.2.0 Released

Cascading 3.2.0 has been released.

Cascading 3.1 Release

We are happy to announce Cascading 3.1 is now publicly available for download.

Announcing Cascading 3.0 on Apache Flink

Thanks to our partners, data Artisans, Cascading users now have an additional compute fabric to execute Cascading 3.0 applications on, Apache Flink. From the project site.. “Apache Flink is a platform for scalable stream and batch processing. Flink’s execution engine features low-latency pipelined and scalable batched data transfers and high-performance, in-memory operators for sorting and joining that gracefully go out-of-core in case of scarce memory resources. Apache Flink uses in-memory storage to achieve massive performance gains over MapReduce.

Cascading 3.0 Maintenance Release

We have just published Cascading 3.0.2, a minor maintenance release. Upgrading is recommended for all users. This release resolves the following issues: Updated Apache Tez to 0.6.2 to prevent deadlocks in complex DAGs. Note this release is incompatible with Tez 0.6.1. Fixed issues concerning detailed stats retrieval robustness for both MapReduce and Tez platforms. Updated build to exclude jgrapht-ext, further isolation of jgrapht apis to support reliable shading.

Cascading 2.7 Maintenance Release

We have just published Cascading 2.7.1, a minor maintenance release. This release resolves the following issues: Fixed issue where c.p.GroupBy or c.p.CoGroup would fail if attempting to group or join incoming Fields.UNKNOWN tuple streams using relative positions in the grouping fields selectors. Fixed issue where c.u.ShutdownUtil could log a NPE if a hook is removed during JVM shutdown. https://github.

Cascading 3.0 Maintenance Release

We have just published a new maintenance release 3.0.1 of Cascading. This release resolves the following issue: – Fixed issue in c.f.t.p.Hadoop2TezFlowStepJob where the LocalResources were not passed to the AppMaster correctly causing ClassNotFoundException during split calculation for custom InputFormats. https://github.com/Cascading/cascading/blob/3.0.1/CHANGES.txt It can be downloaded from these locations: /downloads/ https://github.

Cascading-Hive 2.0 Release

We are happy to announce the release of Cascading-Hive 2.0. This release adds compatibility with Cascading 3.0. Furthermore it contains a major contribution from the Cascading community, namely hotels.com: It is now possible to read and write ACID ORC tables with Cascading-Hive. This feature relies on corc, an ORC integration for Cascading, also created by hotels.com. The demo directory contains a new application demonstrating this new feature. The jars are deployed on conjars and the code is available on github.

Cascading 3.0 release

We are happy to announce Cascading 3.0 is now publicly available for download. The biggest change in this version, compared to previous releases, is Cascading has added native support for Apache Tez along side Apache Hadoop MapReduce and Cascading’s native local in memory mode. It is now trivial (a matter of changing a few lines of code) to move your application to run on Tez instead of MapReduce. We’ve seen others run performance tests with Scalding and Tez and are reporting significant performance improvements.

Cascading 2.7 Release

We are happy to announce that Cascading 2.7 is now publicly available for download. This is the last planned minor release of Cascading in the 2.x line before we make Cascading 3.0 final. This release contains new features and bug fixes. In summary, two features of particular interest are PartitionTap support for small files, and Traps can now capture diagnostic information on the failure. Changes of note are: Added support for o.

Fluid 1.0

We are happy to announce that Cascading Fluid 1.0 is now publicly available. /fluid Fluid is an API library exposing the Cascading library as a Java fluent interface and mirrors all of the Cascading concepts without introducing new ones. As a fluent API, Java IDEs, like IntelliJ IDEA and Eclipse, will auto-suggest the next API call based on the prior method call. Only methods that would logically be next in the chain will be suggested.

Cascading SDK 2.6

We are happy to announce that Cascading SDK 2.6 is now publicly available. The Cascading SDK is a collection of tools, documentation, libraries, tutorials and example projects for the greater Cascading community. What’s New: Cascading 2.6 support for all tools (Lingual 1.2, Multitool 2.6, Load 2.6, Scalding, Cascalog) Teradata Tap is now included in the Cascading-JDBC project New and updated tutorials: Cobol copybook, ETL, Teradata, Redshift For more details:

Lingual 1.2

We are happy to announce that Lingual 1.2 is now publicly available. The purpose of Lingual is to ease migration of SQL based workloads onto Hadoop, and to simplify integration with Hadoop through standards based APIs. Lingual provides an ANSI SQL interface on Cascading for Apache Hadoop. Lingual 1.2 includes both critical bug fixes, updated to support Cascading 2.6, and improved support for Driven. It is highly recommended that users upgrade to this release.

Cascading 2.6

We are happy to announce that Cascading 2.6 is now publicly available for download. This release contains new features and bug fixes. Of note are the new DecoratorTap and DistCacheTap (itself a DecoratorTap sub-class) classes. Working together, Flows can cache data directly into the Hadoop distributed cache when accumulating data for a HashJoin. And for Driven users, new Java annotations allow for additional meta-data to be sent to the Driven UI when visualizing assemblies.

News and Announcements