News

Latest News & Updates

Cascading 3.0 Maintenance Release

We have just published a new maintenance release 3.0.1 of Cascading.
This release resolves the following issue:
– Fixed issue in c.f.t.p.Hadoop2TezFlowStepJob where the LocalResources were not passed to the AppMaster correctly causing ClassNotFoundException during split calculation for custom InputFormats.
It can be downloaded from these locations:

Cascading Newsletter - June 2015

There has been a lot going on in the last month. Cascading 3.0 release is now available. This release helps future-proof your data infrastructure investmentsand by supporting newer compute fabrics as they become available. Also, a new EAP version for Driven is freely available for doing real time performance testing of your Cascading/Scalding apps. A Scalding blog reports dramatic improvements in performance of Cascading 3.0 on Apache Tez. Netflix / PigPen published a Getting Started Guide for Cascading users. New Concurrent blogs include a tutorial on how to run Cascading on AWS EMR and how to boost Hadoop performance through better Dev and Ops collaboration. Continue reading

Cascading-Hive 2.0 Release

We are happy to announce the release of Cascading-Hive 2.0. This release adds compatibility with Cascading 3.0. Furthermore it contains a major contribution from the Cascading community, namely hotels.com: It is now possible to read and write ACID ORC tables with Cascading-Hive. This feature relies on corc, an ORC integration for Cascading, also created by hotels.com. The demo directory contains a new application demonstrating this new feature.

The jars are deployed on conjars and the code is available on github.

Cascading-Hive allows you to read and write Hive tables from within Cascading Flows as well as running any HiveQL query as part of a Cascade.

– Andre

Cascading 3.0 release

We are happy to announce Cascading 3.0 is now publicly available for download.

The biggest change in this version, compared to previous releases, is Cascading has added native support for Apache Tez along side Apache Hadoop MapReduce and Cascading’s native local in memory mode. It is now trivial (a matter of changing a few lines of code) to move your application to run on Tez instead of MapReduce. We’ve seen others run performance tests with Scalding and Tez and are reporting significant performance improvements.

This milestone release of Cascading with Apache Tez support means we’ve completed the work to the query planner to make it faster for us and the community to integrate Cascading with other compute fabrics, as they become available. We hope to announce additional platform support in the near future.

Along with the ease of adding new platforms, the new query planner should also show some improvements over Cascading 2.x execution times on MapReduce. Additionally, we’ve given the developer direct control over how they optimize their MapReduce and Tez jobs perform so you can tune performance to your specific needs.

Please note this is a major release, thus all deprecated methods have been removed, along with some incompatible API changes to the Cascading public API, you will need to edit and recompile in order to upgrade to 3.0.

As we continue to advance the code base, a number of other enhancements and bug fixes are included in the release. For the complete list of changes in Cascading 3.0, please see the change log.

Enjoy!

Cascading 2.7 Release

We are happy to announce that Cascading 2.7 is now publicly available for download. This is the last planned minor release of Cascading in the 2.x line before we make Cascading 3.0 final.

This release contains new features and bug fixes. In summary, two features of particular interest are PartitionTap support for small files, and Traps can now capture diagnostic information on the failure. Changes of note are:

  • Added support for o.a.h.m.l.CombineFileInputFormat in the Hadoop specific c.t.h.PartitionTap implementation.
  • Added c.t.Tap#prepareResourceForRead() and c.t.Tap#prepareResourceForWrite() methods to allow for client side tap resource initialization.
  • Updated trap handling to capture diagnostic information within a trap when configured via a c.t.TrapProps instance.
  • Updated c.t.u.TupleHasher to use MurmurHash3 32bit for hashCode calculation.
  • Added ability to provide a custom cache to be used in c.p.a.AggregateBy and c.p.a.Unique.
  • Updated c.f.h.MapReduceFlow to support both the org.apache.hadoop.mapred.* and org.apache.hadoop.mapreduce.* APIs.
  • Updated Cascading SDK

For more details on new features and resolved issues see the change log.

Enjoy!

Cascading Newsletter - April 2015

Check out the latest Cascading & Driven news, updates, events, and useful resources such as tutorials, extensions, and more. In this issue, get the latest update on Cascading 3.0—which is coming soon; Learn more about Maestro which is built on Scalding; Writing to Hbase using Scalding; Read how LiveRamp performs Transitive Elimination for large-scale problems; Get an introduction to Apache Tez via the Warsaw HUG slideshare presentation; Pre-order “Learning Cascading” from basics to advanced topics and links to tutorials, extensions and more… Continue reading

Fluid 1.0

We are happy to announce that Cascading Fluid 1.0 is now publicly available.

http://www.cascading.org/fluid

Fluid is an API library exposing the Cascading library as a Java fluent interface and mirrors all of the Cascading concepts without introducing new ones.

As a fluent API, Java IDEs, like IntelliJ IDEA and Eclipse, will auto-suggest the next API call based on the prior method call. Only methods that would logically be next in the chain will be suggested. This lowers the burden on new Cascading developers who wish to rapidly create data-processing applications on Apache Hadoop.

The Fluid API is generated directly from Cascading compiled libraries and supports all currently supported Cascading final and WIP releases, including Cascading 3.0 WIP which provides support for Apache Tez.

Current release Java docs can be found here:
http://docs.cascading.org/fluid/1.0/javadoc/fluid-api

To see Fluid in action, checkout part 6 of the Cascading for the Impatient series which has been ported to Fluid:
https://github.com/Cascading/Impatient/blob/fluid/part6/src/main/java/impatient/Main.java

The source to the complete ported Impatient series:
https://github.com/Cascading/Impatient/tree/fluid

Cascading Newsletter - December 2014

Check out the latest Cascading & Driven news, updates, events, and useful resources such as tutorials, extensions, and more. This issue features a new multi-part tutorials on Data Processing on AWS, Building Data Apps with Scalding, and Scalable Cobol Copybook Data Processing; and updates on Scalding REPL, Scandal project, a developers guide, and Top resources from 2014 and what’s coming with Cascading. Continue reading

Cascading SDK 2.6

We are happy to announce that Cascading SDK 2.6 is now publicly available.

The Cascading SDK is a collection of tools, documentation, libraries, tutorials and example projects for the greater Cascading community.

http://cascading.org/sdk

What’s New:

  • Cascading 2.6 support for all tools (Lingual 1.2, Multitool 2.6, Load 2.6, Scalding, Cascalog)
  • Teradata Tap is now included in the Cascading-JDBC project
  • New and updated tutorials: Cobol copybook, ETL, Teradata, Redshift

For more details:

https://github.com/Cascading/CascadingSDK#cascading-26-sdk

Lingual 1.2

We are happy to announce that Lingual 1.2 is now publicly available.

The purpose of Lingual is to ease migration of SQL based workloads onto Hadoop, and to simplify integration with Hadoop through standards based APIs. Lingual provides an ANSI SQL interface on Cascading for Apache Hadoop.

Lingual 1.2 includes both critical bug fixes, updated to support Cascading 2.6, and improved support for Driven. It is highly recommended that users upgrade to this release.

http://www.cascading.org/downloads

What’s New:

  • Fixed issue where c.l.f.SQLPlanner could execute a flow instead of just planning it
  • Fixed issue in shell wrapper where a previously set LINGUAL_CLASSPATH would be lost and therefore the cascading-service.properties file could not be loaded by setting LINGUAL_CLASSPATH before running lingual shell
  • Several changes to reduce memory footprint and help GC in long running processes with many flows

For more details:

https://github.com/Cascading/lingual