Recently in News Category

Cascading News of Note

|

Just wanted to point out few recent blog posts and upcoming events.

First, if you are in the Atlanta area, check out July 21, 2009 - Cloud Computing with Hadoop, Map/Reduce and Cascading.

Also, A new Cascading pipe - MultiGroupBy outlines a way to defer the joining of multiple streams during co-grouping to a subsequent operation (a Buffer).

This is definitely something we would like to adopt in some fashion for Cascading 1.1.

Next, Cascading's Logparser example in Clojure takes home the current meme on wrapping Cascading with Clojure. For those not in the loop, "clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system".

I think this re-inforces the idea of exposing a MapReduce query planner as an API and not as a syntax. I am very interested to see how this evolves.

Finally, the folks at Cloudscaling call out a recent presentation by Chris, Hadoop 101, that covers Hadoop, Cascading, and some best practices.

You can reach the presentation directly here: Building Scale Free Applications with Hadoop and Cascading.

Cascading WIP 1.1

|

Cascading WIP 1.1 is now available as source on GitHub and as a regression tested distribution at Concurrent, Inc..

Please consider this WIP (and any other Work In Progress branch) as unstable and unsuitable for production use. That said, the more users who test it will make it stable that much more quickly.

Also note that the distribution downloads from Concurrent, Inc. are fully regression tested, so should be a drop in replacement for Cascading 1.0.

Please see CHANGES.txt for a comprehensive list of new features and bug fixes.

For highlights, please read on.

Hadoop: The Definitive Guide

|
For those who missed the announcement, Hadoop: The Definitive Guide was made available early. Grab a copy and checkout the Case Study in the back on Cascading written by the Cascading project lead developer.

ScaleCamp

|
Don't forget to sign up for ScaleCamp, the night before the Hadoop Summit 2009. Should be a nice collection of Cascading users milling about.

Chris will be presenting on Hadoop and Cascading twice this month (May) and twice next month (June). See below for a comprehensive list.

Today we are excited to announce official support for Amazon Elastic MapReduce.

With the Cascading 1.0 (Hadoop 0.18.3+) build (downloads), users can write and push their application into a dynamically provisioned Elastic MapReduce cluster via the AWS Console or the Ruby Command Line Client.

We also created the Cascading.Multitool, an application that allows users to create and run Hadoop data processing jobs using simple program argument parameters, very much like unix pipes and filters.

You can read more about Multitool and how to use with with Elastic MapReduce on the AWS Developer site.

You can also find a link to the Multitool source repository on the modules page for use against your own local cluster.

We are very excited about offering support for this new service, and hope our users watch this space for some additional announcements in the near future.

With the push of the Cascading 1.0.6 maintenance release today, we also added official support for both Hadoop 0.19.0+ and Hadoop 0.18.3+ releases.

That is, you can run any maintenance version of Hadoop 0.19, or any maintenance version of Hadoop 0.18.3 and above without any code changes to your Cascading application. Both libraries are API compatible and can be used interchangeably.

Do note that if your application sets any properties specific to a given Hadoop version, they may not be recognized, but this is generally unlikely.

Visit our downloads page for all available releases.

Cascading.JRuby DSL Module

|

Grégoire Marabout just pushed up his Cascading.JRuby DSL to GitHub. Great job Greg!

You can also find this and other extensions on our modules page.

Cascading.JDBC Module

|
Just pushed up experimental support for reading and writing from JDBC sources on the Cascading Modules page. Feel free to clone, test, patch, and notify us of any fixes/features on your branch.

Cascading.HBase Module

|
Two interesting bits of news here. First, we released support for Apache HBase as a third-party module. Second, we have a new page listing user contributed extensions to Cascading. Both can be found on the Cascading Modules page.