Recently in News Category

Scalding Released

|

If you are a Scala fan, checkout the Scalding announcement from Twitter. Or just grab the Scalding code from GitHub.

Of course, don't forget the other language bindings Cascalog, PyCascading, and Cascading.JRuby.

PyCascading Released

|

If interested in running Python on Apache Hadoop, checkout PyCascading from Twitter.

Here is the official announcement on our mail-list.

If Clojure is more your thing, there is always Cascalog, another project from the Twitter data teams (formerly BackType).

Intro to Cascading

|

Scale Unlimited will be offering their online course, Introduction to Cascading, this November 18th.

Cascading 2.0 Early Access

|

After months of work, we are very happy to announce availability of Cascading 2.0 WIP (Work in Progress).

2.0 is still under development, but it has become stable enough for us to make the work public so we can get early feedback on the APIs and other related changes, without causing unnecessary headaches to early adopters.

Currently nearly all changes are internal except for these...

  • Decoupled internal planner from Hadoop and providing a "local" mode planner for fast in-memory processing.
  • Changed the Tap APIs to improve development of custom taps.
  • Changed Cascading license from GPL v3 to Apache 2.0.

Do note we have a number of additional improvements in the works commonly requested by users. More on that soon.

To download WIP builds, please visit the Concurrent downloads page. Or grab the source from the public Git repository on GitHub.

For a comprehensive list of changes, see the CHANGES.txt file.

Apache Solr Integration

|

Apache Solr integration Tap has just been added to the Cascading extensions page for download from GitHub.

The No Fluff, Just Stuff conference tour is running a series of presentations on Cascading and Cascalog. Check out the video below for a great introduction to Cascading.

Cascading Load and Multitool

|

After a bit of work, we have repackaged both Cascading Load and Multitool giving them helper bash wrappers for installing, running, and updating. The new packages are on the download page.

After unpacking, multitool for example, just run ./bin/multitool install or ./bin/multitool help for more information.

Multitool is a command line interface for running sed and grep like application on Apache Hadoop. It even supports joins across multiple files. It's perfect for finding files or creating large test datasets from larger ones.

Cascading.Load is a command line tool for creating complex loads on a Apache Hadoop cluster for performance tuning.

Both tools are based on Cascading, of course.

JAX San Jose 2011

|
Chris will be speaking at JAX in San Jose on Tuesday June 21st on Apache Hadoop and "Big Data".

Buzzwords 2011

|
Chris will be speaking at Berlin Buzzwords this June on Common Patterns in MapReduce.

Interested in getting started with Hadoop, Cascading, and Cascalog?

If so, sign up for the Cascalog Workshop here in sunny San Francisco, Saturday February 19th, here before space runs out.

Nathan Marz of BackType and the author of Cascalog will be leading the workshop. Chris K Wensel, the author of Cascading, will be lurking about lending a hand where possible.