Scalding

Scalding is an extension to Cascading that enables application development with Scala, a powerful language for solving functional problems. A Scala API for Cascading, Scalding provides functionality from custom join algorithms to multiple APIs (Fields-based, Type-safe, Matrix) for developers to build robust data applications. Scalding is built and maintained by Twitter.

  • Build Data Applications with Scala
    A Scala API for Cascading, Scalding is a dynamic programming language that makes MapReduce computations look very similar to Scala's collection API. It's also a wrapper for Cascading to simplify jobs, tests and data sources on HDFS or local disk.
  • Built with the Cascading framework
    Because Scalding is built on top of the Cascading framework, this dynamic programming language inherits the value Cascading brings to app development, including: extensibility with the Cascading ecosystem, application portability and test-driven development best practices.

Scalding Benefits

  • Build your Data Applications with Scala
  • Simple and concise syntax
  • Leverage the benefits of the Cascading application framework

Get Started with Scalding

To get started with Scalding, you can either download the Cascading SDK or clone the Scalding repository from GitHub.

To download the Cascading SDK, visit the downloads page.

To clone the Scalding repository on GitHub, first:

git clone https://github.com/twitter/scalding.git

Next, build the code using sbt (a standard Scala build tool). Make sure you have Scala (download here, see scalaVersion in project/Build.scala for the correct version to download), and run the following commands:

./sbt update
./sbt test # runs the tests
./sbt assembly # creates a fat jar with all dependencies

Compatibility

Cascading

Scalding is a DSL that integrates Cascading with the Scala programming language. Because Scalding is built on top of Cascading, it allows for writing Cascading applications in Scala. The Java interoperability of Scala allows developers to combine Scalding based code with Cascading flows written in Java.

Driven

Scalding applications will work with Driven. You can build your applications with Scalding and visualize them in Driven, just like any other Cascading application.