Scalding is an extension to Cascading that enables application development with Scala, a powerful language for solving functional problems. A Scala API for Cascading, Scalding provides functionality from custom join algorithms to multiple APIs (Fields-based, Type-safe, Matrix) for developers to build robust data applications. Scalding is built and maintained by Twitter.
- Build your Data Applications with Scala
- Simple and concise syntax
- Leverage the benefits of the Cascading application framework
Get Started with Scalding
To get started with Scalding, you can either download the Cascading SDK or clone the Scalding repository from GitHub.
To download the Cascading SDK, visit the downloads page.
To clone the Scalding repository on GitHub, first:
git clone https://github.com/twitter/scalding.git
Next, build the code using sbt (a standard Scala build tool). Make sure you have Scala (download here, see scalaVersion in project/Build.scala for the correct version to download), and run the following commands:
./sbt update ./sbt test # runs the tests ./sbt assembly # creates a fat jar with all dependencies
Scalding is a DSL that integrates Cascading with the Scala programming language. Because Scalding is built on top of Cascading, it allows for writing Cascading applications in Scala. The Java interoperability of Scala allows developers to combine Scalding based code with Cascading flows written in Java.
Scalding applications will work with Driven. You can build your applications with Scalding and visualize them in Driven, just like any other Cascading application.