Cascading, Scalding, and Related Tutorials Quickly start building Cascading apps with these tutorials and code samples contributed by Concurrent and members of the Cascading community.
Application development
Quick Start for Cascading
Learn how to use Cascading through Java to implement common ETL tasks on Apache HadoopCascading on Amazon Web Services
Use Cascading to create data processing workflows in AWS environments.The Java Developer’s Guide to Apache Hive
Implement Hive workflows within Cascading Flows and Cascades
Integrations
Accessing Redshift with Cascading
Learn how to optimize migration of data between Amazon Redshift and your Cascading applicationsScalable Cobol copybook Processing
Create new Cascading schemes for Cobol copybook and other EBCIDIC, Packed-Decimal Format to process mainframe data in your Hadoop applicationsIntegrating Cascading with Teradata
Augment your EDW systems by learning how to develop Cascading applications on Hadoop that integrate (bulk import, export) with Teradata
Application development
Quick Start for Scalding
Learn how to implement common data processing operations in Scala with Scalding. This tutorial ports lab examples from the popular Cascading training class to ScaldingScalding Tutorial
Accelerate application development by reusing code from common patternsScalding Workshop
Developed by Think Big Analytics, this tutorial takes you through the principles of writing data analysis applications with Scalding
Data Science
Recommendations with Scalding
Popular tutorial by Twitter for building a recommendation application in ScaldingMovie Recommendations with Scalding
Another tutorial for creating movie recommendationsPoker Collusion Detection with Mahout and Scalding
K-means clusterization algorithm for finding similar players on stackoverflow.com, where the criterion of similarity was the set of the authors of questions/answers the users were up-/downvoting.Portfolio Management in Scalding
A fun tutorial that uses Portfolio Selection algorithm to divide $1000 among four stocks
More tutorials and code samples
Project | Tutorial / code sample | Description |
---|---|---|
Cascading (Java) | Integrating Cascading with Map Reduce APIs | Leverage existing MapReduce in a Cascading Application |
Cascading Impatient Series | This six-part tutorial series gets you started with Cascading | |
Cascading.Learn | Test driven learning tutorial for Cascading | |
Accessing Redshift with Cascading | Make Amazon’s Redshift a Cascading tap | |
Example: City of Palo Alto Open Data | An example Cascading app based on City of Palo Alto open data | |
Example: Recommender Algorithm | An example Cascading app with a simple social recommender | |
Example: Word Count | A simple example of a Cascading word count application | |
Example: Log Parser | A simple example of a Cascading log parser application | |
Example: Log Analysis | A simple example of a Cascading log analysis application | |
Scalding | Typesafe’s Activator for Scalding | Performs analytics on data sets with a Scala-based API |
Lingual | Accessing HBase with Cascading Lingual | Data processing with Apache HBase via Cascading Lingual |
Accessing Oracle with Cascading Lingual | Create a workflow exporting data from Oracle into Hadoop | |
Pattern | Quickly Migrate Predictive Models to Hadoop | Migrate models from SAS, R, etc. onto Hadoop and deploy at scale |
Example: Simple PMML | A simple PMML example of Cascading Pattern | |
Example: Robust PMML | A robust PMML example Cascading Pattern | |
Example: Model API Usage Test | An example for model testing with Cascading Pattern | |
Cascalog | Cascalog Tutorial | A tutorial for running Cascalog on Hadoop |
Introducing Cascalog (pt1, pt2) | An introduction to Cascalog | |
Cascalog Impatient Series | Similar to Cascading for the Impatient, but for Cascalog | |
JCascalog | A tutorial on JCascalog, the Java interface to Cascalog | |
Developing and Deploying a Cascalog Query on Hadoop | A tutorial for creating a Facebook-like news feed | |
Methods for Handling Wide Sources | A tutorial for handling excess tuple fields | |
Predicate Macros | A tutorial for defining predicate macros | |
Cascalog and Hadoop Security | How to deal with Hadoop security exceptions | |
Testing Cascalog with Midje (pt1, pt2) | A helpful Cascalog testing suite |
Related Tutorials
Project | Tutorial / code sample | Description |
---|---|---|
Bixo | Getting Started with Bixo | Bixo is web mining toolkit |
ElasticSearch | Using ElasticSearch as a Cascading tap | Elasticsearch can used in Cascading as a SourceTap or SinkTap |
Jading | Getting Started with Jading | Jading is a build and execution tool for Cascading.JRuby |
Lein-Cascading | Leiningen integration for Cascading | Leiningen is for automating Clojure projects |
PyCascading | A collection of PyCascading Examples | PyCascading enables Python code for MapReduce-like execution |