Cascading - User Guide

Concurrent, Inc

V 1.0

August, 2009


Table of Contents

1. Cascading
1.1. What is Cascading?
1.2. Who should use Cascading?
1.3. What is Apache Hadoop
2. Diving In
3. Data Processing
3.1. Introduction
3.2. Pipe Assemblies
Assembling Pipe Assemblies
Each and Every Pipes
GroupBy and CoGroup Pipes
3.3. Source and Sink Taps
3.4. Flows
Creating Flows from Pipe Assemblies
Configuring Flows
Skipping Flows
3.5. Creating Flows from a JobConf
3.6. Cascades
4. Executing Processes
4.1. Introduction
4.2. Building
4.3. Configuring
4.4. Executing
5. Using and Developing Operations
5.1. Introduction
5.2. Functions
5.3. Filter
5.4. Aggregator
5.5. Buffer
5.6. Operation and BaseOperation
6. Advanced Processing
6.1. SubAssemblies
6.2. Stream Assertions
6.3. Failure Traps
6.4. Event Handling
6.5. Template Taps
6.6. Scripting
6.7. Custom Taps and Schemes
7. Built-In Operations
7.1. Identity Function
7.2. Debug Function
7.3. Sample and Limit Functions
7.4. Insert Function
7.5. Text Functions
7.6. Regular Expression Operations
7.7. Java Expression Operations
7.8. XML Operations
7.9. Assertions
7.10. Logical Filter Operators
8. Best Practices
8.1. Unit Testing
8.2. SubAssemblies, not Factories
8.3. Give SubAssemblies Logical Responsibilities
8.4. Java Operators in Field Names
8.5. Debugging Planner Failures
9. How It Works
9.1. MapReduce Job Planner
9.2. The Cascade Topological Scheduler

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.