Package cascading.groovy

Provides for Groovy language scripting support.

See:
          Description

Class Summary
CascadingBuilder CascadingBuilder is a Groovy 'builder' extension.
 

Package cascading.groovy Description

Provides for Groovy language scripting support.

Related Documentation

For overviews, tutorials, examples, guides, and tool documentation, please see: For the Groovy language tool and documentation, please see:

Using

To create a new Cascading 'builder', you must first create a new cascading.groovy.Cascading object.
def cascading = new Cascading()
def builder = cascading.builder();
To create a new Flow:
Flow flow = builder.flow("flow name")
  {
  // map and assembly
  }
or a new Cascade:
Cascade cascade = builder("cascade name")
  {
  // flows
  }
To pass properties to Hadoop and/or the internal planners:
  def props = ["mapred.jar": "some-custom.jar", "mapred.map.tasks": 20, "mapred.reduce.tasks": 20]
  def cascading = new Cascading(props)
  def builder = cascading.builder();

Examples

Nested assembly of 'Tap maps' and Pipe assemblies.

Here is an example using the condensed format:
def cascading = new Cascading()
def builder = cascading.builder();

Cascade cascade = builder("cut cascade")
  {
    flow("cut")
      {
        source(inputFileApache)

        cut(/\s+/, results: [1])
        group([0])

        sink(outputPath + "cut-sort", delete: true)
//        trap(outputPath + "cut-sort-trap", delete: true) // optional sink to capture bad data
      }
  }

cascade.complete()
Here is the same function in its full form:
def builder = new CascadingBuilder();

Cascade cascade = builder("cut cascade")
  {
   flow("cut flow")
     {
       map
       {
         source(name: "cut")
           {
             lfs(inputFileApache)
               {
                 text(["line"])
               }
           }

         sink(name: "cut")
           {
             hfs(outputPath + "cut-sort-full", delete: true)
               {
                 text()
               }
           }

//        trap(name: "cut")  // optional trap to capture bad data
//          {
//            hfs(outputPath + "cut-sort-full-trap", delete: true)
//              {
//                text()
//              }
//          }
       }

       assembly(name: "cut")
         {
           eachTuple(args: ["line"], results: [1])
             {
               regexSplitter(/\s+/)
             }

           group([0])

           everyGroup(args: [0], results: ALL)
             {
               count()
             }
         }
     }
  }

cascade.complete()
This last form is necessary in order to support complex paths within and between flows.

Additionally, within the eachTuple and everyGroup closure, user custom classes can be specified.

 eachTuple(args: ["f1"], results: ["f1", "g1"])
   {
     operation(new RegexParser(new Fields("g1"), ".*", [0, 1] as Integer[]));
   }

Language Reference

Core components:

Pipe assembly support:

Tap and Scheme support:

Group and Join support:

Functions and Filters (formal/alias):

All Functions may take the argument 'declared' to override their default declaredFields value.

Aggregators:

Assertions:

All Assertions require the argument 'assertionLevel' (or abbreviated as 'level').



Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.