cascading.flow
Class MultiMapReducePlanner

java.lang.Object
  extended by cascading.flow.FlowPlanner
      extended by cascading.flow.MultiMapReducePlanner

public class MultiMapReducePlanner
extends FlowPlanner

Class MultiMapReducePlanner is the core Hadoop MapReduce planner.

Notes:

Custom JobConf properties
A custo JobConf instance can be passed to this planner by calling #setJobConf(java.util.Map, org.apache.hadoop.mapred.JobConf) on a map properties object before constructing a new FlowConnector.

A better practice would be to set Hadoop properties directly on the map properties object handed to the FlowConnector. All values in the map will be passed to a new default JobConf instance to be used as defaults for all resulting Flow instances.

For example, properties.set("mapred.child.java.opts","-Xmx512m"); would convince Hadoop to spawn all child jvms with a heap of 512MB.

Heterogeneous source Tap instances
Currently Hadoop cannot have but one InputFormat per Mapper, but Cascading allows for any types of Taps to be used as sinks in a given Flow.

To overcome this issue, this planner will insert temporary Tap instances immediately before a merge or join Group (GroupBy or CoGroup) if the source Taps do not share the same Scheme class. By default temp Taps use the SequenceFile Scheme. So if the source Taps are custom or use TextLine, a few extra jobs can leak into a given Flow.

To overcome this, in turn, an intermediateSchemeClass must be passed from the FlowConnctor to the planner. This class will be instantiated for every temp Tap instance. The intention is that the given intermedeiateSchemeClass match all the source Tap schemes.

Properties


Field Summary
 
Fields inherited from class cascading.flow.FlowPlanner
assertionLevel, properties
 
Constructor Summary
protected MultiMapReducePlanner(Map<Object,Object> properties)
          Constructor MultiMapReducePlanner creates a new MultiMapReducePlanner instance.
 
Method Summary
 Flow buildFlow(String name, Pipe[] pipes, Map<String,Tap> sources, Map<String,Tap> sinks, Map<String,Tap> traps)
          Method buildFlow renders the actual Flow instance.
static JobConf getJobConf(Map<Object,Object> properties)
          Method getJobConf returns a stored JobConf instance, if any.
static boolean getNormalizeHeterogeneousSources(Map<Object,Object> properties)
          Method getNormalizeHeterogeneousSources returns if this planner will normalize heterogeneous input sources.
static void setJobConf(Map<Object,Object> properties, JobConf jobConf)
          Method setJobConf adds the given JobConf object to the given properties object.
static void setNormalizeHeterogeneousSources(Map<Object,Object> properties, boolean doNormalize)
          Method setNormalizeHeterogeneousSources adds the given doNormalize boolean to the given properites object.
 
Methods inherited from class cascading.flow.FlowPlanner
createElementGraph, failOnLoneGroupAssertion, failOnMissingGroup, failOnMisusedBuffer, verifyAssembly, verifyPipeAssemblyEndPoints, verifyTaps, verifyTraps
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiMapReducePlanner

protected MultiMapReducePlanner(Map<Object,Object> properties)
Constructor MultiMapReducePlanner creates a new MultiMapReducePlanner instance.

Parameters:
properties - of type Map
Method Detail

setJobConf

public static void setJobConf(Map<Object,Object> properties,
                              JobConf jobConf)
Method setJobConf adds the given JobConf object to the given properties object. Use this method to pass custom default Hadoop JobConf properties to Hadoop.

Parameters:
properties - of type Map
jobConf - of type JobConf

getJobConf

public static JobConf getJobConf(Map<Object,Object> properties)
Method getJobConf returns a stored JobConf instance, if any.

Parameters:
properties - of type Map
Returns:
a JobConf instance

setNormalizeHeterogeneousSources

public static void setNormalizeHeterogeneousSources(Map<Object,Object> properties,
                                                    boolean doNormalize)
Method setNormalizeHeterogeneousSources adds the given doNormalize boolean to the given properites object. Use this method if additional jobs should be planned in to handle incompatible InputFormat classes.

Normalization is off by default.

Parameters:
properties - of type Map
doNormalize - of type boolean

getNormalizeHeterogeneousSources

public static boolean getNormalizeHeterogeneousSources(Map<Object,Object> properties)
Method getNormalizeHeterogeneousSources returns if this planner will normalize heterogeneous input sources.

Parameters:
properties - of type Map
Returns:
a boolean

buildFlow

public Flow buildFlow(String name,
                      Pipe[] pipes,
                      Map<String,Tap> sources,
                      Map<String,Tap> sinks,
                      Map<String,Tap> traps)
Method buildFlow renders the actual Flow instance.

Parameters:
name - of type String
pipes - of type Pipe[]
sources - of type Map
sinks - of type Map
traps - of type Map
Returns:
Flow


Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.