cascading.flow
Class Flow

java.lang.Object
  extended by cascading.flow.Flow
All Implemented Interfaces:
Runnable
Direct Known Subclasses:
MapReduceFlow

public class Flow
extends Object
implements Runnable

A Pipe assembly is connected to the necessary number of Tap sinks and sources into a Flow. A Flow is then executed to push the incoming source data through the assembly into one or more sinks.

Note that Pipe assemblies can be reused in multiple Flow instances. They maintain no state regarding the Flow execution. Subsequently, Pipe assemblies can be given parameters through its calling Flow so they can be built in a generic fashion.

When a Flow is created, an optimized internal representation is created that is then executed within the cluster. Thus any overhead inherent to a give Pipe assembly will be removed once it's placed in context with the actual execution environment.

Properties

See Also:
FlowConnector

Nested Class Summary
static class Flow.FlowHolder
          Class FlowHolder is a helper class for wrapping Flow instances.
 
Field Summary
protected  boolean stopJobsOnExit
          Field stopJobsOnExit
 
Constructor Summary
protected Flow()
          Used for testing.
protected Flow(Map<Object,Object> properties, JobConf jobConf, String name,  pipeGraph,  stepGraph, Map<String,Tap> sources, Map<String,Tap> sinks, Map<String,Tap> traps)
           
protected Flow(Map<Object,Object> properties, JobConf jobConf, String name,  stepGraph, Map<String,Tap> sources, Map<String,Tap> sinks, Map<String,Tap> traps)
           
 
Method Summary
 void addListener(FlowListener flowListener)
          Method addListener registers the given flowListener with this instance.
 boolean areSinksStale()
          Method areSinksStale returns true if any of the sinks referenced are out of date in relation to the sources.
 boolean areSourcesNewer(long sinkModified)
          Method areSourcesNewer returns true if any source is newer than the given sinkModified date value.
 void complete()
          Method complete starts the current Flow instance if it has not be previously started, then block until completion.
 void deleteSinks()
          Method deleteSinks deletes all sinks.
 FlowStats getFlowStats()
          Method getFlowStats returns the flowStats of this Flow object.
 Flow.FlowHolder getHolder()
          Used to return a simple wrapper for use as an edge in a graph where there can only be one instance of every edge.
 JobConf getJobConf()
          Method getJobConf returns the jobConf of this Flow object.
 String getName()
          Method getName returns the name of this Flow object.
static boolean getPreserveTemporaryFiles(Map<Object,Object> properties)
          Returns property preserveTemporaryFiles.
 Tap getSink()
          Method getSink returns the first sink of this Flow object.
 long getSinkModified()
          Method getSinkModified returns the youngest modified date of any sink Tap managed by this Flow instance.
 Map<String,Tap> getSinks()
          Method getSinks returns the sinks of this Flow object.
 Map<String,Tap> getSources()
          Method getSources returns the sources of this Flow object.
 List<FlowStep> getSteps()
          Method getSteps returns the steps of this Flow object.
static boolean getStopJobsOnExit(Map<Object,Object> properties)
          Returns property stopJobsOnExit.
 Map<String,Tap> getTraps()
          Method getTraps returns the traps of this Flow object.
 boolean hasListeners()
          Method hasListeners returns true if FlowListener instances have been registered.
 boolean isPreserveTemporaryFiles()
          Method isPreserveTemporaryFiles returns true if temporary files will be cleaned when this Flow completes.
 boolean isSkipFlow()
          Method isSkipFlow returns true if the parent Cascade should skip this Flow instance.
 boolean isSkipIfSinkExists()
          Method isSkipIfSinkExists returns the skipIfSinkExists of this Flow object.
 boolean isStopJobsOnExit()
          Method isStopJobsOnExit returns the stopJobsOnExit of this Flow object.
 boolean jobsAreLocal()
          Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.
 TapIterator openSink()
          Method openSink opens the first sink Tap.
 TapIterator openSink(String name)
          Method openSink opens the named sink Tap.
 TapIterator openSource()
          Method openSource opens the first source Tap.
 TapIterator openSource(String name)
          Method openSource opens the named source Tap.
 TapIterator openTapForRead(Tap tap)
          Method openTapForRead return a TapIterator for the given Tap instance.
 TapCollector openTapForWrite(Tap tap)
          Method openTapForWrite returns a (@link TapCollector} for the given Tap instance.
 TapIterator openTrap()
          Method openTrap opens the first trap Tap.
 TapIterator openTrap(String name)
          Method openTrap opens the named trap Tap.
protected static void printElementGraph(String filename,  graph)
           
protected static void printElementGraph(Writer writer,  graph)
           
 boolean removeListener(FlowListener flowListener)
          Method removeListener removes the given flowListener from this instance.
 void run()
          Method run implements the Runnable run method.
protected  void setName(String name)
           
static void setPreserveTemporaryFiles(Map<Object,Object> properties, boolean preserveTemporaryFiles)
          Property preserveTemporaryFiles forces the Flow instance to keep any temporary intermediate data sets.
protected  void setSinks(Map<String,Tap> sinks)
           
 void setSkipIfSinkExists(boolean skipIfSinkExists)
          Method setSkipIfSinkExists sets the skipIfSinkExists of this Flow object.
protected  void setSources(Map<String,Tap> sources)
           
protected  void setStepGraph( stepGraph)
           
static void setStopJobsOnExit(Map<Object,Object> properties, boolean stopJobsOnExit)
          Propety stopJobsOnExit will tell the Flow to add a JVM shutdown hook that will kill all running processes if the underlying computing system supports it.
protected  void setTraps(Map<String,Tap> traps)
           
 void start()
          Method start begins the execution of this Flow instance.
 void start(JobConf jobConf)
          Method start begins the execution of this Flow instance.
 void stop()
          Method stop stops all running jobs, killing any currently executing.
 boolean tapPathExists(Tap tap)
          Method tapExists returns true if the resource represented by the given Tap instance exists.
 String toString()
           
 void writeDOT(String filename)
          Method writeDOT writes this Flow instance to the given filename as a DOT file for import into a graphics package.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

stopJobsOnExit

protected boolean stopJobsOnExit
Field stopJobsOnExit

Constructor Detail

Flow

protected Flow()
Used for testing.


Flow

protected Flow(Map<Object,Object> properties,
               JobConf jobConf,
               String name,
                pipeGraph,
                stepGraph,
               Map<String,Tap> sources,
               Map<String,Tap> sinks,
               Map<String,Tap> traps)

Flow

protected Flow(Map<Object,Object> properties,
               JobConf jobConf,
               String name,
                stepGraph,
               Map<String,Tap> sources,
               Map<String,Tap> sinks,
               Map<String,Tap> traps)
Method Detail

setPreserveTemporaryFiles

public static void setPreserveTemporaryFiles(Map<Object,Object> properties,
                                             boolean preserveTemporaryFiles)
Property preserveTemporaryFiles forces the Flow instance to keep any temporary intermediate data sets. Useful for debugging. Defaults to false.

Parameters:
properties - of type Map
preserveTemporaryFiles - of type boolean

getPreserveTemporaryFiles

public static boolean getPreserveTemporaryFiles(Map<Object,Object> properties)
Returns property preserveTemporaryFiles.

Parameters:
properties - of type Map
Returns:
a boolean

setStopJobsOnExit

public static void setStopJobsOnExit(Map<Object,Object> properties,
                                     boolean stopJobsOnExit)
Propety stopJobsOnExit will tell the Flow to add a JVM shutdown hook that will kill all running processes if the underlying computing system supports it. Defaults to true.

Parameters:
properties - of type Map
stopJobsOnExit - of type boolean

getStopJobsOnExit

public static boolean getStopJobsOnExit(Map<Object,Object> properties)
Returns property stopJobsOnExit.

Parameters:
properties - of type Map
Returns:
a boolean

getName

public String getName()
Method getName returns the name of this Flow object.

Returns:
the name (type String) of this Flow object.

setName

protected void setName(String name)

setSources

protected void setSources(Map<String,Tap> sources)

setSinks

protected void setSinks(Map<String,Tap> sinks)

setTraps

protected void setTraps(Map<String,Tap> traps)

setStepGraph

protected void setStepGraph( stepGraph)

getJobConf

public JobConf getJobConf()
Method getJobConf returns the jobConf of this Flow object.

Returns:
the jobConf (type JobConf) of this Flow object.

getFlowStats

public FlowStats getFlowStats()
Method getFlowStats returns the flowStats of this Flow object.

Returns:
the flowStats (type FlowStats) of this Flow object.

hasListeners

public boolean hasListeners()
Method hasListeners returns true if FlowListener instances have been registered.

Returns:
boolean

addListener

public void addListener(FlowListener flowListener)
Method addListener registers the given flowListener with this instance.

Parameters:
flowListener - of type FlowListener

removeListener

public boolean removeListener(FlowListener flowListener)
Method removeListener removes the given flowListener from this instance.

Parameters:
flowListener - of type FlowListener
Returns:
true if the listener was removed

getSources

public Map<String,Tap> getSources()
Method getSources returns the sources of this Flow object.

Returns:
the sources (type Map) of this Flow object.

getSinks

public Map<String,Tap> getSinks()
Method getSinks returns the sinks of this Flow object.

Returns:
the sinks (type Map) of this Flow object.

getTraps

public Map<String,Tap> getTraps()
Method getTraps returns the traps of this Flow object.

Returns:
the traps (type Map) of this Flow object.

getSink

public Tap getSink()
Method getSink returns the first sink of this Flow object.

Returns:
the sink (type Tap) of this Flow object.

isPreserveTemporaryFiles

public boolean isPreserveTemporaryFiles()
Method isPreserveTemporaryFiles returns true if temporary files will be cleaned when this Flow completes.

Returns:
the preserveTemporaryFiles (type boolean) of this Flow object.

isStopJobsOnExit

public boolean isStopJobsOnExit()
Method isStopJobsOnExit returns the stopJobsOnExit of this Flow object. Defaults to true.

Returns:
the stopJobsOnExit (type boolean) of this Flow object.

isSkipIfSinkExists

public boolean isSkipIfSinkExists()
Method isSkipIfSinkExists returns the skipIfSinkExists of this Flow object.

Returns:
the skipIfSinkExists (type boolean) of this Flow object.

setSkipIfSinkExists

public void setSkipIfSinkExists(boolean skipIfSinkExists)
Method setSkipIfSinkExists sets the skipIfSinkExists of this Flow object. Defaults to false. Set to true if this Flow instance should complete immediately if the Tap.pathExists(JobConf) returns true.

If Tap.isDeleteOnSinkInit() returns true, this Flow instance will execute after deleting the Tap resource.

Parameters:
skipIfSinkExists - the skipIfSinkExists of this Flow object.

isSkipFlow

public boolean isSkipFlow()
                   throws IOException
Method isSkipFlow returns true if the parent Cascade should skip this Flow instance. True is returned if isSkipIfSinkExists returns true and any of the sinks exist and are not Tap.isDeleteOnSinkInit(). Or is the sinks are newer than the sources.

Returns:
the skipFlow (type boolean) of this Flow object.
Throws:
IOException - when

areSinksStale

public boolean areSinksStale()
                      throws IOException
Method areSinksStale returns true if any of the sinks referenced are out of date in relation to the sources. Or if any sink method Tap.isDeleteOnSinkInit() returns true.

Returns:
boolean
Throws:
IOException - when

areSourcesNewer

public boolean areSourcesNewer(long sinkModified)
                        throws IOException
Method areSourcesNewer returns true if any source is newer than the given sinkModified date value.

Parameters:
sinkModified - of type long
Returns:
boolean
Throws:
IOException - when

getSinkModified

public long getSinkModified()
                     throws IOException
Method getSinkModified returns the youngest modified date of any sink Tap managed by this Flow instance.

If zero (0) is returned, atleast one of the sink resources does not exist. If minus one (-1) is returned, atleast one of the sinks are marked for delete (returns true).

Returns:
the sinkModified (type long) of this Flow object.
Throws:
IOException - when

getSteps

public List<FlowStep> getSteps()
Method getSteps returns the steps of this Flow object. They will be in topological order.

Returns:
the steps (type List) of this Flow object.

start

public void start(JobConf jobConf)
Method start begins the execution of this Flow instance. It will return immediately. Use the method complete() to block until this Flow completes. The given JobConf instance will replace any previous value.

Parameters:
jobConf - of type JobConf

start

public void start()
Method start begins the execution of this Flow instance. It will return immediately. Use the method complete() to block until this Flow completes.


stop

public void stop()
Method stop stops all running jobs, killing any currently executing.


complete

public void complete()
Method complete starts the current Flow instance if it has not be previously started, then block until completion.


openSource

public TapIterator openSource()
                       throws IOException
Method openSource opens the first source Tap.

Returns:
TapIterator
Throws:
IOException - when

openSource

public TapIterator openSource(String name)
                       throws IOException
Method openSource opens the named source Tap.

Parameters:
name - of type String
Returns:
TapIterator
Throws:
IOException - when

openSink

public TapIterator openSink()
                     throws IOException
Method openSink opens the first sink Tap.

Returns:
TapIterator
Throws:
IOException - when

openSink

public TapIterator openSink(String name)
                     throws IOException
Method openSink opens the named sink Tap.

Parameters:
name - of type String
Returns:
TapIterator
Throws:
IOException - when

openTrap

public TapIterator openTrap()
                     throws IOException
Method openTrap opens the first trap Tap.

Returns:
TapIterator
Throws:
IOException - when

openTrap

public TapIterator openTrap(String name)
                     throws IOException
Method openTrap opens the named trap Tap.

Parameters:
name - of type String
Returns:
TapIterator
Throws:
IOException - when

deleteSinks

public void deleteSinks()
                 throws IOException
Method deleteSinks deletes all sinks. Typically used by a Cascade before executing the flow if the sinks are stale. Use with caution.

Throws:
IOException - when

tapPathExists

public boolean tapPathExists(Tap tap)
                      throws IOException
Method tapExists returns true if the resource represented by the given Tap instance exists.

Parameters:
tap - of type Tap
Returns:
boolean
Throws:
IOException - when

openTapForRead

public TapIterator openTapForRead(Tap tap)
                           throws IOException
Method openTapForRead return a TapIterator for the given Tap instance.

Parameters:
tap - of type Tap
Returns:
TapIterator
Throws:
IOException - when

openTapForWrite

public TapCollector openTapForWrite(Tap tap)
                             throws IOException
Method openTapForWrite returns a (@link TapCollector} for the given Tap instance.

Parameters:
tap - of type Tap
Returns:
TapCollector
Throws:
IOException - when

jobsAreLocal

public boolean jobsAreLocal()
Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.

Returns:
boolean

run

public void run()
Method run implements the Runnable run method.

Specified by:
run in interface Runnable

toString

public String toString()
Overrides:
toString in class Object

writeDOT

public void writeDOT(String filename)
Method writeDOT writes this Flow instance to the given filename as a DOT file for import into a graphics package.

Parameters:
filename - of type String

getHolder

public Flow.FlowHolder getHolder()
Used to return a simple wrapper for use as an edge in a graph where there can only be one instance of every edge.

Returns:
FlowHolder

printElementGraph

protected static void printElementGraph(String filename,
                                         graph)

printElementGraph

protected static void printElementGraph(Writer writer,
                                         graph)


Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.