cascading.tap
Class Tap

java.lang.Object
  extended by cascading.tap.Tap
All Implemented Interfaces:
FlowElement, Serializable
Direct Known Subclasses:
Hfs, SourceTap

public abstract class Tap
extends Object
implements FlowElement, Serializable

A Tap represents the physical data source or sink in a connected Flow.

That is, a source Tap is the head end of a connected Pipe and Tuple stream, and a sink Tap is the tail end. Kinds of Tap types are used to manage files from a local disk, distributed disk, remote storage like Amazon S3, or via FTP. It simply abstracts out the complexity of connecting to these types of data sources.

A Tap takes a Scheme instance, which is used to identify the type of resource (text file, binary file, etc). A Tap is responsible for how the resource is reached.

A Tap is not given an explicit name by design. This is so a given Tap instance can be re-used in different Flows that may expect a source or sink by a different logical name, but are the same physical resource. If a tap had a name other than its path, which would be used for the tap identity? If the name, then two Tap instances with different names but the same path could interfere with one another.

See Also:
Serialized Form

Constructor Summary
protected Tap()
           
protected Tap(Scheme scheme)
           
 
Method Summary
abstract  boolean containsFile(JobConf conf, String currentFile)
          Method containsFile indicates whether the tap contains a given file.
abstract  boolean deletePath(JobConf conf)
          Method deletePath deletes the resource represented by this instance.
 boolean equals(Object object)
           
abstract  Path getPath()
          Method getPath returns the Hadoop path to the resource represented by this Tap instance.
abstract  long getPathModified(JobConf conf)
          Method getPathModified returns the date this resource was last modified.
 Path getQualifiedPath(JobConf conf)
          Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path.
 Scheme getScheme()
          Method getScheme returns the scheme of this Tap object.
 Fields getSinkFields()
          Method getSinkFields returns the sinkFields of this Tap object.
 Fields getSourceFields()
          Method getSourceFields returns the sourceFields of this Tap object.
 int hashCode()
           
 boolean isDeleteOnSinkInit()
          Method isDeleteOnSinkInit indicates whether the resource represented by this instance should be deleted if it already exists when the tap is initialized.
 boolean isSink()
          Method isSink returns true if this Tap instance can be used as a sink.
 boolean isSource()
          Method isSource returns true if this Tap instance can be used as a source.
 boolean isUseTapCollector()
          Method isUseTapCollector returns true if this instances TapCollector should be used to sink values.
abstract  boolean makeDirs(JobConf conf)
          Method makeDirs makes all the directories this Tap instance represents.
 TapIterator openForRead(JobConf conf)
          Method openForRead opens the resource represented by this Tap instance.
 TapCollector openForWrite(JobConf conf)
          Method openForWrite opens the resource represented by this Tap instance.
 Scope outgoingScopeFor(Set<Scope> incomingScopes)
          Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement.
abstract  boolean pathExists(JobConf conf)
          Method pathExists return true if the path represented by this instance exists.
 Fields resolveFields(Scope scope)
          Method resolveFields returns the actual field names represented by the given Scope.
 Fields resolveIncomingOperationFields(Scope incomingScope)
          Method resolveIncomingOperationFields resolves the incoming scopes to the actual incoming operation field names.
protected  void setScheme(Scheme scheme)
           
 void setUseTapCollector(boolean useTapCollector)
          Method setUseTapCollector should be set to true if this instances TapCollector should be used to sink values.
 void sink(Fields fields, Tuple tuple, OutputCollector outputCollector)
          Method sink emits the sink value(s) to the OutputCollector
 void sinkInit(JobConf conf)
          Method sinkInit initializes this instance as a sink.
 Tuple source(Object key, Object value)
          Method source returns the source value as an instance of Tuple
 void sourceInit(JobConf conf)
          Method sourceInit initializes this instance as a source.
static Tap[] taps(Tap... taps)
          Convenience function to make an array of Tap instances.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tap

protected Tap()

Tap

protected Tap(Scheme scheme)
Method Detail

taps

public static Tap[] taps(Tap... taps)
Convenience function to make an array of Tap instances.

Parameters:
taps - of type Tap
Returns:
Tap array

setScheme

protected void setScheme(Scheme scheme)

getScheme

public Scheme getScheme()
Method getScheme returns the scheme of this Tap object.

Returns:
the scheme (type Scheme) of this Tap object.

isUseTapCollector

public boolean isUseTapCollector()
Method isUseTapCollector returns true if this instances TapCollector should be used to sink values.

Returns:
the writeDirect (type boolean) of this Tap object.

setUseTapCollector

public void setUseTapCollector(boolean useTapCollector)
Method setUseTapCollector should be set to true if this instances TapCollector should be used to sink values.

Parameters:
useTapCollector - the writeDirect of this Tap object.

sourceInit

public void sourceInit(JobConf conf)
                throws IOException
Method sourceInit initializes this instance as a source.

Parameters:
conf - of type JobConf
Throws:
IOException - on resource initialization failure.

sinkInit

public void sinkInit(JobConf conf)
              throws IOException
Method sinkInit initializes this instance as a sink.

Parameters:
conf - of type JobConf
Throws:
IOException - on resource initialization failure.

getPath

public abstract Path getPath()
Method getPath returns the Hadoop path to the resource represented by this Tap instance.

Returns:
Path

containsFile

public abstract boolean containsFile(JobConf conf,
                                     String currentFile)
Method containsFile indicates whether the tap contains a given file.

Parameters:
conf - of type JobConf
currentFile - of type String
Returns:
boolean

getSourceFields

public Fields getSourceFields()
Method getSourceFields returns the sourceFields of this Tap object.

Returns:
the sourceFields (type Fields) of this Tap object.

getSinkFields

public Fields getSinkFields()
Method getSinkFields returns the sinkFields of this Tap object.

Returns:
the sinkFields (type Fields) of this Tap object.

openForRead

public TapIterator openForRead(JobConf conf)
                        throws IOException
Method openForRead opens the resource represented by this Tap instance.

Parameters:
conf - of type JobConf
Returns:
TapIterator
Throws:
IOException - when the resource cannot be opened

openForWrite

public TapCollector openForWrite(JobConf conf)
                          throws IOException
Method openForWrite opens the resource represented by this Tap instance.

Parameters:
conf - of type JobConf
Returns:
TapCollector
Throws:
IOException - when

source

public Tuple source(Object key,
                    Object value)
Method source returns the source value as an instance of Tuple

Parameters:
key - of type WritableComparable
value - of type Writable
Returns:
Tuple

sink

public void sink(Fields fields,
                 Tuple tuple,
                 OutputCollector outputCollector)
          throws IOException
Method sink emits the sink value(s) to the OutputCollector

Parameters:
fields - of type Fields
tuple - of type Tuple
outputCollector - of type OutputCollector
Throws:
IOException - when the resource cannot be written to

outgoingScopeFor

public Scope outgoingScopeFor(Set<Scope> incomingScopes)
Description copied from interface: FlowElement
Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement.

Specified by:
outgoingScopeFor in interface FlowElement
Parameters:
incomingScopes - of type Set
Returns:
Scope
See Also:
FlowElement#outgoingScopeFor(Set)

resolveIncomingOperationFields

public Fields resolveIncomingOperationFields(Scope incomingScope)
Description copied from interface: FlowElement
Method resolveIncomingOperationFields resolves the incoming scopes to the actual incoming operation field names.

Specified by:
resolveIncomingOperationFields in interface FlowElement
Parameters:
incomingScope - of type Scope
Returns:
Fields
See Also:
FlowElement.resolveIncomingOperationFields(Scope)

resolveFields

public Fields resolveFields(Scope scope)
Description copied from interface: FlowElement
Method resolveFields returns the actual field names represented by the given Scope. The scope may be incoming or outgoing in relation to this FlowElement instance.

Specified by:
resolveFields in interface FlowElement
Parameters:
scope - of type Scope
Returns:
Fields
See Also:
FlowElement.resolveFields(Scope)

getQualifiedPath

public Path getQualifiedPath(JobConf conf)
                      throws IOException
Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path.

Parameters:
conf - of type JobConf
Returns:
Path
Throws:
IOException - when

makeDirs

public abstract boolean makeDirs(JobConf conf)
                          throws IOException
Method makeDirs makes all the directories this Tap instance represents.

Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when there is an error making directories

deletePath

public abstract boolean deletePath(JobConf conf)
                            throws IOException
Method deletePath deletes the resource represented by this instance.

Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when the resource cannot be deleted

pathExists

public abstract boolean pathExists(JobConf conf)
                            throws IOException
Method pathExists return true if the path represented by this instance exists.

Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when the status cannot be determined

getPathModified

public abstract long getPathModified(JobConf conf)
                              throws IOException
Method getPathModified returns the date this resource was last modified.

Parameters:
conf - of type JobConf
Returns:
long
Throws:
IOException - when the modified date cannot be determined

isDeleteOnSinkInit

public boolean isDeleteOnSinkInit()
Method isDeleteOnSinkInit indicates whether the resource represented by this instance should be deleted if it already exists when the tap is initialized.

Returns:
boolean

isSink

public boolean isSink()
Method isSink returns true if this Tap instance can be used as a sink.

Returns:
the sink (type boolean) of this Tap object.

isSource

public boolean isSource()
Method isSource returns true if this Tap instance can be used as a source.

Returns:
the source (type boolean) of this Tap object.

equals

public boolean equals(Object object)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.