cascading.scheme
Class Scheme

java.lang.Object
  extended by cascading.scheme.Scheme
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
SequenceFile, TextLine

public abstract class Scheme
extends Object
implements Serializable

A Scheme defines what is stored in a Tap instance by declaring the Tuple field names, and alternately parsing or rendering the incoming or outgoing Tuple stream, respectively.

A Scheme defines the type of resource data will be sourced from or sinked to.

The given fieldNames only label the values in the Tuples as they are sourced. It does not necessarily filter the output since a given implemenation may choose to collapse values and ignore keys depending on the format.

Setting the numSinkParts value insures the output resource has only one part. In the case of MapReduce, it does this by setting the number of reducers to the given value. This may affect performance, so be cautioned.

See Also:
Serialized Form

Constructor Summary
protected Scheme()
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, Fields sinkFields)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, Fields sinkFields, int numSinkParts)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, int numSinkParts)
          Constructor Scheme creates a new Scheme instance.
 
Method Summary
 boolean equals(Object object)
           
 int getNumSinkParts()
          Method getNumSinkParts returns the numSinkParts of this Scheme object.
 Fields getSinkFields()
          Method getSinkFields returns the sinkFields of this Scheme object.
 Fields getSourceFields()
          Method getSourceFields returns the sourceFields of this Scheme object.
 int hashCode()
           
 void setNumSinkParts(int numSinkParts)
          Method setNumSinkParts sets the numSinkParts of this Scheme object.
 void setSinkFields(Fields sinkFields)
          Method setSinkFields sets the sinkFields of this Scheme object.
 void setSourceFields(Fields sourceFields)
          Method setSourceFields sets the sourceFields of this Scheme object.
abstract  void sink(TupleEntry tupleEntry, OutputCollector outputCollector)
          Method sink writes out the given Tuple instance to the outputCollector.
abstract  void sinkInit(Tap tap, JobConf conf)
          Method sinkInit initializes this instance as a sink.
abstract  Tuple source(Object key, Object value)
          Method source takes the given Hadoop key and value and returns a new Tuple instance.
abstract  void sourceInit(Tap tap, JobConf conf)
          Method sourceInit initializes this instance as a source.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Scheme

protected Scheme()
Constructor Scheme creates a new Scheme instance.


Scheme

protected Scheme(Fields sourceFields)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields

Scheme

protected Scheme(Fields sourceFields,
                 int numSinkParts)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
numSinkParts - of type int

Scheme

protected Scheme(Fields sourceFields,
                 Fields sinkFields)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields

Scheme

protected Scheme(Fields sourceFields,
                 Fields sinkFields,
                 int numSinkParts)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
numSinkParts - of type int
Method Detail

getSinkFields

public Fields getSinkFields()
Method getSinkFields returns the sinkFields of this Scheme object.

Returns:
the sinkFields (type Fields) of this Scheme object.

setSinkFields

public void setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object.

Parameters:
sinkFields - the sinkFields of this Scheme object.

getSourceFields

public Fields getSourceFields()
Method getSourceFields returns the sourceFields of this Scheme object.

Returns:
the sourceFields (type Fields) of this Scheme object.

setSourceFields

public void setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object.

Parameters:
sourceFields - the sourceFields of this Scheme object.

getNumSinkParts

public int getNumSinkParts()
Method getNumSinkParts returns the numSinkParts of this Scheme object.

Returns:
the numSinkParts (type int) of this Scheme object.

setNumSinkParts

public void setNumSinkParts(int numSinkParts)
Method setNumSinkParts sets the numSinkParts of this Scheme object.

Parameters:
numSinkParts - the numSinkParts of this Scheme object.

sourceInit

public abstract void sourceInit(Tap tap,
                                JobConf conf)
                         throws IOException
Method sourceInit initializes this instance as a source.

Parameters:
tap - of type Tap
conf - of type JobConf
Throws:
IOException - on initializatin failure

sinkInit

public abstract void sinkInit(Tap tap,
                              JobConf conf)
                       throws IOException
Method sinkInit initializes this instance as a sink.

Parameters:
tap - of type Tap
conf - of type JobConf
Throws:
IOException - on initialization failure

source

public abstract Tuple source(Object key,
                             Object value)
Method source takes the given Hadoop key and value and returns a new Tuple instance.

Parameters:
key - of type WritableComparable
value - of type Writable
Returns:
Tuple

sink

public abstract void sink(TupleEntry tupleEntry,
                          OutputCollector outputCollector)
                   throws IOException
Method sink writes out the given Tuple instance to the outputCollector.

Parameters:
tupleEntry -
outputCollector - of type OutputCollector @throws IOException when
Throws:
IOException

equals

public boolean equals(Object object)
Overrides:
equals in class Object

toString

public String toString()
Overrides:
toString in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.