cascading.tap.hadoop
Class ZipInputFormat

java.lang.Object
  extended by 
      extended by cascading.tap.hadoop.ZipInputFormat

public class ZipInputFormat
extends

Class ZipInputFormat ia an InputFormat for zip files. Each file within a zip file is broken into lines. Either line-feed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text.

If the underlying FileSystem is HDFS or FILE, each ZipEntry is returned as a unique split. Otherwise this input format returns false for isSplitable, and will subsequently iterate over each ZipEntry and treat all internal files as the 'same' file.


Constructor Summary
ZipInputFormat()
           
 
Method Summary
 void configure(JobConf conf)
           
  getRecordReader(InputSplit genericSplit, JobConf job, Reporter reporter)
           
 InputSplit[] getSplits(JobConf job, int numSplits)
          Splits files returned by listPathsInternal(JobConf).
protected  boolean isAllowSplits(FileSystem fs)
           
protected  boolean isSplitable(FileSystem fs, Path file)
          Return true only if the file is in ZIP format.
protected  Path[] listPathsInternal(JobConf jobConf)
           
protected  FileStatus[] listStatus(JobConf jobConf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ZipInputFormat

public ZipInputFormat()
Method Detail

configure

public void configure(JobConf conf)

isSplitable

protected boolean isSplitable(FileSystem fs,
                              Path file)
Return true only if the file is in ZIP format.

Parameters:
fs - the file system that the file is on
file - the path that represents this file
Returns:
is this file splitable?

listPathsInternal

protected Path[] listPathsInternal(JobConf jobConf)
                            throws IOException
Throws:
IOException

listStatus

protected FileStatus[] listStatus(JobConf jobConf)
                           throws IOException
Throws:
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
Splits files returned by listPathsInternal(JobConf). Each file is expected to be in zip format and each split corresponds to ZipEntry.

Parameters:
job - the JobConf data structure, see JobConf
numSplits - the number of splits required. Ignored here
Throws:
IOException - if input files are not in zip format

getRecordReader

public  getRecordReader(InputSplit genericSplit,
                             JobConf job,
                             Reporter reporter)
                      throws IOException
Throws:
IOException

isAllowSplits

protected boolean isAllowSplits(FileSystem fs)


Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.