cascading.tap.hadoop
Class ZipInputFormat
java.lang.Object
cascading.tap.hadoop.ZipInputFormat
public class ZipInputFormat
- extends
Class ZipInputFormat ia an InputFormat for zip files. Each file within a zip file is broken
into lines. Either line-feed or carriage-return are used to signal end of
line. Keys are the position in the file, and values are the line of text.
If the underlying FileSystem is HDFS or FILE, each ZipEntry is returned
as a unique split. Otherwise this input format returns false for isSplitable, and will
subsequently iterate over each ZipEntry and treat all internal files as the 'same' file.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ZipInputFormat
public ZipInputFormat()
configure
public void configure(JobConf conf)
isSplitable
protected boolean isSplitable(FileSystem fs,
Path file)
- Return true only if the file is in ZIP format.
- Parameters:
fs - the file system that the file is onfile - the path that represents this file
- Returns:
- is this file splitable?
listPathsInternal
protected Path[] listPathsInternal(JobConf jobConf)
throws IOException
- Throws:
IOException
listStatus
protected FileStatus[] listStatus(JobConf jobConf)
throws IOException
- Throws:
IOException
getSplits
public InputSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
- Splits files returned by
listPathsInternal(JobConf). Each file is
expected to be in zip format and each split corresponds to
ZipEntry.
- Parameters:
job - the JobConf data structure, see JobConfnumSplits - the number of splits required. Ignored here
- Throws:
IOException - if input files are not in zip format
getRecordReader
public getRecordReader(InputSplit genericSplit,
JobConf job,
Reporter reporter)
throws IOException
- Throws:
IOException
isAllowSplits
protected boolean isAllowSplits(FileSystem fs)
Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.