|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||
See:
Description
| Class Summary | |
|---|---|
| CascadingBuilder | CascadingBuilder is a Groovy 'builder' extension. |
Provides for Groovy language scripting support.
cascading.groovy.Cascading object.
def cascading = new Cascading() def builder = cascading.builder();To create a new
Flow:
Flow flow = builder.flow("flow name")
{
// map and assembly
}
or a new Cascade:
Cascade cascade = builder("cascade name")
{
// flows
}
To pass properties to Hadoop and/or the internal planners:
def props = ["mapred.jar": "some-custom.jar", "mapred.map.tasks": 20, "mapred.reduce.tasks": 20] def cascading = new Cascading(props) def builder = cascading.builder();
Nested assembly of 'Tap maps' and Pipe assemblies.
def cascading = new Cascading()
def builder = cascading.builder();
Cascade cascade = builder("cut cascade")
{
flow("cut")
{
source(inputFileApache)
cut(/\s+/, results: [1])
group([0])
sink(outputPath + "cut-sort", delete: true)
// trap(outputPath + "cut-sort-trap", delete: true) // optional sink to capture bad data
}
}
cascade.complete()
Here is the same function in its full form:
def builder = new CascadingBuilder();
Cascade cascade = builder("cut cascade")
{
flow("cut flow")
{
map
{
source(name: "cut")
{
lfs(inputFileApache)
{
text(["line"])
}
}
sink(name: "cut")
{
hfs(outputPath + "cut-sort-full", delete: true)
{
text()
}
}
// trap(name: "cut") // optional trap to capture bad data
// {
// hfs(outputPath + "cut-sort-full-trap", delete: true)
// {
// text()
// }
// }
}
assembly(name: "cut")
{
eachTuple(args: ["line"], results: [1])
{
regexSplitter(/\s+/)
}
group([0])
everyGroup(args: [0], results: ALL)
{
count()
}
}
}
}
cascade.complete()
This last form is necessary in order to support complex paths within and between flows.
Additionally, within the eachTuple and everyGroup closure, user custom classes can be specified.
eachTuple(args: ["f1"], results: ["f1", "g1"])
{
operation(new RegexParser(new Fields("g1"), ".*", [0, 1] as Integer[]));
}
Cascade. Expects 'name' and optionally an
AssertionLevel 'level'.
Flow. Expects 'name' and optionally an
AssertionLevel 'assertionLevel' (or just 'level').
Each Operator. Accepts nested
Function or cascading.operatoin.Filter Operations. Expects 'arguments' ('args')
and 'results' ('res'), where the values are arrays. Optionally 'argumentFields' and 'resultFields' may be given,
which are expected to be Fields instances.
Every Operator. Accepts nested Aggregator Operations. Expects same arguments as eachTuple.
Operation classes to be included in theassembly.
Tap. Expects 'name'
and optionally child arguments.
Hfs/Lfs
Tap. Expects 'path' and optionaly 'delete' if resource should be deleted on exec.
TextLine scheme, with default source field 'line'.
Optionally accepts a 'fields' argument.
SequenceFile scheme. Expects a 'fields' argument.
Tap classes.
Expects a 'name' argument.
GroupBy. Accepts 'groupBy' and
'sortBy' fields.
CoGroup. Accepts 'groupBy' and 'declared' fields.Debug Operation that simply prints out each Tuple
to stdout.
Identity. Passes incoming arguments as results.Identity. Coerces incoming arguments to the given
types in the 'types' argument.
RegexParser. Expects regex 'pattern'
and an int array of regex 'groups'
RegexReplace. Expects a regex 'pattern', 'replacement' and optionally a
boolean 'replaceAll'.
RegexFilter.
Expects regex 'pattern'.
RegexSplitter.
Expects regex 'pattern'
RegexSplitGenerator. Expects
regex 'pattern'
DateFormatter. Expects a SimpleDateFormat 'format'.
DateParser. Expects a SimpleDateFormat 'format'.
FieldFormatter. Expects a Formatter 'format'.
FieldJoiner. Expects a value
'delimiter' string.
Sum Aggregator.Count Aggregator.First Aggregator.Last Aggregator.Min Aggregator.Max Aggregator.Average Aggregator.AssertNull Assertion.AssertNotNull Assertion.
AssertSizeEquals Assertion.
AssertSizeLessThan
Assertion.
AssertSizeMoreThan
Assertion.
AssertMatches Assertion.
Expects 'pattern' and optionally 'negateMatch'.
AssertMatchesAll Assertion.
Expects 'pattern' and optionally 'negateMatch'.
AssertEquals Assertion.AssertEqualsAll Assertion.AssertExpression Assertion.
Expects a java expression string named 'expression' and an array of types for the listed arguments named 'types'.
|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||