News

Latest News & Updates

Driven Updates

We pushed out a new Driven update this week with a few new major enhancements:

  • Ability to list all current and past applications by configuring an API Key
  • Additional field and operation details when selecting a pipe in the application view

To get an API Key, you will need to create a Driven account:

See the Getting Started Guide for additional details on configuring your application with the key.

You only need an account if you want the API Key and the enhanced functionality and security it will provide going forward.

As always, if you find any issues or have questions, please drop us a note on the Driven Forums:

Cascading SDK 2.5

We are happy to announce that Cascading SDK 2.5 is now publicly available.

The Cascading SDK is a collection of tools, documentation, libraries, tutorials and example projects from the greater Cascading community.

http://cascading.org/sdk/

What’s New:
  • Hadoop 2 support for all tools (Lingual 1.1, Multitool 2.5, Scalding, Cascalog)
  • Driven Plugin Installer – The SDK now ships with an installer for the Driven plugin (Learn more at: http://cascading.io/driven/)
  • EMR Bootstrap support – For EMR users, you now have the option to install the Driven plugin on your EMR cluster via the SDK’s EMR bootstrap action

For more details:

https://github.com/Cascading/CascadingSDK#cascading-25-sdk

Lingual 1.1

We are happy to announce that Lingual 1.1 is now publicly available.

The purpose of Lingual is to ease migration of SQL based workloads onto Hadoop, and to simplify integration with Hadoop through standards based APIs. Lingual provides an ANSI SQL interface on Cascading for Apache Hadoop.

http://cascading.org/downloads/

What’s New:
  • Support for Hadoop 2
Key Features:
  • ANSI SQL – A mature implementation of ANSI/ISO standard SQL-99
  • JDBC Driver – The standards compliant JDBC driver enables integration with many existing BI tools and application servers
  • SQL Shell – An interactive SQL command interface for interacting with on Hadoop and executing SQL commands
  • Catalog – Command line tool that allows users to curate a catalog of database tables mapping to Hadoop files and resources
  • Data Provider API – Allows Lingual to query data simultaneously from multiple external systems with a single SQL statement

For more details:

http://cascading.org/lingual

Driven for Cascading

We’re happy to announce that a free public beta for Driven is available now.

Driven takes Cascading application development to the next level with management and monitoring capabilities for your Cascading apps:

  • Monitor your enterprise data apps built with Cascading (including Scalding, Cascalog, Lingual, Pattern and other DSLs)
  • Quickly identify failed and poorly performing apps
  • Visually see your data application execute

Driven Example - Load

Installation is easy, you’ll just need to add the free plug-in in your CLASSPATH to get going. Once installed, Driven begins collecting telemetry data from your running applications.

Get started with Driven:

http://docs.cascading.io/driven/1.0/getting-started/index.html

Also, we will be supporting Driven from our new Cascading Forums:

http://forums.cascading.io

Cascading Training

Due to popular demand, we’re happy to announce the availability of our Cascading Enterprise Developer Training course. For those of you that are just getting started with Cascading or those who want to take their Cascading chops to the next level, this course was designed for you.

This comprehensive course covers topics ranging from basic to advanced Cascading concepts and best practices, all reinforced by hands-on labs.

Learn more here at: http://www.concurrentinc.com/support/training/

Cascading 2.5

We are happy to announce that Cascading 2.5 is now publicly available for download.

With Cascading 2.5, Cascading applications now have full support and compatibility for both Apache Hadoop 1 and Hadoop 2, including YARN.

http://cascading.org/downloads/

What’s New in Cascading 2.5:

  • Support for Hadoop 2 and YARN
  • Added PartitionTap, and deprecated TemplateTap, which allows for pluggable data partitioning and can be used as either a sink or source
  • Updated Buffer to allow access to individual tuple streams for more complex join operations to be built out

For more details on new features and resolved issues see:
https://github.com/Cascading/cascading/blob/2.5/CHANGES.txt

Lingual 1.0

We are happy to announce that Lingual 1.0 is now publicly available for download.

Lingual provides an ANSI SQL interface on Cascading for Apache Hadoop.

The purpose of Lingual is to ease migration of SQL based workloads onto Hadoop, and to simplify integration with Hadoop through standards based APIs.

All the while benefiting from the robustness of Cascading.

http://cascading.org/downloads/

Key Features:
  • ANSI SQL – A mature implementation of ANSI/ISO standard SQL-99
  • JDBC Driver – The standards compliant JDBC driver enables integration with many existing BI tools and application servers
  • SQL Shell – An interactive SQL command interface for interacting with on Hadoop and executing SQL commands
  • Catalog – Command line tool that allows users to curate a catalog of database tables mapping to Hadoop files and resources
  • Data Provider API – Allows Lingual to query data simultaneously from multiple external systems with a single SQL statement

For more details:

http://cascading.org/lingual

Cascading 2.2

We are happy to announce that Cascading 2.2 is now publicly available for download.

http://www.cascading.org/downloads/

Cascading 2.2 includes a number of new features and updates:

What’s New in Cascading 2.2
- First class support for field level type information used by the planner
- Pluggable API for custom type coercion on custom types
- Support for blocks of Java “scripts” in addition to expressions
- AssemblyPlanner interface to allow for platform independent generative Flow planning
- Optional CombinedInputFormat support to improve handling with lots of small files
- Added FirstNBuffer and updated Unique to leverage it for faster performance
- MaxValue and MinValue Aggregators to allow for max/min on Comparable types, not just numbers

What’s Improved in Cascading 2.2
- Updated relevant operations to optionally honor SQL like semantics with regard to NULL values
- Updated SubAssembly to support setting local and step properties via the ConfigDef interface
- Updated FlowDef to accept classpath elements for dynamic inclusion of Hadoop jobs
- Updated GlobHfs to minimize resource and CPU usage

For more details on new features and resolved issues see:
https://github.com/Cascading/cascading/blob/2.2/CHANGES.txt

Lingual – An Introduction

Checkout our screencast showing off how Lingual works along with some brand new features.

We are ramping up to 1.0 in the near term, but as of today, Lingual has some very exciting features like support for Data Providers, the ability to dynamically add Tap/Scheme implementations to be used as tables in SQL queries from the JDBC drivers.

Cascading Lingual – An Introduction from Concurrent Inc. on Vimeo.

Continue reading

Pattern – an Open Source Project for Migrating Predictive Models from SAS, etc., onto Hadoop

Paco Nathan, the director of data science at Concurrent, Inc. speaks at the Hadoop Summit

View YouTube presentation here -
http://www.youtube.com/watch?v=RCTa2mlsFng&feature=youtu.be.

Published on Jul 9, 2013

Continue reading