Cascading Pattern is an extension to Cascading that provides various machine learning scoring algorithms and a utility for translating Predictive Model Markup Language (PMML) documents into applications on Apache Hadoop. Now you can deploy predictive models on to Hadoop or utilize the Cascading Pattern Java API to deploy your models or sophisticated ensembles.
Quickly Deploy Predictive Models to Hadoop
Hadoop is a cost-effective computation engine for running data-intensive models. Create your models in tools such as R, MicroStrategy and SAS, export those models in PMML, and then utilize Cascading Pattern to deploy them at scale on Hadoop.
Build Machine Scoring Applications
Leverage a simple Java API to start building applications for predictive model scoring on Hadoop. Also, utilize any of the other Cascading APIs and languages such as Cascading Lingual (SQL), Scalding (Scala), Cascalog (Clojure), etc. to extend the capabilities of your application.
- Quickly deploy machine scoring applications at scale on Apache Hadoop in as little as 4 lines of code
- Leverage existing intellectual property in predictive models, and investments in predictive modeling tooling and core competencies
- Accelerate application development and testing
- Unlock accessibility to Hadoop
Supported Predictive Model Types
|Random Forest Algorithm|
Have a model we don’t support? Request it here!
To add the Conjars repository:
<repository> <id>conjars.org</id> <url>http://conjars.org/repo</url> </repository>
To include the Pattern core library:
<dependency> <groupId>cascading</groupId> <artifactId>pattern-core</artifactId> <version>1.0.0-wip-45</version> </dependency>
To include the Pattern PMML library:
<dependency> <groupId>cascading</groupId> <artifactId>pattern-pmml</artifactId> <version>1.0.0-wip-45</version> </dependency>
To include the Pattern Hadoop library:
<dependency> <groupId>cascading</groupId> <artifactId>pattern-hadoop</artifactId> <version>1.0.0-wip-45</version> </dependency>
To include the Pattern local library:
<dependency> <groupId>cascading</groupId> <artifactId>pattern-local</artifactId> <version>1.0.0-wip-45</version> </dependency>
How does Cascading Pattern work with R?
R is great for creating models, but it does not run efficiently on Hadoop. However, R does support PMML, a standards-based XML language for building and deploying sophisticated ensembles. So, you can export your model from R into PMML, and pass the PMML to Pattern to translate your model into a Cascading application.
Additionally, R works great with the Casading Lingual’s JDBC driver. Thus, you can pull data out of Hadoop and into R by using Lingual.
With Cascading Pattern and Cascading Lingual, the appropriate connections now exist between modeling tools and Hadoop for you to deploy your models on to Hadoop and pull data off of Hadoop for testing.
What is PMML?
Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications.
PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. It allows users to develop models within one vendor’s application, and use other vendors’ applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.
What software supports PMML?
For a list of software and projects that support the PMML format, visit: http://www.dmg.org/products.html.
|Pattern – Migrating Predictive Models to Hadoop (41 min)|
|Introduction to Cascading – Pattern (7 min)|
|Quickly Migrate Predictive Models to Hadoop|
|Set up Cascading Pattern in Hortonworks|
|Using Cascading Pattern with R and CDH|
|Simple PMML Example|
|Robust PMML Example|
|Model API Usage Test|