Cascading Pattern is an extension to Cascading that provides various machine learning scoring algorithms and a utility for translating Predictive Model Markup Language (PMML) documents into applications on Apache Hadoop. Now you can deploy predictive models on to Hadoop or utilize the Cascading Pattern Java API to deploy your models or sophisticated ensembles.

  • Quickly Deploy Predictive
    Models to Hadoop
    Hadoop is a cost-effective computation engine for running data-intensive models. Create your models in tools such as R, MicroStrategy and SAS, export those models in PMML, and then utilize Cascading Pattern to deploy them at scale on Hadoop.
  • Build Machine Scoring Applications
    Leverage a simple Java API to start building applications for predictive model scoring on Hadoop. Also, utilize any of the other Cascading APIs and languages such as Cascading Lingual (SQL), Scalding (Scala), Cascalog (Clojure), etc. to extend the capabilities of your application.

Pattern Benefits

  • Quickly deploy machine scoring applications at scale on Apache Hadoop in as little as 4 lines of code
  • Leverage existing intellectual property in predictive models, and investments in predictive modeling tooling and core competencies
  • Accelerate application development and testing
  • Unlock accessibility to Hadoop


  • Hierarchical Clustering
  • K-Means Clustering
  • Linear Regression
  • Logistic Regression
  • Random Forest Algorithm

Using Maven

Pattern is currently under active development and available as source in the Pattern project or Maven artifacts on Conjars.

To add the Conjars repository:


To include the Pattern core library:


To include the Pattern PMML library:


To include the Pattern Hadoop library:


To include the Pattern local library:




Pattern integrates Cascading with the PMML format. Because Pattern is built on top of Cascading, any Pattern based code will function with your other Cascading applications and flows.


Pattern applications will work with Driven. You can build your applications with Pattern and visualize them in Driven, just like any other Cascading application.


  • How does Cascading Pattern work with R?

    R is great for creating models, but it does not run efficiently on Hadoop. However, R does support PMML, a standards-based XML language for building and deploying sophisticated ensembles. So, you can export your model from R into PMML, and pass the PMML to Pattern to translate your model into a Cascading application.

    Additionally, R works great with the Casading Lingual’s JDBC driver. Thus, you can pull data out of Hadoop and into R by using Lingual.

    With Cascading Pattern and Cascading Lingual, the appropriate connections now exist between modeling tools and Hadoop for you to deploy your models on to Hadoop and pull data off of Hadoop for testing.

  • What is PMML?

    Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications.

    PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. It allows users to develop models within one vendor’s application, and use other vendors’ applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.

    Source: DMG.ORG
  • What software supports PMML?

    For a list of software and projects that support the PMML format, visit: DMG.ORG.