Powered By

Here are a few of the many companies using Cascading in production:

Adknowledge

Adknowledge is an ad network which provides an online pay-per-click marketplace for high quality traffic across multiple channels of email, web and search engine inventory.
Adknowledge currently uses Cascading for ad-hoc queries (internal business intelligence) and to develop clickstream analytics, using a data warehouse based on HDFS.

BackType

BackType is a real-time, conversational search engine. We index and connect millions of conversations from blogs, social networks and other social media so you can find out what people are saying about the topics that interest you.
Read about how BackType uses Cascading on their tech blog. BackType engineers are the authors of Cascading-DBMigrate.

Bixolabs

Bixolabs is an elastic web mining platform that makes it easy to create web mining apps, so customers can focus on what they know best - using the data - without the challenges of building out a reliable, scalable web crawling and data processing workflow.
Bixolabs is built on top of Hadoop, Cascading & Bixo, and runs in EC2. This makes it a flexible, scalable, on-demand solution for companies processing web data for internal use, as well as companies building products based on web mining.

Delve Networks

Delve provides a complete online video solution to manage, publish, measure, and monetize high quality video content on the web. We power video for well known sites in the media, sports, health, finance and other verticals.
At Delve we use Cascading in conjunction with AWS services such as EC2, S3 and Elastic MapReduce to scale video analytics for our rapidly growing collection of usage data. We are planning on leveraging it further to build additional business intelligence applications.

Etsy

Etsy's mission is to enable people to make a living making things, and to reconnect makers with buyers.
Read about how Etsy leverages Cascading on their blog in Analyzing Etsy's data with Hadoop and Cascading. The Etsy engineers also maintain the Cascading.JRuby DSL.

Feeva

Feeva has created a digital bridge between fixed and mobile service providers and the digital marketing industry. This bridge solution enhances the performance of digital marketing campaigns while maintaining the highest standards of consumer privacy.

Feeva uses Hadoop, HBase and Cascading for two areas right now.

The first is for analytics, we build aggregates for an OLAP cube from detailed logs which show non-PII web activity from our partners. The aggregates are created across 10 different dimensions, and aggregated on an hourly, daily, weekly and monthly basis. There are 4 distinct assemblies in this process, which are combined with various source and sink taps to create up to 26 flow instances in one Cascade execution. One of the flows is a period-merge, which can be used , for example, to merge the most recent hour with a pre-existing day-to-date. In this way, we avoid having to process the entire day at one time. Week-to-Date and Month-to-Date work similarly. I think this is a strength of Cascading... it really enables and encourages reuse by providing a layer of abstraction on top of an actual map/reduce job. The final sink taps create TEXT files which are suitable for bulk loading directly into PostgreSQL. Incidentally, a lot of the code to construct this cascade is written in Groovy, which points out another strength... we can code in a convenient, expressive language like Groovy, but the steps (functions, filters, aggregators, etc.) that are used to construct the map/reduce job are coded in Java and can be optimized.

Our second process synthesizes subscriber data and third party data into our HBase database for use by other processes (e.g supporting analytics as described above, and generating cache files which are used for real time ad decisioning). This process is made up of 6 flows and updates 2 hbase tables. We use the HBaseTap (modified to allow for variable columns in one family) and also use the HBase APIs directly (for lookups and deletes).

FlightCaster

FlightCaster predicts flight delays. We use an advanced algorithm that scours data on every domestic flight for the past 10-years and matches it to real-time conditions.
Read about how FlightCaster predicts flight delays with Cascading on InfoQ, DataWrangling, SDTimes, and InformationWeek. Or listen to an interview on Cloud Cafe. FlightCaster engineers created and contribute to Cascading-Clojure.

Ning

Ning is the social platform for the world's interests and passions online. Ning offers an easy-to-use service that allows people to join and create Ning Networks.
Ning data analytics team uses Cascading for their ad-hoc log and data analysis.

OneSpot

OneSpot mines content from around the web to find the best possible content for specific communities of interest. They help content creators find a bigger audience and web publishers to become the one spot their readers need to go to find the best content.
OneSpot currently uses Cascading to generate arbitrary reports on data stored in HDFS. They are in the process of migrating their content scoring code to Cascading.

RapLeaf

Every day, people use Rapleaf to discover the information about themselves that is available on the internet. Businesses use Rapleaf's search service to better understand their customers, learn how their customers use the social web, and offer their customers new and enhanced services.
Recent blog posts by the RapLeaf engineering team: Engineering Rapleaf - Goodbye MapReduce, Hello Cascading and A new Cascading pipe - MultiGroupBy.

Razorfish

Razorfish, a digital advertising and marketing firm, segments users and customers based on the collection and analysis of non-personally identifiable data from browsing sessions.
Read about how Razorfish leverages Cascading and AWS in the Razorfish AWS Case Study.

StumbleUpon

StumbleUpon helps you discover and share great websites.
StumbleUpon uses Cascading to manage data stored in an Apache HBase cluster.

Veoh

Veoh is a revolutionary Internet TV service that gives viewers the power to easily discover, watch, and personalize their online viewing experience.
Veoh developers have been using Cascading since its initial public release in early 2008.

VideoEgg

VideoEgg is a new kind of rich media advertising network that guarantees brand engagement. Our network consists of over 100 million uniques across hundreds of leading sites, blogs, and gaming sites, as well as social and mobile applications.
We are currently using Cascading to process hundreds of gigabytes of daily logs before importing them into our Hive-based data warehouse. With Cascading, we can implement an otherwise complex set of map-reduce ETL tasks as a simple Cascade of reusable and easily extensible Flows.

Visible Technologies

Visible Technologies is a leading provider of online brand management solutions for companies and individuals in today's rapidly chaning new media environment. Whether it's building or managing reputations across popular search engines, or helping companies track and participate in influential consumer created content channels, we empower brands to do more online to build their businesses and bottom lines.
We're using Cascading to manage workflow between all of our algorithms. It abstracts your calculations, processing, and workflow. It's very, very nice because it saves quite a bit of time writing 'pipeline' code. --Bradford Stephens

If you would like your organization listed, drop us a note.