Powered By
Here are a few of the many companies using Cascading in production:
Adknowledge
Adknowledge is an ad network which provides an online pay-per-click marketplace for high quality traffic across multiple channels of email, web and search engine inventory.
Adknowledge currently uses Cascading for ad-hoc queries (internal business intelligence) and to develop clickstream analytics, using a data warehouse based on HDFS.
Bixolabs
Bixolabs is an elastic web mining platform that makes it easy to create web mining apps, so customers can focus on what they know best - using the data - without the challenges of building out a reliable, scalable web crawling and data processing workflow.
Bixolabs is built on top of Hadoop, Cascading & Bixo, and runs in EC2. This makes it a flexible, scalable, on-demand solution for companies processing web data for internal use, as well as companies building products based on web mining.
Feeva
Feeva has created a digital bridge between fixed and mobile service providers and the digital marketing industry. This bridge solution enhances the performance of digital marketing campaigns while maintaining the highest standards of consumer privacy.
Feeva uses Hadoop, HBase and Cascading for two areas right now.
The first is for analytics, we build aggregates for an OLAP cube from detailed logs which show non-PII web activity from our partners. The aggregates are created across 10 different dimensions, and aggregated on an hourly, daily, weekly and monthly basis. There are 4 distinct assemblies in this process, which are combined with various source and sink taps to create up to 26 flow instances in one Cascade execution. One of the flows is a period-merge, which can be used , for example, to merge the most recent hour with a pre-existing day-to-date. In this way, we avoid having to process the entire day at one time. Week-to-Date and Month-to-Date work similarly. I think this is a strength of Cascading... it really enables and encourages reuse by providing a layer of abstraction on top of an actual map/reduce job. The final sink taps create TEXT files which are suitable for bulk loading directly into PostgreSQL. Incidentally, a lot of the code to construct this cascade is written in Groovy, which points out another strength... we can code in a convenient, expressive language like Groovy, but the steps (functions, filters, aggregators, etc.) that are used to construct the map/reduce job are coded in Java and can be optimized.
Our second process synthesizes subscriber data and third party data into our HBase database for use by other processes (e.g supporting analytics as described above, and generating cache files which are used for real time ad decisioning). This process is made up of 6 flows and updates 2 hbase tables. We use the HBaseTap (modified to allow for variable columns in one family) and also use the HBase APIs directly (for lookups and deletes).
Ning
Ning is the social platform for the world's interests and passions online. Ning offers an easy-to-use service that allows people to join and create Ning Networks.
Ning data analytics team uses Cascading for their ad-hoc log and data analysis.
OneSpot
OneSpot mines content from around the web to find the best possible content for specific communities of interest. They help content creators find a bigger audience and web publishers to become the one spot their readers need to go to find the best content.
OneSpot currently uses Cascading to generate arbitrary reports on data stored in HDFS. They are in the process of migrating their content scoring code to Cascading.
RapLeaf
Every day, people use Rapleaf to discover the information about themselves that is available on the internet. Businesses use Rapleaf's search service to better understand their customers, learn how their customers use the social web, and offer their customers new and enhanced services.
Recent blog posts by the RapLeaf engineering team: Engineering Rapleaf - Goodbye MapReduce, Hello Cascading and A new Cascading pipe - MultiGroupBy.
Veoh
Veoh is a revolutionary Internet TV service that gives viewers the power to easily discover, watch, and personalize their online viewing experience.
Veoh developers have been using Cascading since its initial public release in early 2008.
VideoEgg
VideoEgg is a new kind of rich media advertising network that guarantees brand engagement. Our network consists of over 100 million uniques across hundreds of leading sites, blogs, and gaming sites, as well as social and mobile applications.
We are currently using Cascading to process hundreds of gigabytes of daily logs before importing them into our Hive-based data warehouse. With Cascading, we can implement an otherwise complex set of map-reduce ETL tasks as a simple Cascade of reusable and easily extensible Flows.
Visible Technologies
Visible Technologies is a leading provider of online brand management solutions for companies and individuals in today's rapidly chaning new media environment. Whether it's building or managing reputations across popular search engines, or helping companies track and participate in influential consumer created content channels, we empower brands to do more online to build their businesses and bottom lines.
We're using Cascading to manage workflow between all of our algorithms. It abstracts your calculations, processing, and workflow. It's very, very nice because it saves quite a bit of time writing 'pipeline' code. --Bradford Stephens
If you would like your organization listed, drop us a note.