LinkLog: Streaming Data, Distributed Execution Engines

It is rare that in three days time you come across four references to similar technologies. That is what happened to me a couple of days ago.

  1. There was a reference to Hadoop on Twitter. I almost forgot about Hadoop, the open source equivalent of Map/Reduce.
  2. I was watching a rather unusual Google Tech Talk the other day. It was unusual, because a person from Microsoft Research was talking about Dryad, their distributed execution engine, at Google.
  3. One of the participants asked a question whether the speaker can compare Dryad to IBM’s Stream Processing Core. So I had to look it up.
  4. Following a few links from the IBM article, I found SPADE, a declarative language for handling streaming data. I have always been fascinated by domain specific languages to solve special problems, especially with data. You learn a lot by just understanding the high level concepts.

So here they are. A set of related technologies with some overlap.

Google Map/Reduce

Apache Hadoop

Microsoft Research’s Dryad

IBM Streaming Processing Core

IBM SPADE – Stream Processing Application Declarative Engine

5 thoughts on “LinkLog: Streaming Data, Distributed Execution Engines

  1. Isn’t Hadoop an opensource implementation of Map/Reduce? I mean, Map/Reduce is an ancient LISP concept which Google implemented on a massively Parallel Scale, which was then implemented independently as Apache Hadoop. Right?

    Just some nitpicking:D

    1. That is correct. Several companies contributed to Hadoop project (with a lot of contributions from Yahoo). Just a couple of days ago, Amazon announced a cloud service for Hadoop.

      It may be an ancient concept but building it and making it scalable is the real challenge.

Comments are closed.