Does the Full-featured DBMS Scale to Web Scale?

I am a little attached to RDBMS and SQL since we worked on an engine called Integra SQL, for a decade starting in 1985. Declarative queries fascinate me. In fact declarative anything fascinates me since I think it is the only we can push complexity under the hood and have application developers deal with higher levels of abstractions.

I have been watching the NoSQL space for a while. I really don’t like that name. I have watched ORMs (object relational mappers), columnar databases, Entity stores (just another name?), document databases, Open Linked Data, RDF stores and the whole galore. There is a common thread. Large data sets, web scale computing, distributed data are some common themes.

So when I saw this link for the panel discussions on VLDB, I thought it deserved a read. I am not sure why a dated article fell into my infostream. Here is an interesting snippet.

Does the full-featured DBMS scale to web scale? Microsoft says the Azure version of SQL server does. Yahoo says they want no SQL but Hadoop and PNUTS.Twitter, Facebook, and other web names got their own discussion. Why do they not go to serious DBMS vendors for their data but make their own, like Facebook with Hive?

Who can divine the mind of the web developer? What makes them go to memcached, manually sharded MySQL, and MapReduce, walking away from the 40 years of technology invested in declarative query and ACID?

A few more interesting quotes from this page:

The appeal exerted by the diverse language/paradigm -isms on their followers seems to be based on hitting a simplification of reality that coincides with a problem in the air. MapReduce is an example of this. PHP is another. A quick fix for a present need: Scripting web servers (PHP) or processing tons of files (MapReduce).

query languages that were ever universally adopted were declarative, i.e., keyword search and SQL

It is an interesting space to watch. If this movement is the real thing, it may change a lot of the way we build data driven apps in the future.

Meta:

I found this article while I was doing some testing of InfoPro, a product we have been working on a for a few months.

A few more things you may want to lookup – couchdb, cassandra, mongodb

RDBMS: Tired Software?

Michael Stonebraker calls RDBMs “Tired Software”. Stonebraker is a well known guru in the DBMS community. As the architect of Berkeley Ingres, Postgres, Illustra and Streambase, he has been constantly innovating in the database space.

So when he speaks, a lot of us listen. If you are a database developer or designer, this article on the future of databases may be worth a read. A few snippets:

If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of. What follows are a few examples.

  • In the data warehouse market, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason….
  • In the online transaction processing (OLTP) market, a lightweight main memory DBMS beats a row store by a factor of 50.
  • In the science DBMS market, users have never liked relational DBMSs and want a non-relational model and query facility.
  • If you are storing Resource Description Framework (RDF) data, which is popular in the bio community and elsewhere, then “Scalable Semantic Web Data Management Using Vertical Partitioning” points out that column stores are very good at certain RDF workloads
  • Text applications have never used relational DBMSs.
  • Even in XML specialized engines beat conventional RDBMS

Stonebraker goes on to explain why RDBMs technologies show signs of age and describes several possible alternatives.

Meta:

When we first built a relational engine in mid 80s the only resource we had was C.J.Date’s book. After the first iteration, I managed to get hold of a set of papers on relational database mangement system from Michael Stonebraker. We learned a lot from them. So to me Michael Stonebraker is a bit of a hero. In a conference at Hyatt Rickey’s in Palo Alto, I was lucky to meet all my RDB heros – Codd, Date, Michael, Lawrence Row and many others. After that I bumped into Stonebraker once in Illustra (we build an ODBC driver for them). I lost track of Stonebraker after that. I kept hearing about him a bit when I was doing some consulting work on Streaming Database for an XML acceleration company. So when I found this article, in the Semantic Web group in LinkedIn, I was really happy.

This is an area that bears a bit of investigation. Would love to get back and dabble in RDF stores, one of the most promising technologies on the horizon.