When the Web Becomes a Database

This was what we were dreaming about 7 years ago. This image is taken from the home page of our first web site (Feb 23, 2001), still sitting on the internet archive.

That is a bit of a background. You know why I got excited when I saw this article.

Every time there is a major shift in technology, this shift needs to be motivated by addressing a new class of problem. This means doing something that could not be done before. The last time this happened was when the relational database became the dominant IT technology. At that time, the questions involved putting the enterprise in the database and building a cluster of Line Of Business (LOB) applications around the database. The argument for the RDBMS was that you did not have to constrain the set of queries that might later be made, when designing the database. In other words, it was making things more ad hoc. This was opposed then on grounds of being less efficient than the hierarchical and network databases which the relational eventually replaced.

Today, the point of the Data Web is that you do not have to constrain what your data can join or integrate with, when you design your database. The counter-argument is that this is slow and geeky and not scalable. See the similarity?

A difference is that we are not specifically aiming at replacing the RDBMS. In fact, if you know exactly what you will query and have a well defined workload, a relational representation optimized for the workload will give you about 10x the performance of the equivalent RDF warehouse. OLTP remains a relational-only domain.

I started my career working with CODASYL based network databases (DBMS-11 to be precise) in 1978. Then we built a composite database (part network, part relational) in 83 and a SQL engine in 85-86. Computing was very different then. Relational databases had a slow start but once the performance improved (due to improvements in relational technology and hardware speeds), they took off.

Almost 20 years later, after the first commercial relational databases started becoming mainstream, we need to rethink data and access to data. The scale is different. The needs for access is different. What we need to get out of databases is going to be very different.

Along the way, there were attempts at extending relational databases in different directions (text databases, multi-media databases, object databases) but none of them had either the simplicity or elegance of relational databases.

What are some possible technologies for the new databases of the future? This article provides some insights.

However, when we are talking about doing queries and analytics against the Web, or even against more than a handful of relational systems, the things which make RDBMS good become problematic.

It is worth watching this space, especially, if you make a living working with relational technology. When the web becomes a database (even parts of it), there will be lots of challenges and hence lots of new opportunities.