Uncategorized

DataWeb

The last time I heard the term Data Web, was in a presentation made at Stanford, by R.V.Guha. While digging through some articles, I found this interesting intersection between Data Spaces and Information Analytics by Seth Grimes.

My take is that a network of interconnected database environments would make an ideal data web, but one that will never be close to completely realized. The programmer in me says that practical, task-oriented approaches like Grossman's are the way to get stuff done. Regardless of how it's realized, the data-space concept provides an excellent framework for work toward robust knowledge networks.

Data Space is an abstraction to collect data (or reference to data) from multiple sources, in multiple formats and organize it for easy access by indviduals and organization. Currently search engines such as Google serve this function with very little (or no) semantics.

Their analysis acknowledges that much of the information we use is outside our administrative control. It's in someone else's database or files. It's described by someone else's metadata schema (or none at all) and therefore possesses a low level of semantic integration (or common definitions) with other information that interests us. These are the conditions that launched Google and the other search giants. It's hard to find documents and even harder to find meaning, whether on your desktop or on the Internet. Per Franklin, Halevy and Maier, we should move toward data coexistence rather than enforced conformity.

There are several efforts in similar directions. Some of the Semantic Web initiatives propose embedding RDF in HTML and others like Microformats propose a bottom-up method of structuring frequently used data (like people, events, relationships).

While I was Googling for Guha's presentation, I found that he now works for Google.