Day Dreaming About Voice Web

IBM Next Five in Five” is a list of innovations that have the potential to change the way people work, live and play over the next five years.

New technology will change how people create, build and interact with information and e-commerce websites – using speech instead of text. We know this can happen because the technology is available, but we also know it can happen because it must. In places like India, where the spoken word is more prominent than the written word in education, government and culture, “talking” to the Web is leapfrogging all other interfaces, and the mobile phone is outpacing the PC.

Here is the list from the IBM’s article.

  • Energy saving solar technology will be built into asphalt, paint and windows
  • You will have a crystal ball for your health
  • You will talk to the Web . . . and the Web will talk back
  • You will have your own digital shopping assistants
  • Forgetting will become a distant memory

One of my favorite hobbies is to pick one or more of these and try to figure out what we need to get there. It is a good way to dream about the near future and try to see where the gaps are and do some intermediate predictions.

Here are some random, incomplete thoughts for the Voice Web. There are several starting points depending on where you interests lie.

  • It has to be a layer on Web 1.0 and 2.0 (since a lot of useful content is already there).
  • Web 2.0 layer may be a better starting point since some of the underlying technologies – rest based APIs, social interfaces, mashup tools  are already available.
  • Some of the semantic technologies may help in providing some contextual structure and meta data over existing content. This may be an alternate starting point (using Freebase/dbpedia/Open Calais).
  • Voice recognition is one starting point. Many of the mobile providers already have something in this space but they are not perfect yet. Voice commands on our cell phones have limited context. There can be a bunch of innovations there.
  • Voice output is another starting point. This is an easier problem than voice recognition if the input (web content) and output (voice) are of the same language. This is another good starting point.
  • If the voice input and output are different languages (instructions originally written in English translated to a Tamil farmer, for example), we have some more chances of innovation. I am not talking about the babelfish style translation but a couple of steps above that.
  • From a device point of view, hands free operation of the cell phone may work better. These require innovation in both audio input and output technologies and miniaturization.
  • Obviously integrating search into this equation is one of the steps. There are some early attempts at doing this from Google. Not sure how well they work. But here are a few more opportunites. Layering voice search over meta search.

I can go on. But you get the drift. One cool way to capture all this (and the collective intelligence) is through some kind of voice annotated mind map (which in itself is another innovation waiting to happen). Your thoughts?