Information About Information

I was jotting down ideas on the various aspects of Information that a business has to deal with. Not all of them are relevant to all businesses. However, as I was thinking about Information,  I was amazed by the number of attributes and activities related to information. Here is a list.

  1. Gathering – Identifying the Right Sources
  2. Finding – Search and Other tools
  3. Aggregating
  4. Validating – Verifying the authenticity and sources
  5. Deduplicating – Enormous overload occurs due to slightly modified versions of Information occurring over a period of time
  6. Normalizing – Reducing it to some kind of canonical form (who are the players, what is happenings etc.)
  7. Filtering – The essential tool to manage the overload and separate signal from noise. But the noise of one person may be the signal for another. So can we customize, individualize filters? What do we do with sediments left behind the filtering process?
  8. Detecting patterns – occurrence patterns and source bias patterns and other cause-effect patterns
  9. Classification  – Topic Aggregation, Topic Similarity, Topic Hierarchy
  10. Relating  – independent, interdependent, co-occurrence and correlations
  11. Analysis – contextual analysis, source context, use context, bias, analysis of language, overtones/undertones,
  12. Synthesis – Making sense of different pieces of information
  13. identifying Propagation Patterns – How does it propagate? What is the correlation of information paths to styles of information
  14. Insights – Detecting trends, velocity and currency
  15. Intelligence – Deriving actionable intelligence, mining, extracting facts, extracting entities, why/what/how/when/where analysis
  16. Layering  – how each layer maps to the organization’s layers?
  17. Flow – An analysis of flow of information. Tracing information between people, teams, departments, up and down the organization. Also flows between an organization, its partners and customers.
  18. Structuring  –  How do we link these different pieces – Unstructured, semi-structured and structured?
  19. identifying barriers to use – stovepipes/silos, lost information
  20. Supplementing/Augmenting Information – with annotations and collaborative editing
  21. Visualizing – Different levels and types of visualization
  22. Alerts and Notifications  – Smart alerts/notifications based on analysis and detection of patterns and occurrence of events based on rules. Needed for both internal and external information.
  23. Synchronizing – Updating internal information based on changes taking place external to the organization.

This is just a partial list. As the information increases dramatically, we need to think about these various aspects of Information and how we can leverage it to help an organization. What is your IIQ (Information Intelligence Quotient)?

Update June 2012

The team at Next Wave Multimedia were kind enough to create a presentation from this post.  Do you want to create your own fun presentations? You can try ComicsHead, an iPad app.

InfoTools Survey Results

Yesterday, I gave a talk on InfoTools: Beyond Search at TiE Chennai. The slides of the presention are here. I think it went well, but I think, if I had cut down the slides and talk and gave more demos, it would have gone even better. Perhaps next time.

Before starting the talk, I requested people to give me (written) answers to three questions:

  1. What are your information needs?
  2. What are your problems with information?
  3. What tools do you use to manage information?

The questions, perhaps were a bit vague. I realized that after going through the answers. They varied in their level of granularity (specific vs generic problems) and the definition of information itself. But here they are (slightly modified to reduce redundancy).

Here is what I got from the survey:

What are your information needs?

  1. Potential customer info
  2. Right info at the right time (whenever I need it)
  3. To structure, unstructured web data
  4. Technology and Process to manage large newspaper portal
  5. Should be current, relevant (to my context. Should lead to (help?) actual decisions
  6. I need reference (information) of various consultants relating to start of business viz cost, web, management etc.
  7. Need for current information
  8. Need (to handle?) information from multiple sources and formats
  9. Collating information from multiple sources
  10. Information about competition
  11. About marketability and segments
  12. Company address information
  13. Company finance (annual reports)
  14. Executives within the company
  15. Trade details, products, services etc.
  16. Sales leads
  17. Knowledge enhancement
  18. Learning about old friends/acquaintances/family
  19. To learn to grow personally & business
  20. Needs to be local search for providers near to me (for ex: a photo copier shop near to my house)
  21. Technical solutions (day to day) for career and personal growth
  22. My business is providing information based services, package with recommendations. So need for information varies.
  23. Various technologies in market
  24. Information about market situation
  25. About stock/companies performance
  26. Details/support to solve issues
  27. Products available in the market for specifics(?)
  28. Focused News
  29. Similar business entity info
  30. Public info of competitor
  31. At a business level – market feelers about demand, ease of vendor options availability
  32. At an execution/implementation level – latent trends in tech
  33. Updated knowledge
  34. Price information about products etc.
  35. Information about technology
  36. Looking for acquiring an IT company. Need info on the industry they are in (macro) and more about that company (micro)
  37. Collect, compile for pattern understanding, plan for target customer
  38. Top IT temp staffing companies in India
  39. Total temp staff in IT in India
  40. How do I know the customer needs
  41. Scholarly articles on business entrepreneurship
  42. Product information, addresses from www
  43. Collecting/harvesting data from websites and collating, cleansing and delivering to clients
  44. Where is the resource for information?
  45. Where info is available, how to get data stream into our database
  46. How cost effective, credible, valuable is the data
  47. Accessibility
  48. About companies wanting to enter India -setup operations, joint ventures
  49. Companies in India wanting to enter other geographies
  50. Consultants from outside India needing partners in India
  51. Relevant, accurate data (specific to the task at hand)
  52. Info about prospective customers
  53. Info about vendors
  54. Info about current market
  55. Info about latest technology

Here are my list of information requirements (I took the survey along with others)

  1. Leads
  2. Trends
  3. Best practices

What are your problems with information?

  1. Locating the right data at the right time
  2. At times info overload
  3. Unable to get the right (specific) information
  4. Sometimes get caught into loads of data, making it difficult to sift through
  5. Credibility, cost and accessibility
  6. Frequent website updates
  7. Different formats of information
  8. Gettting data from complex templates and grouping into finite categories
  9. Precision, very difficult to get objective information
  10. Currency of data
  11. Comprehensiveness of data
  12. Need continuous monitoring
  13. Information overload and in such case, synthesizing & assimilating that information in a reasonable time frame is difficult
  14. Old data, not accurate
  15. Too much info
  16. Not easily accessible
  17. Irrelevant info
  18. Filter out the actual/real info from a large pool of junk data
  19. Do not have a scope to interact with peers in similar industries
  20. Direct actionable information takes several searches, navigation
  21. How to localize information (assume how to get local information) and get reliable info
  22. How to segregate info from the web
  23. Difficult to put together
  24. If put together, not sure whether it is the updated info
  25. If updated (up to date?) not sure about the integrity of the data source
  26. Availability (sources), Reliability (sources)
  27. Aggregation of data in a presentable manner
  28. Too much information
  29. Unable to identify precise locations quickly
  30. Quality of inputs not high (always)
  31. Too large varied and different
  32. Formats (word, pdf, excel etc. ), hard copies, books, magazines
  33. Difficult to authenticate, collate and organize based on requirement
  34. I like websearch engines but I strongly believe that these search engines are at a nascent stage. I just don’t need a site coming up in my search because it is in wikipedia or yahoo
  35. Inappropriate not timely
  36. Have to go through lots of notes/documents/pages to get a single piece of information
  37. Validating the information
  38. Storing and organizing information
  39. Time
  40. Where to see (sources?)
  41. Not a centralized reporting
  42. Assimilation requires a lot of pre-formatting
  43. Effective and speed search by everyone not followed
  44. Not sure what to look for, where to look for and how to get it
  45. Vast, use software to target timely, quick, on realtime
  46. Not able to source the information in the web
  47. We develop products based on blogs and emails. This is not enough.
  48. Too much info
  49. Info with noise

My List

  1. Signal vs noise
  2. Reliability
  3. Authenticity

What tools do you use?

  1. Blog, forums
  2. Google, web search
  3. Search engines
  4. Reliable third parties
  5. Friends
  6. Regular expressions
  7. Use bookmarking tools like delicious, share with team
  8. Knowledge repositories (wikipedia
  9. Books (online/printed)
  10. Inhouse tools to capture through automation
  11. Infosource – www, infoanalysis – spreadsheets
  12. Search engines to identify information
  13. Customized perl/php/ programs to manage
  14. Scrape information from the web and manage it
  15. Search engines
  16. Networking sites (LinkedIn etc)
  17. Forums
  18. Email
  19. My brain power, word/excel
  20. justdial and few others provide localized service over phone but it is not so accurate
  21. Justdial
  22. Hakia
  23. None
  24. Excel/Computer/Notebooks
  25. Peer discussions
  26. IE Favorites (browser bookmarks)
  27. Bing
  28. Primary Research
  29. Internet, newspapers, meeting – software modules
  30. spreadsheet, email
  31. Internet, libraries
  32. Getting logic from other tools and using our own tools or languages
  33. Perl, regex
  34. Paid portals
  35. LinkedIn
  36. Spoke
  37. Ecademy
  38. Xing
  39. My memory (sigh)

What I use:

  1. Social bookmarks (delicious, stumble upon)
  2. Twitter Search
  3. Facebook groups
  4. LinkedIn Groups and Answers
  5. Custom search
  6. Blog/Feed Search
  7. Twine
  8. Semantic Search engines
  9. InfoMinder
  10. InfoStreams (feed aggregator/search)
  11. InfoPortals (just started)
  12. Tag clouds (generated)
  13. Concept Mapping tools
  14. OpenCalais
  15. Zemanta
  16. Wikis

This is a small sample (about 40+ people who attended my talk). But you can see some patterns. I think we have a long way to go beyond search.

Oh Mighty Search Engines, Please Let Me Tell You What I Want

I have been watching Bing, Wolfram Alpha and Google’s Snippets. All of these (and some others) advance search and provide lots of cool features. Their job is challenging since they are trying to guess what I want when I just type one keyword or phrase or a search expression.

What do I want from a search? It varies. Some times I want to find a website. Some times I want to find an address or document. There are times I want a person or discussion or a “how to”.

I am wondering why search engines don’t ask you what you want to find and make their job simpler?  Instead of assuming what you want, why not take that additional input from the user?

To be fair, they all do it to an extent. Google has keywords like filetype to specify the type of document you are trying to retrieve. Bing lists some of the categories as a set of options in the search result. Wolfram Alpha amazes me some times but I am never sure what I am going to get back.

Here is my suggestion. Why not let the user specify a keyword/value pair as part of search? For example

cisco type:address

will give me address of Cisco the company. Type can be abbreviated to # (since hashtags are becoming popular) if needed.

cisco #adress (could have synonyms like location etc.)

Or even extend this to retrieving multiple properties

cisco #address #stockinfo

The default will be #all


which defaults to the current mode of dumping all the stuff at you.

A feature like this would reduce the clutter (too many results), increase relevancy. There will be many challenges like coming up with a finite set of hashtags or hashtag synonyms.  If the search engine does not recognize a hashtag, let it default to #all and do what it is doing now. The search engine can learn from such input  about what people want.

I am little tired of getting lots of useless information from the search engine and wading through it all to find the one little nugget of information I really need. I don’t need a million results or decision engine or computational knowledge engine.  So, Might Search Engines, please let me tell you what I want. I may be stupid and this feature may already exist. In that case, you reader can educate me a bit. I know there are a few of others like me.

Day Dreaming About Voice Web

IBM Next Five in Five” is a list of innovations that have the potential to change the way people work, live and play over the next five years.

New technology will change how people create, build and interact with information and e-commerce websites – using speech instead of text. We know this can happen because the technology is available, but we also know it can happen because it must. In places like India, where the spoken word is more prominent than the written word in education, government and culture, “talking” to the Web is leapfrogging all other interfaces, and the mobile phone is outpacing the PC.

Here is the list from the IBM’s article.

  • Energy saving solar technology will be built into asphalt, paint and windows
  • You will have a crystal ball for your health
  • You will talk to the Web . . . and the Web will talk back
  • You will have your own digital shopping assistants
  • Forgetting will become a distant memory

One of my favorite hobbies is to pick one or more of these and try to figure out what we need to get there. It is a good way to dream about the near future and try to see where the gaps are and do some intermediate predictions.

Here are some random, incomplete thoughts for the Voice Web. There are several starting points depending on where you interests lie.

  • It has to be a layer on Web 1.0 and 2.0 (since a lot of useful content is already there).
  • Web 2.0 layer may be a better starting point since some of the underlying technologies – rest based APIs, social interfaces, mashup tools  are already available.
  • Some of the semantic technologies may help in providing some contextual structure and meta data over existing content. This may be an alternate starting point (using Freebase/dbpedia/Open Calais).
  • Voice recognition is one starting point. Many of the mobile providers already have something in this space but they are not perfect yet. Voice commands on our cell phones have limited context. There can be a bunch of innovations there.
  • Voice output is another starting point. This is an easier problem than voice recognition if the input (web content) and output (voice) are of the same language. This is another good starting point.
  • If the voice input and output are different languages (instructions originally written in English translated to a Tamil farmer, for example), we have some more chances of innovation. I am not talking about the babelfish style translation but a couple of steps above that.
  • From a device point of view, hands free operation of the cell phone may work better. These require innovation in both audio input and output technologies and miniaturization.
  • Obviously integrating search into this equation is one of the steps. There are some early attempts at doing this from Google. Not sure how well they work. But here are a few more opportunites. Layering voice search over meta search.

I can go on. But you get the drift. One cool way to capture all this (and the collective intelligence) is through some kind of voice annotated mind map (which in itself is another innovation waiting to happen). Your thoughts?

On a lighter note

WordPress recently introduced a feature to get all time stats on your blog. I was curious and started looking at the search keywords. The top 10 looks like this. This surprised me a bit since I rarely write about love. I then realized that search engines were leading people looking for love (pun intended) to my geeky entry on “Why I love Python”. I don’t know whether I should laugh or cry.


Making Search a Ubiquitous Infrastructure Component

Search is currently used as a Web Application. Imagine Search being a component of a wide variety of applications. The founder of Wikipedia, Jimmy Wales, plans to make that happen with Wikia.

Wikia is an open source search engine that plans to release both the source code and search data to the public.

“We have open-source software and cheap commodity computers in an open, neutral setting so that people can innovate very cheaply,” he says, adding that this allows people to experiment and perhaps make search a ubiquitous infrastructure component. Wales anticipates that many organizations will build their own search engine services thanks to the software’s availability on an open platform, and he says the trick to persuading people to use Wikia Search is delivering quality and a search experience that is at least as good if not better than their preferred search engine.

Almost every web site needs some kind of search engine and so do organizations. Imagine an open search API (currently you can use Lucene) broadly available for use. Imagine the ability to build your own ranking algorithm where the relevance factors are more in tune with your requirements.

You can do all these with a set of open source components from Apache. But an effort like Wikia is likely to accelerate the effort somewhat. Wikia may take a while to take off. But here in lies the opportunity for a wide variety of new products and innovations in search.

via ACM Technews: Wikipedia Co-Founder Tries Similar Idea In Search

LinkLog: Cool Mashups

It took a while for me to get back to my Google reader. I was going through the list of mashups from Programmable Web  and here is what I found.

Complete Schools

Education reference site based on US Department of Education data embedded with Yahoo Answers to deliver relevant questions and answers about colleges and universities in the United States.

The really cool thing  about this mashup is that I discovered a bunch of other “Answers” applications, notably one for Facebook.


Aggregates search trends from Google Trends and Yahoo Buzz. A complete list of the trends is supplemented by Google Search Results, Blog Search Results, News Results, Image Results, Book Results, Yahoo Search Results, and MSN Live Search Results.

Mashup of LinkedIn Questions and Answers 

This is a RSS feed that contains mashup of Linkedin Questions RSS feed and Linkedin Answers published on the Linkedin website.

Search Exploder

This uses two Yahoo pipes chained together. The first searches a default list of RSS feeds for terms you provide. The results are processed and an extra link is added that calls a pipe to run a Yahoo Search to give more details about the article topic.

It is a cool Yahoo pipes app.  There is a great idea in it somewhere. I can imagine some interesting possibilities with some recursion thrown in.


A mashup of tags with a Yahoo Pipe containing aggregated results from Blogger, Technorati Google Blog Search and several others. Click on a tag to see what people in the blogosphere are saying about that topic today

Does sound delicious (no pun intended). One of the coolest mashups I have seen in a while. I just wish it had a language filter so that I can see just English language blogs.

Searching For A Better Search

When Google came on the scene, they raised the bar on search. Search went from something I occasionally use to something I use most of the time. I did not know that I had the need to search so much.

However, I am yearning for a better search engine. That is good for both Google (since they are slightly ahead of the game at this point) and competitors since they can do a Google on Google.

Depending on your search needs, your wish-list may be different than mine. I am also looking at it as a more geeky power user. Here is my list.

Better Profile Usage
I want better use of my search profile (I now provide this to Google) both implicit and explicit. I use Swicki and other similar tools to constrain search. So I am willing to provide explicit hints to the search engine if required. I would also like the search engine to implicitly see the results I am clicking on and bookmarking. I will be happy to provide a search engine my blog profile, my profile my flickr profile in addition to my search profile. That should give them enough hints to give me a few good, high quality results. Google does a great job when I am looking for something new. But I am not all that happy with what it does for the areas I am interested in.

Better Use of Contextual Information
An example of a context is whether I am searching for my work or for my personal needs. Other contextual information includes whether I am looking for vendors, recommendations, opinions, deals etc.

Better Clustering of Results
I would like better clustering of the results. Some search engines currently do this. But I have no control over the clustering parameters. Again, I will be happy to provide some hints to the engine (may be my own classification structure).

Collaborative Search
Collaborative Search may be the next big thing. In a collaborative search, a set of users collaborate on filtering the search results to improve the quality of results. Collaborative search, by its very nature, may be useful for small clusters or groups where people are looking for similar things.

Better Formatting of Search Output
I would like better formatting of output. It is good to see RSS feeds being one option. It will be great, if I can choose what I want to see in my search results besides links and some brief text around the search terms that is currently provided by search engines.

Continuous Search
I would love to have a facility called Continuous Search (I picked the term from continuous query processing in the database world). A continuous search is a search, that is repeated at specified intervals and only the new items identified. This way I can keep tabs on various things I frequently search for.

Ability to Specify Search Objectives
Finally, I would like to specify what I want out of the search. Is it a person, a company, a place, a document? I will be happy to specify something like result:person or some such hint in the search.

I can go on and on, but you get the drift. I am a power searcher and I find some of the most powerful search engines available today still a bit too limiting. As we make search an integral part of our web lives, we will need more.

Many of these features can probably be handled through search extensions and mashups. So building a great Search framework and let other developers innovate may be a way to do this.

Cool Tools for the Web – Part-3

Some times I use a tool for a while before it appears in this list.


I stumbled upon, StumbleUpon a while ago, while tracking the visitors to my blog. It is one of those compelling applications that you have to have. It is sitting in my Firefox Toolbar, right now. StumbleUpon made news this week. They were acquired by eBay. They seemed to have added more services too. Definitely worth checking out.

Swicki is a cross between wikis and search engine (not sure why they decided to spell wiki as wicki). The swickibuilder lets you build a search (a few terms and information sources) and use it collaboratively. According to the swicki FAQ:

A swicki is new kind of search engine that allows anyone to create deep, focused searches on topics you care about. Unlike other search engines, you and your community have total control over the results and it uses the wisdom of crowds to improve search results. This search engine, or swicki, can be published on your site. Your swicki presents search results that you’re interested in, pulls in new relevant information as it is indexed, and organizes everything for you in a neat little customizable widget you can put on your web site or blog, complete with its very own buzz cloud that constantly updates to show you what are hot search terms in your community.

You can check it out here. I built a couple of them to play around with. There is not that much activity in my swickis. You may understand why, if you look at my swickis and the popular ones 🙂