InfoTools Survey Results

Yesterday, I gave a talk on InfoTools: Beyond Search at TiE Chennai. The slides of the presention are here. I think it went well, but I think, if I had cut down the slides and talk and gave more demos, it would have gone even better. Perhaps next time.

Before starting the talk, I requested people to give me (written) answers to three questions:

  1. What are your information needs?
  2. What are your problems with information?
  3. What tools do you use to manage information?

The questions, perhaps were a bit vague. I realized that after going through the answers. They varied in their level of granularity (specific vs generic problems) and the definition of information itself. But here they are (slightly modified to reduce redundancy).

Here is what I got from the survey:

What are your information needs?

  1. Potential customer info
  2. Right info at the right time (whenever I need it)
  3. To structure, unstructured web data
  4. Technology and Process to manage large newspaper portal
  5. Should be current, relevant (to my context. Should lead to (help?) actual decisions
  6. I need reference (information) of various consultants relating to start of business viz cost, web, management etc.
  7. Need for current information
  8. Need (to handle?) information from multiple sources and formats
  9. Collating information from multiple sources
  10. Information about competition
  11. About marketability and segments
  12. Company address information
  13. Company finance (annual reports)
  14. Executives within the company
  15. Trade details, products, services etc.
  16. Sales leads
  17. Knowledge enhancement
  18. Learning about old friends/acquaintances/family
  19. To learn to grow personally & business
  20. Needs to be local search for providers near to me (for ex: a photo copier shop near to my house)
  21. Technical solutions (day to day) for career and personal growth
  22. My business is providing information based services, package with recommendations. So need for information varies.
  23. Various technologies in market
  24. Information about market situation
  25. About stock/companies performance
  26. Details/support to solve issues
  27. Products available in the market for specifics(?)
  28. Focused News
  29. Similar business entity info
  30. Public info of competitor
  31. At a business level – market feelers about demand, ease of vendor options availability
  32. At an execution/implementation level – latent trends in tech
  33. Updated knowledge
  34. Price information about products etc.
  35. Information about technology
  36. Looking for acquiring an IT company. Need info on the industry they are in (macro) and more about that company (micro)
  37. Collect, compile for pattern understanding, plan for target customer
  38. Top IT temp staffing companies in India
  39. Total temp staff in IT in India
  40. How do I know the customer needs
  41. Scholarly articles on business entrepreneurship
  42. Product information, addresses from www
  43. Collecting/harvesting data from websites and collating, cleansing and delivering to clients
  44. Where is the resource for information?
  45. Where info is available, how to get data stream into our database
  46. How cost effective, credible, valuable is the data
  47. Accessibility
  48. About companies wanting to enter India -setup operations, joint ventures
  49. Companies in India wanting to enter other geographies
  50. Consultants from outside India needing partners in India
  51. Relevant, accurate data (specific to the task at hand)
  52. Info about prospective customers
  53. Info about vendors
  54. Info about current market
  55. Info about latest technology

Here are my list of information requirements (I took the survey along with others)

  1. Leads
  2. Trends
  3. Best practices

What are your problems with information?

  1. Locating the right data at the right time
  2. At times info overload
  3. Unable to get the right (specific) information
  4. Sometimes get caught into loads of data, making it difficult to sift through
  5. Credibility, cost and accessibility
  6. Frequent website updates
  7. Different formats of information
  8. Gettting data from complex templates and grouping into finite categories
  9. Precision, very difficult to get objective information
  10. Currency of data
  11. Comprehensiveness of data
  12. Need continuous monitoring
  13. Information overload and in such case, synthesizing & assimilating that information in a reasonable time frame is difficult
  14. Old data, not accurate
  15. Too much info
  16. Not easily accessible
  17. Irrelevant info
  18. Filter out the actual/real info from a large pool of junk data
  19. Do not have a scope to interact with peers in similar industries
  20. Direct actionable information takes several searches, navigation
  21. How to localize information (assume how to get local information) and get reliable info
  22. How to segregate info from the web
  23. Difficult to put together
  24. If put together, not sure whether it is the updated info
  25. If updated (up to date?) not sure about the integrity of the data source
  26. Availability (sources), Reliability (sources)
  27. Aggregation of data in a presentable manner
  28. Too much information
  29. Unable to identify precise locations quickly
  30. Quality of inputs not high (always)
  31. Too large varied and different
  32. Formats (word, pdf, excel etc. ), hard copies, books, magazines
  33. Difficult to authenticate, collate and organize based on requirement
  34. I like websearch engines but I strongly believe that these search engines are at a nascent stage. I just don’t need a site coming up in my search because it is in wikipedia or yahoo
  35. Inappropriate not timely
  36. Have to go through lots of notes/documents/pages to get a single piece of information
  37. Validating the information
  38. Storing and organizing information
  39. Time
  40. Where to see (sources?)
  41. Not a centralized reporting
  42. Assimilation requires a lot of pre-formatting
  43. Effective and speed search by everyone not followed
  44. Not sure what to look for, where to look for and how to get it
  45. Vast, use software to target timely, quick, on realtime
  46. Not able to source the information in the web
  47. We develop products based on blogs and emails. This is not enough.
  48. Too much info
  49. Info with noise

My List

  1. Signal vs noise
  2. Reliability
  3. Authenticity

What tools do you use?

  1. Blog, forums
  2. Google, web search
  3. Search engines
  4. Reliable third parties
  5. Friends
  6. Regular expressions
  7. Use bookmarking tools like delicious, share with team
  8. Knowledge repositories (wikipedia
  9. Books (online/printed)
  10. Inhouse tools to capture through automation
  11. Infosource – www, infoanalysis – spreadsheets
  12. Search engines to identify information
  13. Customized perl/php/vb.net programs to manage
  14. Scrape information from the web and manage it
  15. Search engines
  16. Networking sites (LinkedIn etc)
  17. Forums
  18. Email
  19. My brain power, word/excel
  20. justdial and few others provide localized service over phone but it is not so accurate
  21. Justdial
  22. Hakia
  23. None
  24. Excel/Computer/Notebooks
  25. Peer discussions
  26. IE Favorites (browser bookmarks)
  27. Bing
  28. Primary Research
  29. Internet, newspapers, meeting – software modules
  30. spreadsheet, email
  31. Internet, libraries
  32. Getting logic from other tools and using our own tools or languages
  33. Perl, regex
  34. Paid portals
  35. LinkedIn
  36. Spoke
  37. Ecademy
  38. Xing
  39. My memory (sigh)

What I use:

  1. Social bookmarks (delicious, stumble upon)
  2. Twitter Search
  3. Facebook groups
  4. LinkedIn Groups and Answers
  5. Custom search
  6. Blog/Feed Search
  7. Twine
  8. Semantic Search engines
  9. InfoMinder
  10. InfoStreams (feed aggregator/search)
  11. InfoPortals (just started)
  12. Tag clouds (generated)
  13. Concept Mapping tools
  14. OpenCalais
  15. Zemanta
  16. Wikis

This is a small sample (about 40+ people who attended my talk). But you can see some patterns. I think we have a long way to go beyond search.

Wikis and Information Intelligence

Information Intelligence is the practice of gathering intelligence useful to an organization. It uses Open Source Intelligence to enrich an organization’s ability to gather intelligence for internal use.

Open Source Intelligence (OSINT) is an information processing discipline that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence. In the Intelligence Community (IC), the term “open” refers to overt, publicly available sources (as opposed to covert or classified sources); it is not related to open-source software. OSINT includes a wide variety of information and sources:

  • Media – newspapers, magazines, radio, television, and computer-based information.
  • Public data – government reports, official data such as budgets and demographics, hearings, legislative debates, press conferences, speeches, marine and aeronautical safety warnings, environmental impact statements, contract awards.
  • Observation and reporting – Amateur airplane spotters, radio monitors and satellite observers among many others have provided significant information not otherwise available. The availability of worldwide satellite photography, often of high resolution, on the Web (e.g., Google Earth) has expanded open source capabilities into areas formerly available only to major intelligence services.
  • Professional and academic – conferences, symposia, professional associations, academic papers, and subject matter experts.[1]

In addition to these Media mentioned above there are several sources for Web Data Mining. There are several aspects of improving Information Intelligence:

  1. Gathering information from a variety of openly available sources
  2. Supplementing the open source intelligence with internal information
  3. Providing a collaborative platform to share information
  4. Enriching information – tagging, interlinking, annotating
  5. Versioning information to keep it current
  6. Providing a semantic layer for easy retrieval and integration with other tools
  7. Providing both a horizontal view and specific vertical views of the information

Wiki is an ideal tool for managing Information Intelligence inside an organization. You can start with a base wiki technology like MediaWiki (used by Wikipedia) and build additional layers like Semantic Media Wiki or provide structured data access like DbPedia . You can get information on several vertical sharing information sites using MediaWiki here.

A good example of both horizontal and vertical views is demonstrated by the US Government initiatives Diplopedia and Intellipedia.

Recent congressional testimony from Jimmy Wales, the founder of Wikipedia,[5] notes the difference between vertical and horizontal information sharing and suggests that both could be successful e-government endeavors. Intellipedia is an excellent example of sharing information horizontally across agencies, and Diplopedia has found similar success in sharing information within the Department of State bureaucracy. Statements on both wikis encourage cross posting of relevant information as appropriate.

Wikis provide a great foundation for Information Intelligence. Enriching Wikis with semantic annotations, providing more powerful viewing options, granular addressing and increasing the quality of links may go a long way in increasing their effectiveness.

Meta:

This entry was triggered by an email invite to an Intellipedia session at the MIT Center for Collective Intelligence.