Research: How Do You Find What Blogs to Read?

Research quesion: What blogs should you read, to be up to date on newsworthy stories?

Given a budget of 100 blogs, the biggest bang for the buck belonged to the popular Instapundit blog, which featured more than 4,500 postings throughout the year. Assuming a budget of 5,000 posts, however, the top-scoring blog was the less well-known sisu site, which featured only 331 posts for all of 2006.

How do you find something like this? How do you even go looking for this information?

During the past couple of weeks, I have been reading about Sentiment Analysis. I started with a post in the Text Analytics mailing list by Seth Grimes and followed many good posts with links. I read a few, understood the concept. It is a fascinating idea.

I came across this article today. It is different, but very useful. How do you find what blogs to read? Researchers at Carnegie Mellon created an algorithm called Cascades.

A team of researchers and graduate students from Carnegie Mellon eventually created a complex mathematical equation called the cost-effective lazy forward-selection algorithm, later dubbed the Cascades algorithm for simplicity’s sake.

One part seeks to maximize reward, in this case detecting the most news in the least amount of time. Within the algorithm, that reward concept is captured by tallying the number of people who read a news item after it appears on a specific blog. If 10 million people read a story after its initial posting on Blog A but only 1,000 had read it beforehand, the story would be deemed both newsworthy and early-breaking for Blog A’s readers.

A second part of the algorithm seeks to minimize cost, namely the inordinate time that could be spent reading blogs. The team also exploited a mathematical relationship known as the law of diminishing returns.

Cascades algorithm is not only useful for detecting news worthy blogs, but also water pollution. The sensors are just different.

LinkLog: Programming

From the food pyramid: for the journeyman programmer


It is a great way to structure:

  • Teaching Software Development
  • Training new employees (the environmental training is equally important)
  • A way to understand how developers do in various activities
  • For each project move from bottom to top and repeat

Other activities, like reading and writing about programming can help developers break the monotony of just doing work by combining it with a bit of learning.

Innovation Propagation

Democracy was probably one of the greatest innovations in the world. How did it propagate? For a visualization of this story visit March of Democracy. While you are there explore other maps too.

Where has democracy dominated and where has it retreated? This map gives us a visual ballet of democracy’s march across history as the most popular form of government. From the first ancient republics to the rise of self-governing nations, see the history of democracy: 4,000 years in 90 seconds…!

This is a great and a very powerful way to track how a certain event or movement propagates around the globe. This is also a great way to teach history. Moving from the video, to the meta problem it solves, we can think of a tool to track propagation of innovation and other events. Many examples come to mind:

  1. Historic events – spread of religions, spreading of culture, propagation of ideas. These and many others originate in one or two places and spread globally over a period of time.
  2. This may also be a great tool for teaching economics, history and diffusion of various other types of innovation.
  3. I would love to see a map of the way Mathematics or Science spread.

With the advent of internet, ideas spread through packets. Bloggers, definitely are catalysts for propagating information and ideas. Hopefully, we can trace the spread at a more granular level and understand why certain ideas spread and why others dont.

Web Data Mining

One of my articles on Web Data Mining appeared in i.t.magazine. They were kind enough to permit me to make it available from my blog.

Almost all of us need information. A lot of information is freely available on the Web. Learning a few techniques on how to mine information on the Web is a useful skill. Here are some sample usage scenarios:

  • You are an entrepreneur who is planning to start a new software business. You hear that Web 2.0 and social applications are hot. You want to do some research to understand the marketplace, and want to prototype a few product ideas.
  • You are part of the CTO office of a software company, and are interested in short-, medium-, and long-term technology and business trends in your industry. You need this information to build skills in your organization, and to build a few concept prototypes.
  • You are part of the CIO office of an organization. You need to balance early adoption of technologies with providing a stable environment for your business; you don’t want to jump at every new technology. In addition to finding new tools an techniques, you also want to understand the risks and the maturity level of these technologies, which ones are being used for building applications, and you also want to track many non-technical factors.
  • You are an outsourcing company and want to find customers for your business and track trends in outsourcing. Being a jump ahead of your competition and carving a niche are important differentiators.
  • You are part of HR, or a Learning Officer, and need to plan for the skill development of your employees. You want to keep your software team happy and so need to know the latest technologies, tools and resources to plan training and skill development.
  • You are a development lead, and need to provide the team with the latest information on product releases, and access to product/technology knowledge bases. You need to know of any problems, including security issues, in the tools or software that you are currently using for your projects.

Broadly, there are several components to finding, using and sharing information.

  • Identifying and discovering information sources
  • Tracking information from various sources and filtering them for their relevance to your needs
  • Organizing collected information and sharing it with others

Information sources can be many. A few listed below are typical.

Information sources can be categorized as:

  • News sources
  • Company websites
  • Blogs
  • Search engines
  • Wikis
  • Discussion groups
  • Social bookmarking sites
  • Social networks


This article ( webdata-mining.pdf) describes these sources and their significance in more detail (the article uses British spelling which is common in India).

Startups In the Center

There is no better way to express what is going on in Chennai (and India) than this graphic which Vijay was kind enough to share.


There is new kind of euphoria here. I was at TiECON on Friday and on Saturday. Very different approaches and very different styles. But they both had one thing in common – entrepreneurs at the center stage. Vijay said it better than I ever can:

Startups are the Center of this Universe.

The essential environment for startups to breath rich oxygen, is taking shape. As young, energetic, voluntary organization has a very bottom-up approach. I sensed the excitement and could feel the energy yesterday. It is very difficult not to get infected by the enthusiam and optimism. I walked out feeling a lot younger,  and my head buzzing with the infinite possibilities. Vijay’s Be That One Percent is still ringing in my ears.

TiECON Chennai 2008 – A Memorable Event

First, a bit of a background. I had my first two startups in India (both in Chennai) and the next two in the USA. I spend roughly about half my time in Silicon Valley and the other half in Chennai. I attended several TiECON’s in Santa Clara but this was my first in India. I did not know what to expect. The one day, information packed event, was one of the most inspiring conferences I have been to. I can write long accounts of what happened, but I will leave it to better narrators than me.
Every one I talked to, echoed what I felt – it was one of the best learning experiences and one of the most inspiring events. It certainly was like drinking from a fire hose.

“The world is driven by knowledge”

said the Chief Minister, Dr.M.Karunanidhi. He urged all of us to:

Let common man be the focus of innovation.

Great work is inspired by  a great cause. I can’t think of a better cause than this.

Reach out to the down trodden since they do not know how to seek help. Take that extra effort to bring them into the fold and teach them how to improve their lives.

The highly energetic and ever smiling Smt. M K Kanimozhi, Member of Parliment, Tamilnadu. talked about the efforts we need to make to help people who are surprised and may even be suspicious, when some one reaches out to them. She epitomizes the young leadership, we so much need – smart, articulate, completely at ease and very interactive.

The idea of awarding entrepreneurs was a brilliant one. It was a privilege to be in the same room with these people, who do not take no for an answer, who had a vision and a dream and worked hard to make it happen.

Images of the struggles and achievements, painted vividly by various keynote speakers were some of the most awe inspiring moments during the day.

A great blend of keynotes and panels, a very interactive audience, and an event that ran like a well oiled machine, should make the organizers and volunteers proud.

Under the leadership of Gopal Srinivasan and Mr.Ramaraj, two of the most dynamic figures in the industry, TiE organization and all the volunteers did an outstanding job. This event easily compares and even beats some of the TiECON annual events in California. It  is, definitely, one of the best, and most memorable one for me.

With TiECON Chennai 2008, the TiE organization set a high bar for future events. I am glad I was there.

LinkLog: Future of Learning

This is from a blog about Stephen Downes seminar in Malaysia on how to use Web 2.0 tools for learning.  It has great links to lots of useful resources for both learners and teachers too.

I especially liked the part about Future Learning Directions

  • Learning as Creation
  • Social Learning
  • Personal Learning Environments
  • Immersive Learning
  • Living Arts

I regularly read Stephen Downes blog and get his newsletter and I learn a lot about learning.

Personal Research Agenda: A Cool Concept

I came across Brad’s blog entry on Creating a Personal Research Agenda, an engrossing read. It seems to have triggered a few other comments including this one -which is summary list of topics.

My current focus is more on teaching technology to students by making them create small useful products. Brad’s ideas and other similar ideas will definitely inspire these students to think deeper.

Here is my own list to which I will probably add more items

  1. A set of Semantic Web Components
  2. Converting Web Pages to Linked Data Streams
  3. A simple concept map overlay of a Web Document
  4. A Semantic Search Project (for specific types of users – students, developers
  5. A Simple text mining toolkit (student projects)

I will write more about these as I start thinking deeper.

Semantic Web – Do We Really Need a Killer App?

Alex Iskod has an insightful blog post on Semantic Web: What is the Killer App that starts off a great conversation.

Before we talk about the Killer App for the Semantic Web, let us first look at some killer apps for the current Web. What do you think it is?

  • Search?
  • Open Information repositories like Wikipedia?
  • e-Commerce?
  • Social Networks?
  • Hosted Applications?
  • Social Resources (bookmarks, pictures, videos)?
  • Blogs?
  • <add your favorite app here>

Depending on where you are coming from, it could be any of these or a combination. The killer app varies with time and evolution of the web. But when the web started, what got people excited was the easy accessibility of information. Even the static web served useful purpose before the dynamic web and web applications showed up.

I think the same thing is going to be true of The Semantic Web. I look at the Semantic Web as a set of incremental improvements over the current web.

Here is a rather simplistic view of the Semantic Web:

  1. More accessible data (some of it encoded with Microformats or RDFa or some yet to be invented markup format)
  2. Better associations (type) between items of information (Linked Data?)

Most of the data on the web comes from either documents or databases. Database data already has a lot of semantics associated with it (in the form of meta data) and links (in the form of foreign key relationships or symbolic keys). Somewhere along the journey from this more structured form to the display format (html), some of the valuable information is lost. How difficult is it to add this information back (in the form of semantic markup) to the display data?

Documents will evolve to have more structure too. With ODF, OOXML and other similar formats, you are more likely to see better structured documents. As web pages are widgetized, you will see richer types of data on the web pages.

Structured data and richer data and more accessible data will take us several steps forward. Once we have data that makes more sense to people and programs, we will start noticing several useful applications.

When You Solve Your Own Problem…

When you solve your own problem, you create a tool that you’re passionate about. And passion is key. Passion means you’ll truly use it and care about it. And that’s the best way to get others to feel passionate about it too.

This and other great ideas in a book called Getting Real. It is a book about smaller, faster, better ways to build web applications. Some great ideas about building software. Here is a list of my favorite ones.
Build Less
Less Features means you can get the product out earlier into the hands of the customers. You get to hear what they really like and what they would like. This can be invaluable.

Fund Yourself
You can focus on doing something good instead of spending time looking for money. Meebo did this and so did lot of others. In fact, this is the norm in many of the Web 2.0 startups.

It Shouldn’t be a Chore
I love this one. If the app does not get you excited, it is not worth building. It should be fun to build. You need to enjoy every bit of the process. And if you built it for your own use, make sure that the experience of using it is fun, as well.

Seek and Celebrate Small Victories
Build incrementally. With each increment, make it more useful.

Check out the following advice.

Hire Less and Hire Later
You Can’t Fake Enthusiasm
The Blank Slate
Context Over Consistency

Open Doors
Ride the Blog Wave
Promote Through Education

Feel The Pain