Analyzing Unstructured Content

I am reading the UIMA overview document. It is a fascinating description of an architecture for analyzing unstructured documents. In analyzing unstructured content, UIMA based applications make use of a variety of analysis technologies including:

• Statistical and rule-based Natural Language Processing (NLP)
• Information Retrieval (IR)
• Machine learning
• Ontologies
• Automated reasoning and
• Knowledge Sources (e.g., CYC, WordNet, FrameNet, etc.)

As the amount of unstructured information increases, it becomes important to make sense of it. The type of analysis is normally domain and application specific. You can take a collection of related documents and come up with various analysis views. Depending on the type of analysis you can use different analysis engines.

Let us take a current topic – Apple vs Samsung. If we gather a set of news items from the time the case started, you can analyze it in different ways.

An analysis of innovations which include levels of innovation and what is an innovation and what is not
An analysis of patents which may be useful to other vendors of smart phones and tablets
An analysis of human interest stories from both companies (and the style of product management)
An analysis of product development processes

Same documents, different views based on your interest levels. This is a fascinating area.

UIMA document provides an overview of how to develop simple and aggregated analysis engines. I found this document gripping (which is not a term you normally associate with technical documents). It not only explains the conceptual thinking behind UIMA, but also triggers several ideas and thoughts for further reading.

One thought on “Analyzing Unstructured Content”

Cohan says:

September 6, 2012 at 5:59 pm

Dorai, I might not have shared this with you yet, but we have made a lot of progress in an area of unstructured information analysis that is not mainstream yet. I thought you might like a demo we have at http://aiaioo.wordpress.com/2012/07/30/a-simple-tracker-to-follow-what-people-are-saying-about-the-jan-lokpal-protest/ and there are more demos on our website aiaioo.com.

Comments are closed.