One of the ideas we are discussing in a small group at KCG is to build a minimal prototype of a Science Teacher robot.
Here are some thoughts about the first iteration of the prototype:
- We will restrict the domain of discourse to Science.
- A student can ask simple questions like “Who is…”, “What is..”, “How does X work”
- The application will take voice input and may support English and one local language (initially Tamil)
- We will use Wikipedia as the knowledge base
- We will first build it as a tablet application but later may move to other platforms that support touch
- The output may be voice, images, video
- If students are comfortable and it works reasonably well, we may put it inside a more friendly humanoid robot
Something like this may already be there. Here is what I already know:
- With the Apple Siri and other similar competitive products in other platforms, we may not need this effort. But I will wait till I see APIs for these natural agents.
- We can move from the simple language to full natural language. We wanted to restrict the language to simple constructs to reduce the front end parsing and understanding the question.
- At a very simple level it is speech recognizer, question translator (into some canonical form), a look up in a specially constructed knowledge base derived from Wikipedia, a speech synthesizer and display (if the output is other than text).
- Plan to look at Cyc and a couple of other similar technologies
- Plan to look at UIMA
- Plan to look at sources other than Wikipedia – YouTube, Vimeo, HowStuff Works and some Science books may be some of the next set of sources.
- We need to build a higher level semantic layer on top of the sources. This may be simple LinkedData (as Debpedia does on top of Wikipedia)
Thoughts, questions, pointers and suggestions are welcome.
Great initiative. I would like to be part of it. I have experience in huge data processing while doing the first annotation of human genome as Chief Architect of DoubleTwist Inc.
1. Wikipedia data needs to be curated for accuracy for such a project which has the potential to affect millions.
2. Do you really need NLP aka voice processing?
Thanks. We do plan to take only a small subset of data from Wikipedia for testing. I agree that we need to curate it. We may look at wikibooks which may be better for this.
NLP will let us support a larger format of questions. But since we are currently dealing with 7th graders (as test subjects), we will teach them some simple syntax for asking questions.
Our first goal is to build something viable and test to see adoption rate and difficulties. We can work in parallel in other technologies.
Let us know how we can work together. We are doing this in Chennai. Proximity is not a requirement but would be something nice to have.