What size of vocabulary do you need to support for a small useful task or domain specific NLP app?
Let us start with a simple theory that it can be the sum of spoken English terms and a set of task/domain specific terms.
You can take these terms, build a vector space and build a chatbot using the vector space from scratch.
The task specific terms may come from the application – customer support, marketing, Job/Skill Analysis, etc. I don’t know whether this idea will work, but it is inspired by a couple of similar efforts.
Just to get a rough idea of these numbers, I asked ChatGPT two questions and the answers look as if the numbers are the ball park. From other sources I know the size of spoken English.
Question: What is the size of commonly used Vocabulary terms in English?
ChatGPTThe size of the commonly used vocabulary terms in English can vary depending on the definition of “commonly used” and the specific source or study that is referenced.
According to some estimates, the average adult native English speaker has a vocabulary of approximately 20,000 to 35,000 words. However, this number can be influenced by various factors such as education, profession, and social environment.
Another way to estimate the size of the English vocabulary is to consider the number of words in popular dictionaries. For example, the Oxford English Dictionary (OED) contains over 170,000 words in current use, while Merriam-Webster’s Collegiate Dictionary includes around 225,000 entries.
It’s worth noting that these figures may not represent the exact number of words that are commonly used in everyday speech or writing, as some words may be more specialized or obscure. Additionally, there are many regional and colloquial variations in the English language, which can affect the size and usage of different vocabulary terms.Question – What is the size of the vocabulary of spoken English?
Estimating the size of the vocabulary of spoken English can be a challenging task, as there are many factors that can influence the number of words that are commonly used in spoken communication.
Some studies have attempted to estimate the vocabulary size of English speakers based on their performance on vocabulary tests. For example, one study found that native English speakers had an average vocabulary size of around 20,000 to 25,000 words, while non-native speakers had a vocabulary size of around 5,000 to 10,000 words.
However, it’s important to note that these estimates may not accurately reflect the size of the vocabulary used in everyday spoken communication. In spoken language, people often use a smaller set of words more frequently, while relying on nonverbal communication, intonation, and context to convey meaning.
Furthermore, the vocabulary used in spoken English can vary depending on factors such as regional dialects, social context, and the level of formality. For example, the vocabulary used in casual conversations with friends may be different from the vocabulary used in a professional or academic setting.
Overall, while it’s difficult to estimate the exact size of the vocabulary of spoken English, it’s likely that it is smaller than the total number of words in the English language and that it can vary depending on various factors.
Finally, when asked – does every chatbot require an LLM or a generative pre-trained model?