Does every chatbot require an LLM or a generative pre-trained model?

What size of vocabulary do you need to support for a small useful task or domain specific NLP app?

Let us start with a simple theory that it can be the sum of spoken English terms and a set of task/domain specific terms.

You can take these terms, build a vector space and build a chatbot using the vector space from scratch.

The task specific terms may come from the application – customer support, marketing, Job/Skill Analysis, etc. I don’t know whether this idea will work, but it is inspired by a couple of similar efforts.

Just to get a rough idea of these numbers, I asked ChatGPT two questions and the answers look as if the numbers are the ball park.  From other sources I know the size of spoken English.

Question: What is the size of commonly used Vocabulary terms in English?

According to some estimates, the average adult native English speaker has a vocabulary of approximately 20,000 to 35,000 words. However, this number can be influenced by various factors such as education, profession, and social environment.

Another way to estimate the size of the English vocabulary is to consider the number of words in popular dictionaries. For example, the Oxford English Dictionary (OED) contains over 170,000 words in current use, while Merriam-Webster’s Collegiate Dictionary includes around 225,000 entries.

It’s worth noting that these figures may not represent the exact number of words that are commonly used in everyday speech or writing, as some words may be more specialized or obscure. Additionally, there are many regional and colloquial variations in the English language, which can affect the size and usage of different vocabulary terms.

Question – What is the size of the vocabulary of spoken English?

Some studies have attempted to estimate the vocabulary size of English speakers based on their performance on vocabulary tests. For example, one study found that native English speakers had an average vocabulary size of around 20,000 to 25,000 words, while non-native speakers had a vocabulary size of around 5,000 to 10,000 words.

However, it’s important to note that these estimates may not accurately reflect the size of the vocabulary used in everyday spoken communication. In spoken language, people often use a smaller set of words more frequently, while relying on nonverbal communication, intonation, and context to convey meaning.

Furthermore, the vocabulary used in spoken English can vary depending on factors such as regional dialects, social context, and the level of formality. For example, the vocabulary used in casual conversations with friends may be different from the vocabulary used in a professional or academic setting.

Overall, while it’s difficult to estimate the exact size of the vocabulary of spoken English, it’s likely that it is smaller than the total number of words in the English language and that it can vary depending on various factors.

Finally, when asked – does every chatbot require an LLM or a generative pre-trained model?


For example, there are rule-based chatbots that rely on a set of predefined rules to respond to user input. These chatbots are typically simpler and less flexible than LLM-based chatbots, but they can still be effective for certain use cases.

Another type of chatbot is the retrieval-based chatbot, which uses a database of pre-existing responses and matches them to user input using keywords or other types of natural language processing (NLP) techniques. Retrieval-based chatbots are also typically simpler than LLM-based chatbots, but they can still provide useful and accurate responses in certain contexts.

However, if a chatbot is expected to generate human-like responses that are not limited to a predefined set of rules or responses, then an LLM or a generative pre-trained model may be necessary. These types of chatbots are typically more complex and require more advanced NLP techniques, but they can provide more personalized and engaging interactions with users.