Live Experiment: Training LLMs on your Knowledge Base
How to Structure and Organize Information of Optimal Retrieval Augmented Generation
So recently, I have been experimenting with knowledge bases, and how information is structured is vital.
In some sense, it directly relates to the context and overall meaning. It also makes it easier for LLMs to answer questions accurately and reduces hallucinations.
And so we created a few experiments that you can play with and test yourself.
But first, why does information structure matter?
Imagine a library. One room has books strewn everywhere, titles mixed up, and no discernible order. In another room, the books are organized by subject, then by author, then by publication date. Which room would help you find the book you want more efficiently?
LLMs are similar. The way knowledge is presented and organized dictates not only their understanding but also their output.
Our recent experiments highlighted two crucial aspects:
1. Organization of Informational Taxonomies and Hierarchies: This considers elements like URL structures, folders, and how information is interrelated. By defining the proper context, you can highlight what's critical and organize it appropriately.
2. Organization Within a Document: Delving deeper, this looks at the composition of individual pieces of information, from structure and semantics to formatting and summaries.
Let's dive into each experiment and the results.
To make this more fun and interactive, I am providing the documents and URLs used to train the AI. Additionally, you talk to the AI Agents, Good Bot and Bad Bot, and experience the difference for yourself.
Informational Hierarchies: How Structures Imply Meaning
At its core, informational hierarchy is about context. Whether it's a URL on a website or the structure of folders within a system, hierarchies set the scene and help LLMs understand the importance and relevance of different data points and how they relate to each other.
Consider these URLs:
- ChatbotConferences.com/conferences/2019/nyc: This URL suggests there are multiple conferences across different cities. From this URL, we could even extrapolate that there are other types of events that are not conferences.
- ChatbotConferences.com/new-york-city: This URL just refers to the city of New York. We can infer that there is or was a conference there, but not much more. Overall, this is ambiguous.
- ChatbotConferences.com/nyc/2019: This URL indicates multiple NYC events but omits a broader context. For example, are there conferences in other cities?
The Great Hierarchy Test
We started on a quest to understand the weight and importance of hierarchies. We build two chatbots, with their primary differentiation being the organization of their hierarchy:
Bot 1: Trained on multiple pages, each representing a distinct event and year with a well-defined hierarchical structure. Here are the URLs:
https://www.chatbotconference.com/conferences/2017/san-francisco
https://www.chatbotconference.com/conferences/2018/san-francisco
https://www.chatbotconference.com/conferences/2019/san-francisco
https://www.chatbotconference.com/conferences/2020/online-spring
https://www.chatbotconference.com/conferences/2020/online-fall
https://www.chatbotconference.com/conferences/2022/online-fall
https://www.chatbotconference.com/conferences/2022/online-spring
Bot 2: The Bot was trained on a single page that contains all of the same information as above.
What were the results?
Overall, Good Bot 1 was able to answer more questions accurately than Bad Bot.
Bad Bot struggled with even simple questions like, ‘How many conferences were there in 2019?” or ‘How many conferences were there in San Francisco?’. It got simple questions like this wrong. Without even having to look inside the documents, Good Bot was able to answer these questions, simply based on the URLs!
Are there any benefits to Bad Bot?
Surprisingly, there were. It generally costs less to ask questions and get responses from Bad Bot. Since Bad Bot only looked at one document, it was cheaper to run.
Now, let’s look look at ways to optimize a Document so you get the most value from it.
Organizing information within a Document
In the same way that a group document gives rise to the overall context and meaning, so does a single document.
Single documents are made up of words, sentences, paragraphs, headings, lists, and meta data, and these structures organize the context and overall meaning of a document.
But how much do these structures actually matter?
Do lists, tags, and headings have a special power? What about the semantics within a document? Does this have a major impact on performance?
To answer these questions, we created the document test! The test features two articles that are identical. The only difference in the articles is the use of tags, lists, definitions, and summary content. In other words, the meat and potatoes of the articles are identical.
The Document Test
We used an article about the benefits of drinking water to create our test. Here is a little information about the article:
Title: 20 Health Benefits of Drinking Water: Physical, Psychological and Nutritional
List of 20 Benefits
Focuses on the Physical, Psychological and Nutritional Benefits
Defines what each of these things means. For example, it defines the difference between the physical and nutritional benefits of drinking water.
It also considers the opposite of drinking too much water, such as dehydration, and situations when drinking water is unhealthy, like drinking too much water or poor quality water.
Now, this document was the 'Control', and we created a second document as the ‘Experiment’. The Experiment Document had all of the same information except that it was all structured differently.
Below are the differences between the documents.
Document 1: Ideally Structured Article (Control)
H1 Title Tag
Semantic Summary
H2 & H3
Definitions
Supplementary Content
Document 2: Poorly Structured Article (Experiment)
All tags converted to Paragraph Format
Simple Summary
No definitions or questions
Results
We then trained two bots on these articles and asked each bot simple questions. Below are the results of the answer and the costs in terms of tokens.
What are the 20 benefits of drinking water?
Good Bot Accuracy: 60% | Cost: 1265
Bad Bot Accuracy: 80% | Cost: 1117
What are the benefits of drinking water?
Good Bot Accuracy: 100% | Cost: 2771
Bad Bot Accuracy: 80% | Cost: 2876
How many benefits are there of drinking water?
Good Bot Accuracy: 100% | Cost: 2659
Bad Bot Accuracy: 0% | Cost: 2550
What are the types of benefits of drinking water?
Good Bot: Accuracy: 100% | Cost: 2615
Bad Bot: Accuracy 100% | Cost: 2580
How does drinking water help weight loss?
Good Bot: Accuracy: 100% | Cost: 2854
Bad Bot: Accuracy 100% | Cost: 2639
Overall, the differences in accuracy are clear. There is a clear benefit to a well-structured document. For one, it improves overall accuracy, which decreases the cost per accurate answer.
Furthermore, there is a qualitative difference. For example, the last question, “How does drinking water help weight loss?” bot bots answered it correctly; however, Good Bot provided a much more detailed and clear answer. The quality and experience were simply better.
Now, try it yourself: You can play around with each bot and test them for yourself below:
What’s Next?
I hope you enjoyed this piece and took a lot away from it. Next, I’ll share with you my complete guide on ‘How to Organize a Knowledge Base for RAG’. Lastly, if you have any questions, let me know in the comments.
Join the Journey
Join me as I journey and navigate through these waters.
Over the coming weeks, I will be sharing more content on philosophy and mindfulness. Then we will start to look at early results of some of our experiments!