LLM Hallucinations

September 8, 2024 - 9 minutes read - 1897 words

alt text

Reducing LLM Hallucinations: A Deep Dive into Reflection LLM and Vector Stores

Raymond Bernard
Senior Engineer and Solutions Architect specializing in NAS, SAN, and NVIDIA BaseBOD H100. Passionate about Data Science, AI, and Open Source. LLM enthusiast driving innovative solutions in cloud and AI technologies.

September 8, 2024
Ray Bernard
ray.bernard@outlook.com
Video demo
Code

Large Language Models (LLMs) have become invaluable tools across various domains, from content creation to coding assistance. However, they are not without flaws and often produce hallucinations—outputs that seem plausible but are factually incorrect. This blog explores a practical solution for reducing LLM hallucinations, focusing on a new model called the Reflection LLM. This model employs a unique approach, where it reflects internally before generating responses. We’ll dive into examples that illustrate how this mechanism works, support the open-source development of such models, and discuss how community contributions can help refine this approach.

Our goal is to provide constructive criticism to the community, aiming to improve these models by enhancing the reflection mechanism. While the Reflection LLM moves us closer to self-correcting models that deliver more accurate results, it still has limitations. In this blog, we’ll demonstrate where these limitations lie using the new Reflection LLM based on LLaMa3.

You can explore the model here:
Reflection LLM - Ollama Reflection Llama-3.1 70B - Huggingface
“Reflection Llama-3.1 70B is (currently) the world’s top open-source LLM, trained with a new technique called Reflection-Tuning that teaches an LLM to detect mistakes in its reasoning and correct course. The model was trained on synthetic data generated by Glaive. If you’re training a model, Glaive is incredible — use them.” — Huggingface

The Problem with LLM Hallucinations

A common misconception is that fine-tuning alone can resolve hallucinations. Fine-tuning often tweaks a model’s style and how it responds, but it doesn’t necessarily add new factual information. Fine-tuning is useful for improving coherence or tone, but hallucinations arise when the model is confident about something it knows nothing about.

Even if we use reflection, the model will not produce new insights to issue the correct answer.

LLMs require new sources of information to provide factual accuracy. This can be done using vector stores—databases designed to store and retrieve contextually relevant data. The challenge, however, is that even with using a multi-shot example in your vector database, the Reflection LLM is prone to hallucinations and may reflect away from the correct answer.

Here’s how we can do it:

Create a Vector Store with Correct QA Examples: Include examples like the Monty Hall problem and its various forms, including the correct answer to specific cases (where the host does not reveal any information).
Feed the Correct Context to the LLM: The model will retrieve the relevant example from the vector store and integrate that knowledge, reducing hallucinations.

Our example will demonstrate how to achieve this.

Even Reflection LLMs Can Get It Wrong: The Monty Hall Problem Example

While the Reflection LLM represents progress, it’s important to recognize that even state-of-the-art models can still falter in certain situations. A common example where LLMs frequently misinterpret the problem is a variation of the Monty Hall scenario:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a gold bar; behind the others, rotten vegetables. You pick a door, say No. 1, and the host asks, “Do you want to pick door No. 2 instead?” Is it to your advantage to switch your choice?

Even models equipped with reflection mechanisms often answer this incorrectly, treating it as the classic Monty Hall problem, where switching doors is statistically beneficial.

In the classic version, the host reveals one of the losing doors, providing new information that shifts the odds of winning from 1/3 to 2/3 if you switch doors. However, in this variation, no additional information is revealed, so the probability remains 1/3, and switching offers no advantage.

Despite this, LLMs—including Reflection LLMs—tend to assume that switching is always advantageous, hallucinating the presence of extra information that isn’t there.

Why Reflection Alone Is Not Enough

We will demonstrate the limitations of reflection mechanisms. While the Reflection LLM checks its internal understanding of the problem, it doesn’t necessarily gain new factual information unless it can access external knowledge, such as from a vector db.

Even though the model reflects on the problem, it still hallucinates because it over-relies on the pattern it learned from the classic Monty Hall problem. In this instance, the reflection process is insufficient for the model to recognize that the host doesn’t provide any new information in this version of the problem.

To correct this, we need to go a step further.

Let’s Play with Some Python Code to Illustrate Our Premise.

Baseline Model with No External Information

import ollama
import chromadb

client = chromadb.Client()
convo = []

# replace the message_history with one-shot and multi-shot examples 
message_history = [
    {'id': 1, 'prompt': 'What is my name?', 'response': 'Your name is Ray Bernard?'},
    {'id': 2, 'prompt': 'Ray Bernard owns two cats?', 'response': 'Lucy and Penny'},
    {'id': 3, 'prompt': 'Where is Ray Bernard’s astrological sign?', 'response': 'Virgo'}
]

# Vector database creation and embedding retrieval functions

def create_vector_db(conversations):
    vector_db_name = 'conversations'
    try:
        client.delete_collection(name=vector_db_name)
    except ValueError:
        pass  # Handle collection not existing
    vector_db = client.create_collection(name=vector_db_name)
    for c in conversations:
        serialized_convo = f'prompt:{c["prompt"]} response:{c["response"]}'
        response = ollama.embeddings(model='nomic-embed-text', prompt=serialized_convo)
        embedding = response['embedding']
        vector_db.add(ids=[str(c['id'])], embeddings=[embedding], documents=[serialized_convo])

def retrieve_embedding(prompt):
    response = ollama.embeddings(model='nomic-embed-text', prompt=prompt)
    prompt_embedding = response['embedding']
    vector_db = client.get_collection(name='conversations')
    results = vector_db.query(query_embeddings=[prompt_embedding], n_results=1)
    return results['documents'][0][0]

# Using LLama3 to generate responses without vector store context
def stream_response(prompt):
    convo.append({'role': 'user', 'content': prompt})
    response = ''

    # comment this section to remove llama3 
    stream = ollama.chat(model='llama3', messages=convo, stream=True)

    # uncomment the below to use the refection LLM 
    # stream = ollama.chat(model='reflection', messages=convo, stream=True)

    for chunk in stream:
        content = chunk['message']['content']
        response += content
        print(content, end='', flush=True)
    convo.append({'role': 'assistant', 'content': response})

Observation:

Without additional information, the model confidently hallucinates. It “knows” the user’s name and the other facts used in our vector store. However, it fails to get the correct answer to the variation of the Monty Hall problem.

Implementing One-Shot Learning with a Vector Store

In this example, we introduce one-shot learning by querying the vector store to retrieve relevant information. However, despite being provided with some context, the model can still hallucinate or overestimate its knowledge. To test this, you can replace the message_history in the code and run it again.

message_history = [
 {'id': 1,
 'prompt': """Suppose you’re on a game show, and you’re given the choice of three doors: Behind one  door is a gold bar; behind the others, rotten vegetables. You pick a door, say No. 1, and the host asks you “Do you want to pick door No. 2 instead?” Is it to your advantage to switch your choice?""",
 'response': """It is not an advantage to switch. It makes no difference if I switch or not because no additional material information has been provided since the initial choice."""},
 {'id': 2,
 'prompt': 'Ray Bernard owns two cats?', 
 'response': 'Lucy and Penny'},
 {'id': 3, 
 'prompt': 'Where is Ray Bernard’s astrological sign?', 
 'response': 'Virgo'}
]

Observation: I noticed that the model gets the answer right with just a one-shot example. One-shot learning means the model only needs a single example to understand and correctly respond to the problem. In this case, I’m using LLama3, and it accurately solved the task with just that single example.

Multi-Shot Learning with Reflection LLM

Using Multi-Shot Learning, hallucinations are greatly reduced by comparing different context sources in the vector database before deciding. Now, replace the message_history with the example below.

message_history = [
{'id': 1,
'prompt': """Suppose you’re on a
    game show, and you’re given the choice of three doors: Behind one door is a gold bar; behind the others, rotten vegetables. You pick a door, say No. 1, and the host asks you “Do you want to pick door No. 2 instead?” Is it to your advantage to switch your choice?""",
    'response': """It is not an advantage to switch. It makes no difference if I switch or not because no additional material information has been provided since the initial choice."""},
{'id': 2,
'prompt': """Suppose you’re on a
    game show, and you’re given the choice of three doors: Behind one door is a gold bar; behind the others, rotten vegetables. You pick a door, say No. 1, and the host asks you “Do you want to pick door No. 2 instead?” Is it to your advantage to switch your choice?""",
'response': 'the host has not revealed any new information'},
{'id': 3,
  'prompt': 'User: What is my name',
  'response': 'Ray Bernard'}
]

Observation: By leveraging a vector store with accurate QA examples in a multi-shot approach, the LLM can retrieve the specific context of the problem. In our Monty Hall variation, the model accurately understands that the host hasn’t revealed any new information, meaning switching doors offers no advantage. Use multi-shot only if the model gets it wrong, as this will help improve its accuracy.

Reflection LLM Test with Multi-Shot Example

For our final example, let’s use the Reflection LLM along with a multi-shot message_history to see if we can achieve better results. In my tests, the Reflection LLM failed even though multi-shot examples were provided.

 #comment this section to remove llama3 
# stream = ollama.chat(model='llama3', messages=convo, stream=True)

# please uncomment the below to use the refection LLM 
stream = ollama.chat(model='reflection', messages=convo, stream=True)

Observation: The Reflection LLM may initially offer the correct answer, but once it engages in the reflection process, it can overanalyze the problem and fail to recognize its own mistakes. Instead of adjusting, it tends to revert to the patterns it was originally trained on. As a result, it ultimately arrives at the wrong conclusion.

Conclusion

The release of Reflection LLM is an exciting development in the battle against LLM hallucinations and improving the overall quality of the response. However, reflection is not new and can be accomplished in many other ways. Lots of agentic frameworks have this built in. Reflection does come at a price though with respect to system performance and token consumption. It sometimes will go off the rails, going off-topic at times. But overall, this is a great approach and will improve with the advent of better models.

So how can Reflection LLM address this issue? The solution lies in prioritizing factual information from the vector store and only activating the reflection mechanism when no relevant context is available. This approach could be a valuable enhancement for the team behind Reflection LLM to consider in future updates, ensuring the model focuses on retrieving accurate information before reflecting.

Overall, I am very grateful to the open-source community and like to thank the team Matt Shumer and Sahil Chaudhary at Glaive.ai for releasing Refection LLM to the open-source community. I believe the community will play a crucial role in refining these methods, and we encourage further experimentation and development in this area. As we move forward, it’s clear that reducing hallucinations is about more than just fine-tuning or reflection—it’s about giving models access to accurate, contextually relevant information.

— Ray Bernard
ray.bernard@outlook.com

References

Williams, S., & Huckle, J. (2024). Easy Problems That LLMs Get Wrong. arXiv preprint arXiv:2405.19616. https://doi.org/10.48550/arXiv.2405.19616
Ai Austin. (2024, July 11). Local UNLIMITED Memory Ai Agent | Ollama RAG Crash Course [Video]. YouTube. https://www.youtube.com/watch?v=5xPvsMX2q2M

#LLM #ReflectionLLM #Llama3_70B #AIHallucinations #VectorStore #MultiShotLearning #OneShotLearning #MontyHallProblem #OpenSourceAI #AITraining #AIDevelopment #ReducingHallucinations #FineTuning #GlaiveAI #Ollama