An Actor-Critic Approach to Reduce Hallucinations

We share a method to reduce hallucinations in RAG systems by combining answer generation with a hallucination checker, making Fin safer and more reliable without compromising latency or performance.

Fedor Parfenov

2025.04.10

In this post, we introduce a promising method to reduce hallucinations that can occur in a RAG (Retrieval-Augmented Generation) system. RAG systems are extremely useful in allowing LLMs to be used for a specific business setting; however they don’t reduce ‘hallucinations’ to zero.

In particular, they are vulnerable to a specific class of problem, where they assume that the content provided by the RAG system is complete, and can make ungrounded assumptions as a result.

Here, we describe our work combining answer generation with a hallucination checker, to reduce their risk of occurrence, without reducing efficiency too much.

Our approach uses multiple Actor-Critic iterations where the critic points to hallucinations, and the actor uses previous failures as negative examples. This allows us to run Fin more safely without unreasonable latency cost.

Hallucinations

By now, everyone reading this blog will be familiar with the hallucinations that LLMs can be prone to. Since the earliest days of Fin, we have worked hard to reduce the risk of hallucinations, so as not to provide wrong information to users. We have made a lot of progress around the structure of our prompts and how we word specific instructions. Additionally, LLMs have become increasingly better in following these instructions. Still, there are rare cases where a hallucination could slip through the net, and so we need to reduce risk further beyond a single LLM call.

Actor-Critic using a hallucination checker

As part of our work on reducing hallucinations, we developed an offline hallucination checker. A simple prompt (example in Appendix) can be used to detect hallucinations in an answer when provided with the question and the content used. This approach has proven to be quite effective at detecting ‘obvious’ hallucinations. By obvious, we mean hallucinations that are pure fabrications or completely tangential answers. For example, in the case of a password reset process, the LLM could mention receiving a text – even though it is clearly stipulated in the documentation that the user will receive an email.

The Actor-Critic approach, a pattern inspired from reinforcement learning, is a natural way of combining the answer generation and the hallucination checker. It is a way to structure a chain of thought process that iterates several times between an actor and a critic phase. The answer generation will play the role of the ‘actor’ while the hallucination checker, the ‘critic’, will look for hallucinations and point them out to the actor (if any). The following diagram summarizes this:

To close the loop, the critic outputs the problematic sentences in the candidate answer and it’s reasoning why. These are injected back as instructions into the generation prompt (see. Appendix). Effectively, the critic here plays the role of a bespoke negative example generator.

The system will then iterate several times between the actor and the critic, adding new negative examples, until the critic approves or ‘clears’ the answer. In case the model fails to clear the check, we escalate the question to a customer representative. This is a simple to implement yet powerful way to ensure the quality of the answer, that could also apply other types of guidance. However, we believe that removing hallucinations is the most important application so far.

Offline Experiments

To assess the effectiveness of this approach, we need a dataset. We used a mixture of direct feedback from customers and farming using the checker to gather a dataset of 148 examples of hallucinations in case of informational queries (i.e. queries that can be resolved entirely with appropriate static information in a Knowledge base). We backtested them using the new Actor-Critic approach, with a maximum of 3 iterations before escalation.

The results are very promising: 75% of the time we managed to remove the hallucination. 19% of the cases still had to be escalated – but these were mostly due to lack of informational content available for the query, so escalation was inevitable.

There were still 6% of cases where we observed that the Actor managed to ‘evade’ the critic by reformulating the sentences instead of addressing the error, but that is a significant reduction to an already rare event.

Regarding iterations, 81% of the cases cleared the hallucination checker after 1 iteration and 7.3% after 2. We also noticed that all cases with more than 2 iterations have led to an escalation. Hence this could be the sweet spot in terms of the right number of iterations for this approach.

Examples

We show 3 illustrative examples showing how this new system works. Note the output of the following examples was modified/paraphrased to mask any sensitive information.

Clear Fabrications

Most of the cases in our dataset are gross fabrications that can be easily spotted and removed. The actor critic system allows us to surgically replace them and generate answers which are more aligned with the knowledge base.

Issue Summary	First Act	Critique	Second Act
How can I reset my forgotten password for my account?	If you’ve forgotten your password, here’s how to reset it: 1. Go to the login page and click on the ‘Forgot your password?’ link. 2. You’ll receive an SMS with a six-digit code. 3. Enter this code into the ‘6 digit reset code’ field on your screen. 4. Change your password and log back into your account. If you don’t receive the reset SMS or encounter an ‘invalid token’ message, try sending yourself a new reset link. If you need further assistance, you can reach out to us at helpdesk@mycompany.com. Did that answer your question?	Problem sentence 1: You’ll receive an SMS with a six-digit code. Reason: The sources mention that the six-digit code is sent via email, not SMS. Problem Sentence 2: If you don’t receive the reset SMS or encounter an ‘invalid token’ message, try sending yourself a new reset link. Reason: The sources do not mention an SMS for the reset code; they mention an email. The ‘invalid token’ message is correctly referenced, but the method of receiving the code is incorrect.	To reset your forgotten password: 1. Go to the login page and click ‘Forgot your password?’ 2. You’ll receive an email with a password reset request. 3. Enter the six-digit code from the email into the ‘6 digit reset code’ field. 4. Change your password and log back in. If you get an ‘invalid token’ message, request a new reset link. The reset email expires after a while for security reasons. If you don’t receive the email, your account might be closed. If you need help, email us at helpdesk@mycompany.com. Did that answer your question?

From negative to absence of positive

The hallucination checker also does not let the actor make assumptions like “Unfortunately, we do not support integration with Product X.” unless it is clearly stated. The actor usually regenerates the safe answer “I do not have information whether we support Product X”. This type of assumption can be dangerous as it can be potentially misleading, and is a common type of error LLMs make in a RAG setting. Although the root issue here should ultimately be improved with better content in the Knowledge base, you’re never going to anticipate all customer questions and so a system that admits not knowing is a much safer option.

Issue Summary	First Act	Critique	Second Act
Is there a fee to use the connection with CompanyX?	There’s no additional fee to use the CompanyX integration with CompanyY. However, it’s important to note that the integration is only available on our Advanced plans. If you’re already on one of these plans, you can use the CompanyX integration at no extra cost. It’s designed to help you streamline your payroll process by syncing timesheets and reimbursable expenses from CompanyY to CompanyX automatically. Is that what you were looking for?	Problem sentence 1: There’s no additional fee to use the CompanyX integration with CompanyY. Reason: The sources do not provide information on whether there is an additional fee for the CompanyX integration. This statement is an assumption not supported by the provided sources.	The CompanyX integration is available to CompanyY customers who are on our Advance plans. To check your current plan, you can go to the Burger Icon > Accounts in your CompanyY account. If you have questions about your plan’s features or pricing, our support team is available to help. Did that answer your question?

Limiting Tangential Responses

In order to provide a helpful answer, LLMs can go off script and make common sense recommendations. Although this behaviour can be useful in some chatbot applications, this is undesirable for customers in industries like Fintech, and can generate problematic answers.

End User Question	First Response	Critic Feedback	New Response
How can I seek financial advice, from where can I get it, and would it be chargeable?	We understand you’re looking for financial advice, but it’s important to note that we can’t provide personalized financial advice or refer you to a specific financial advisor. However, we encourage you to seek independent legal, financial, taxation or other advice when considering whether an investment is appropriate for your objectives, financial situation or needs. Financial advice is typically chargeable, but fees can vary depending on the advisor and the complexity of your situation. To find a financial advisor, you might consider: 1. Searching for registered financial advisors in your area 2. Asking for recommendations from trusted friends or family 3. Contacting professional financial planning associations Remember, it’s important to verify the credentials and experience of any advisor you’re considering working with. Was that helpful?	Problematic sentence 1: Financial advice is typically chargeable, but fees can vary depending on the advisor and the complexity of your situation. Reason: The sources do not mention the cost of financial advice or the variability of fees. Problematic sentence 2: To find a financial advisor, you might consider: 1. Searching for registered financial advisors in your area 2. Asking for recommendations from trusted friends or family 3. Contacting professional financial planning associations Reason: The sources do not provide specific methods for finding a financial advisor.	We understand you’re looking for financial advice, but it’s important to note that we can’t provide personalized financial advice or refer you to a financial advisor. However, we strongly recommend seeking independent legal, financial, taxation or other advice when considering whether an investment is appropriate for your objectives, financial situation or needs. This will help ensure you make informed decisions about your investments based on your specific circumstances. Is that what you were looking for?

Next Steps

We constantly iterate on Fin to enable more innovative architectures and approaches. We believe that the application of the Actor-Critic pattern can be extended to other types of quality checks as well as helping to enforce guidances that customers provide to Fin.

Appendix

Hallucination Checker Prompt

Unset

Consider the following sources used to generate the answer:

{sources_text}

And the following conversation history:

<conversation_history>

{conversation_history}

</conversation_history>

And this customer query:

<query>

{query}

</query>

Given the following generated answer:

<answer>

{answer_text}

</answer>

Go through the answer line by line and evaluate each sentence against the provided sources, looking for contradictions, misinterpretations, conflation,  inaccuracies, fabrications, made up suggestions, hallucinations and false inference.

Pay particular attention to any case where the generated answer is trying to bend the meaning of the sources in order to answer the question.

It is ok if the answer doesn't directly address the query, the most important thing is that for every answer sentence: any facts, suggestions, steps or solutions proposed in the answer are fully grounded in the information provided in the resources.

Severity level definition: 

* "NONE" means no hallucinations

* "LOW" means super minor extrapolations or extremely reasonable assumptions

* "HIGH" means any statements not supported by the sources

Think hard about how severe the problems are (my job depends on it!) and answer with a VALID JSON output:

(When quoting or listing sentences, remove all single or double quotes to prevent JSON parsing errors)

{{

    "general_thoughts":  str, # Your general thoughts about how grounded the answer is in the sources

    "problematic_sentences": [[str]], # List of lists, each sublist being 3 strs [sentence, reason, severity_level]: Quote any sentences you think are problematic and state why alongside a severity level of "HIGH", "LOW", or "NONE", empty list if none.

    "severity_level": str # Overall hallucination severity, must be one of "HIGH", "LOW", or "NONE" (if one problematic sentence is "HIGH", this should be "HIGH")

}}

Injection of hallucination checker failures as negative examples into our generation prompt

Unset

[...]

Make sure not to repeat the mistakes committed by <previous_answer> to avoid <failed_checks>:

    <failed_checks>

        List of quality checks that the previously generated answer (in <previous_answer>) failed and the reason why.

        Make sure that the generated answer follows the instructions from the failed checks to avoid failing them again.

    </failed_checks>

    <previous_answer>

        This is a previous answer provided with this prompt that has failed in one or several quality checks.

        Make sure to write the <answer> so that remarks in <failed_checks> are respected.

    </previous_answer>

[...]

<previous_answer>{previous_answer}</previous_answer>

<failed_checks>

<ungrounded_sentence>

    The following sentence is ungrounded:

    <sentence>

        {sentence1}

    </sentence>

    Here is the reasoning:

    <reason>

        {reason1}

    </reason>

    Do not repeat this mistake. This is important to avoid quality checks to fail again.

</ungrounded_sentence>

<ungrounded_sentence>

    The following sentence is ungrounded:

    <sentence>

        {sentence2}

    </sentence>

    Here is the reasoning:

    <reason>

        {reason2}

    </reason>

    Do not repeat this mistake. This is important to avoid quality checks to fail again.

</ungrounded_sentence>

</failed_checks>

[...]

About the author

Fedor Parfenov is a Staff Machine Learning Scientist on Intercom's AI team. His previous projects include implementing recommender systems, reinforcement learning and LLMs for commercial applications.

David vs Goliath: are small LLMs any good?

Sagar Joglekar, Ramil Yarullin

2025.09.11