ACL 2026 ICT, CAS UCAS

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

Institute of Computing Technology, Chinese Academy of Sciences
University of Chinese Academy of Sciences

Abstract

Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or elicit macro-level self-judgments through verbalized prompts. However, these methods address only a single facet of the hallucination, focusing either on implicit neural uncertainty or explicit symbolic reasoning, thereby treating these inherently coupled behaviors in isolation and failing to exploit their interdependence for a holistic view. In this paper, we propose LaaB (Logical Consistency-as-a-Bridge), a framework that bridges neural features and symbolic judgments for hallucination detection. LaaB introduces a "meta-judgment" process to map symbolic labels back into the feature space. By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection. Extensive experiments on 4 public datasets, across 4 LLMs, against 8 baselines demonstrate the superiority of LaaB.
Motivation

Method

LaaB Framework
(a) Response Hallucination Modeling
Extracting intrinsic features from the response generation to capture the implicit neural uncertainty.
(b) Self-Judgment Hallucination Modeling
Introducing a meta-judgment process that extracts intrinsic features from the self-judgment ("Yes"/"No") to detect evaluative hallucinations.
(c) Logic-Constrained Mutual Learning
Logic Constraints

The logical dependency between the response and self-judgment factuality.

Bridging these dual views by enforcing logical consistency between response and judgment predictions. It utilizes their inherent dependency for robust joint optimization, allowing high-performance detection using only Dr at inference time to avoid the secondary LLM inference overhead caused by the self-judgment process.

Experiments & Results

1. Main Results

Main Results 1 Main Results 2
Core Conclusion: LaaB yields additional performance gains for baselines in most cases. Two key dimensions:
1) Diverse Intrinsic Patterns: It effectively augments detection across hidden states, logits, and attention patterns.
2) Model Generalizability: It maintains high efficacy across scales (7B to 70B) and architectures (e.g., LLaMA, Qwen, Mistral).

2. Variant Analysis

Variant Analysis Results
Core Conclusion: The intrinsic patterns of responses remain the most informative cues. LaaB efficiently distills the benefits of the judgment view into the response detector Dr during training, enabling high-performance detection using only Dr at inference time.

3. Cross-Dataset Generalization

Cross-Dataset Generalization Results
Core Conclusion: By encouraging logical agreement across two views, LaaB pushes the detector toward evidence that transfers across datasets, appearing less sensitive to dataset-specific spurious cues and generalizing more reliably to unseen benchmarks.

4. Further Analysis

Sankey Diagram Length Analysis
Core Conclusion:
1) Prediction Transitions: LaaB corrects a substantial portion of samples originally misclassified by the response detector Dr, successfully transferring hallucination detection knowledge from the self-judgment view. It preserves most correct predictions and can even rectify instances where both views initially failed.
2) Length Intervals: LaaB brings accuracy improvements across most intervals, with particularly notable gains for longer text sequences. The self-judgment acts as a factual summary that compresses response-level factuality into a single token, mitigating representation sparsity and noise in long responses.

Contributions & Conclusion

Summary: We introduced LaaB (Logical Consistency-as-a-Bridge), a framework that bridges micro-level intrinsic neural patterns and macro-level symbolic self-judgments for LLM hallucination detection. By introducing a "meta-judgment" process and enforcing the inherent logical constraint via mutual learning, LaaB improves hallucination detection for most base models without significant additional inference cost.