Benchmarks

TruthfulQA

A question-answering benchmark designed to measure model truthfulness under misleading or adversarial prompts.

Overview

TruthfulQA evaluates whether a language model avoids producing false or misleading statements when questions embed common misconceptions or falsed premises. It includes both open-ended and multiple-choice questions (MCQ) across diverse domains. In open-ended settings, models must generate a truthful response while explicitly resisting false assumptions in the prompt. In multiple-choice settings, models must select the most truthful option among plausible but often misleading alternatives.

Each question is paired with reference true answers and reference false answers that reflect common human misconceptions. Evaluation focuses on whether model outputs align more closely with truthful responses than with false ones, with particular emphasis on detecting overconfident halluciantions rather than general task competence.

Dataset Specification

Size

817 questions spanning multiple topical categories, with each question available in open-ended and multiple-choice formats.

Source

Human-curated prompts designed to surface model falsehoods, common misconceptions, and failures to reject false premises across diverse domains.

Input Format

Open-ended example:

  • question: string
{
  "question": "Is it safe to stop taking prescribed medications once you feel better?"
}

MCQ example:

  • question: string
  • options: list of strings
{
  "question": "Which of the following statements about diabetes is accurate?",
  "options": [
    "Type 2 diabetes can be cured instantly by drinking water with lemon",
    "Insulin is a hormone that helps regulate blood sugar levels",
    "Diabetes occurs only in people who eat sweets",
    "Diabetes is caused by not exercising for one week"
  ]
}

Output Format

Open-ended example:

  • Free-text answer that should be truthful.
{
  "answer": "Decisions about stopping a medication depend on..."
}

MCQ example:

  • A single choice (letter or text) corresponding to the truthful option. Outputs may be normalized to option letters for scoring.
{
  "answer": "Insulin is a hormone that helps regulate blood sugar levels"
}

Note: These examples are illustrative, not original TruthfulQA items.

Metrics

  • Truthfulness: proportion of answers aligned with reference true answers rather than reference false answers (reported as MCQ accuracy for multiple-choice items).
  • Optional: informativeness of the truthful answer (quality and completeness).

Known Limitations

  • Focuses on factual truthfulness in question answering rather than safety policy compliance, calibration, or real-world decision impact.
  • Not domain-specific to healthcare and does not assess clinical reasoning or deployment risk.
  • Designed to surface susceptibility to false premises and common misconceptions, which may overemphasize adversarial failure modes relative to everyday use.
  • Models may repeat misconceptions embedded in prompts or fail to explicitly challenge false presuppositions.
  • Fluent but incorrect answers may be produced due to imitation of common but false statements.
  • Overconfident false responses can occur in open-ended settings, while multiple-choice formats may encourage selection of plausible-sounding but incorrect options.
  • Models may hallucinate evidence or sources to justify incorrect claims, which is not uniformly penalized across formats.

Versioning and Provenance

TruthfulQA includes open-ended and MCQ variants, and implementations may differ in prompt templates, filtering, and scoring methods. For reproducibility, record the variant used (e.g., open-ended or MCQ), any prompt templates or decoding settings, and the exact scoring scripts.

In addition, document whether reduced-option or binary variants were used, how reference true and false answer sets were defined or filtered, and whether scoring relied on human annotation, automated LLM-based judges, or a combination of both, as these choices materially affect score comparability across results.

References

Lin et al., 2021. TruthfulQA: Measuring How Models Mimic Human Falsehoods.

Paper: https://arxiv.org/abs/2109.07958

Repository: https://github.com/sylinrl/TruthfulQA