AI explainability is the ability to give an affected person a meaningful, actionable reason for an AI decision, at the level of detail relevant to the decision being made. It is not a single technique, a single visualization, or a single line in a privacy policy. It is the load-bearing pillar that turns AI transparency from a public relations posture into a legally defensible practice.
This post is the long-form definition. It separates explainability from the related ideas it gets confused with (interpretability, transparency, fairness), names the four levels of explanation a complete program produces, walks through the major XAI techniques and what each is good for, surfaces the regulations that have made explainability a legal obligation, and finishes with the working playbook a company should be running today. Sources appear inline so any claim can be checked against the original framework or statute.
Input attribution
Which features pushed this decision which way.
Decision rationale
A plain-language reason an affected person can act on.
Model behavior
What the system does in aggregate, across populations.
Lifecycle accountability
Who built it, what it was tested on, how it changed.
A precise definition
The most cited operational definition comes from the DARPA XAI program, which framed explainability as the property that lets a user understand, appropriately trust, and effectively manage an AI system. The NIST four principles of explainable AI, published in 2021, distill the same idea into four testable requirements: a system must produce an explanation, the explanation must be meaningful to its audience, it must accurately reflect the system's process (explanation accuracy), and it must signal when the model is being asked to operate outside its knowledge limits.
Stripped to its working parts, an explainable AI system has to do four things at once. It has to surface which inputs drove a specific decision. It has to translate that into a reason an ordinary person can act on. It has to describe how the system behaves in aggregate, not just on the one case in front of you. And it has to carry an audit trail that lets a regulator, auditor, or court reconstruct who built the system, what it was trained on, and how it has changed over time.
A program that satisfies one of those four and skips the others is not "partially explainable." It is incomplete in a way that regulators, plaintiffs, and procurement teams now look for.
Why explainability is the load-bearing pillar
Of the four pillars of AI transparency (disclosure, explainability, data governance, oversight), explainability is the one that most often determines whether a company can actually defend a decision in front of a regulator, an underwriter, or a court. Disclosure tells people that AI is in use; explainability tells them, when something goes wrong, why the AI did what it did. The other pillars stage the system; explainability is what gets cross-examined.
The shift from "nice to have" to "load-bearing" happened in a narrow 24-month window. The GDPR Article 22 right to "meaningful information about the logic involved" in solely automated decisions, on the books since 2018, started seeing serious enforcement only after a string of 2023–2025 cases in the Netherlands, France, and Italy. The EU AI Act added Article 86 in 2024, an explicit right to explanation for affected persons subject to high-risk AI decisions, with full enforcement landing in August 2026. Colorado's SB24-205 operationalized adverse-action notices for "consequential decisions" with civil penalties up to $20,000 per violation per consumer. Each of these regimes assumes, implicitly or explicitly, that the company can produce a reason on demand.
The companies that produced model cards as a marketing exercise five years ago are now finding the same documents being read into the record by their adversaries.
The four levels of explanation
A complete explainability program produces artifacts at four distinct levels of abstraction. The labels differ between frameworks, but the operational layers line up.
1. Input attribution: which features pushed this decision
Input attribution answers the narrowest question: for this one prediction, how much did each input contribute? The standard tools are SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations). SHAP scores are grounded in cooperative game theory and have a clean additivity property; LIME fits a small linear surrogate near the decision point. Both produce per-feature contributions that engineers, auditors, and courts can read.
Input attribution is necessary but not sufficient. A SHAP plot showing that "income" pushed a credit decision down is technically true and operationally useless if the affected person cannot tell what they would need to do differently.
2. Decision rationale: a reason the affected person can act on
Decision rationale is the layer most laws actually require. GDPR Article 22, Colorado's ADMT, and California's Automated Decision-Making Technology rules all require that affected individuals receive a meaningful, plain-language reason when an AI is part of a denial or adverse action. "Your application was declined because of low income" is a rationale; a SHAP scatterplot is not. The two are linked, but the regulatory standard sits at the human-readable layer.
The cleanest production approach is to derive rationales from attribution, not instead of it: pick the top-N drivers of the decision, translate them into action-language for the affected person ("if you increase X by Y, the decision flips"), and store both the rationale and the underlying attribution so the trail is reproducible.
3. Model behavior: what the system does in aggregate
Model behavior describes what the system does across populations, not on one case. This is the layer that ISO/IEC 42001 audits, NIST AI RMF assessments, and procurement reviews care most about. Concrete artifacts include performance metrics broken out by subgroup, calibration curves, drift monitoring, distributional comparisons between training data and live data, and known-limitation lists. The widely used Model Cards framework, originally from Google researchers in 2019, gives the canonical template.
A company that can answer "how does this model behave across the people it affects?" with a published artifact is doing the work this layer requires. A company that cannot, even with strong attribution and rationales, will fail an external audit.
4. Lifecycle accountability: who built it and how it changed
The fourth layer is the one most companies skip until a lawsuit lands. It requires that the lineage of the system be reconstructible: which team built each version, which data the model was trained on, which evaluations were run before deployment, when the model was last updated, and what changed each release. ISO/IEC 42001's "AI management system" requirements, NIST AI RMF's Govern function, and the EU AI Act's technical documentation requirements all converge here.
Lifecycle accountability is the layer that turns an explanation from a one-off artifact into something a regulator, auditor, or court can verify. A model card without a versioned audit log is a snapshot; with one, it is evidence.
How AI explainability differs from related concepts
Explainability is sometimes used interchangeably with interpretability, transparency, or fairness. They overlap, but each one answers a different question and produces different artifacts.
Explainability asks: can affected people get a meaningful reason for an outcome? It produces reasons, attributions, and counterfactuals tied to specific decisions. It is the user-facing and regulator-facing layer.
Interpretability asks: can a researcher inspect the model's internal logic? It produces readable weights, rules, mechanistic descriptions, and circuits-level analyses of what individual model components compute. The Anthropic interpretability team's work on dictionary learning and feature visualization is the canonical example. Interpretability is a research property of the model itself; explainability is an operational property of the system around it. An interpretable model is easier to be explainable about, but a non-interpretable model can still be explained at the input-attribution and decision-rationale layers.
Transparency asks: can the public see what the AI does and how it is governed? It produces disclosure pages, model cards, sub-processor lists, and oversight commitments. Explainability is one input to transparency; transparency is broader because it covers disclosure and governance, not just per-decision reasoning.
Fairness asks: do outcomes harm protected groups disproportionately? It produces bias audits, disparate-impact testing, and mitigation plans. Fairness work draws on explainability tools (a SHAP plot can surface a proxy variable that produces disparate impact) but the fairness question is distinct.
The cleanest way to keep the four straight: explainability is what an affected person, regulator, or auditor gets when they ask the system for a reason. Interpretability, transparency, and fairness are properties measured by different audiences against different evidence.
The major XAI techniques
The space of explainability techniques is broad, but a small core covers most production use cases. The right choice depends on the model family, the audience, and the legal context.
A few patterns are worth naming. SHAP has become the default in regulated industries because it has a single mathematical foundation, tooling that integrates with most ML frameworks, and a regulator-friendly story. LIME is faster and useful when SHAP is computationally infeasible at scale. Counterfactuals ("if your income were $5,000 higher, this would have been approved") have become the preferred format for adverse-action notices because they translate directly into action-language. Attention and saliency maps are dominant for vision and language models, where token-level or pixel-level highlights are intuitive. Surrogate models are useful when a global, simple approximation is needed for audit. Intrinsically interpretable models (linear models, GAMs, monotonic boosters, rule lists) are the default in domains like credit scoring and clinical decision support, where regulators and clinicians need to read the model end-to-end.
Most mature programs combine three or four of these. A credit-decision pipeline might use a monotonic gradient-boosting model for intrinsic readability, SHAP for per-decision attribution, counterfactuals for the adverse-action notice, and a model card for the aggregate-behavior layer. No single technique satisfies every audience.
Local vs. global, post-hoc vs. intrinsic
Two orthogonal axes organize the XAI space cleanly. Local explanations describe one prediction; global explanations describe the model's behavior across all inputs. SHAP and LIME are local by default; surrogate models and partial-dependence plots are global. Most regulations require local explanations for affected individuals and global descriptions for auditors and procurement teams.
Post-hoc methods derive explanations after the model is trained, by querying it from the outside. SHAP, LIME, attention maps, and counterfactuals are all post-hoc. Intrinsic methods build explainability into the model itself: linear models, decision trees, rule lists, GAMs, and monotonic boosters. Intrinsic models trade some predictive power for end-to-end readability, which is often the right trade in regulated domains.
The practical recipe in 2026: use intrinsic models where the regulator demands end-to-end readability, post-hoc methods everywhere else, and always maintain both a per-decision rationale and an aggregate behavior description. Picking a side is the wrong frame.
What the regulators are formalizing
The four-level structure above maps cleanly onto the recent legislation. Reading the statutes side-by-side surfaces the operational consensus.
- 2024
EU AI Act enters force
EURisk-tiered framework with general-purpose AI obligations.
- 2024
Colorado AI Act passes
ColoradoADMT, fines up to $20,000 per consumer per violation.
- 2025
California TFAIA
CaliforniaTransparency in Frontier AI Act, public disclosure of training data and capabilities.
- 2026
EU AI Act full enforcement
EUAugust 2026, high-risk AI systems must be fully compliant.
EU AI Act, Article 86
The EU AI Act creates an explicit right to explanation in Article 86 for any natural person subject to a decision made by a high-risk AI system that produces legal effects or similarly significantly affects them. The deployer must provide "clear and meaningful explanations of the role of the AI system in the decision-making procedure and the main elements of the decision taken." Full enforcement for high-risk AI systems begins August 2026.
GDPR Article 22
GDPR Article 22, in force since 2018, provides the right not to be subject to a decision "based solely on automated processing" that produces legal or similarly significant effects, and the right to "meaningful information about the logic involved." Enforcement intensified through 2024 and 2025 in the Netherlands, France, and Italy, producing the case-law foundation that the AI Act's Article 86 now builds on.
Colorado AI Act (SB24-205)
Colorado's SB24-205 requires deployers of high-risk AI systems making "consequential decisions" to provide affected consumers with the principal reason or reasons for an adverse decision, an opportunity to correct any incorrect data, and an opportunity to appeal. Civil penalties reach $20,000 per violation per consumer. The bill is the first U.S. state law to attach financial penalties to explainability failures.
California ADMT regulations
The California Privacy Protection Agency's Automated Decision-Making Technology rules require pre-use notices, opt-out rights, and access to "meaningful information" about how the technology works for any business making significant decisions about California residents using ADMT. The regulations finalized in 2025 and apply alongside California's CCPA framework.
NIST AI Risk Management Framework
The NIST AI RMF is the soft-law backbone in the U.S. Its "Explainable and Interpretable" trustworthy-AI characteristic is what most U.S. federal contracting and many state regimes cite by reference. NIST does not score companies; it provides the structural language that other frameworks and procurement teams build on.
ISO/IEC 42001
ISO/IEC 42001, the international AI management-system standard published in late 2023, embeds explainability into auditable controls: documented processes for explanation generation, validation, and review. ISO 42001 certification is the closest thing to a commercial AI-explainability certification standard available today.
How AI explainability is evaluated
Three families of evaluation are emerging in production today, each measuring something different.
Faithfulness
Faithfulness asks: does the explanation reflect what the model actually did? Standard tests include input-perturbation checks (does removing the top-attributed feature actually drop the prediction the way the explanation implies?) and consistency checks (do similar inputs produce similar explanations?). Stanford's HELM and the TruLens library both publish faithfulness metrics for common XAI methods. A faithful explanation is not the same as a useful one, but an unfaithful explanation is automatically wrong.
Comprehensibility
Comprehensibility asks: does the explanation help its intended audience? This is the layer most often missing in early-stage XAI programs. Standard tests are user studies with representative affected users (does the explanation lead to correct understanding?), reading-level analysis on adverse-action notices, and field measurement of correction or appeal rates after explanations are introduced. A SHAP plot may be perfectly faithful and entirely useless to a denied loan applicant. The two evaluations are complementary, not redundant.
Auditability
Auditability asks: can a third party reconstruct and verify the explanation? This is the layer ISO 42001 audits, the EU AI Act technical documentation requirement, and procurement reviews focus on. Standard requirements include reproducible explanation pipelines, versioned model and explanation logs, and change-management records that connect every production decision to the model version and explanation method that produced it.
Mature programs evaluate against all three. A program that scores well on faithfulness and auditability but fails on comprehensibility produces explanations that satisfy regulators but not users. A program that scores well on comprehensibility alone has no defensible audit trail.
How to implement AI explainability at your company
The four levels and the evaluation taxonomy give a checklist. The difference between a company that can defend an AI decision and one that cannot is execution.
- Inventory the consequential decisions. Before picking a technique, list every place AI participates in a decision that affects an external person: hiring, lending, pricing, content moderation, fraud detection, healthcare triage, eligibility checks. Each one needs all four levels of explanation. The decision you forgot is the one that produces the litigation.
- Decide intrinsic vs. post-hoc per use case. For decisions in heavily regulated domains (credit, insurance, hiring), default to intrinsic models even at some predictive cost. For everything else, post-hoc methods are fine if the rest of the stack supports them.
- Wire attribution into the production path. SHAP or LIME values for every consequential decision, stored alongside the prediction with a stable schema. This is a one-time engineering investment that becomes the foundation for every downstream layer.
- Translate attribution into a rationale. For each consequential-decision flow, build a deterministic mapping from the top-N attributed features to a plain-language reason and, where applicable, a counterfactual ("you would need X to flip the decision"). Validate the mapping with reading-level checks and user research.
- Publish a model card per system. A single linkable URL describing intended use, performance metrics across relevant subgroups, known limitations, training data summary, and explanation methodology. The Model Cards template is the working starting point.
- Version everything. Model versions, explanation-method versions, training-data versions, evaluation results. Each consequential decision should be reproducible from logs months or years later. This is the auditability layer in operational form.
- Get evaluated externally. An independent assessment surfaces gaps before regulators or buyers do. The AI Clear methodology scores explainability across the same four levels described above.
Common myths
A handful of misconceptions slow companies down. Each one fails on first contact with the operational definition.
"We use a black-box model so we cannot explain it"
This is the most common and most expensive myth. Post-hoc explainability methods (SHAP, LIME, counterfactuals, surrogate models) work on any model, including deep neural networks and large foundation models. The choice is between harder explainability and no explainability, not between explainability and a black box. The frameworks cited above are explicit about this: GDPR, the EU AI Act, and Colorado's ADMT all assume explanations are achievable for the high-risk systems they regulate.
"A SHAP plot is an explanation"
A SHAP plot is one input to an explanation. It is faithful and well-defined, but it is not a meaningful, actionable reason for an affected person. Regulations like Article 22 and Article 86 require the decision rationale layer, in plain language. A complete program produces both: the technical attribution for auditors and the human-readable rationale for affected users.
"Explainability and accuracy are a strict tradeoff"
The "accuracy-explainability tradeoff" is real in some narrow cases but overstated as a general principle. Modern intrinsic models (monotonic boosters, GAMs with interaction detection, rule lists) reach within a small fraction of black-box performance on most structured tabular tasks, and post-hoc methods give explainability on any model. The companies that cite this tradeoff as a reason to skip explainability are usually skipping the engineering work, not making a mathematically forced tradeoff.
"Open-sourcing the model means it is explainable"
Open-sourcing is one input to interpretability. It does not, by itself, satisfy the explainability question for an affected person ("why did this decision come out this way?"). A closed-source model with strong attribution, rationales, model cards, and audit logs is more explainable than an open-weight model with none of those.
The 2027 outlook
The four-level structure of explainability is hardening, not softening. 2024 to 2026 was the regime-formation phase, when statutes and standards got drafted and adopted. 2027 is the case law and audit phase, when explainability obligations move from drafting tables into court records, audit findings, and procurement defaults. Pressure builds across four parallel fronts simultaneously.
Right-to-explanation expectations harden
- EU AI Act Article 86 right to explanation tested in court
- Colorado ADMT adverse-action notices reach standardized templates
- GDPR Article 22 enforcement converges with AI Act guidance
Model cards become a procurement default
- Vendor reviews require model cards and known-limitation lists
- Sector-specific explainability checklists in healthcare and finance
- Failed audits trace back to missing decision-level explanations
Third-party evaluation matures
- ISO/IEC 42001 audits formalize explainability evidence
- Independent benchmarks for explanation faithfulness gain ground
- Explanation quality starts being scored, not just present or absent
Plaintiffs cite missing explanations directly
- Adverse-action denials without reasons trigger class actions
- Counterfactual evidence used to argue actionable harm
- Discovery requests target explanation logs and audit trails
Regulators move from rulemaking to case law
The European AI Office, established under the EU AI Act, transitions from issuing guidance to producing the first formal interpretations of Article 86. National data-protection authorities continue the GDPR Article 22 enforcement they ramped up in 2024–2025. In the United States, the Colorado ADMT adverse-action notice template gets adopted (with local variations) by Connecticut, Texas, Virginia, and New York. By the end of 2027, somewhere between five and ten U.S. states are likely to have explainability requirements for consequential AI decisions.
Procurement standardizes around model cards
Vendor questionnaires now include explainability sections by default at large enterprises and federal agencies. This pattern goes universal in 2027 as more buyers add model-card requirements to their standard due-diligence templates. A vendor without a published model card and a documented explanation methodology starts losing deals it would have won in 2024. The AI Clear registry and similar third-party scoring services get cited inside vendor reviews because they answer the questions procurement teams are already required to ask.
Audits formalize explanation evidence
ISO/IEC 42001 audits move from a small early-adopter cohort to a procurement-default certification in 2027. Audit firms publish standardized evidence requirements for explainability: reproducible explanation pipelines, faithfulness benchmarks, versioned logs, and documented user-facing rationale templates. Audit findings start being cited in regulatory enforcement and litigation, in much the same way SOC 2 findings are cited in security disputes today.
Plaintiffs cite missing explanations directly
Class-action and individual plaintiff filings start citing missing or inadequate explanations as the operational fact pattern. Adverse-action denials without reasons, explanations that contradict the underlying attribution, and missing audit logs all become the working template. The HireVue, UnitedHealth, Cigna, and RealPage cases that defined 2023 to 2025 become the case-law foundation for a much larger second wave focused specifically on the explainability layer.
A 2027 readiness comparison
The working test for whether your company is positioned for 2027 is whether you can answer six questions in writing today.
| Question your company must answer | If yes | If no |
|---|---|---|
| Which of our AI decisions are "consequential" under EU/Colorado/California rules? | Documented inventory of consequential decisions | Disclosure gap regulators find first |
| How does each consequential decision get attributed (SHAP, LIME, intrinsic)? | Reproducible per-decision attribution pipeline | Failed audits and procurement questionnaires |
| What plain-language rationale does an affected person receive? | Documented rationale templates | Article 22 and ADMT exposure |
| What does the system do in aggregate, by subgroup? | Published model card with subgroup metrics | Missing the layer auditors care about most |
| Can we reproduce a 12-month-old decision from logs? | Versioned model, data, and explanation logs | Defenseless when discovery starts |
| What does our public explainability score look like? | A baseline you can defend | A surprise you find out about during an audit |
The companies that go into 2027 with all six answers in writing keep their customers, close their deals, and stay out of court. The ones that do not pay for it across all four fronts simultaneously.
A working summary
AI explainability is the ability to give an affected person a meaningful, actionable reason for an AI decision, at the level of detail relevant to that decision. It rests on four operational levels (input attribution, decision rationale, model behavior, and lifecycle accountability) that map onto every major framework (NIST, ISO 42001, EU AI Act, GDPR, Colorado ADMT) and onto the obligations imposed by recent legislation. It is now legally required for high-risk AI systems in the EU, financially incentivized in Colorado and California, and increasingly priced into procurement, audit, and insurance.
A company that can answer "why did this AI make this decision?" with a faithful attribution, a comprehensible rationale, a documented model behavior description, and a versioned audit trail keeps its customers, closes its deals, and stays out of court. A company that cannot does not.
The shortest possible definition: AI explainability is what lets you answer "why" when an affected person, an auditor, a regulator, or a court asks the question.
If you want to see how your company scores on explainability, search the AI Clear registry or read the published rubric. The 26 criteria across 5 domains operationalize exactly the four-level framework described in this post.
See where your company stands
AI Clear scores companies on AI transparency. Search the registry or request your scorecard.