{"id":219,"date":"2026-06-08T07:53:57","date_gmt":"2026-06-08T07:53:57","guid":{"rendered":"https:\/\/elva.ai\/articles\/?p=219"},"modified":"2026-06-08T07:53:58","modified_gmt":"2026-06-08T07:53:58","slug":"ai-coverage-determination","status":"publish","type":"post","link":"https:\/\/elva.ai\/articles\/ai-coverage-determination\/","title":{"rendered":"The Most Valuable Answer in Dental RCM Is &#8220;I Don&#8217;t Know&#8221;"},"content":{"rendered":"<p>Most &#8220;AI for dental insurance&#8221; pitches collapse under one question: <strong>where does the answer come from, and how do you know it&#8217;s right?<\/strong> Coverage determination \u2014 deciding whether a CDT code is covered, for this patient, at what cost \u2014 isn&#8217;t a single model call. It&#8217;s a data problem with two competing sources of truth, a reconciliation strategy, and an obligation to be honest about uncertainty. This is how ELVA&#8217;s RCM data engine is built, and why a system that sometimes says &#8220;I don&#8217;t know&#8221; is the one a DSO can build a decade on.<\/p>\n<p><strong>AI coverage determination is the process of deciding what a payer will cover and at what cost \u2014 and the architecture behind it matters more than the model.<\/strong> Here&#8217;s ours, at the depth a technical evaluator should demand from any vendor.<\/p>\n<h2>Two sources of truth, kept deliberately separate<\/h2>\n<p>There are two fundamentally different kinds of evidence about what a payer will cover, and most systems blur them. We don&#8217;t.<\/p>\n<ul>\n<li><strong>What the payer <em>says<\/em> it does<\/strong> \u2014 published provider manuals, policy bulletins, state Medicaid manuals, ADA\/CDT references, fee schedules. Authoritative and citable, but generic and frequently stale.<\/li>\n<li><strong>What the payer <em>actually does<\/em><\/strong> \u2014 the empirical record of how real claims adjudicated. Current and specific, but it has to be earned through volume.<\/li>\n<\/ul>\n<p>Each covers the other&#8217;s blind spot. Stated policy answers the cold-start case where there&#8217;s no history; observed behavior catches the gap between what a payer publishes and what it pays. The two live in separate pipelines, with separate provenance, and are reconciled only at decision time \u2014 because the <em>disagreement<\/em> between them is itself one of the most valuable signals in the system.<\/p>\n<h2>Path 1 \u2014 Learning payer behavior from real claims<\/h2>\n<p>Every adjudicated claim becomes a normalized observation. Observations pool into cells of payer \u00d7 plan \u00d7 CDT code, and where the evidence supports it, ELVA <strong>derives a rule<\/strong> \u2014 across categories like allowed amount, cost-share, denial rate, frequency, bundling, downgrade, deductible exemption, missing-tooth clause, waiting period, and authorization or documentation requirements. Three design choices make this trustworthy rather than a black box:<\/p>\n<ul>\n<li><strong>Confidence is earned, not assumed.<\/strong> A derived rule starts at medium confidence and is promoted only as additional claims corroborate it. Thin cells are explicitly recorded as gaps \u2014 not papered over with a guess.<\/li>\n<li><strong>Rules are validated against held-out claims.<\/strong> A temporal holdout is split off and the derived ruleset is scored against claims it never saw \u2014 measuring whether rules <em>generalize<\/em>, not whether they memorized history.<\/li>\n<li><strong>Cross-practice seeding has guardrails.<\/strong> A behavior observed consistently across many independent practices can seed a practice that hasn&#8217;t encountered it yet \u2014 but only above strict thresholds for organization count and claim count, so one practice&#8217;s anomaly never becomes everyone&#8217;s rule.<\/li>\n<\/ul>\n<p>The result is a self-correcting model of each payer&#8217;s real behavior that strengthens with every remittance \u2014 the institutional knowledge a veteran biller carries, made auditable and shared.<\/p>\n<h2>Path 2 \u2014 Digesting the documents, with a judge in the loop<\/h2>\n<p>The document side is a full ingestion pipeline, not a prompt. A source document moves through tracked stages \u2014 preflight, OCR and layout parsing, normalization, semantic chunking \u2014 before any extraction happens. Then three controls apply:<\/p>\n<ul>\n<li><strong>No citation, no rule.<\/strong> An extractor proposes candidate rules, each grounded to the exact source excerpt it came from. A rule that can&#8217;t point to its sentence doesn&#8217;t exist.<\/li>\n<li><strong>A separate judge reviews every candidate.<\/strong> Each proposed rule is evaluated against validation checks and assigned a status \u2014 auto-accepted, rejected, or escalated for human approval. Nothing an extractor produces goes live without passing the judge. This is the single most important design decision on the document side: it&#8217;s what lets us use language models for extraction without inheriting their tendency to confabulate.<\/li>\n<li><strong>Conflicts resolve by formal hierarchy.<\/strong> Surviving rules are scoped (payer, plan, jurisdiction, organization) and stamped with a source-authority tier. When two rules conflict, ADA\/CDT references and state regulation outrank a payer FAQ; then scope specificity; then priority. Every state transition \u2014 created, approved, superseded, archived \u2014 is written to an audit chain.<\/li>\n<\/ul>\n<p>Knowledge is also scoped from organization-specific up to ELVA-wide: curated global knowledge is the floor; a practice&#8217;s, state&#8217;s, or payer&#8217;s specifics override it. This is what it means for the platform to handle <a href=\"https:\/\/www.elva.ai\/insurance\/eligibility-verification\">eligibility verification<\/a>, <a href=\"https:\/\/www.elva.ai\/insurance\/prior-authorization\">prior authorization<\/a>, and payer <a href=\"https:\/\/www.elva.ai\/insurance\/clinical-notes\">documentation requirements<\/a> from one governed knowledge layer rather than a pile of PDFs.<\/p>\n<h2>The decision engine \u2014 and why it abstains<\/h2>\n<p>At query time, a single contract answers &#8220;is this CDT covered, for this patient, at what cost?&#8221; through a tiered search: direct eligibility first, then the derived and policy rules, then patient history. It returns not just an answer but a <strong>coverage state, a confidence level, the tier that produced it, and the evidence behind it.<\/strong><\/p>\n<p>The part we&#8217;re proudest of is what the engine does when the data doesn&#8217;t support an answer: <strong>it abstains.<\/strong> It has explicit states for &#8220;plan active, but no CDT-specific data,&#8221; &#8220;eligibility stale,&#8221; &#8220;request misrouted,&#8221; &#8220;payer unsupported.&#8221; A system that always answers is a system that is confidently wrong a predictable fraction of the time \u2014 and in RCM, a confidently wrong answer denies a real claim or misquotes a real patient. &#8220;I don&#8217;t know, and here&#8217;s why&#8221; is treated as a first-class, correct output \u2014 the same honesty-about-uncertainty discipline that runs through <a href=\"https:\/\/www.elva.ai\/articles\/trust-ai-in-your-practice\/\">everything ELVA is allowed to do in a practice<\/a>.<\/p>\n<h2>How we know it works<\/h2>\n<p>Anyone can ship a model. The differentiator is measurement, and ELVA&#8217;s is built like a test lab, not a demo. Ground truth is <strong>real adjudication outcomes<\/strong> \u2014 what the payer actually paid \u2014 on a patient-disjoint, temporally-forward split, so the system is never graded on data it could have memorized. Grading uses a confusion matrix, not a single accuracy percentage, because not all errors cost the same:<\/p>\n<ul>\n<li><strong>The headline metric is the false-confident rate<\/strong> \u2014 answers that are wrong <em>and<\/em> high-confidence \u2014 weighted an order of magnitude more heavily than a cautious miss. That&#8217;s the cell that denies a real claim.<\/li>\n<li><strong>Abstention is never counted as failure.<\/strong> Over-answering is tracked as its own metric; the model is never rewarded for guessing.<\/li>\n<li><strong>Every probability is calibrated.<\/strong> Reliability curves and expected calibration error \u2014 when the system says 80%, it should happen about 80% of the time.<\/li>\n<li><strong>Results are stratified by seen-vs-unseen payer<\/strong>, so a blended number can&#8217;t hide a model that&#8217;s strong on familiar payers and weak on new ones.<\/li>\n<li><strong>The rule-extraction engine is measured separately<\/strong> \u2014 precision and recall of the rules it derives, not just final-answer accuracy \u2014 because a single wrong rule corrupts every downstream decision in its cell.<\/li>\n<\/ul>\n<h2>Why this is a foundation, not a feature<\/h2>\n<p>Three properties make the engine durable. <strong>Provenance everywhere:<\/strong> every fact traces to either a claim-volume-and-confidence trail or a source-document citation \u2014 which is what audits, appeals, and board-level scrutiny actually require. <strong>Self-correction:<\/strong> the behavioral model improves with every claim, and the document layer re-evaluates rules as feedback arrives. <strong>Honesty about uncertainty:<\/strong> the engine is engineered to know the boundary of what it knows. These are the properties that separate a system <a href=\"https:\/\/www.elva.ai\/articles\/ai-by-design-vs-ai-powered\/\">built as AI from one with AI bolted on<\/a> \u2014 and they&#8217;re the same engineering standard behind <a href=\"https:\/\/www.elva.ai\/articles\/practice-brain-engineering\/\">the Practice Brain itself<\/a>.<\/p>\n<p>For a technical evaluator, three questions are worth asking any vendor: <em>Where does each answer come from? Can it tell you when it doesn&#8217;t know? And can it prove it&#8217;s right against real adjudication outcomes \u2014 including on payers it has never seen?<\/em> ELVA is built so the answer to all three is yes. See the full RCM platform at <a href=\"https:\/\/www.elva.ai\/insurance\">ELVA Insurance<\/a>.<\/p>\n<h3>Frequently Asked Questions<\/h3>\n<h4>How does AI coverage determination work?<\/h4>\n<p>A serious system reconciles two evidence sources at decision time: the payer&#8217;s stated policy (manuals, bulletins, fee schedules) and the payer&#8217;s observed behavior (how real claims actually adjudicated). Each answer should return a coverage state, a confidence level, the evidence behind it \u2014 and an explicit &#8220;I don&#8217;t know&#8221; when the data doesn&#8217;t support a confident answer.<\/p>\n<h4>Why does ELVA&#8217;s engine sometimes answer &#8220;I don&#8217;t know&#8221;?<\/h4>\n<p>Because a system that always answers is confidently wrong a predictable fraction of the time, and in RCM a confidently wrong answer denies a real claim or misquotes a real patient. Abstention \u2014 with the reason stated \u2014 is treated as a first-class correct output, and the engine is never rewarded for guessing.<\/p>\n<h4>How does ELVA stop a language model from inventing payer rules?<\/h4>\n<p>Two controls: every candidate rule must be grounded to the exact source excerpt it came from (no citation, no rule), and a separate judge evaluates every candidate against validation checks before anything goes live \u2014 auto-accepting, rejecting, or escalating to human approval.<\/p>\n<h4>How is the engine&#8217;s accuracy actually measured?<\/h4>\n<p>Against real adjudication outcomes on a patient-disjoint, temporally-forward split, graded with a confusion matrix. The headline metric is the false-confident rate \u2014 wrong and high-confidence \u2014 weighted far more heavily than a cautious miss, with calibrated probabilities and results stratified by seen-vs-unseen payer.<\/p>\n<h4>What should a DSO ask any RCM AI vendor?<\/h4>\n<p>Three questions: where does each answer come from (provenance)? Can it tell you when it doesn&#8217;t know (calibrated abstention)? And can it prove accuracy against real payer outcomes, including payers it has never seen? A vendor without good answers to all three is selling a demo, not a foundation.<\/p>\n<p><strong>Evaluate the foundation, not the demo.<\/strong> See <a href=\"https:\/\/www.elva.ai\/insurance\">ELVA Insurance<\/a>, or the engineering story behind the wider system in <a href=\"https:\/\/www.elva.ai\/articles\/practice-brain-engineering\/\">how the Practice Brain is built<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Coverage determination isn&#8217;t a model call \u2014 it&#8217;s a data problem with two competing sources of truth and an obligation to be honest about uncertainty. Inside ELVA&#8217;s RCM engine: derived payer-behavior rules, a judge that reviews every extracted rule, and a decision engine that abstains rather than guesses.<\/p>\n","protected":false},"author":1,"featured_media":220,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35],"tags":[43,23,19,60,95],"class_list":["post-219","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-infrastructure","tag-dso","tag-insurance","tag-rcm","tag-supervised-autonomy","tag-technical-deep-dive"],"_links":{"self":[{"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/posts\/219","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/comments?post=219"}],"version-history":[{"count":1,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/posts\/219\/revisions"}],"predecessor-version":[{"id":225,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/posts\/219\/revisions\/225"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/media\/220"}],"wp:attachment":[{"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/media?parent=219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/categories?post=219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/elva.ai\/articles\/wp-json\/wp\/v2\/tags?post=219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}