auditrmmbayesianrisk-assessmentaudit-firmbig4professional-judgment

Bayesian RMM scoring as a calibration check, not a replacement

Big 4 firms have proprietary RMM frameworks (EY GAM, KPMG KAM, PwC's Aura). VynFi's Bayesian RMM exists not to replace them, but to calibrate them — an objective second opinion with auditor-prior overrides for professional judgment.

VynFi Team · EngineeringMay 10, 202612 min read

Risk of material misstatement (RMM) is one of the load-bearing concepts in modern auditing. The whole structure of the engagement — what gets tested, how thoroughly, with what materiality — flows from the RMM assessment. Get RMM wrong and either the engagement is over-resourced (testing things that don't need testing, leaving the firm exposed on cost) or under-resourced (missing risks that materialise later as audit failures). Big 4 firms have spent decades building proprietary RMM frameworks: EY's GAM, KPMG's KAM (Key Audit Matters / KAM methodology), PwC's Aura, Deloitte's Omnia. Each is a substantial intellectual investment. None of them is going anywhere.

**TL;DR** — VynFi's Bayesian RMM is not a replacement for EY GAM / KPMG KAM / PwC Aura / Deloitte Omnia. It's a calibration check: an objective second opinion built on a 12-factor model with closed-form Beta posterior updates. Auditor priors are first-class — every prior can be overridden, and every override is captured in the audit trail. The model's job is to surface where your firm framework's output and the objective model diverge, so you can have an informed conversation about why.

The state of RMM in Big 4

Each of the four firms runs its own RMM framework, and they're more similar than they are different. All four are anchored to the ISA 315 (Revised) risk-assessment paradigm: identify the entity's risks of material misstatement at the financial-statement and assertion levels, evaluate the design and implementation of relevant controls, and design responses (substantive procedures, controls testing) proportionate to the assessed risk. The differences are mostly in the procedural overlay (how risks get scored, what factors get considered, how many rating bands the framework uses) and the technology overlay (the audit-tech platform that operationalises the framework — KPMG Clara, PwC Aura, Deloitte Omnia, EY's Helix-aligned GAM).

Pros: each framework has been refined over decades against real engagement experience, reflects the firm's accumulated audit knowledge, integrates with the firm's broader audit-tech stack (analytics, AI tools, document repositories, training material), and is what every partner and manager at the firm has been trained to use. Cons: each framework is proprietary (so the four firms can't easily benchmark against each other), each is to some extent a product of the firm's institutional culture (which can encode unstated biases — under-rating risks the firm has been comfortable with historically, over-rating risks that hit the firm in a recent litigation), and none has a formal mechanism for surfacing 'how does our scoring compare to a third-party objective baseline.'

That last point is where VynFi's Bayesian RMM enters. Not as a competitor to the proprietary frameworks (which would be doomed; nobody at a Big 4 firm is going to abandon GAM or Aura for a startup's risk model). As a calibration check: an objective second opinion that scores the same engagement on the same data, surfaces where the firm framework and the objective model agree (most of the time), and flags where they diverge (the interesting cases). The diverge cases drive the conversation that improves both — the firm framework can tighten its scoring on factors the objective model picks up; the objective model can incorporate the firm's domain reasoning where the auditor's professional judgment correctly overrides the data.

Why an objective Bayesian model adds value as a calibration check

Three properties make a Bayesian model well-suited to this calibration role. First, transparency: every factor's contribution to the posterior is explicit (no black-box neural net), which means partners and reviewers can trace any output back to its inputs. Second, override-friendliness: priors are first-class. The Bayesian model isn't 'we computed a number, take it or leave it' — it's 'given these priors and this evidence, here's the posterior; if you have better priors, the model will gladly re-score.' Third, formal uncertainty quantification: every score comes with a credibility interval, not just a point estimate, which is crucial for a calibration check (the interesting cases are where the firm framework and objective model disagree by more than the credibility interval).

The contrast with ML-based 'AI risk scoring' (the MindBridge / proprietary-platform approach) is informative. ML scoring trained on historical audit data optimises for prediction accuracy on the training distribution. For a calibration check, prediction accuracy isn't the right objective — the right objective is interpretability. Why did the model assign this score? What would change the score? How confident is the model? An ML model can give you a number; it can't easily give you a defensible answer to those questions in front of a regulator. A Bayesian model can.

The 12-factor model

VynFi's Bayesian RMM consists of 12 factors per account, each modelled as a Beta-distributed random variable with explicit priors. The factor list is derived from common patterns across ISA 315, the four Big 4 frameworks, and academic risk-assessment literature; we landed on 12 as the cardinality where adding more factors would deliver minimal incremental information and removing factors would lose interpretability:

**Account complexity** — operational complexity of the account (e.g. revenue recognition for a SaaS company is more complex than cash for a money-market fund).
**Transaction volume** — sheer number of transactions feeding the account in the period.
**Control strength** — the assessed effectiveness of the controls over the account (preventive + detective).
**Manual journal frequency** — proportion of postings that hit the account from manual journal entries (vs. system-generated).
**Prior-period adjustments** — frequency and magnitude of adjustments to this account's prior-period balances.
**Estimation uncertainty** — degree to which the account is driven by management estimates (allowance for credit losses, fair value, contingent liabilities).
**Related-party exposure** — the account's exposure to related-party transactions.
**Foreign-currency exposure** — proportion of the account that's denominated in foreign currency.
**Sector volatility** — sector-specific volatility (banking and insurance carry higher inherent risk than utilities).
**Going-concern proximity** — distance from going-concern triggers (e.g. operating losses, debt-covenant proximity).
**Recent management turnover** — turnover among executives with influence over the account.
**Audit history** — VynFi's record of prior-engagement findings on this account at this entity.

Each factor is modelled as Beta(α, β) where α + β encodes the prior strength. Factors compose multiplicatively (with appropriate normalisation) to produce the per-account RMM score, also Beta-distributed, with its own credibility interval. The composition is a closed-form computation — no MCMC sampling, no variational inference, no GPU. That's deliberate: closed-form means sub-3s p95 re-scoring on a 100-account engagement, which is fast enough for interactive use.

Closed-form Beta posterior updates

The Bayesian update mechanic is conjugate-Beta-Bernoulli, the classic textbook case. A Beta(α, β) prior, observed data with k 'successes' in n 'trials', updates to Beta(α + k, β + n − k). For our purposes, 'success' is 'evidence that the factor is at higher risk than the prior assumed' — the specifics of how data points map to (k, n) are factor-dependent and are documented in the per-factor schema (which is exposed via the audit-methodology catalog at /audit-methodology/rmm/taxonomy).

The closed-form property is what makes interactive overrides feasible. When an auditor adjusts a prior — say, sets the prior on revenue-recognition complexity from Beta(2, 5) to Beta(8, 2) — the system doesn't need to re-run a sampler or re-train a model. It just updates the affected Beta parameters and re-composes. The full re-score on a 100-account engagement runs in sub-3s p95, which means partners can sit at the engagement-review meeting, override priors live, and watch the per-account deltas surface in real time:

TypeScript

// Pseudocode: the closed-form RMM update on a prior override.
function applyPriorOverride(
  engagement: Engagement,
  factor: RmmFactor,
  newPrior: BetaDistribution,
  override: { actor: User; rationale: string },
): RmmScoreDelta[] {
  // 1. Persist the override to the audit trail.
  engagement.auditTrail.append({
    type: "rmm_prior_override",
    factor,
    oldPrior: engagement.rmm.priors[factor],
    newPrior,
    actor: override.actor,
    rationale: override.rationale,
    timestamp: now(),
  });
  // 2. Re-compute affected accounts' posteriors (closed-form).
  const affectedAccounts = engagement.rmm.accountsUsing(factor);
  const oldScores = affectedAccounts.map(a => a.score);
  for (const account of affectedAccounts) {
    account.score = composeBeta(
      account.factors,
      { ...engagement.rmm.priors, [factor]: newPrior },
    );
  }
  // 3. Return the deltas for UI surfacing.
  return affectedAccounts.map((a, i) => ({
    accountId: a.id,
    oldScore: oldScores[i],
    newScore: a.score,
    delta: betaMeanDelta(oldScores[i], a.score),
  }));
}

The audit-trail append is the crucial part. An override that doesn't capture who, why, and when is a hidden bias. An override that does capture all three is exactly what a reviewer or regulator wants to see: the partner had professional judgment, applied it, documented the rationale, and the model re-scored under the documented assumptions. That's defensible.

Prior overrides as professional judgment encoding

Auditor professional judgment is not noise to be averaged out — it's signal that's hard to encode in factors alone. Consider the Q4 contract structure example from the FSM walkthrough in the companion post. The objective Bayesian model looked at the entity's Q4 revenue-recognition complexity factor based on observable data: more contracts than usual, mixed currency, three new customers. It scored the factor at 'medium'. The partner, who had spoken to the CFO and read the new master service agreement that landed in October, knew that one of those contracts was a 7-year licence with multiple performance obligations and significant financing component — an ASC 606-Step-5 minefield. The partner's assessment was 'high'. The model's data didn't see the contract terms; the partner did.

The right move here isn't 'fix the model so it can read contracts.' That's the wrong abstraction. The right move is 'let the partner override the prior, capture the rationale, re-score, and see the impact.' The partner overrides the revenue-recognition complexity prior to Beta(8, 2). The audit trail captures: actor=partner@firm, factor=revenue_recognition_complexity, oldPrior=Beta(3, 4), newPrior=Beta(8, 2), rationale='New 7-year licence with multiple POBs and SFC; ASC 606 Step 5 high-risk', timestamp=2026-01-15T14:32Z. The model re-scores in real time. The revenue account's RMM moves from medium to high; the testing strategy in the next FSM state (Fieldwork) reflects the updated posterior.

What we don't want is the partner thinking 'the model says medium, but I think high — and the audit trail is going to read whatever the model said.' That's an alignment failure: the official record diverges from the actual professional judgment, and any future review (engagement quality review, regulator inspection, litigation defence) is going to surface the divergence as a problem. We want the model to make professional judgment easy to encode, easy to apply, and easy to audit. The override mechanism is how that happens.

Interactive deltas: real-time response to overrides

The UI exposes the override flow as a per-factor adjuster: each factor has a slider that maps to its Beta prior strength. The partner moves the slider, the model re-composes the posterior, the affected accounts surface their deltas (old score → new score, with credibility-interval annotations), and the testing strategy panel updates to reflect the new substantive-procedure intensity. The whole loop is sub-3s p95.

Two design choices are worth noting. First, the affected-accounts panel is sorted by absolute delta — biggest movers first. Partners aren't interested in the 80 accounts where the override changed the score by less than the credibility interval; they're interested in the 4 accounts where the override pushed the score across a band boundary. Second, the override is staged: the partner sees the impact before committing. A common pattern is 'override → see impact → adjust → see impact → finalise', which is exactly the right shape for an interactive risk-assessment session.

A walkthrough: 100-account engagement with a partner override

Concrete walkthrough on a 100-account engagement (a mid-sized US listed retailer). Default scoring under VynFi's Bayesian RMM — no overrides applied yet — produces a distribution: 8 high-risk accounts, 23 medium-risk, 69 low-risk. The partner reviews this and overrides the revenue-recognition complexity prior on the basis of the Q4 contract (described above). The model re-scores. New distribution: 11 high-risk (3 new entrants from the override), 22 medium-risk (1 promotion to high, no demotions), 67 low-risk (2 promotions).

The audit trail captures the override and the resulting delta on every affected account. The UI surfaces the 4 promoted accounts (revenue, deferred revenue, contract assets, accrued revenue) at the top of the affected-accounts panel — each with its own pre/post score and credibility interval. The testing-strategy panel auto-flags additional substantive procedures for the promoted accounts (extended cut-off testing, contract-by-contract Step 5 walkthrough, deferred-revenue rollforward with significant-financing-component analysis). The Fieldwork state will pick up these procedures when the engagement transitions out of Risk Assessment.

End-to-end elapsed time from override to fully re-scored engagement: 2.4 seconds. The partner moves on to the next override question. Over a 90-minute risk-assessment review meeting, a typical engagement will have 8–15 prior overrides applied, each captured in the audit trail. By the end of the meeting, the engagement has a fully-calibrated RMM, all overrides are documented with rationale, and the testing strategy is set for Fieldwork.

What to expect on perf

Closed-form Beta updates are cheap. The bottleneck on a re-score isn't computation; it's I/O — fetching the engagement state, persisting the audit-trail append, surfacing the deltas to the UI. We architect for sub-3s p95 on 100-account engagements; in practice, p50 is closer to 1.2s, p95 around 2.4s, p99 around 3.8s. Engagements with 200+ accounts (very large groups) extend the p95 to 4.5s; we're working on a per-component sharding strategy that should bring those engagements back under 3s by the next Wave.

The perf characteristics matter because they shape the UX. Sub-3s feels interactive; partners will explore overrides freely. Above 5s starts to feel sluggish; partners will batch overrides, miss the real-time delta surfacing, and the 'override → see impact → adjust' loop breaks down. Closed-form Beta is what keeps us in the interactive regime for the engagement sizes that matter most (100 accounts is a typical mid-cap audit; 50 is a small private; 200 is a large group). For the very largest engagements (1000+ accounts in a multinational consolidation), some sluggishness is acceptable — those reviews are typically multi-session anyway.

If you want to see this in action on a sample engagement — your own data, or a synthetic 100-entity group from VynFi's Group Audit catalog — schedule a design partner call. Bringing your firm's RMM framework up against VynFi's Bayesian model on a real engagement is the fastest way to see where they agree (most of the time, encouragingly) and where they diverge (the cases that matter). Background reading: the Audit Firm landing page covers the full v3.0 surface, the Active audit methodology post walks through how RMM scoring fits into the engagement FSM, and the 12-factor RMM taxonomy is the public reference for the factors and their priors.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.