Journal Entry Forensics: Benford's Law, Anomaly Detection, and Pre-Built Analytics
Use VynFi's pre-built analytics API to validate Benford's Law conformity, inspect amount distributions, and assess process variant entropy — without computing anything client-side.
Every audit engagement starts with journal entry testing. ISA 240 requires the auditor to test for management override of controls — and the first-digit distribution (Benford's Law) is the canonical screening test. If the leading digits of transaction amounts deviate from the expected logarithmic distribution, something may be off: round-number bias, duplicate entries, or deliberate manipulation.
DataSynth 3.1.1 computes Benford's Law conformity, amount distribution statistics, process variant summaries (with rework / skip / out-of-order rates), and banking evaluation metrics as part of every generation run. The results land as pre-built JSON files in the archive. VynFi's analytics API merges them into a single response — no client-side computation needed.
**Update (2026-04-19):** As of DataSynth 3.1.1, fraud-labeled entries carry behavioural signal that real forensic pipelines can actually detect. Weekend-posting lift jumps from ~1× to ~32×, round-dollar lift from ~0× to ~170×, and post-close lift reaches ~3,106× on fraud-marked JEs. Scheme-propagated fraud (document-seeded rings) is now distinguishable from direct line-level injections via `is_fraud_propagated` on every JE header — see the new fraud-split endpoint below.
Fetching Pre-Built Analytics
import osimport vynficlient = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])job = client.jobs.list(status="completed", limit=1).data[0]analytics = client.jobs.analytics(job.id)# Benford's Lawb = analytics.benford_analysisprint(f"Benford's Law Analysis ({b.sample_size:,} amounts):")print(f" MAD: {b.mad:.4f}")print(f" Chi-squared: {b.chi_squared:.2f} (p={b.p_value:.4f})")print(f" Conformity: {b.conformity}")print(f" Passes: {b.passes}")Interpreting the Results
The Mean Absolute Deviation (MAD) measures how far the observed first-digit distribution is from the theoretical Benford distribution. Nigrini's thresholds: MAD < 0.006 = close conformity, 0.006-0.012 = acceptable, 0.012-0.015 = marginal, > 0.015 = nonconformity. VynFi's synthetic data typically falls in the close-conformity range because the underlying generators use log-normal amount distributions calibrated from real financial data.
Amount Distribution Statistics
d = analytics.amount_distributionprint(f"Amount Distribution ({d.sample_size:,} amounts):")print(f" Mean: {d.mean}")print(f" Median: {d.median}")print(f" Skewness: {d.skewness:+.3f}")print(f" Kurtosis: {d.kurtosis:+.3f}")print(f" Round number ratio: {d.round_number_ratio:.2%}")if d.fitted_mu is not None: print(f" Log-normal fit: mu={d.fitted_mu:.2f}, sigma={d.fitted_sigma:.2f}")Positive skewness with high kurtosis is the hallmark of real financial data: many small transactions and a long tail of large ones. If your synthetic data has skewness near zero, it's too uniform for realistic audit testing. The round-number ratio flags how many amounts end in 000 — a useful red flag for manual journal entries.
Process Variant Analysis
v = analytics.process_variant_summaryprint(f"Process Variants ({v.total_cases:,} cases):")print(f" Variant count: {v.variant_count}")print(f" Entropy: {v.variant_entropy:.3f}")print(f" Happy-path share: {v.happy_path_concentration:.2%}")print(f" Top variants:")for vid, freq in v.top_variants[:5]: print(f" {vid}: {freq:.2%}")High variant entropy means the process has many execution paths — typical of complex P2P flows with rework, returns, and partial deliveries. Low entropy with high happy-path concentration means the process is well-controlled. Auditors look for the gap: if a process should be controlled but has high entropy, that's a risk indicator.
Banking Evaluation (AML Jobs)
if analytics.banking_evaluation: be = analytics.banking_evaluation print(f"Banking Evaluation (passes={be.passes}):") if be.cross_layer: print(f" Fraud propagation: {be.cross_layer.fraud_propagation_rate:.2%}") if be.velocity: print(f" Velocity coverage: {be.velocity.coverage_rate:.2%}") if be.false_positive: print(f" FP rate: {be.false_positive.fp_rate:.2%}")For banking/AML jobs, the analytics response includes 10 sub-analyses covering KYC completeness, typology mix, cross-layer fraud propagation, velocity feature quality, false-positive calibration, device fingerprint distributions, sanctions screening, sophistication diversity, lifecycle phase coverage, and network topology structure. Each sub-analysis reports whether the dataset passes its quality gate — so you know before training whether the data has the properties you need. AML typology coverage reaches **0.857** in DataSynth 3.1.1, comfortably above the 0.80 evaluator threshold (was 0.000 in 3.1.0).
Scheme vs line-level fraud split (DS 3.1.1)
Document-level fraud fans out to every derived journal entry when `fraud.documentFraudRate` is set and `propagate_to_lines` is on. The resulting JEs carry `is_fraud_propagated = true` and `fraud_source_document_id`. This lets you train two detector classes on the same dataset: a cross-document scheme detector on the propagated population and a noise-robust slip-level detector on the direct-injection population. The new endpoint aggregates the split server-side.
# VynFi Python SDK 1.5.1+split = client.jobs.fraud_split(job.id)print(f"Total fraud JEs: {split.fraud_entries:,}")print(f"Scheme-propagated: {split.scheme_propagated:,} ({split.propagation_rate:.1%})")print(f"Direct injection: {split.direct_injection:,}")for fraud_type, counts in split.by_fraud_type.items(): print(f" {fraud_type:30} total={counts.total:5} scheme={counts.scheme_propagated:4} direct={counts.direct_injection:4}")Worked examples and the regenerated dataset
The VynFi Python SDK 1.5.1 ships three worked examples that exercise this post end-to-end: `examples/ml_training_pipeline.py` (fraud-split stratification), `examples/behavioral_fraud_patterns.py` (weekend / round-dollar / post-close lift verification), and `examples/02_audit_data_deep_dive.ipynb` (interactive Benford + variant notebook). Regenerated journal-entry datasets are published on Hugging Face: VynFi/vynfi-journal-entries-1m (2.1M lines, manufacturing, 12 periods) and VynFi/vynfi-audit-p2p (document-flow fraud with `is_fraud_propagated`).