Methodology

How VynFi generates statistically faithful synthetic financial data

The DataSynth Engine

VynFi is powered by DataSynth, a Rust engine purpose-built for high-throughput synthetic financial data generation.

16 crates

Rust crates

100K+ rows/sec

Throughput

Rust

Language

Proprietary

License

Architecture Layers

API Layer

Axum HTTP server with rate limiting and auth middleware

Orchestration

Job queue management, credit estimation, and webhook dispatch

Generation Core

Schema resolution, distribution sampling, correlation injection

Output Layer

Format serialization (JSON, CSV, Parquet) and compression

Generation Pipeline

Every generation request flows through a 5-step pipeline that transforms schema definitions into statistically faithful datasets.

Schema Selection

Resolves the target sector and table definitions. Loads column schemas, data types, and constraint rules from the catalog registry.

Distribution Sampling

Generates base values using statistical distributions calibrated to real-world financial data. Supports normal, log-normal, Poisson, and custom empirical distributions.

Correlation Injection

Applies cross-column and cross-table correlations. Ensures debits balance credits, foreign keys resolve correctly, and temporal sequences are coherent.

Anomaly Insertion

Optionally injects realistic anomalies (duplicate entries, round-number bias, off-hours transactions) with configurable frequency and labels.

Validation

Runs Benford compliance checks, referential integrity validation, and statistical quality scoring before returning the final dataset.

Statistical Models

The generation core uses several statistical models to produce realistic financial data. Click a model to learn more.

Real-World Calibration

VynFi's distributions and statistical parameters are derived from extensive analysis of real-world financial data.

155

Real-World Datasets

Analyzed across 10 industry sectors for distribution calibration and statistical benchmarking

364M

Journal Entries

In the calibration corpus used to derive realistic financial patterns and temporal dynamics

2.4B

Line Items

Processed to build inter-table correlation models and cross-entity relationship graphs

Copula Families

VynFi uses 5 copula families to model complex dependencies between financial variables. Each family captures different tail dependency and correlation structures.

Copula Family	Tail Dependency	Use Case
Gaussian	None (symmetric)	General-purpose modeling of smooth, symmetric correlations between financial variables
Clayton	Lower tail	Correlated loss events and downside risk scenarios where defaults tend to cluster
Gumbel	Upper tail	Extreme revenue spikes and co-movement in high-value transactions
Frank	None (symmetric)	Weak to moderate dependencies without tail concentration; balanced risk profiles
Student-t	Both tails	Fat-tailed financial distributions with joint extreme events in stress-testing scenarios

Coherence Validators

32+ coherence validators run on every generated dataset to ensure cross-table consistency and referential integrity.

#	Validator	Description
01	Debit-Credit Balance	Every journal entry sums to zero across debit and credit legs
02	Trial Balance	Total debits equal total credits across the general ledger
03	Foreign Key Integrity	All references resolve to valid records in related tables
04	Temporal Ordering	Document dates follow logical sequence (PO before GR before Invoice)
05	Period Boundaries	Entries fall within valid fiscal periods and calendar constraints
06	Currency Consistency	FX amounts reconcile with exchange rates and base currency
07	Account Hierarchy	Posted accounts exist in the chart of accounts and follow hierarchy rules
08	Subledger Reconciliation	AR/AP/FA/INV subledger totals match corresponding GL control accounts
09	Document Numbering	Sequential document IDs with no gaps or duplicates within each series
10	Tax Calculation	Tax amounts match rate schedules for the jurisdiction and line items
11	Intercompany Elimination	IC transaction pairs balance and eliminate correctly in consolidation
12	Aging Consistency	Receivable/payable aging buckets sum to outstanding balances
13	Quantity-Value Match	Inventory quantities times unit costs equal total values
14	Approval Chain	Transactions above threshold have the required authorization records
15	Entity Cross-Reference	Multi-entity datasets maintain consistent entity identifiers across all tables
16	FG Rollforward	Opening + completions - COGS - scrap = closing finished goods balance
17	WIP Rollforward	Material + labor + overhead - transfers = WIP balance
18	Standard Cost Variance Reconciliation	Price, quantity, and efficiency variances reconcile to total production cost variance
19	Cash Flow Reconciliation	Opening + flows = closing cash and cash equivalents balance
20	Equity Rollforward	Opening + net income + OCI - dividends = closing equity balance
21	Segment-to-Consolidated Reconciliation	Segment revenues and profits sum to consolidated totals with eliminations
22	IC Document Chain Completeness	Intercompany document chain is complete: PO → GR → Vendor Invoice → Customer Invoice
23	Warranty Provision Reconciliation	Opening provision + additions - utilizations - reversals = closing warranty provision
24	Hedge Effectiveness	Hedging instrument gain/loss falls within the 80–125% effectiveness corridor
25	ETR Reconciliation	Statutory rate applied to pre-tax income reconciles to actual income tax expense (effective tax rate)
26	Payroll-HR Reconciliation	Payroll totals reconcile to HR headcount and approved salary records
27	Interest Expense Proof	Accrued interest matches outstanding debt schedule balances and applicable rates
28	Dividend Payment Proof	Declaration → record date → payment flow is complete and amounts agree
29	XBRL Taxonomy Mapping Validation	All tagged facts map to valid XBRL taxonomy elements with correct context and unit references
30	Manufacturing Cost Flow Balance	WIP + FG + COGS + Scrap = Total manufacturing cost input for the period
31	Multi-Period Balance Carry-Forward	Closing balances of each period match opening balances of the subsequent period
32	Covenant Compliance Check	Key financial ratios (leverage, coverage, liquidity) remain within simulated debt covenant thresholds

Datasets that fail any validator are automatically rejected and regenerated. Validation results are included in the quality report for every job.

Benford's Law Compliance

VynFi achieves excellent Benford's Law conformity across all monetary fields, a critical quality metric for financial data realism.

Test Results

Mean Absolute Deviation (MAD)< 0.006

Nigrini ClassificationExcellent Conformity

Chi-Squared TestPass (p > 0.05)

How It Works

The engine calibrates first-digit frequencies to match the expected distribution p(d) = log10(1 + 1/d). During validation, MAD is computed as the average absolute difference between observed and expected first-digit proportions. A MAD below 0.006 is classified as "close conformity" per Nigrini's threshold table. VynFi consistently achieves this benchmark across all sectors and monetary value columns.

Fingerprint System

VynFi Fingerprints capture the statistical DNA of a real dataset without storing any actual records. Upload your data to create a .dsf fingerprint, then use it to generate unlimited synthetic data that matches your production distributions.

Fingerprint Details

Format	.dsf (DataSynth Fingerprint)
Structure	ZIP archive containing schema, distributions, and correlation matrices
Encryption	AES-256-GCM with per-fingerprint key wrapping
Licensing	Fingerprints are licensed per-organization with usage metering

Quality Evaluation

Every generated dataset is scored across three dimensions to ensure it meets production-grade quality standards.

Statistical

Fidelity

Measures how closely the synthetic data mirrors the statistical properties of real-world financial data. Evaluated using KS tests, Wasserstein distance, and correlation matrix similarity.

Functional

Utility

Assesses whether the synthetic data produces equivalent results when used for downstream tasks (model training, analytics, testing). Measured via train-on-synthetic/test-on-real benchmarks.

Security

Privacy

Verifies that no individual record in the synthetic dataset can be linked back to a real entity. Uses membership inference attacks and nearest-neighbor distance ratios as privacy guarantees.

Audit Blueprint Architecture

VynFi includes 9 industry-standard audit blueprints built on DataSynth's engagement execution engine. Each blueprint encodes audit procedures, step sequencing, standards mappings, and evidence templates — enabling end-to-end simulation of real audit engagements. View all blueprints.

Audit blueprints

4,437

Analytics form field categories

757

Steps (EY GAM)

Judgment levels

Big 4 Methodology Integration

Blueprints incorporate procedures from leading audit frameworks: KPMG Clara (728 steps), PwC Aura (729 steps), Deloitte Omnia (733 steps), and EY GAM (757 steps).

Judgment Level Classification

Every procedure step is tagged with one of three judgment levels: data_only ai_assistable human_required— enabling AI-augmentation workflow design.

Year-over-Year Engagement Chains

Multi-period audit simulations chain engagement outputs across fiscal years, carrying forward prior-year findings, open items, and management responses to create realistic longitudinal datasets.

Analytics Integration

Blueprints include 4,437 canonical form field categories covering data analytics inventories, enabling direct mapping from generated data to audit analytics tool schemas.

Credit Formula

Credits consumed per request are calculated deterministically so you always know the cost before generating.

Formula

credits = rows x base_rate x sector_mult x label_mult

Base Rates

Data Type	Rate	Unit
Journal entries	1 credit	per row
Chart of accounts	0.5 credits	per account
Master data	1 credit	per record
Document flow chain	5 credits	per chain
Intercompany matched pairs	8 credits	per pair
Full P2P cycle	10 credits	per cycle
Banking/KYC profile	3 credits	per customer
OCEL 2.0 event log	2 credits	per event
Audit workpaper package	15 credits	per engagement

Worked Example

Generate 10,000 journal entries for a curated banking sector pack with anomaly labels:

text

rows          = 10,000
base_rate     = 1 credit/row      (journal entries)
sector_mult   = 1.5x              (curated sector pack)
label_mult    = 1.3x              (anomaly labels)
credits = 10,000 × 1 × 1.5 × 1.3
        = 19,500 credits

Research

The science behind synthetic financial data generation.

Credit System

Understand the credit formula, base rates, and multipliers.

Glossary

Definitions of key terms and concepts used in VynFi documentation.