DataSynth 3.0

Neural Synthetic Financial Data

Score networks, tabular transformers, and GNN edge predictors — combined into a neural backend that captures the joint distributions rule-based engines miss.

Why Rule-Based Engines Miss Joint Distributions

Financial data has complex multi-way dependencies that statistical methods approximate poorly

Rule-Based Limits

Statistical engines generate columns independently or with pairwise correlations. They miss the joint distributions that make financial data realistic — the 5-way dependencies between amount, timing, counterparty, currency, and approval chain.

Distribution Mismatch

Copula-based methods assume parametric marginals. Real financial data has heavy tails, regime switches, and conditional heteroskedasticity that parametric families can't capture.

Missing Graph Structure

Financial transactions form networks — entity-to-entity edges with temporal ordering. Tabular generators produce rows in isolation, destroying the relational structure ML models need.

Temporal Incoherence

Sequential financial events must respect ordering constraints, business calendars, and settlement cycles. Row-level shuffling breaks the temporal fabric that time-series models depend on.

Neural Architecture

Three specialized networks, each handling a different aspect of financial data structure

Score Networks

Diffusion-based score matching for continuous financial attributes. Learns the gradient of the log-density, enabling exact sampling from complex multimodal distributions without mode collapse.

Trained on anonymized financial distributions, fine-tuned per sector pack

Tabular Transformers

Attention-based architecture for mixed-type tabular data. Handles the categorical-continuous interactions that define financial records — entity types, currency codes, approval hierarchies alongside amounts and dates.

Self-attention across columns captures feature interactions at any order

GNN Edge Predictors

Graph neural networks that generate realistic transaction networks. Predicts edge existence and attributes between entities, preserving degree distributions, community structure, and temporal motifs.

Message-passing architecture respects entity relationship constraints

Three Generation Modes

Choose your speed-fidelity tradeoff with a single parameter

Statistical

Default

Pure DataSynth rule engine. Fastest generation (100K+ rows/sec), deterministic output, full reproducibility. Best for high-volume testing where speed matters more than distributional fidelity.

100K+ rows/sec
Deterministic
1 credit/1K rows

Neural

New in 3.0

Full neural backend. Score networks for continuous columns, tabular transformers for mixed types, GNN for entity graphs. Highest distributional fidelity, captures joint distributions and tail behavior.

5K rows/sec
Stochastic
8 credits/1K rows

Hybrid

Recommended

Configurable blend weight between statistical and neural. The rule engine handles structure and constraints while neural components refine distributions. Balance speed and fidelity per use case.

10K-50K rows/sec
Blend weight 0.0-1.0
2-6 credits/1K rows

Python SDK Example

import vynfi

client = vynfi.Client(api_key="vf_live_...")

# Generate with neural backend (hybrid mode)
job = client.generate(
    sector="banking",
    module="transactions",
    rows=25_000,
    backend="hybrid",
    blend_weight=0.7,  # 70% neural, 30% statistical
    neural_config={
        "score_network": True,
        "tabular_transformer": True,
        "gnn_edges": True,
        "temperature": 0.9,
    },
    output_format="parquet",
)

# Verify distributional fidelity
stats = job.quality_report()
print(f"KL divergence: {stats['kl_divergence']:.4f}")
print(f"Joint distribution score: {stats['joint_score']:.3f}")
print(f"Graph structure score: {stats['graph_score']:.3f}")

Neural · Beyond rules

Generate the distributions your models actually need.

5,000 free credits to start. No credit card. Ground truth by construction.

Generate a reference View pricing