Neural Synthetic Financial Data
Score networks, tabular transformers, and GNN edge predictors — combined into a neural backend that captures the joint distributions rule-based engines miss.
Why Rule-Based Engines Miss Joint Distributions
Financial data has complex multi-way dependencies that statistical methods approximate poorly
Rule-Based Limits
Statistical engines generate columns independently or with pairwise correlations. They miss the joint distributions that make financial data realistic — the 5-way dependencies between amount, timing, counterparty, currency, and approval chain.
Distribution Mismatch
Copula-based methods assume parametric marginals. Real financial data has heavy tails, regime switches, and conditional heteroskedasticity that parametric families can't capture.
Missing Graph Structure
Financial transactions form networks — entity-to-entity edges with temporal ordering. Tabular generators produce rows in isolation, destroying the relational structure ML models need.
Temporal Incoherence
Sequential financial events must respect ordering constraints, business calendars, and settlement cycles. Row-level shuffling breaks the temporal fabric that time-series models depend on.
Neural Architecture
Three specialized networks, each handling a different aspect of financial data structure
Score Networks
Diffusion-based score matching for continuous financial attributes. Learns the gradient of the log-density, enabling exact sampling from complex multimodal distributions without mode collapse.
Trained on anonymized financial distributions, fine-tuned per sector pack
Tabular Transformers
Attention-based architecture for mixed-type tabular data. Handles the categorical-continuous interactions that define financial records — entity types, currency codes, approval hierarchies alongside amounts and dates.
Self-attention across columns captures feature interactions at any order
GNN Edge Predictors
Graph neural networks that generate realistic transaction networks. Predicts edge existence and attributes between entities, preserving degree distributions, community structure, and temporal motifs.
Message-passing architecture respects entity relationship constraints
Three Generation Modes
Choose your speed-fidelity tradeoff with a single parameter
Statistical
DefaultPure DataSynth rule engine. Fastest generation (100K+ rows/sec), deterministic output, full reproducibility. Best for high-volume testing where speed matters more than distributional fidelity.
- 100K+ rows/sec
- Deterministic
- 1 credit/1K rows
Neural
New in 3.0Full neural backend. Score networks for continuous columns, tabular transformers for mixed types, GNN for entity graphs. Highest distributional fidelity, captures joint distributions and tail behavior.
- 5K rows/sec
- Stochastic
- 8 credits/1K rows
Hybrid
RecommendedConfigurable blend weight between statistical and neural. The rule engine handles structure and constraints while neural components refine distributions. Balance speed and fidelity per use case.
- 10K-50K rows/sec
- Blend weight 0.0-1.0
- 2-6 credits/1K rows
Python SDK Example
import vynfi
client = vynfi.Client(api_key="vf_live_...")
# Generate with neural backend (hybrid mode)
job = client.generate(
sector="banking",
module="transactions",
rows=25_000,
backend="hybrid",
blend_weight=0.7, # 70% neural, 30% statistical
neural_config={
"score_network": True,
"tabular_transformer": True,
"gnn_edges": True,
"temperature": 0.9,
},
output_format="parquet",
)
# Verify distributional fidelity
stats = job.quality_report()
print(f"KL divergence: {stats['kl_divergence']:.4f}")
print(f"Joint distribution score: {stats['joint_score']:.3f}")
print(f"Graph structure score: {stats['graph_score']:.3f}")Generate the distributions your models actually need.
5,000 free credits to start. No credit card. Ground truth by construction.