Causal Financial Data for ML Training
Generate synthetic financial data from a known causal graph — 23 nodes, 25 edges, 6 categories. Train SCMs, validate causal discovery, and run intervention analysis with ground truth.
Correlation Is Not Causation in Financial Data
ML models trained on observational data learn spurious associations that fail under intervention
Correlation Is Not Causation
Standard synthetic data preserves correlations but not causal mechanisms. Models trained on correlated features learn spurious associations that fail under distribution shift.
No Intervention Support
You cannot do-calculus on data that lacks a structural causal model. Without explicit causal structure, there is no principled way to estimate the effect of an intervention.
Confounding Everywhere
Financial data is riddled with confounders — macro conditions that simultaneously affect revenue, credit risk, and operational metrics. Observational data alone cannot disentangle these effects.
SCM Training Requires Ground Truth
Structural causal model algorithms need data generated from known causal graphs for validation. Real financial data comes without the true DAG, making it impossible to measure causal discovery accuracy.
The Financial Process DAG
23 nodes, 25 directed edges across 6 causal categories
Macro Environment
Interest rates, FX rates, credit spreads. Exogenous variables that drive downstream financial processes.
Revenue Chain
Sales pipeline through revenue recognition. Affected by macro conditions and customer credit quality.
Procurement Chain
Purchase requisition through payment. Supply chain costs propagate from FX and commodity prices.
Treasury Chain
Cash management, FX hedging, and settlement. Directly driven by macro rates and transaction volumes.
Credit Risk Chain
Counterparty assessment through provisioning. Credit spreads and revenue performance feed into loss estimates.
Control Chain
Authorization, segregation, and reconciliation. Control effectiveness modulates error rates across all other chains.
Use Cases
From causal discovery benchmarking to production intervention analysis
SCM Training and Validation
Train structural causal models on data from a known DAG. Measure discovery accuracy (SHD, SID) against ground truth. Benchmark algorithms like PC, GES, NOTEARS, and DAG-GNN on financial graph structures.
Counterfactual Inference
Generate matched factual/counterfactual pairs. Answer questions like 'what would this portfolio's loss have been if interest rates had risen 200bp instead of falling?' with structurally valid synthetic data.
Intervention Analysis
Estimate causal effects of policy changes — pricing adjustments, credit limit modifications, hedging strategy shifts. Data respects do-calculus semantics: interventions break incoming edges in the DAG.
Sensitivity and Robustness
Systematically vary edge strengths in the causal graph to test model robustness. Identify which causal pathways your downstream models are most sensitive to, and where they fail under structural changes.
Python SDK Example
import vynfi
client = vynfi.Client(api_key="vf_live_...")
# Generate causal financial data with known DAG
job = client.generate(
sector="banking",
module="causal",
dag="financial_process_v3", # 23 nodes, 25 edges
rows=50_000,
include_interventions=True,
interventions=[
{"node": "interest_rate", "do": "+150bp"},
{"node": "credit_spread", "do": "+100bp"},
],
output_format="parquet",
)
# Get the ground truth DAG
dag = job.causal_graph()
print(f"Nodes: {dag.n_nodes}, Edges: {dag.n_edges}")
# Generate matched counterfactual pairs
pairs = job.counterfactual_pairs(
treatment="interest_rate",
outcome="provisions",
)
print(f"ATE estimate: {pairs.ate():.4f}")From correlation to causation
Generate causal financial data with known ground truth for SCM training and validation. 5,000 free credits to start.