DataSynth 3.0

Causal Financial Data for ML Training

Generate synthetic financial data from a known causal graph — 23 nodes, 25 edges, 6 categories. Train SCMs, validate causal discovery, and run intervention analysis with ground truth.

Correlation Is Not Causation in Financial Data

ML models trained on observational data learn spurious associations that fail under intervention

Correlation Is Not Causation

Standard synthetic data preserves correlations but not causal mechanisms. Models trained on correlated features learn spurious associations that fail under distribution shift.

No Intervention Support

You cannot do-calculus on data that lacks a structural causal model. Without explicit causal structure, there is no principled way to estimate the effect of an intervention.

Confounding Everywhere

Financial data is riddled with confounders — macro conditions that simultaneously affect revenue, credit risk, and operational metrics. Observational data alone cannot disentangle these effects.

SCM Training Requires Ground Truth

Structural causal model algorithms need data generated from known causal graphs for validation. Real financial data comes without the true DAG, making it impossible to measure causal discovery accuracy.

The Financial Process DAG

23 nodes, 25 directed edges across 6 causal categories

Macro Environment

3 nodes4 edges

Interest rates, FX rates, credit spreads. Exogenous variables that drive downstream financial processes.

Fed funds rateEUR/USDIG credit spread

Revenue Chain

4 nodes5 edges

Sales pipeline through revenue recognition. Affected by macro conditions and customer credit quality.

Sales ordersInvoicingRevenue recognitionCash collection

Procurement Chain

4 nodes4 edges

Purchase requisition through payment. Supply chain costs propagate from FX and commodity prices.

RequisitionPO approvalGoods receiptInvoice matching

Treasury Chain

4 nodes4 edges

Cash management, FX hedging, and settlement. Directly driven by macro rates and transaction volumes.

Cash positionFX exposureHedge executionSettlement

Credit Risk Chain

4 nodes4 edges

Counterparty assessment through provisioning. Credit spreads and revenue performance feed into loss estimates.

Rating assessmentExposure calcProvisionWrite-off

Control Chain

4 nodes4 edges

Authorization, segregation, and reconciliation. Control effectiveness modulates error rates across all other chains.

AuthorizationSoD checksReconciliationReporting

Use Cases

From causal discovery benchmarking to production intervention analysis

SCM Training and Validation

Train structural causal models on data from a known DAG. Measure discovery accuracy (SHD, SID) against ground truth. Benchmark algorithms like PC, GES, NOTEARS, and DAG-GNN on financial graph structures.

Counterfactual Inference

Generate matched factual/counterfactual pairs. Answer questions like 'what would this portfolio's loss have been if interest rates had risen 200bp instead of falling?' with structurally valid synthetic data.

Intervention Analysis

Estimate causal effects of policy changes — pricing adjustments, credit limit modifications, hedging strategy shifts. Data respects do-calculus semantics: interventions break incoming edges in the DAG.

Sensitivity and Robustness

Systematically vary edge strengths in the causal graph to test model robustness. Identify which causal pathways your downstream models are most sensitive to, and where they fail under structural changes.

Python SDK Example

import vynfi

client = vynfi.Client(api_key="vf_live_...")

# Generate causal financial data with known DAG
job = client.generate(
    sector="banking",
    module="causal",
    dag="financial_process_v3",  # 23 nodes, 25 edges
    rows=50_000,
    include_interventions=True,
    interventions=[
        {"node": "interest_rate", "do": "+150bp"},
        {"node": "credit_spread", "do": "+100bp"},
    ],
    output_format="parquet",
)

# Get the ground truth DAG
dag = job.causal_graph()
print(f"Nodes: {dag.n_nodes}, Edges: {dag.n_edges}")

# Generate matched counterfactual pairs
pairs = job.counterfactual_pairs(
    treatment="interest_rate",
    outcome="provisions",
)
print(f"ATE estimate: {pairs.ate():.4f}")

From correlation to causation

Generate causal financial data with known ground truth for SCM training and validation. 5,000 free credits to start.

View Pricing