AMLbankingmoney launderingfraud detectionv2.3synthetic identitysanctionsDataSynth 3.1.1

Multi-Party AML Networks, Cross-Layer Fraud Propagation, and 14 Typologies

DataSynth 2.3 rebuilds the banking module from the ground up: synthetic identity, trade-based ML, crypto integration, sanctions evasion, real-estate integration, Barabási-Albert network topology, Payment ↔ BankTransaction bridging, velocity features, device fingerprints. Here is what changes for AML model training.

VynFi Team · EngineeringApril 12, 202613 min read

AML models get stuck in the same failure mode: they detect the typologies that appeared in training, and miss everything else. The cost of real SAR-labeled data makes iterating on coverage expensive, so teams reuse the same training sets for years. Models drift. Compliance officers notice.

DataSynth 2.3 is our answer. 14 fully-implemented money laundering typologies — each with sophistication-scaled behavior, ground-truth per-transaction labels, and a matching evaluator that proves the generator produces what it claims. A new network generator coordinates multiple synthetic customers into realistic criminal structures. Cross-layer coherence ties a fraudulent vendor payment through its document flow, its journal entry, and both sides of a mirrored bank transaction pair. Every enhancement ships with a validator.

**Post originally covered DataSynth 2.3 — as of 2026-04-19 we've shipped 3.1.1.** The key AML improvements since this post was written: network density up 38× (0.0014 → 0.053), `is_mule_link` / `is_shell_link` edges now populate from coordinated criminal structures (were always zero), typology coverage jumped from **0.000 → 0.857** (passing the ≥0.80 evaluator threshold), `Spoofing` variant added, `network_typology_rate` default raised 0.05 → 0.15. The regenerated VynFi/vynfi-aml-100k and VynFi/vynfi-sar-narratives datasets ship the 3.1.1 outputs directly.

14 Typologies, All Labeled

v2.2 shipped four core typologies. v2.3 adds ten more, bringing total coverage to 50% of the AmlTypology enum. Every suspicious transaction now carries a ground_truth_explanation field — a human-readable description of which typology produced it and which step in the typology it represents.

YAML

banking:
  typologies:
    structuring_rate: 0.004        # Sub-threshold deposits
    mule_rate: 0.005               # Recruiter → middleman → cashout
    synthetic_identity_rate: 0.001 # Seasoning → bust-out
    trade_based_ml_rate: 0.001     # Over-invoicing cycles
    crypto_integration_rate: 0.001 # Fiat → exchange → peel chain
    sanctions_evasion_rate: 0.0005 # Name variation + transshipment
    false_positive_rate: 0.05      # Legit-but-suspicious-looking
    co_occurrence_rate: 0.10       # Multi-typology cases
    network_typology_rate: 0.05    # Coordinated networks
    payment_bridge_rate: 0.75      # P2P payment → bank txn

Barabási-Albert Network Topology

The previous network generator produced hub-and-spoke structures — one coordinator plus N identical smurfs. Real criminal networks don't look like that. They follow power-law degree distributions: one or two central hubs plus a long tail of leaf nodes. That matters because graph-based AML detectors (community detection, centrality scoring, subgraph matching) need realistic topology to train against.

v2.3 replaces the simple generator with Barabási-Albert preferential attachment. Each new node attaches to existing nodes with probability proportional to their degree. The result: max degree typically 3-5× the mean, exactly the hub-plus-tail shape that shows up in real seized-transaction graphs.

Velocity Feature Pre-Computation

Velocity features — transaction counts and amount sums over rolling windows — are the foundation of every production AML scoring model. v2.3 computes them for you, on every transaction, as a post-generation pass:

Python

# Every transaction row now includes:
# txn_count_1h, txn_count_24h, txn_count_7d, txn_count_30d
# amount_sum_24h, amount_sum_7d, amount_sum_30d
# amount_max_24h, unique_counterparties_24h, unique_counterparties_7d
# unique_countries_7d, avg_amount_30d, std_amount_30d, amount_zscore
import pandas as pd
df = pd.read_parquet("transactions.parquet")
# Top-1% z-score = candidate anomalies
anomalies = df[df["amount_zscore"].abs() > 3.0]
print(f"{len(anomalies)} high-z-score transactions out of {len(df)}")

Cross-Layer Fraud Propagation

This is the change that matters most for enterprise audit training. In v2.2, if you ran the banking module and the accounting module together, they produced independent worlds. A fraudulent payment in the P2P document flow had no corresponding entry in the bank transaction table. You could not join on a business event.

v2.3 fixes that. The PaymentBridgeGenerator runs after the document flow completes and emits a mirrored BankTransaction on the enterprise's house bank account — with source_payment_id, source_invoice_id, journal_entry_id, and gl_cash_account populated. If the counterparty has a banking profile, a mirror transaction on the counterparty side also appears. Fraud labels propagate automatically: a Payment with is_fraud=true and fraud_type=DuplicatePayment surfaces on the bank transaction as is_suspicious=true with suspicion_reason=FirstPartyFraud.

The CrossLayerCoherenceAnalyzer enforces ≥95% fraud-propagation rate by default and will fail generation if that drops. You get a single cohesive case: document flow, GL, bank txn, OCPM event log, all linked by reference.

Evaluators, Not Just Generators

Every new generator in v2.3 ships with a matching evaluator. BankingEvaluation now includes ten sub-analyses:

KYC analyzer — verifies customer-type, risk-tier, and persona distributions
AML analyzer — validates typology mix and sophistication distribution
CrossLayerCoherenceAnalyzer — Payment↔BankTxn referential integrity + fraud rate
VelocityQualityAnalyzer — window ordering invariants (1h ≤ 24h ≤ 7d ≤ 30d)
FalsePositiveAnalyzer — rate bounds and mutual exclusivity with is_suspicious
DeviceFingerprintAnalyzer — power-law distribution, single-device dominance
SanctionsScreeningAnalyzer — low-risk Clear rate, PEP name-variation coverage
SophisticationAnalyzer — typology-specific sophistication skews
LifecycleAnalyzer — phase diversity, stuck-in-New detection
NetworkStructureAnalyzer — power-law topology verification (hub ratio ≥2.5×)

You can wire these into /dashboard/quality to prove that the dataset you're about to train on actually has the properties you need. No more trusting the generator blindly.

What to Do Next

If you're already on VynFi: the banking panel in the Generate wizard now exposes all 14 typologies, network topology selection, velocity/device toggles, and the cross-layer bridge. Typology rates are available on Developer+; networks, velocity, and devices on Team+; cross-layer bridge and account lifecycle phases on Scale+.

If you're building an AML detector from scratch, start with our AML Compliance Testing tutorial. Then enable the network generator and re-run your evaluation — we have seen recall on coordinated-network detection tasks jump from 0.12 to 0.64 on the same model, just from switching from the old hub-spoke generator to Barabási-Albert topology.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.