privacydifferential-privacydata-governance

Privacy-Preserving Data Sharing with Differential Privacy Fingerprints

How VynFi enables cross-firm analytics without data exposure using epsilon-differential privacy fingerprints that separate the privacy boundary from data generation.

VynFi Research · Founder & Lead ResearcherApril 9, 20267 min read

Financial institutions want to collaborate on analytics. Banks want to benchmark fraud patterns. Audit firms want to share anomaly profiles. Insurance companies want to compare claims distributions. But they cannot share data. Client confidentiality, regulatory restrictions, and competitive concerns make direct data sharing impossible in almost every scenario.

The standard workaround is anonymization: strip names, mask account numbers, generalize dates. But anonymization is fragile. Research has repeatedly shown that anonymized financial data can be re-identified, especially when combined with auxiliary information. And even if re-identification risk were zero, many firms are contractually prohibited from sharing any derivative of client data, anonymized or not.

This post describes the fingerprint module from "DataSynth: Reference Knowledge Graphs for Enterprise Audit Analytics through Synthetic Data Generation with Provable Statistical Properties" by the VynFi research team (April 2026, under peer review).

The Data Sharing Dilemma

The tension is real and well-documented. A fraud detection model trained on data from one bank will not generalize to another bank's transaction patterns. An audit analytics tool calibrated against one firm's engagements may be poorly calibrated for a different firm's client base. Industry benchmarks require cross-firm data that no single firm can provide.

Federated learning offers a partial solution (share model gradients instead of data), but it is operationally complex, requires all participants to run compatible infrastructure, and recent research shows that gradients can leak information about training data. What is needed is something simpler: a way to capture the statistical essence of a dataset without retaining any individual records.

Statistical Fingerprints Under Differential Privacy

VynFi's fingerprint module solves this problem with a clean two-step architecture. First, it extracts a statistical fingerprint from a source dataset. Second, it uses that fingerprint to generate synthetic data that matches the source's statistical properties without ever seeing the original records.

The fingerprint itself is a compact representation of the dataset's statistical properties: marginal distributions for each column, correlation structures between columns, temporal patterns, and anomaly frequency profiles. Critically, the fingerprint is extracted under epsilon-differential privacy (DP), which provides a mathematical guarantee about how much information any individual record contributes to the fingerprint.

How Differential Privacy Works Here

Differential privacy adds carefully calibrated noise to each statistical measurement during fingerprint extraction. The noise is calibrated to the sensitivity of each measurement (how much a single record could change the result) and the privacy parameter epsilon. A smaller epsilon means more noise and stronger privacy. A larger epsilon means less noise and higher fidelity.

VynFi supports four privacy levels, from standard (epsilon = 1.0, suitable for most use cases) to maximum (epsilon = 0.01, for environments requiring the strongest formal guarantees). The privacy guarantee is composable: if you extract multiple fingerprints from the same dataset, the total privacy loss is bounded by the sum of individual epsilons.

The Clean Privacy Boundary

The architectural insight is the separation of the privacy boundary from the generation process. The fingerprint crosses the privacy boundary. The source data does not. Once a fingerprint has been extracted and the source data is no longer needed, the fingerprint can be shared, stored, and used for generation without any reference to the original records.

This separation has practical consequences:

No data leaves the source environment. The fingerprint extraction runs where the data lives. Only the DP-protected fingerprint is exported.
The generation engine never sees real data. It takes a fingerprint as input and produces synthetic data as output. There is no pathway from real records to generated records.
Fingerprints can be versioned and shared. A firm can extract a fingerprint annually and share it with partners who use it to generate calibrated synthetic data for benchmarking.
The privacy guarantee is verifiable. Because epsilon is a fixed parameter of the extraction, the privacy properties of any fingerprint are auditable.

Enabling Cross-Firm Analytics

Consider a concrete scenario. Three regional banks want to jointly train a fraud detection model, but none can share transaction data. With VynFi's fingerprint module, each bank extracts a DP-protected fingerprint from its transaction data. The fingerprints are shared with a central coordinator (or each bank generates from the other banks' fingerprints). Synthetic data matching each bank's statistical profile is generated and used for model training.

The result is a model trained on data that reflects the diversity of all three banks' transaction patterns, without any bank seeing another bank's data or even a non-private summary of it. The differential privacy guarantee ensures that no individual transaction from any bank can be inferred from the fingerprints or the generated data.

Use Cases for Privacy-Preserving Fingerprints

Cross-bank fraud benchmarking: Generate synthetic data matching each participating bank's statistical profile. Train and evaluate shared models without exposing proprietary data.
Audit firm tool evaluation: Extract fingerprints from engagement data. Share fingerprints with audit analytics vendors who generate representative test data without seeing client records.
Regulatory stress testing: Regulators can collect DP-protected fingerprints from regulated entities and generate synthetic datasets for systemic risk analysis.
Academic research partnerships: Firms share fingerprints with university researchers who generate data for published studies. The research is reproducible and the source data is protected.
Internal data democratization: Sensitive datasets can be fingerprinted and the fingerprints used to generate non-sensitive synthetic versions for teams that need representative data but lack clearance for the originals.

What VynFi Provides

VynFi's fingerprint module is available on the Scale and Enterprise tiers. It includes a command-line tool for on-premises fingerprint extraction, an API for cloud-based extraction, configurable privacy levels (epsilon from 0.01 to 1.0), and full audit logging of extraction events including the epsilon budget consumed.

For organizations where even running an extraction tool on-premises is not feasible, VynFi offers a sector-calibrated baseline: synthetic data generated from fingerprints aggregated across 155 real-world datasets under differential privacy. This baseline does not match any individual organization's profile, but it provides a statistically representative starting point for development and testing.

The Free and Developer tiers use VynFi's sector-calibrated baselines. For fingerprint-based generation from your own data, the Scale and Enterprise tiers provide the extraction tools and configurable privacy parameters.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.