Early accessSome features may be unavailable
Back to Blog
auditP2PO2Cthree-way matchingdocument flowPython

Document Flow Traceability: P2P/O2C Three-Way Matching

Every payment should trace back through an invoice, a goods receipt, and a purchase order. Here is how to reconstruct, validate, and audit those document chains with VynFi and Python.

VynFi Team · EngineeringApril 13, 20269 min read

Three-way matching is the foundation of procure-to-pay controls. Before a payment is released, the system verifies that: (1) a purchase order authorized the spend, (2) a goods receipt confirmed delivery, and (3) a vendor invoice matches both. When any link in this chain is missing or mismatched, you have a control gap — and a potential fraud vector.

VynFi generates complete P2P and O2C document chains with realistic reference linking. Each document carries a header with `document_id`, and a `document_references.json` file maps source-to-target relationships across the chain. This tutorial walks through loading those chains, validating three-way matching, and identifying gaps.

**DataSynth 3.1.1 update:** Every document-flow JE header now carries `DocumentRef` (GoodsReceipt / VendorInvoice / Payment / Delivery / CustomerInvoice / Receipt) on `source_document`, so `is_fraud_propagated` correctly populates when a fraudulent PO fans out through GR → invoice → payment. Previously this chain was broken at the reference level. The regenerated VynFi/vynfi-audit-p2p dataset ships the full P2P flow with correct propagation flags.

Load the Document Flows

Python
import os
import vynfi
client = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])
config = {
"sector": "retail",
"rows": 1000,
"companies": 5,
"processModels": ["p2p", "o2c"],
"exportFormat": "json",
}
job = client.jobs.generate_config(config=config)
completed = client.jobs.wait(job.id)
archive = client.jobs.download_archive(completed.id)
# Load each document type
pos = archive.json("document_flows/purchase_orders.json")
grs = archive.json("document_flows/goods_receipts.json")
vis = archive.json("document_flows/vendor_invoices.json")
pays = archive.json("document_flows/payments.json")
refs = archive.json("document_flows/document_references.json")
print(f"POs: {len(pos)}, GRs: {len(grs)}, VIs: {len(vis)}, Payments: {len(pays)}")
print(f"References: {len(refs)}")

Build the Reference Graph

Python
from collections import defaultdict
# Build adjacency: source_id -> [target_ids]
forward = defaultdict(set)
backward = defaultdict(set)
for ref in refs:
src = ref.get("source_document_id") or ref.get("from_id")
tgt = ref.get("target_document_id") or ref.get("to_id")
if src and tgt:
forward[str(src)].add(str(tgt))
backward[str(tgt)].add(str(src))
# Index documents by ID
doc_index = {}
for doc_list, dtype in [(pos, "PO"), (grs, "GR"), (vis, "VI"), (pays, "PAY")]:
for doc in doc_list:
did = str(doc.get("header", {}).get("document_id", doc.get("id")))
doc_index[did] = {"type": dtype, "doc": doc}
print(f"Document index: {len(doc_index)} documents")
print(f"Forward links: {sum(len(v) for v in forward.values())}")

Validate Three-Way Matching

Python
matched = 0
unmatched_pos = []
for po in pos:
po_id = str(po["header"]["document_id"])
# Find GRs linked to this PO
gr_ids = [t for t in forward.get(po_id, set())
if doc_index.get(t, {}).get("type") == "GR"]
# Find VIs linked to this PO (directly or via GR)
vi_ids = set()
for gid in [po_id] + gr_ids:
vi_ids.update(t for t in forward.get(gid, set())
if doc_index.get(t, {}).get("type") == "VI")
if gr_ids and vi_ids:
matched += 1
else:
unmatched_pos.append({
"po_id": po_id,
"has_gr": bool(gr_ids),
"has_vi": bool(vi_ids),
})
total = len(pos)
print(f"Three-way match rate: {matched}/{total} ({matched/total:.1%})")
print(f"Gaps: {len(unmatched_pos)} POs missing GR and/or VI")
for gap in unmatched_pos[:5]:
print(f" PO {gap['po_id']}: GR={'yes' if gap['has_gr'] else 'NO'}, "
f"VI={'yes' if gap['has_vi'] else 'NO'}")

Fraud Labels on Document Headers (v2.3.1)

With DataSynth 2.3.1, document headers carry `is_fraud` and `fraud_type` directly. You can filter for fraudulent POs, GRs, or payments without joining through `document_references.json`. This makes gap analysis actionable: if an unmatched PO is also flagged `is_fraud: true` with `fraud_type: FictitiousVendor`, that's your test case for the control-weakness finding.

Python
# Filter for fraudulent documents (v2.3.1+)
fraud_docs = [
doc for doc in pos + grs + vis + pays
if doc.get("header", {}).get("is_fraud", False)
]
print(f"\nFraudulent documents: {len(fraud_docs)}")
for doc in fraud_docs[:3]:
h = doc["header"]
print(f" {h['document_id']}: {h.get('fraud_type', 'unknown')}")

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.