Process Mining on OCEL 2.0 Financial Event Logs
Generate OCEL-compliant event logs from synthetic P2P/O2C/manufacturing processes, reconstruct case traces from document references, and run variant analysis — all from a single VynFi job.
Process mining needs event logs. Good event logs need three things: a case ID that groups related events into end-to-end traces, activity labels that describe what happened, and timestamps that order them. In ERP systems, extracting this from raw tables (BKPF, BSEG, EKKO, EKPO) takes weeks of data engineering. VynFi generates OCEL 2.0-compliant event logs directly — or, if you prefer, raw document flows that you can reconstruct into traces yourself.
**DataSynth 3.1.1 update:** OCEL timestamps are microsecond-precision (was nanosecond — `pd.to_datetime(..., utc=True)` silently dropped 95% of rows before 3.1). `process_variant_summary.json` always ships in the archive. Inductive Miner fitness lands in the realistic 0.70–0.92 band now that rework (15%), skip-step (10%), and out-of-order (8%) imperfections are injected by default. Regenerated dataset: VynFi/vynfi-supply-chain-ocel (native OCEL events + objects + anomaly labels).
Generate the Data
import osimport vynficlient = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])config = { "sector": "manufacturing", "rows": 1000, "companies": 5, "periods": 3, "processModels": ["p2p", "s2c", "manufacturing"], "exportFormat": "json",}job = client.jobs.generate_config(config=config)completed = client.jobs.wait(job.id)archive = client.jobs.download_archive(completed.id)Load the Event Log
VynFi jobs with OCPM enabled (default since v2.3) include an `ocel-event-log.json` file. If your job predates v2.3 or OCPM was disabled, you can reconstruct traces from the document_flows/ directory using document references as case-linking edges.
import pandas as pd# Try native OCEL firsttry: ocel_raw = archive.json("ocel-event-log.json") events_df = pd.json_normalize(ocel_raw) events_df["timestamp"] = pd.to_datetime(events_df["timestamp"]) print(f"Loaded native OCEL: {len(events_df)} events")except (KeyError, FileNotFoundError): # Reconstruct from document flows (see SDK notebook 05) events_df = reconstruct_from_doc_flows(archive) print(f"Reconstructed: {len(events_df)} events")print(f"Activities: {events_df['activity'].nunique()}")print(f"Cases: {events_df['case_id'].nunique()}")print(f"Time span: {events_df['timestamp'].min()} to {events_df['timestamp'].max()}")Variant Analysis
from collections import Counter# Build variant tracesvariants = {}for case_id, group in events_df.sort_values("timestamp").groupby("case_id"): trace = tuple(group["activity"].tolist()) variants[case_id] = trace# Count variant frequenciesvariant_counts = Counter(variants.values())total_cases = len(variants)print(f"\nTotal variants: {len(variant_counts)}")print(f"Top 5 variants:")for trace, count in variant_counts.most_common(5): pct = count / total_cases * 100 label = " -> ".join(trace) print(f" {pct:5.1f}% ({count:3d} cases) {label}")Happy Path vs. Exceptions
The most frequent variant is your happy path — the expected process execution. Everything else is an exception: rework loops, skipped steps, out-of-order activities. The ratio of happy-path cases to total cases is your process conformance rate. VynFi's pre-built analytics include this as `happy_path_concentration` in the process variant summary.
# Pre-built analytics (no computation needed)analytics = client.jobs.analytics(completed.id)if analytics.process_variant_summary: v = analytics.process_variant_summary print(f"Happy-path concentration: {v.happy_path_concentration:.1%}") print(f"Variant entropy: {v.variant_entropy:.3f}")Export to pm4py / Disco / Celonis
VynFi also generates OCEL 2.0 XML, XES 2.0, Celonis IBC, and Disco CSV formats. Request them via `exportFormat` in your generation config, or find them in the archive alongside the JSON event log. For pm4py, the DataFrame above is already in the right shape — just rename columns to match pm4py's expected schema (`case:concept:name`, `concept:name`, `time:timestamp`).