process miningOCELpm4pyvariant analysisPythonmanufacturingDataSynth 3.1.1

Process Mining on OCEL 2.0 Financial Event Logs

Generate OCEL-compliant event logs from synthetic P2P/O2C/manufacturing processes, reconstruct case traces from document references, and run variant analysis — all from a single VynFi job.

VynFi Team · EngineeringApril 13, 202611 min read

Process mining needs event logs. Good event logs need three things: a case ID that groups related events into end-to-end traces, activity labels that describe what happened, and timestamps that order them. In ERP systems, extracting this from raw tables (BKPF, BSEG, EKKO, EKPO) takes weeks of data engineering. VynFi generates OCEL 2.0-compliant event logs directly — or, if you prefer, raw document flows that you can reconstruct into traces yourself.

**DataSynth 3.1.1 update:** OCEL timestamps are microsecond-precision (was nanosecond — `pd.to_datetime(..., utc=True)` silently dropped 95% of rows before 3.1). `process_variant_summary.json` always ships in the archive. Inductive Miner fitness lands in the realistic 0.70–0.92 band now that rework (15%), skip-step (10%), and out-of-order (8%) imperfections are injected by default. Regenerated dataset: VynFi/vynfi-supply-chain-ocel (native OCEL events + objects + anomaly labels).

Generate the Data

Python

import os
import vynfi
client = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])
config = {
    "sector": "manufacturing",
    "rows": 1000,
    "companies": 5,
    "periods": 3,
    "processModels": ["p2p", "s2c", "manufacturing"],
    "exportFormat": "json",
}
job = client.jobs.generate_config(config=config)
completed = client.jobs.wait(job.id)
archive = client.jobs.download_archive(completed.id)

Load the Event Log

VynFi jobs with OCPM enabled (default since v2.3) include an `ocel-event-log.json` file. If your job predates v2.3 or OCPM was disabled, you can reconstruct traces from the document_flows/ directory using document references as case-linking edges.

Python

import pandas as pd
# Try native OCEL first
try:
    ocel_raw = archive.json("ocel-event-log.json")
    events_df = pd.json_normalize(ocel_raw)
    events_df["timestamp"] = pd.to_datetime(events_df["timestamp"])
    print(f"Loaded native OCEL: {len(events_df)} events")
except (KeyError, FileNotFoundError):
    # Reconstruct from document flows (see SDK notebook 05)
    events_df = reconstruct_from_doc_flows(archive)
    print(f"Reconstructed: {len(events_df)} events")
print(f"Activities:  {events_df['activity'].nunique()}")
print(f"Cases:       {events_df['case_id'].nunique()}")
print(f"Time span:   {events_df['timestamp'].min()} to {events_df['timestamp'].max()}")

Variant Analysis

Python

from collections import Counter
# Build variant traces
variants = {}
for case_id, group in events_df.sort_values("timestamp").groupby("case_id"):
    trace = tuple(group["activity"].tolist())
    variants[case_id] = trace
# Count variant frequencies
variant_counts = Counter(variants.values())
total_cases = len(variants)
print(f"\nTotal variants: {len(variant_counts)}")
print(f"Top 5 variants:")
for trace, count in variant_counts.most_common(5):
    pct = count / total_cases * 100
    label = " -> ".join(trace)
    print(f"  {pct:5.1f}%  ({count:3d} cases)  {label}")

Happy Path vs. Exceptions

The most frequent variant is your happy path — the expected process execution. Everything else is an exception: rework loops, skipped steps, out-of-order activities. The ratio of happy-path cases to total cases is your process conformance rate. VynFi's pre-built analytics include this as `happy_path_concentration` in the process variant summary.

Python

# Pre-built analytics (no computation needed)
analytics = client.jobs.analytics(completed.id)
if analytics.process_variant_summary:
    v = analytics.process_variant_summary
    print(f"Happy-path concentration: {v.happy_path_concentration:.1%}")
    print(f"Variant entropy: {v.variant_entropy:.3f}")

Export to pm4py / Disco / Celonis

VynFi also generates OCEL 2.0 XML, XES 2.0, Celonis IBC, and Disco CSV formats. Request them via `exportFormat` in your generation config, or find them in the archive alongside the JSON event log. For pm4py, the DataFrame above is already in the right shape — just rename columns to match pm4py's expected schema (`case:concept:name`, `concept:name`, `time:timestamp`).

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.