pm4pyprocess miningOCELintegrationPythonconformanceDataSynth 3.1.1

pm4py + VynFi: Process Mining on Synthetic OCEL Event Logs

Generate OCEL 2.0 event logs with VynFi, load them into pm4py, discover process models, check conformance, and detect bottlenecks — no ERP data extraction needed.

VynFi Team · EngineeringApril 13, 20269 min read

pm4py is the go-to Python library for process mining. It supports process discovery (alpha miner, inductive miner, heuristics miner), conformance checking, social network analysis, and performance visualization. The hard part isn't the library — it's getting clean event logs to feed it.

Extracting event logs from SAP (BKPF, EKKO, EKPO → case/activity/timestamp triples) takes weeks of data engineering. VynFi generates OCEL 2.0-compliant event logs directly — complete with case IDs, activity labels, timestamps, object types, and variant annotations. This tutorial shows the full pipeline: generate data, load into pm4py, discover a process model, and check conformance.

**Update (2026-04-19, DataSynth 3.1.1):** OCEL timestamps are now microsecond-precision (previously nanosecond), so `pandas.to_datetime(..., utc=True)` retains **100%** of events — the prior nanosecond format silently dropped 95% of rows. Process-variant imperfections (rework, skip-step, out-of-order) are now injected at realistic default rates (15% / 10% / 8%), producing Inductive-Miner fitness in the 0.70–0.92 band — much closer to real ERP data than the near-perfect 1.00 fitness the engine used to produce. The regenerated VynFi/vynfi-supply-chain-ocel and VynFi/vynfi-ocel-manufacturing datasets on Hugging Face include these improvements (162 variants, 55% happy-path concentration on a sample retail job).

Generate the Event Log

Python

import os
import vynfi
client = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])
config = {
    "sector": "manufacturing",
    "rows": 5000,
    "companies": 5,
    "periods": 6,
    "processModels": ["p2p", "o2c", "manufacturing"],
    "exportFormat": "json",
    "ocpm": {"enabled": True, "computeVariants": True},
}
job = client.jobs.generate_config(config=config)
completed = client.jobs.wait(job.id)
archive = client.jobs.download_archive(completed.id)

Load into pm4py

Python

import pandas as pd
import pm4py
# Load the OCEL event log
events = archive.json("ocel-event-log.json")
df = pd.json_normalize(events)
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.sort_values("timestamp").reset_index(drop=True)
# Rename to pm4py's expected schema
df = df.rename(columns={
    "case_id": "case:concept:name",
    "activity": "concept:name",
    "timestamp": "time:timestamp",
})
print(f"Events: {len(df)}")
print(f"Cases: {df['case:concept:name'].nunique()}")
print(f"Activities: {df['concept:name'].nunique()}")
# Convert to pm4py EventLog
event_log = pm4py.convert_to_event_log(df)

Discover a Process Model

Python

# Inductive Miner — guaranteed sound model
net, initial_marking, final_marking = pm4py.discover_petri_net_inductive(event_log)
# Visualize
pm4py.view_petri_net(net, initial_marking, final_marking)
# Or use a BPMN model
bpmn = pm4py.discover_bpmn_inductive(event_log)
pm4py.view_bpmn(bpmn)

Conformance Checking

Python

# Token-based replay
replayed = pm4py.conformance_diagnostics_token_based_replay(event_log, net, initial_marking, final_marking)
# Fitness score
fitness = pm4py.fitness_token_based_replay(event_log, net, initial_marking, final_marking)
print(f"Fitness: {fitness['average_trace_fitness']:.3f}")
# Alignment-based (exact, slower)
aligned = pm4py.conformance_diagnostics_alignments(event_log, net, initial_marking, final_marking)
precision = pm4py.precision_alignments(event_log, net, initial_marking, final_marking)
print(f"Precision: {precision:.3f}")

Bottleneck Detection

Python

# Performance analysis — median time between activities
from pm4py.algo.filtering.log.timestamp import timestamp_filter
# Filter to a specific time window
filtered = timestamp_filter.filter_traces_intersecting(
    event_log,
    "2024-01-01 00:00:00",
    "2024-06-30 23:59:59",
)
# Discover directly-follows graph with performance
dfg, start, end = pm4py.discover_performance_dfg(filtered)
pm4py.view_performance_dfg(dfg, start, end)

Pre-Built Variant Analysis

VynFi's analytics API includes a process variant summary with variant count, entropy, and happy-path concentration — computed server-side. Use it to verify data quality before running expensive conformance checks.

Python

analytics = client.jobs.analytics(completed.id)
if analytics.process_variant_summary:
    v = analytics.process_variant_summary
    print(f"Variants: {v.variant_count}")
    print(f"Entropy: {v.variant_entropy:.3f}")
    print(f"Happy-path: {v.happy_path_concentration:.1%}")

Other Export Formats

VynFi also generates XES 2.0 (ProM/pm4py native), Celonis IBC (with metadata sidecar), Disco CSV, and Parquet. Request these via exportFormat in your config or find them in the archive alongside the JSON event log.

SDK examples + regenerated datasets

The pm4py_integration.py example runs this whole pipeline against the live API and prints conformance metrics. 05_process_mining_ocel.ipynb is the interactive notebook companion. For sector-specific process DAGs (manufacturing supply-chain, retail O2C, financial-services correspondent banking), see sector_dag_presets.py. The regenerated HF datasets VynFi/vynfi-supply-chain-ocel and VynFi/vynfi-ocel-manufacturing include native OCEL events, objects, and anomaly labels in one parquet bundle — load directly with `datasets.load_dataset("VynFi/vynfi-supply-chain-ocel", "events")`.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.