Early accessSome features may be unavailable
Back to Blog
DataSynth 5.29business-unitsegment-reportingifrs-8dimensionscost-centeranalytics

Business-unit dimensions: the segment column that makes segment analytics coherent

DataSynth 5.29 adds a line-level `business_unit` field — a deterministic FNV roll-up of the cost center (or profit center, as a fallback) that the corpus carries ~11 distinct codes for, but the engine previously lacked entirely. Here's why this matters for IFRS 8 segment reporting, why deterministic is the right design, and what BU-level analytics now do.

VynFi Team · EngineeringMay 27, 20265 min read

Real audit datasets carry an organisational dimension on every line — a business unit (BU) or segment that rolls up several cost centers into a reporting boundary an internal-controls team or IFRS 8 segment-reporting analyst can pivot against. Until 5.29, DataSynth's output had cost centers and profit centers but no BU. Customers running BU-level analytics had to invent one client-side from naming conventions — error-prone and not consistent across runs.

**TL;DR** — `transactions.business_unit_dimension` (default-on) populates a new line-level `business_unit` column. It's a deterministic FNV roll-up of the cost center (or profit center, as a fallback): the same CC/PC always maps to the same BU, across runs and across customers. Around ~11 distinct BU codes per company in the default config (BU01..BU11), matching the corpus. Default fill ~82% (covers every line with a CC or PC). Emitted in JSON, journal_entries.csv (column 47), and the Parquet sink (column 16). Set `false` for legacy (empty) behavior.

Why a deterministic roll-up

The obvious design — sample a random BU per line — would have broken BU-level analytics from day one. If two lines on the same JE have the same cost center but different BUs, every aggregation is broken: the JE doesn't balance in the BU view, segment totals don't reconcile, and the reader assumes the data is wrong.

The 5.29 design uses an FNV hash of the cost center, bucketed into 11 BU codes. Same CC → same BU, always. This means:

  • JEs balance in the BU view — debits and credits in the same BU always offset.
  • Segment totals reconcile to the trial balance — sum across BUs equals the global total.
  • Cross-run consistency — the same CC produces the same BU on every run, so longitudinal BU analytics are coherent.
  • Cross-customer comparability — every dataset using the same CC list (e.g., a standard chart of accounts) produces the same BU layout, which makes head-to-head benchmarking trivial.

The cost of determinism: a customer can't override the BU on a specific line without also changing the CC. That's intentional — real ERPs don't let you override BU on a line either; the BU is master-data derived. If you need a different BU layout, change the CC bucket boundaries in the master data and re-run.

Cost center OR profit center, with fallback

The first 5.29 round rolled up from cost center only. That gave ~24% BU fill — only lines with a CC got a BU. The tuning round broadened the input: roll up from CC if present, fall back to PC if CC is empty. That lifted fill to ~82%, matching the corpus.

The helper function in the engine is renamed `business_unit_for_dimension(cc, pc)` — the rename captures the dual-source behavior clearly. Lines with neither CC nor PC (e.g., header-level summary postings) still have empty BU.

IFRS 8 segment reporting

IFRS 8 requires entities to report financial information by operating segment — the chunks of the business management routinely reviews. For multi-jurisdictional audit teams, generating test data that exercises the full IFRS 8 disclosure pack used to require hand-crafting a CC → segment mapping per dataset. With 5.29, every dataset ships a coherent segment view out of the box.

Concretely, the IFRS 8 disclosures you can pivot on after 5.29:

  • Segment revenue (sum of credit-amount on revenue accounts, group by business_unit).
  • Segment expense / profit (sum of debit-amount on expense accounts).
  • Inter-segment transfers (filter to JE lines where the BU on the debit ≠ BU on the credit; that's automatically captured by the same-CC roll-up).
  • Major-customer / supplier concentration by segment (join JE lines to vendor master, group by BU).
  • Segment asset / liability totals (TB filtered by BU bucket).

What changed for downstream consumers

The journal_entries.csv now has 47 columns (48 if foreign_currency_rate is also enabled). The Parquet sink schema has 16 columns. Both are additive — consumers that resolve columns by name (which is what the VynFi portal visualizers do) stay green. Column-positional consumers (e.g., older CSV pipelines that hardcode column 42) need to widen their parser; the regression test we added during DS 5.29 adoption (PR #5) pins this.

Header-parity self-checks and Parquet schema assertions in DataSynth are updated to match. The new column is included in the integration tests that resolve by name; older tests that count columns positionally are updated.

How to try it

Default-on for every run. To inspect the resulting BU distribution on a 10k JE retail run:

Bash
# After downloading the run
awk -F, '$0~/BU/{ print $NF }' journal_entries.csv | sort | uniq -c | sort -rn
# Typical output:
# 2841 BU03
# 2103 BU07
# 1822 BU01
# 1654 BU09
# ...

Or load into Pandas and pivot on `business_unit` against revenue/expense accounts to get a segment P&L. Or in SQL via the Parquet sink: `SELECT business_unit, SUM(local_amount) FROM je WHERE account_class = 'revenue' GROUP BY business_unit`.

To disable: set `transactions.business_unit_dimension: false` (legacy behavior, empty BU column). Most customers should leave it default-on.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.