Slotting Architecture · 20 min read

Sales History Data Mapping for Velocity Slotting

Accurate inventory velocity is the foundational input for every slotting decision, and it is only as trustworthy as the sales history it is derived from. This guide is part of the Velocity Data Ingestion & WMS Sync Pipelines system, and it owns the normalization layer specifically: turning raw, heterogeneous ERP transaction logs — spread across channels, pack configurations, returns, and promotions — into one canonical per-SKU demand baseline that the scoring engine can compute a defensible velocity coefficient from. Get this layer wrong and every downstream tier assignment inherits a silent distortion: a single pallet line masquerades as a hyper-mover, a returns-heavy SKU reads as fast, and the optimizer confidently ships move directives that make travel worse.

What Sales History Mapping Is

Sales history mapping is the deterministic transformation of source-system transaction records into a flat, time-indexed units_sold series per SKU, expressed in a single canonical unit of measure and cleansed of the artifacts that would bias a demand estimate. It is mapping because the core work is a field-by-field and value-by-value translation between a source contract you do not control (the ERP’s ORDERS / ORDER_LINES / ITEM_MASTER shape) and a target contract the velocity engine does (a VelocityFact row). It is not aggregation and it is not scoring — those happen downstream. This layer’s single responsibility is to make every unit of demand comparable to every other unit of demand.

Three translation problems separate a production mapper from a naive SELECT qty, sku FROM order_lines:

Unit-of-measure reconciliation. ERPs record a line in whatever UOM the order was placed in — eaches, inner packs, cases, pallets. A pallet line of 1 with a pack_factor of 240 is 240 picks, not one. Mapping must resolve every line to a single base unit before any total is meaningful.
Channel and return netting. The same SKU sells through retail, wholesale, and e-commerce with different line semantics, and returns arrive as negative or separately flagged lines. A velocity baseline for slotting wants net outbound pick demand, not gross gross sales — a unit sold and returned generated two touches but zero net demand.
Event isolation, not deletion. Promotional spikes, one-off bulk orders, and discontinued-item tails are real events that must be flagged and routed, never silently dropped. They belong in a separate seasonal model so steady-state tier assignment stays stable while peak-season slotting can still see the spike.

The output of this layer is the historical baseline that the rolling windows in Async Batch Processing for Velocity score against, so its precision sets the ceiling on the entire pipeline’s accuracy.

Input Data Requirements

The mapper consumes raw delta records produced upstream and emits VelocityFact rows. It assumes the extract already resolved transport-level faults but does not assume clean field values — value-level cleansing is this layer’s job. The source feed must carry enough context to resolve UOM and net returns; a feed missing pack_factor or a return indicator cannot be mapped correctly no matter how good the code is.

Source field	Type	Maps to	Precondition
`item_code`	`str`	`sku_id`	Non-null; retired aliases resolved against `ITEM_MASTER`
`trans_date`	`date`	`demand_date`	Timezone-normalized to warehouse-local before windowing
`qty`	`int`	(→ `units_base`)	Signed; negative denotes a return line
`uom`	`str`	(drives conversion)	One of the known UOM codes; unknown codes quarantine the row
`pack_factor`	`int`	(→ `units_base`)	`>= 1`; base-units-per-UOM from item master
`channel`	`str`	`channel`	Retail / wholesale / ecom; drives netting policy
`line_type`	`str`	`is_return`	`sale` / `return` / `adjustment`

The canonical target record is defined as a frozen dataclass so every downstream consumer reads the same shape. Keeping it immutable makes the mapping contract explicit and prevents each cleansing pass from mutating the record in place.

from __future__ import annotations

import logging
from dataclasses import dataclass
from datetime import date
from enum import Enum

logger = logging.getLogger("velocity.mapping")


class Channel(str, Enum):
    RETAIL = "retail"
    WHOLESALE = "wholesale"
    ECOM = "ecom"


@dataclass(frozen=True)
class VelocityFact:
    """One canonical, base-unit demand fact the scoring engine consumes."""
    sku_id: str
    demand_date: date
    units_base: int          # net outbound demand in base (each) units
    channel: Channel
    is_promotional: bool = False   # flagged, not dropped — routed to seasonal model

The quality gate that matters here is value correctness, not transport. A run polluted by unconverted UOM (pallet lines counted as single picks) or un-netted returns produces a baseline that is confidently wrong. Records with an unknown UOM code or a pack_factor below 1 must be quarantined rather than guessed — the contract for that quarantine boundary is enforced by Schema Validation for Inventory Feeds, and this layer trusts that a row reaching it is structurally valid even if its values still need cleansing.

Step-by-Step Implementation

The mapper runs in four passes: pull the incremental delta by watermark, translate each raw line into base-unit VelocityFact records with validation, net returns and channels into a single demand series, then classify promotional and bulk-order events into a separate seasonal store. Each pass is isolated so a fault in cleansing never corrupts the extract bookmark.

1. Pull the Incremental Delta by Watermark

Full-table scans of an ERP sales history lock rows during peak fulfillment and re-fetch data you already have. Instead, pull only records newer than the last committed watermark — a modified_at timestamp or a monotonic order_line_id — and advance the bookmark only after the batch is durably mapped. This is the same watermark contract the extractors in WMS & ERP Polling Strategies emit, so the mapper consumes deltas rather than snapshots.

from datetime import datetime

import httpx


async def fetch_incremental_sales(
    base_url: str,
    watermark: datetime,
    batch_size: int = 5_000,
) -> tuple[list[dict], datetime]:
    """Fetch delta sales lines newer than `watermark`, returning rows and the advanced bookmark."""
    params = {
        "modified_since": watermark.isoformat(),
        "limit": batch_size,
        "sort": "modified_at",
        "order": "asc",
    }
    async with httpx.AsyncClient(timeout=30.0) as client:
        resp = await client.get(f"{base_url}/api/v1/sales/orders", params=params)
        resp.raise_for_status()
        rows = resp.json().get("data", [])

    if not rows:
        logger.info("no new sales lines since %s", watermark.isoformat())
        return [], watermark

    new_watermark = max(datetime.fromisoformat(r["modified_at"]) for r in rows)
    logger.info("pulled %d sales lines; watermark %s -> %s",
                len(rows), watermark.isoformat(), new_watermark.isoformat())
    return rows, new_watermark

2. Map and Validate Each Line to Base Units

This is the heart of the layer. Each raw line is renamed into the canonical field names, its quantity resolved to base units via pack_factor, and its UOM checked against the known set. Unknown UOM codes and non-positive pack factors are quarantined — never coerced — so a bad reference value cannot silently deflate a SKU’s velocity.

KNOWN_UOM = {"each", "inner", "case", "pallet"}

_FIELD_MAP = {
    "item_code": "sku_id",
    "trans_date": "demand_date",
    "storage_channel": "channel",
}


def map_line(raw: dict) -> VelocityFact | None:
    """Translate one raw ERP line to a base-unit VelocityFact, or None if it must quarantine."""
    uom = raw.get("uom", "").lower()
    pack_factor = int(raw.get("pack_factor", 0))
    if uom not in KNOWN_UOM or pack_factor < 1:
        logger.warning("quarantined line sku=%s uom=%r pack_factor=%s",
                       raw.get("item_code"), uom, pack_factor)
        return None

    signed_qty = int(raw["qty"])                       # negative == return line
    units_base = signed_qty * pack_factor              # eaches per UOM applied
    return VelocityFact(
        sku_id=raw["item_code"],
        demand_date=date.fromisoformat(raw["trans_date"][:10]),
        units_base=units_base,
        channel=Channel(raw.get("storage_channel", "retail")),
    )

3. Net Returns and Channels into One Demand Series

A slotting baseline wants net outbound pick demand per SKU per day, aggregated across channels. Summing units_base — where returns are already negative from the signed quantity — collapses gross sales and returns into net demand in a single reduction. A SKU whose returns exceed its sales on a given day nets to zero or negative and must be floored at zero, because negative demand is meaningless for slotting and would corrupt a rolling average.

from collections import defaultdict


def net_daily_demand(facts: list[VelocityFact]) -> dict[tuple[str, date], int]:
    """Collapse mapped facts into net base-unit demand per (sku, day), floored at zero."""
    totals: dict[tuple[str, date], int] = defaultdict(int)
    for fact in facts:
        totals[(fact.sku_id, fact.demand_date)] += fact.units_base

    netted = {key: max(units, 0) for key, units in totals.items()}
    clamped = sum(1 for v in totals.values() if v < 0)
    if clamped:
        logger.info("floored %d (sku, day) buckets with net-negative demand", clamped)
    return netted

4. Classify Promotional and Bulk-Order Events

Promotions and one-off bulk orders are genuine demand, but they are not steady-state demand and must not set a SKU’s baseline tier. The practical heuristic: any single day whose net demand exceeds a facility-tuned multiple of the SKU’s trailing median is flagged promotional. Flagged buckets are routed to a separate seasonal store — retained for peak-season slotting — while the steady-state series that drives everyday tier assignment stays clean.

import statistics


def flag_promotional_days(
    netted: dict[tuple[str, date], int],
    spike_multiple: float = 4.0,
) -> dict[tuple[str, date], bool]:
    """Flag (sku, day) buckets whose demand exceeds `spike_multiple` x the SKU's median day."""
    by_sku: dict[str, list[int]] = defaultdict(list)
    for (sku, _day), units in netted.items():
        by_sku[sku].append(units)

    flags: dict[tuple[str, date], bool] = {}
    for (sku, day), units in netted.items():
        median = statistics.median(by_sku[sku]) or 1
        is_promo = units > spike_multiple * median
        flags[(sku, day)] = is_promo
        if is_promo:
            logger.info("promo spike sku=%s day=%s units=%d (%.1fx median)",
                        sku, day, units, units / median)
    return flags

The steady-state VelocityFact rows leave this layer and become the historical baseline the rolling windows score against; the flagged seasonal buckets feed the peak-season path so genuine demand spikes are captured without corrupting the everyday tier assignment.

Tuning & Calibration

The two parameters that move outcomes most are spike_multiple (how aggressively single days are treated as promotional) and the set of rolling-window lengths the downstream engine averages over. Too low a spike_multiple and ordinary busy days get excluded, starving the baseline; too high and a genuine promotion contaminates the steady-state tier. Window lengths trade responsiveness against stability: a short window chases noise, a long one lags a real trend. Set these against the facility’s demand volatility, then hold them stable — churning them re-slots SKUs for no operational gain.

# sales_mapping.yaml — one profile per facility
mapping:
  base_uom: each                # canonical unit every line resolves to
  known_uom: [each, inner, case, pallet]
  quarantine_unknown_uom: true  # never guess a missing pack_factor
netting:
  floor_negative: true          # net-negative day demand clamps to 0
  channels: [retail, wholesale, ecom]
cleansing:
  spike_multiple: 4.0           # day > 4x SKU median => promotional
  route_promos_to: seasonal_store
windows:
  rolling_days: [30, 90, 365]   # baselines the scoring engine averages

# Equivalent Python config dict consumed by the mapper
SALES_MAPPING = {
    "mapping": {
        "base_uom": "each",
        "known_uom": ["each", "inner", "case", "pallet"],
        "quarantine_unknown_uom": True,
    },
    "netting": {"floor_negative": True, "channels": ["retail", "wholesale", "ecom"]},
    "cleansing": {"spike_multiple": 4.0, "route_promos_to": "seasonal_store"},
    "windows": {"rolling_days": [30, 90, 365]},
}

Facilities with pronounced seasonality should keep the 365-day window and lean on the seasonal store rather than widening spike_multiple — a wide multiple hides real promotions inside the baseline, which is exactly the distortion this layer exists to prevent. Facilities with flat, high-turn catalogs can drop the 365-day window entirely and run tighter 14/30-day baselines for faster response.

Validation & Testing

Never ship a mapped batch without asserting its invariants. Three properties must hold: UOM conversion actually multiplies by pack_factor, returns net against sales instead of inflating demand, and net-negative buckets floor at zero. These pytest checks encode all three and run in the mapping job’s CI gate.

from datetime import date


def test_pallet_line_converts_to_base_units() -> None:
    fact = map_line({"item_code": "SKU1", "trans_date": "2026-06-01",
                     "qty": 1, "uom": "pallet", "pack_factor": 240,
                     "storage_channel": "retail"})
    assert fact is not None and fact.units_base == 240


def test_unknown_uom_is_quarantined() -> None:
    assert map_line({"item_code": "SKU1", "trans_date": "2026-06-01",
                     "qty": 5, "uom": "barrel", "pack_factor": 1}) is None


def test_returns_net_and_floor_at_zero() -> None:
    facts = [
        VelocityFact("SKU1", date(2026, 6, 1), units_base=3, channel=Channel.RETAIL),
        VelocityFact("SKU1", date(2026, 6, 1), units_base=-5, channel=Channel.ECOM),
    ]
    netted = net_daily_demand(facts)
    assert netted[("SKU1", date(2026, 6, 1))] == 0   # -2 floored to 0

A sample expected result for a healthy run: test_pallet_line_converts_to_base_units yields units_base == 240, test_unknown_uom_is_quarantined returns None and logs a quarantine warning, and test_returns_net_and_floor_at_zero floors the −2 net to 0. If conversion or netting drifts, fix the mapper before scoring — a distorted baseline corrupts velocity silently and no downstream tuning recovers it.

Integration Points

Sales history mapping is a normalization layer, not a data source or a decision-maker. It sits between four sibling systems, and each imposes a contract:

Upstream extraction. The delta feed and its watermark come from WMS & ERP Polling Strategies; the mapper advances the bookmark only after a batch is durably mapped, so a mapping crash re-pulls rather than skips.
Contract enforcement. Structurally malformed rows are quarantined at the ingestion boundary by Schema Validation for Inventory Feeds; this layer handles value-level cleansing (UOM, returns, spikes), keeping the two failure domains diagnosable independently.
Downstream scoring. The canonical VelocityFact baseline is what the rolling windows in Async Batch Processing for Velocity average and score into velocity coefficients — the precision of this mapping sets the ceiling on that engine’s accuracy.
Deeper transforms. Facility-specific legacy quirks — composite item codes, embedded UOM in free-text fields, split order headers — are handled in Transforming Legacy ERP Sales Logs for Velocity, which extends the mapper for older ERPs.

The scored coefficients that eventually result flow into the tier logic in ABC Classification Tuning for Warehouse Slotting, and the taxonomy those tiers populate is defined by the SKU Velocity Taxonomy Design layer — both of which inherit whatever distortion this mapping fails to remove.

Failure Modes & Edge Cases

Unconverted UOM inflating a slow mover. A pallet or case line counted as a single pick reads as one unit of demand instead of hundreds — or the reverse, an each line multiplied by a stale pack_factor. Remediation: resolve every line to base units against the current item master and quarantine any line whose UOM is unknown, as map_line does.
Gross demand from un-netted returns. Counting returns as positive demand, or ignoring them entirely, inflates a returns-heavy SKU into a false fast mover. Remediation: carry the signed quantity through netting so returns subtract, and floor net-negative days at zero.
Promotion contaminating the steady-state baseline. A Black Friday spike averaged straight into the rolling window pins a slow mover into a golden zone for months. Remediation: flag spike days by the median multiple and route them to the seasonal store rather than the baseline.
Timezone drift misaligning demand days. Lines stamped in ERP-server UTC but windowed in warehouse-local time smear demand across the wrong day and blur weekend/weekday patterns. Remediation: normalize trans_date to warehouse-local before bucketing by day.
Watermark advanced before the batch is durable. Advancing the bookmark on pull rather than on commit silently drops a batch if mapping crashes mid-run. Remediation: persist mapped facts first, then advance the watermark, so a failed run re-pulls the same delta.

FAQ

Should returns be subtracted from sales or tracked separately?

For a slotting baseline, subtract them. Slotting optimizes for net outbound pick demand — where a SKU should live so pickers travel less — and a unit sold then returned generated touches but no net movement of stock out the door. Carry the return as a signed negative line through netting and floor the day at zero. Keep the gross returns rate in a separate quality metric if you need it for other reporting, but do not let it inflate the velocity the optimizer sees.

How do I stop a seasonal promotion from wrecking my tiers?

Flag any day whose net demand exceeds a facility-tuned multiple (a spike_multiple of about 4x the SKU’s median day is a sane default) and route those buckets to a separate seasonal store instead of deleting them. The steady-state baseline that drives everyday tier assignment stays clean, while the seasonal series is still available to pre-slot for a known peak. Widening the multiple until the spike “disappears” into the baseline is the wrong fix — it hides the very event you need to plan for.

What unit should velocity be expressed in when SKUs sell in different pack sizes?

One canonical base unit — almost always the each (the individual pickable unit) — and every line must be converted to it before any total is taken. Resolve the conversion from the item master’s pack_factor at mapping time, not at query time, so downstream consumers never have to know a line’s original UOM. Mixing eaches and cases in one units_sold column is the single most common cause of a phantom hyper-mover.

How much sales history do I need before the baseline is trustworthy?

Enough to cover at least one full demand cycle for the window you score against — a 30-day rolling baseline needs weeks of clean data, and a 365-day window needs a year to reflect true seasonality. New SKUs with thin history should be assigned a provisional tier from a category or affinity proxy rather than a noisy short-window average, and promoted to a data-driven tier once they accumulate enough clean days.

Where does mapping end and velocity scoring begin?

Mapping ends when you have a canonical, base-unit, returns-netted, promotion-flagged demand series per SKU per day. Everything after that — rolling averages, decay weighting, tier-boundary math — is scoring, and it lives in the batch layer. Keeping the boundary sharp means a scoring change never requires re-touching the mapper, and a mapping fix never silently changes tier math.

WMS & ERP Polling Strategies — the watermark extraction that produces the delta feed this mapper consumes.
Schema Validation for Inventory Feeds — the structural contract enforced before value-level cleansing runs here.
Async Batch Processing for Velocity — the scoring layer that averages this baseline into velocity coefficients.
Transforming Legacy ERP Sales Logs for Velocity — extends the mapper for older ERPs with composite codes and free-text UOM.
Velocity Data Ingestion & WMS Sync Pipelines — the parent architecture this normalization layer feeds.