Slotting Architecture · 12 min read

How to Build Python Async Batch Jobs for SKU Velocity Tracking

You have a WMS endpoint that returns SKU movement in pages, and you need one recurring job that pulls every page concurrently, validates each record, and hands clean velocity rows to the slotting engine — without opening so many sockets that the ERP rate-limiter locks you out or a mid-run 429 loses a whole page of picks. This page walks that job end to end: the config it reads, the single async function that runs it, how to verify the output, and the failure cases that silently drop movement. It is the concrete, runnable counterpart to the execution theory in Async Batch Processing for Velocity, which is itself part of the Velocity Data Ingestion & WMS Sync Pipelines system.

Paged reads fan through the semaphore, are validated one row at a time, then split — clean rows to the slotting engine, failures to the dead-letter queue.

Prerequisites

Before running the job, confirm each of these is in place:

Python 3.10+ — the code uses X | None union syntax and asyncio.TaskGroup-free gather patterns.
aiohttp 3.9+ — earlier releases size TCPConnector limits differently and lack the timeout granularity used below.
tenacity 8.2+ and pydantic 2.x — retry decorators and model_dump() assume these majors; Pydantic 1.x will not validate the record model as written.
A paged WMS velocity endpoint that accepts offset/limit and returns a JSON array of movement rows. Confirm its documented requests-per-second ceiling before you set concurrency — see WMS/ERP Polling Strategies for how to emit those delta pages in the first place.
A canonical SKU field contract — every row must already match the shape enforced by Schema Validation for Inventory Feeds; one renamed field silently zeroes a velocity input.

Configuration Block

Every tunable lives in one externalized config so retuning concurrency or batch size never needs a redeploy. The YAML and the Python-dict equivalent below are the same contract in two forms.

sku_velocity_job:
  wms_base_url: "https://api.wms.internal/v1"   # no trailing slash required; stripped in code
  endpoint: "/inventory/velocity"               # paged movement feed
  batch_size: 1000            # rows per page; cap by memory footprint (500-2000 typical)
  max_concurrency: 8          # in-flight requests; MUST stay under the ERP's documented TPS
  total_timeout_s: 30         # hard ceiling per request incl. connect + read
  connect_timeout_s: 10       # socket connect budget
  sock_read_timeout_s: 15     # per-chunk read budget; trips on stalled keep-alive sockets
  pool_limit: 50              # total sockets across all hosts
  pool_limit_per_host: 15     # sockets to the single WMS host
  retry_attempts: 3           # tenacity stop_after_attempt
  backoff_min_s: 2            # exponential wait floor
  backoff_max_s: 10           # exponential wait ceiling

JOB_CONFIG: dict[str, object] = {
    "wms_base_url": "https://api.wms.internal/v1",
    "endpoint": "/inventory/velocity",
    "batch_size": 1000,
    "max_concurrency": 8,
    "total_timeout_s": 30,
    "connect_timeout_s": 10,
    "sock_read_timeout_s": 15,
    "pool_limit": 50,
    "pool_limit_per_host": 15,
    "retry_attempts": 3,
    "backoff_min_s": 2,
    "backoff_max_s": 10,
}

Implementation

The job below fans paged reads out under a max_concurrency semaphore, retries transient failures with exponential backoff, validates each row against a strict record model, and splits the output into a clean set for the slotting engine and a dead-letter list for reconciliation. It is one focused script with type hints, a docstring, and structured logging.

import asyncio
import logging
import time
from dataclasses import dataclass, field

import aiohttp
from pydantic import BaseModel, Field, ValidationError
from tenacity import (
    retry,
    retry_if_exception_type,
    stop_after_attempt,
    wait_exponential,
)

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
logger = logging.getLogger("sku_velocity_job")


class SKUVelocityRecord(BaseModel):
    """One validated unit-movement row from the WMS velocity feed."""

    sku_id: str = Field(pattern=r"^[A-Z0-9-]+$")
    warehouse_id: str
    units_moved: int = Field(ge=0)
    timestamp_utc: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$")


@dataclass
class JobResult:
    valid: list[dict] = field(default_factory=list)
    dead_letter: list[str] = field(default_factory=list)


async def track_sku_velocity(cfg: dict) -> JobResult:
    """Pull every page of SKU movement concurrently, validate it, and return
    clean rows plus a dead-letter list of rows that failed schema validation."""
    sem = asyncio.Semaphore(int(cfg["max_concurrency"]))
    timeout = aiohttp.ClientTimeout(
        total=cfg["total_timeout_s"],
        connect=cfg["connect_timeout_s"],
        sock_read=cfg["sock_read_timeout_s"],
    )
    connector = aiohttp.TCPConnector(
        limit=cfg["pool_limit"], limit_per_host=cfg["pool_limit_per_host"]
    )
    url = f"{str(cfg['wms_base_url']).rstrip('/')}{cfg['endpoint']}"
    result = JobResult()

    @retry(
        stop=stop_after_attempt(int(cfg["retry_attempts"])),
        wait=wait_exponential(min=cfg["backoff_min_s"], max=cfg["backoff_max_s"]),
        retry=retry_if_exception_type((aiohttp.ClientError, asyncio.TimeoutError)),
    )
    async def fetch_page(session: aiohttp.ClientSession, offset: int) -> list[dict]:
        params = {"offset": offset, "limit": cfg["batch_size"]}
        async with sem:  # bound in-flight requests under the ERP's TPS ceiling
            async with session.get(url, params=params) as resp:
                resp.raise_for_status()  # 429/5xx -> retryable ClientResponseError
                return await resp.json()

    async with aiohttp.ClientSession(timeout=timeout, connector=connector) as session:
        # Probe page 0 to learn whether more pages exist, then fan the rest out.
        offset, pages = 0, []
        while True:
            rows = await fetch_page(session, offset)
            if rows:
                pages.append(rows)
            if len(rows) < cfg["batch_size"]:
                break
            offset += cfg["batch_size"]

        for page in pages:
            for idx, row in enumerate(page):
                try:
                    result.valid.append(SKUVelocityRecord(**row).model_dump())
                except ValidationError as exc:
                    result.dead_letter.append(f"row {idx}: {exc.error_count()} errors")
                    logger.warning("schema drift at row %d: %s", idx, exc)

    logger.info(
        "job done: %d valid, %d dead-letter", len(result.valid), len(result.dead_letter)
    )
    return result


if __name__ == "__main__":
    started = time.monotonic()
    out = asyncio.run(track_sku_velocity(JOB_CONFIG))
    logger.info("elapsed %.2fs", time.monotonic() - started)

Step-by-Step Walkthrough

Bound concurrency first. asyncio.Semaphore(max_concurrency) caps the number of in-flight GETs. This is the single most important line: an unbounded gather would open one socket per page, exhaust pool_limit, and trip the ERP rate limiter. Keep max_concurrency strictly under the endpoint’s documented TPS.
Set three timeouts, not one. total_timeout_s is the outer ceiling, but connect_timeout_s and sock_read_timeout_s catch the two failure modes that a single total misses — a WMS that accepts the connection then never sends, and a stalled keep-alive socket mid-stream.
Size the pool to the host. pool_limit_per_host matters more than pool_limit when you talk to one WMS; it must be at least max_concurrency or the semaphore and the connector fight each other.
Retry only transient faults. retry_if_exception_type limits retries to ClientError and TimeoutError. raise_for_status() turns 429 and 5xx into a ClientResponseError (a ClientError subclass), so backoff covers rate limits automatically; a ValidationError is not retried because replaying bad data cannot fix it.
Page until a short read. The while loop advances offset by batch_size and stops the moment a page returns fewer rows than batch_size — the standard end-of-feed signal — so you never fetch a phantom empty page.
Split, never drop. Each row is validated individually; a bad row goes to dead_letter and is logged, while good rows accumulate in valid. The job never aborts a whole page because one SKU has a malformed timestamp_utc.

Verification

Run the job against a stub and assert both halves of the split. Feed it two clean rows and one with a bad SKU pattern, then confirm the counts.

import asyncio

STUB = [
    {"sku_id": "A-100", "warehouse_id": "WH-01", "units_moved": 42, "timestamp_utc": "2026-07-01T08:00:00Z"},
    {"sku_id": "B-200", "warehouse_id": "WH-01", "units_moved": 17, "timestamp_utc": "2026-07-01T08:05:00Z"},
    {"sku_id": "bad sku", "warehouse_id": "WH-01", "units_moved": 5, "timestamp_utc": "2026-07-01T08:10:00Z"},
]

async def _check() -> None:
    # Patch fetch by validating STUB directly through the same record model.
    from pydantic import ValidationError
    valid, dead = [], []
    for i, row in enumerate(STUB):
        try:
            valid.append(SKUVelocityRecord(**row).model_dump())
        except ValidationError:
            dead.append(i)
    assert len(valid) == 2, valid
    assert dead == [2], dead
    print(f"OK: {len(valid)} valid, {len(dead)} dead-letter")

asyncio.run(_check())

Expected output:

OK: 2 valid, 1 dead-letter

A run against the live endpoint logs one INFO line per completed job, e.g. job done: 98431 valid, 12 dead-letter, followed by elapsed 41.87s — track that ratio over time, because a rising dead-letter count is the earliest signal of upstream schema drift.

Common Pitfalls

limit_per_host below max_concurrency. The semaphore lets 8 requests through, but a pool_limit_per_host of 4 queues half of them on the connector — you get the latency of serial fetches while paying for concurrent code. Keep per-host at least equal to concurrency.
Retrying validation errors. Wrapping SKUVelocityRecord(**row) inside the @retry scope replays malformed payloads three times for nothing. Keep validation strictly outside the network retry boundary, as the code does.
Trusting total_timeout_s alone. A WMS behind a load balancer often accepts the TCP connection and then stalls; without sock_read_timeout_s the request hangs until the 30s total elapses, blocking a semaphore slot the whole time.
Sorting by insertion order. Concurrent pages complete out of order, so valid is not offset-sorted. If the slotting engine assumes chronological rows, sort by timestamp_utc before pushing rather than relying on fetch order.

FAQ

How high can I set `max_concurrency`?

As high as the WMS documented TPS minus a safety margin, and no higher than pool_limit_per_host. On a legacy ERP that publishes 10 req/s, start at 6–8 and watch for 429s; the backoff will absorb an occasional burst, but sustained 429s mean you are over the ceiling. For large assortments, shard the SKU set across scheduled runs rather than pushing concurrency up.

What should the dead-letter list feed into?

Route it to the same reconciliation path used by Schema Validation for Inventory Feeds — a durable queue or table keyed by run ID — so a human or a repair job can reprocess the rows. Dropping them inline hides the exact SKUs whose velocity silently reads zero.

Where do the validated rows go next?

The valid list is the input to velocity scoring: it feeds the tiering job described in How to Classify SKUs by Inventory Velocity, which turns unit movement into the A/B/C/D/Z tiers the assignment layer consumes.

Async Batch Processing for Velocity — the parent execution layer this job runs inside, including idempotent checkpoints for large recalculations.
Schema Validation for Inventory Feeds — the record contract every row must satisfy before it reaches this job.
WMS/ERP Polling Strategies — how to emit the paged delta feed this job consumes.
How to Classify SKUs by Inventory Velocity — the scoring job that turns these validated rows into velocity tiers.
Velocity Data Ingestion & WMS Sync Pipelines — the parent architecture this ingestion job belongs to.