Slotting Architecture · 13 min read

How to Poll a WMS API With No Published Rate Limit

You need current inventory velocity, but the WMS exposes only a paginated REST endpoint and publishes no rate limit, no X-RateLimit-Remaining header, and no Retry-After on its 503s. The absence of a documented ceiling is not a licence to hammer it: mid-tier and legacy WMS platforms enforce limits implicitly — through connection-pool exhaustion, database lock contention during ETL windows, and silent request queuing that returns 200 OK with a truncated array. This page builds a client-side adaptive poller that infers a safe cadence from observed latency and error signals instead of a server-published quota, so your velocity feed stays fresh without degrading WMS transaction throughput. It is the “no documented ceiling” case of the WMS & ERP Polling Strategies cluster, which sits inside the wider Velocity Data Ingestion & WMS Sync Pipelines architecture.

Prerequisites

Confirm each of these before wiring the poller into a live feed:

Python 3.10+ — the implementation uses X | None unions, list[...] generics, and dataclass config.
requests 2.31+ — for the pooled Session and HTTPAdapter; swap for httpx if you want the same pattern over async.
Read-only WMS/ERP credentials — a bearer token or API key scoped to the inventory and movement-transaction endpoints only.
A diagnostic baseline — 24–48 hours of captured response latency and status codes across a full picking cycle, so latency_ceiling_ms reflects this facility’s peak-window behaviour rather than a guessed default.
A schema validator for the payload — reuse the contract from Schema Validation for Inventory Feeds so a truncated 200 OK is rejected, not scored.
Optional: a Redis instance — to externalize interval state and the dedup set if you run more than one poller replica.

Configuration Block

Every tunable lives in one externalized profile, one block per facility or endpoint. The two levers that decide behaviour are latency_ceiling_ms (the backpressure trigger) and backoff_multiplier (how hard the poller retreats when it fires).

# adaptive_poll.yaml — one profile per WMS/ERP endpoint
poller:
  base_url: "https://wms.internal/api"
  min_interval_s: 2.0          # floor between polls; prevents micro-bursts even when healthy
  max_interval_s: 300.0        # ceiling; a poller pinned here signals systemic WMS degradation
  latency_window: 25           # sliding window of recent latencies used for the moving average
  latency_ceiling_ms: 800.0    # avg latency above this = implicit backpressure -> multiplicative increase
  healthy_latency_ms: 200.0    # avg latency below this = system healthy -> additive decrease
  backoff_multiplier: 2.0      # multiplicative-increase factor on 429/503/504 or ceiling breach
  additive_decrease_s: 1.0     # seconds shaved per healthy poll (additive decrease)
  request_timeout_s: 15.0      # hard per-request timeout; a hang counts as backpressure
  dedup_ttl_s: 30.0            # window in which an identical endpoint+params poll is suppressed

# Equivalent Python config dict consumed by the poller
ADAPTIVE_POLL = {
    "base_url": "https://wms.internal/api",
    "min_interval_s": 2.0,
    "max_interval_s": 300.0,
    "latency_window": 25,
    "latency_ceiling_ms": 800.0,
    "healthy_latency_ms": 200.0,
    "backoff_multiplier": 2.0,
    "additive_decrease_s": 1.0,
    "request_timeout_s": 15.0,
    "dedup_ttl_s": 30.0,
}

Implementation

The poller keeps a sliding window of recent latencies and drives the interval with multiplicative increase, additive decrease (MIAD): it retreats fast under backpressure and creeps back toward the floor only while the moving average stays healthy. Every poll is gated by a time-boxed dedup set so a resumed or duplicated request never triggers a redundant materialized-view refresh on the WMS. A payload that fails schema validation is treated as soft backpressure — the interval widens rather than accepting a truncated array.

from __future__ import annotations

import hashlib
import logging
import time
from collections import deque
from dataclasses import dataclass, field
from typing import Any, Callable

import requests

logger = logging.getLogger("velocity.poll")


@dataclass
class PollerConfig:
    base_url: str
    min_interval_s: float = 2.0
    max_interval_s: float = 300.0
    latency_window: int = 25
    latency_ceiling_ms: float = 800.0
    healthy_latency_ms: float = 200.0
    backoff_multiplier: float = 2.0
    additive_decrease_s: float = 1.0
    request_timeout_s: float = 15.0
    dedup_ttl_s: float = 30.0


class AdaptivePoller:
    """Self-regulating WMS poller for endpoints that publish no rate limit."""

    def __init__(self, cfg: PollerConfig, api_key: str) -> None:
        self.cfg = cfg
        self.interval = cfg.min_interval_s
        self._latencies: deque[float] = deque(maxlen=cfg.latency_window)
        self._seen: dict[str, float] = {}          # req_hash -> expiry monotonic ts
        self._session = requests.Session()
        self._session.headers.update({"Authorization": f"Bearer {api_key}"})
        adapter = requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=50)
        self._session.mount("https://", adapter)

    def _hash(self, endpoint: str, params: dict[str, Any]) -> str:
        raw = f"{endpoint}:{sorted(params.items())}"
        return hashlib.sha256(raw.encode()).hexdigest()[:16]

    def _adjust(self, latency_ms: float, backpressure: bool) -> None:
        self._latencies.append(latency_ms)
        avg = sum(self._latencies) / len(self._latencies)
        if backpressure or avg > self.cfg.latency_ceiling_ms:
            self.interval = min(self.interval * self.cfg.backoff_multiplier, self.cfg.max_interval_s)
            logger.warning("backpressure: avg=%.0fms interval->%.1fs", avg, self.interval)
        elif avg < self.cfg.healthy_latency_ms:
            self.interval = max(self.interval - self.cfg.additive_decrease_s, self.cfg.min_interval_s)

    def poll(self, endpoint: str, params: dict[str, Any],
             is_valid: Callable[[dict], bool]) -> dict | None:
        """Poll once; return a validated payload or None, then sleep the adaptive interval."""
        key, now = self._hash(endpoint, params), time.monotonic()
        if self._seen.get(key, 0) > now:
            logger.debug("dedup: suppressing repeat poll of %s", key)
            return None
        payload = None
        try:
            start = time.perf_counter()
            resp = self._session.get(f"{self.cfg.base_url}{endpoint}", params=params,
                                     timeout=self.cfg.request_timeout_s)
            latency_ms = (time.perf_counter() - start) * 1000
            backpressure = resp.status_code in (429, 503, 504)
            if resp.status_code == 200 and is_valid(resp.json()):
                payload = resp.json()
                self._seen[key] = now + self.cfg.dedup_ttl_s
            elif resp.status_code == 200:
                backpressure = True  # truncated/stale body = soft backpressure
                logger.warning("schema gate rejected 200 body on %s", endpoint)
            self._adjust(latency_ms, backpressure)
        except requests.RequestException as exc:
            logger.error("request failed (%s); treating as backpressure", exc)
            self._adjust(self.cfg.latency_ceiling_ms + 1, backpressure=True)
        time.sleep(self.interval)
        return payload

Step-by-Step Walkthrough

Pool connections once. The constructor builds a single requests.Session with an HTTPAdapter (pool_maxsize=50) so every poll reuses a warm socket. A fresh session per call is the fastest way to exhaust the WMS connection pool — the very throttle you cannot see.
Gate on the dedup set. poll hashes endpoint + sorted(params) and checks _seen; a hash still inside its dedup_ttl_s window short-circuits. WMS snapshots are eventually consistent, so a duplicate poll only buys a redundant view refresh and more backend load.
Measure, then classify. Each request is timed with perf_counter. A 429, 503, or 504 is explicit backpressure; a 200 OK whose body fails is_valid (a truncated array or missing cursor) is treated as soft backpressure by flipping the flag — the poller never scores a partial payload.
Apply MIAD in _adjust. The latency feeds a deque(maxlen=latency_window). If the moving average breaches latency_ceiling_ms or any backpressure fired, the interval is multiplied by backoff_multiplier up to max_interval_s. Only a sustained average below healthy_latency_ms shaves additive_decrease_s back toward min_interval_s. Retreat is fast; recovery is deliberate.
Sleep the adaptive interval. The time.sleep(self.interval) at the tail enforces the current cadence. Because a network exception routes through _adjust with a synthetic over-ceiling latency, a hung endpoint widens the interval exactly like a slow one.

Verification

Assert the controller’s invariants directly — the interval must climb on backpressure and never fall below the floor. These checks run without a live WMS by driving _adjust and stubbing the session.

import logging

cfg = PollerConfig(base_url="https://wms.internal/api",
                   min_interval_s=2.0, backoff_multiplier=2.0)
poller = AdaptivePoller(cfg, api_key="test")

# Backpressure must widen the interval multiplicatively.
poller._adjust(latency_ms=50.0, backpressure=True)
assert poller.interval == 4.0, poller.interval

# Sustained healthy latency must decrease additively, never below the floor.
for _ in range(10):
    poller._adjust(latency_ms=50.0, backpressure=False)
assert poller.interval == cfg.min_interval_s

# A 200 body that fails the schema gate must be suppressed, not returned.
logging.basicConfig(level=logging.INFO)
assert poller._hash("/inv", {"page": 1}) == poller._hash("/inv", {"page": 1})
print(f"OK — interval floor held at {poller.interval:.1f}s")

Sample expected output:

WARNING:velocity.poll:backpressure: avg=50ms interval->4.0s
OK — interval floor held at 2.0s

Common Pitfalls

Trusting a 200 OK. During high-concurrency windows the WMS returns 200 with a silently truncated results array and an expired pagination cursor. Without the is_valid gate those partial payloads become corrupted velocity baselines. Always reject a body missing sku_id, location_code, pick_count, or last_updated_ts and widen the interval.
A floor of zero. Setting min_interval_s to 0 lets the poller micro-burst the instant latency dips, re-triggering the contention it just backed off from. Keep the floor at 2–3 seconds so healthy periods still cannot produce a thundering herd.
Polling through the nightly ETL. Latency spikes that correlate with the WMS reconciliation job are lock contention, not client fault. A poller pinned at max_interval_s during that window is behaving correctly — schedule around known batch jobs rather than chasing the backoff.
Sharing no state across replicas. Two poller processes each keep their own in-memory interval and _seen, so they collectively double the load and defeat dedup. Externalize both to Redis before scaling horizontally, and hand the heavy full-catalog reconciliation to Async Batch Processing for Velocity instead of a tighter poll.

FAQ

How do I detect an implicit rate limit with no headers to read?

Watch three client-side signals over your diagnostic baseline: P95 latency climbing past latency_ceiling_ms during peak picking, 200 OK bodies with truncated arrays or missing cursors, and intermittent 503/504s with no Retry-After. Cross-reference each latency spike against the WMS transaction log to confirm the bottleneck is the backend, not network jitter. The moving average in _adjust turns those signals into a cadence automatically.

What should `max_interval_s` actually be?

High enough that a genuinely degraded WMS is not polled into the ground — 300 seconds is a safe default. Treat a poller pinned at the ceiling as an alert, not a steady state: it means the WMS itself is degraded, and the right response is to fall back to a scheduled batch pull until it stabilizes rather than lowering the interval.

Can I run this pattern asynchronously?

Yes — swap requests for httpx.AsyncClient and time.sleep for await asyncio.sleep; the MIAD logic is transport-agnostic. If you need to score hundreds of thousands of SKUs on a schedule rather than poll a live cursor, the bounded-concurrency approach in Async Batch Processing for Velocity is the better fit.

WMS & ERP Polling Strategies — the parent guide covering watermark cursors, delta emission, and 429 governance for endpoints that do publish limits.
Schema Validation for Inventory Feeds — the contract behind the is_valid gate that rejects truncated 200 OK payloads.
Async Batch Processing for Velocity — the scheduled, bounded-concurrency alternative when a live cursor poll is the wrong tool.
Velocity Data Ingestion & WMS Sync Pipelines — the parent architecture this extraction layer feeds.