Operational Baseline

The system is only launch-ready if the critical paths are measurable.

Observability is treated as part of the financial safety model. Trade execution, position monitoring, trigger evaluation, and emergency close each have their own SLOs, metrics, dashboards, and alerting rules because the team needs to know when a degraded mode is still safe and when it has crossed into dangerous territory.

Trade latency

<30s

Target end-to-end on Arbitrum from cast request to finalized state.

Monitor cycle

<45s

Healthy position monitor cycle must finish inside its 60-second cadence.

Alert delivery

<5s

Threshold breach to dispatch completion for critical alerts.

Emergency close

99.99%

Availability target for the safety net endpoint.

System health dashboard

Worker, workflows, TLM, Neon, KV, queues, bundlers, and RPCs each contribute to a unified status grid.

Execution dashboard

Trade funnel, gas efficiency, failover events, paymaster health, and active exposure distribution.

Product dashboard

Traffic, active users, active triggers, route latency, and user-facing error composition.

Testing strategy

The test plan is threat-driven, not coverage-driven theater.

The stack uses static analysis, unit and integration tests, on-chain tests, E2E journeys, load tests, and chaos tests. The key point is not using every tool. It is matching the tool to the type of failure the system can actually suffer.

Unit + integration

Protect shared business logic, schemas, query modules, worker contracts, and the TLM state machine.

E2E

Only the narrowest critical journeys: cast, trigger fire, emergency close, and DISARM.

Load + chaos

Stress the monitor, edge API, queue replay, bundler/RPC failover, and database outage cascades.

Quality gates that matter most

`packages/core` targets at least 90% line coverage and 85% branch coverage. Mutation score threshold is 80%. Critical-path SLOs are validated in load and synthetic monitoring, not just in unit suites.

Chaos scenarios that must pass

Neon outage, KV outage, bundler failover chain, RPC outage, paymaster depletion, Vercel down, queue consumer failure, and recovery backpressure are all explicit test cases.

Implementation roadmap

Delivery is staged around dependency order, not team convenience.

Shared contracts and business logic lock first. Execution backbone and GMX integration follow. API and web come after the backend seams stabilize. Resilience and intelligence hardening are added before launch-readiness closes the loop.

Phase 0–1: Preparation and foundation

Audit edge cases, scaffold the monorepo, lock schemas and shared constants, and stabilize package contracts.

Phase 2: Execution backbone

Implement TLM, core workflows, registry, GMX calldata and wrapper path, and prove end-to-end execution on testnets.

Phase 3–4: API, frontend, resilience, and intelligence

Ship the user-facing surfaces, then harden queue replay, fallback reads, monitoring, TP/SL, and signal pipelines.

Phase 5–6: Launch and deferred hardening

Close observability and runbooks, then leave cross-region Neon, stronger key infrastructure, and advanced nonce parallelism for justified later work.

Launch guardrails

Guardrails are where launch quality becomes concrete.

The edge-case audit and GMX integration spec are effectively a list of things the team must not get wrong. These are not “nice-to-have hardening tasks.” They are the practical controls that prevent protocol quirks from becoming user-facing losses.

USDT approve-to-zero reset Safe withdrawal margin 95% Asymmetric slippage bounds Stale price and sequencer detection GMX wrapper receiver pinning External modification reconciliation

GMX launch must-haves

Wrapper-deployed automated path, curated markets, execution-fee reserve checks, order-key tracking, 5-minute keeper timeout alerts, and cancellation handling.

DeFi correctness must-haves

Safe approval strategy, swap target validation, optimistic locking on position status, dust-threshold close verification, and reliable stale data detection.

Explicitly unresolved

These decisions are intentionally not closed for launch.

The spec marks a small set of questions as deferred rather than silently settled. They should stay visible so the team does not mistake present launch scope for a permanent architecture ceiling.

Infra and scale questions

When is cross-region Neon worth its complexity?
When does serialized per-vault execution need 2D nonce parallelism?
When does key management move from Workers secrets to external KMS or HSM?

Platform and vendor questions

How reliable is GMX SDK read-only usage inside Workers at scale?
Which long-term observability sink should own logs and metrics?

Cost + reliability

Costs, constants, and SLOs anchor the operating envelope.

The constants reference exists so the team can audit drift between documents. It is where cadence, retry budgets, fee caps, wrapper bounds, alert thresholds, and quality gates become numerically explicit rather than narrative.

Cloud costs

Cloudflare, Neon, and Vercel stay modest until monitoring scale, replay load, or observability storage grows materially.

Reliability constants

Cadences, timeout windows, fee caps, alert thresholds, and wrapper bounds all live in one audit surface.

User-facing guarantees

Trade latency, alert delivery, monitor cycle completion, and emergency-close availability remain the visible reliability promises.

50 Parallel trigger DO reads allowed per sweep.

250 Default accounts per monitoring partition.

0.01 ETH GMX wrapper max execution fee per order.