← /writing/tech·2026 · 03 · 21·7 min read

Designing an event-sourced state machine for real-world fleet operations

Why the rental row never had a status column, and what we built instead.

companion: Building an EV Rental Operating System for Last-Mile Delivery

Most engineers reach for a "current state" column on the rental row when they build a fleet platform. Updates land via UPDATE rentals SET status = ?. Six months in, you discover the database thinks a rider has a vehicle that the showroom thinks was returned, the battery operator thinks is in for service, and the rider's app thinks is paused.

The state column drifted. There is no way back to ground truth. Disputes resolve to whoever shouts loudest.

This was the problem on day one of the EV rental platform. We never let it happen because the rental row never had a status column.

The decision

Every rental was modelled as an append-only event log with derived current-state views. No mutable status field anywhere. The state of a rental at any moment was a projection - replay the events, fold them into a state - not a stored value that could drift out of sync with reality.

The events table sat at the centre:

sqlcreate table rental_events (
  id              uuid primary key,
  rental_id       uuid not null,
  event_type      text not null,
  payload         jsonb not null,
  reason_code     text,
  actor_id        uuid not null,        -- who emitted the event
  actor_role      text not null,        -- rider | showroom | admin | battery_op
  occurred_at     timestamptz not null,
  recorded_at     timestamptz not null default now(),
  idempotency_key text not null unique
);

create index on rental_events (rental_id, occurred_at);

Three things matter here:

  • occurred_at and recorded_at are different. A battery operator at a swap point with no signal logs an event at 06:42, the network reconnects at 07:15, and the event lands in the database at 07:15. Both timestamps are preserved. Replays use occurred_at; reconciliation uses recorded_at.
  • idempotency_key is a hard unique. The battery operator's offline-tolerant app retries on reconnect. The same swap event must not double-write. The key is deterministic - {actor_id}-{rental_id}-{event_type}-{minute_bucket} - so a retry produces the same key and collides on the unique constraint.
  • actor_role is on every row. Disputes ("the showroom said the vehicle was returned") resolve by filtering events by role.

The alternatives we considered

Plain CRUD with a status column. Fastest to build. Drift within weeks. Every dispute becomes a finger-pointing exercise because there is no audit of how the status got that way.

Status column plus an audit log table. Better - at least there is a record. But the status column is still the source of truth, and the audit log is the apology when reality disagrees with it. We would have spent operations time reconciling the status column against the log instead of against reality.

Full event sourcing with a CQRS framework (e.g. Eventuate, Axon). Tempting, but the framework was bigger than the problem. We needed event sourcing on one aggregate (the rental); we did not need it on customers, payments, vehicles. The framework cost outweighed the benefit.

The chosen approach was event sourcing on the rental aggregate, plain CRUD on everything else. Hybrid is fine. Don't event-source the customer profile.

What the projections looked like

The "current state" of a rental was a SQL view that folded the events:

sqlcreate view rental_current_state as
select
  rental_id,
  -- last assignment event wins
  (
    select payload->>'vehicle_id' from rental_events e2
    where e2.rental_id = e.rental_id
      and e2.event_type = 'vehicle_assigned'
    order by occurred_at desc limit 1
  ) as current_vehicle_id,
  -- last battery swap wins
  (
    select payload->>'battery_id' from rental_events e3
    where e3.rental_id = e.rental_id
      and e3.event_type = 'battery_swapped'
    order by occurred_at desc limit 1
  ) as current_battery_id,
  -- derived status
  case
    when exists (
      select 1 from rental_events x
      where x.rental_id = e.rental_id
        and x.event_type = 'rental_returned'
    ) then 'returned'
    when exists (
      select 1 from rental_events x
      where x.rental_id = e.rental_id
        and x.event_type = 'rental_paused'
        and not exists (
          select 1 from rental_events y
          where y.rental_id = e.rental_id
            and y.event_type = 'rental_resumed'
            and y.occurred_at > x.occurred_at
        )
    ) then 'paused'
    else 'active'
  end as status,
  max(occurred_at) as last_event_at
from rental_events e
group by rental_id;

Hot reads went through this view in development. In production, we materialised it - refresh materialized view concurrently after every event insert, batched at 1-second granularity. Read latency stayed under 30ms even with millions of events.

The materialisation was a cache, not the source of truth. If it ever drifted (it never did), we could rebuild it by replaying. That property mattered more than the latency.

Idempotent events for the offline app

The battery operator's tablet ran inside a swap point with intermittent backhaul. Operators logged swaps in seconds and walked to the next bay. The app could not pause for the network.

Local-first writes solved the latency. Idempotency keys solved the duplicate problem.

ts// Battery operator app - local write, queued for sync
function logSwap(rentalId: string, oldBatteryId: string, newBatteryId: string) {
  const occurredAt = new Date();
  const minuteBucket = occurredAt.toISOString().slice(0, 16); // YYYY-MM-DDTHH:MM
  const idempotencyKey = `battery_op-${operatorId}-${rentalId}-battery_swapped-${minuteBucket}`;

  const event = {
    rentalId,
    eventType: "battery_swapped",
    payload: { oldBatteryId, newBatteryId },
    actorId: operatorId,
    actorRole: "battery_op",
    occurredAt,
    idempotencyKey,
  };

  db.localEvents.put(event);          // commits to local SQLite
  syncQueue.enqueue(event);           // best-effort sync to server
  return event;
}

The same operator scanning the same battery twice within 60 seconds collapsed to one event. The minute bucket was the right granularity for our workflow - finer was unnecessary, coarser would have lost legitimate retries.

On the server side, the idempotency_key unique constraint did the rest. Conflicting inserts threw, the sync layer caught the conflict, and the local event was marked synced.

The race the model dissolves

The original "scary" race: the rider finishes a shift and arrives at a swap point. Simultaneously, the showroom manager updates the rider's vehicle assignment in the back-office because they think the rider returned the bike yesterday.

In a status-column world, this is a nightmare. Two writes hit the same row from different actors with different intents. Whoever lands last wins. The other loses, silently.

In the event log:

12:01:30  swap_point: battery_swapped
            { rental_id: R1, vehicle_id: V1, old_battery: B1, new_battery: B2 }
12:01:32  showroom: vehicle_unassigned
            { rental_id: R1, vehicle_id: V1, reason: "rider returned yesterday" }

Both events land. Both are visible. The projection now shows: current_vehicle_id = null (last event wins on assignment) but the swap event for V1 is still in the log. An operator looking at the rental can see that two contradicting events happened in two seconds, by two different roles, and can adjudicate. The system isn't broken; it has captured a real-world disagreement that needs human resolution.

That's the entire point. Software shouldn't paper over disagreements between actors. It should surface them.

What surprised me

The reason_code column carried more weight than expected. Every event had to ship with a structured reason from a fixed enum (vehicle_in_service, rider_no_show, battery_overheated, etc.). At first this felt like ceremony. After three months, the reason codes were the most-queried field in the database - they powered fleet-wide diagnostics, billing exception reports, and the operating playbook itself.

Materialised view refresh strategy mattered more than I thought. refresh materialized view concurrently requires a unique index on the view, and the lock semantics are different from a non-concurrent refresh. We hit subtle issues at scale and ended up with a custom incremental projection updater that subscribed to the event stream and wrote the projection rows on the same transaction as the event insert. Faster, simpler, no refresh dance.

Event types should be additive forever. We renamed an event type once early on. The migration to rewrite historical events lost a day. After that, deprecated event types stayed in the enum forever; new types were added; nothing was renamed.

What I'd do differently

Start the materialised projections inside the application layer. Postgres views and refresh materialized view got us 80% of the way. Going from 80% to 100% would have been smoother with an explicit projection updater that owned its own table from day one. The migration when we eventually built that was painful only because we had inherited the materialised-view tooling first.

Snapshots after a threshold. Replaying every event for a long-running rental is wasteful. Our heaviest fleet renters crossed 1,200 events per rental within 18 months, mostly battery swaps. We added snapshots at the 1,000-event mark, but only after we had to. Should have built snapshot infrastructure into the projection layer from day one.

A typed event registry. We had loose JSON payloads with runtime validation. Should have had a typed registry per event type - code generation from a schema, mandatory shapes, breaking-change protection. Worth the up-front cost.


If your software sits on top of a physical operation where multiple humans can update the same thing in the same minute: do not give it a status column. Give it events. The status column is the lie the database tells when reality is more complicated than the schema. The events are reality.