Pessimistic vs optimistic concurrency for slot reservation: a benchmark and a decision
The standard 'use OCC' advice fits the median case. Under sustained contention with users who've already entered payment details, the honest choice is different.
Three players hit the same Friday-evening slot within 800ms of each other. One booking should land. Two should see "already booked." None should see a confirmation page that turns out to be a lie.
This is the load case for any real-world booking platform. Conflict is not the rare exception - Friday evenings are when conflicts happen. Building a booking engine for this load case is a different exercise from building one for the median case.
The internet's standard advice is "use optimistic concurrency control with a version column." We benchmarked it. We didn't pick it. Here's why.
The decision
We used pessimistic locking at the slot level - a SELECT ... FOR UPDATE inside the booking transaction, and a strict ordering of operations such that nothing else can interleave between "check availability" and "write the booking."
The standard alternative is optimistic concurrency control (OCC): each slot has a version column, you read it, you do your booking work, you write back with WHERE version = <what we read>, and if the row was changed under you (by another booking) the update returns 0 rows and you retry.
Both work in the abstract. They have different real-world behaviours under our specific load.
The benchmark
We ran a synthetic load test: 10 concurrent clients each trying to book the same slot, repeated 1000 times.
| Approach | Successful bookings (should be 1) | Avg latency under contention | Tail behaviour |
|---|---|---|---|
Pessimistic (FOR UPDATE) | 1 every time | 22ms p50, 80ms p99 | Waits queue cleanly |
| OCC with retry (3 attempts) | 1 every time | 18ms p50, 240ms p99 | Tail bloats with retry storm |
| OCC with no retry | 1 every time | 15ms p50, 35ms p99 | 9 of 10 see "try again" |
| Naive (no concurrency control) | 2-4 per run, randomly | 8ms p50 | Double-bookings - broken |
The numbers tell a specific story. OCC with retry is faster if every retry succeeds, but under sustained contention the retry storm makes the tail worse than pessimistic locking. OCC with no retry is faster but pushes "try again" to the user, which is the wrong UX for a booking flow - the user has already entered their details, they shouldn't be asked to redo it.
Pessimistic locking serialises the contended path with predictable latency. Under our load (peak ~50 conflicts/second on Friday evenings), it was the right shape.
The schema and the lock
create table turfs ( id uuid primary key, location_id uuid not null, name text not null, surface text not null, -- 'turf' | 'concrete' | 'grass' active boolean not null default true ); create table bookings ( id uuid primary key, turf_id uuid not null references turfs(id), player_id uuid not null, slot_start timestamptz not null, slot_end timestamptz not null, status text not null, -- 'reserved' | 'confirmed' | 'cancelled' | 'no_show' amount_paise bigint not null, payment_id text, created_at timestamptz not null default now(), cancelled_at timestamptz, -- The exclusion constraint enforces no two confirmed bookings overlap -- on the same turf exclude using gist ( turf_id with =, tstzrange(slot_start, slot_end, '[)') with && ) where (status in ('reserved', 'confirmed')) );
The exclusion constraint is the database-level safety net. Even if the application-level locking fails, the database will not accept two overlapping reserved or confirmed bookings on the same turf. This is the "in case our concurrency reasoning is wrong, what protects us" layer.
The booking transaction:
async function reserveSlot(params: { turfId: string; playerId: string; slotStart: Date; slotEnd: Date; }): Promise<BookingResult> { return await db.transaction(async (tx) => { // 1. Lock the turf row to serialise concurrent bookings on this turf const turf = await tx.query( "select * from turfs where id = $1 for update", [params.turfId] ); if (!turf || !turf.active) { return { ok: false, reason: "turf_inactive" }; } // 2. Check overlap (with the same predicate as the exclusion constraint) const conflict = await tx.query( `select 1 from bookings where turf_id = $1 and status in ('reserved', 'confirmed') and tstzrange(slot_start, slot_end, '[)') && tstzrange($2::timestamptz, $3::timestamptz, '[)') limit 1`, [params.turfId, params.slotStart, params.slotEnd] ); if (conflict.length > 0) { return { ok: false, reason: "slot_taken" }; } // 3. Compute price (peak-hour rules etc - see below) const amount = await computePrice(params); // 4. Insert the reservation. The exclusion constraint is the final safety net. const booking = await tx.query( `insert into bookings ( id, turf_id, player_id, slot_start, slot_end, status, amount_paise ) values ($1, $2, $3, $4, $5, 'reserved', $6) returning *`, [uuid(), params.turfId, params.playerId, params.slotStart, params.slotEnd, amount] ); return { ok: true, bookingId: booking[0].id, amountPaise: amount }; }); }
The for update on the turf row is the lock. Two concurrent transactions targeting the same turf serialise - the second waits for the first to commit. The wait is short (the booking transaction is small), and the queue is fair.
The for update on the turf, not the bookings table, was deliberate. Locking the turf row is a coarser lock - it serialises all bookings on that turf, not just on the contested slot. The slot-level alternative would be a row in a turf_slots table, which we considered. Coarse turf-level locking was simpler, the contention was acceptable (a single turf has at most a handful of concurrent booking attempts at peak), and the alternative table added joins everywhere.
Pricing as a function, not a column
A common mistake: store price on each slot. It looks fine until you change pricing.
Slots don't have prices. Slot bookings have prices, computed at booking time:
type PricingContext = { turfId: string; slotStart: Date; playerId: string; promoCode?: string; }; async function computePrice(ctx: PricingContext): Promise<number> { const turf = await db.turfs.findById(ctx.turfId); const baseRate = turf.baseRatePaise; let multiplier = 1.0; // Peak-hour rule const hour = ctx.slotStart.getHours(); const day = ctx.slotStart.getDay(); if ([5, 6].includes(day) && hour >= 18 && hour < 22) { multiplier *= 1.5; // Friday/Saturday evening peak } // Member discount const player = await db.players.findById(ctx.playerId); if (player.membershipTier === "premium") { multiplier *= 0.85; } // Promo code if (ctx.promoCode) { const promo = await validatePromoCode(ctx.promoCode, ctx); if (promo) multiplier *= 1 - promo.discount; } return Math.round(baseRate * multiplier); }
The function takes a context and returns a price. The price is recomputed every time. Operators can change the underlying rules (turf rate, peak hours, member tiers) without touching anything booking-side.
This was crucial when the central operator wanted to A/B test peak-hour multipliers across cities. We changed the function. We didn't migrate slot rows.
Cancellation rules: also a function
Same pattern for refund logic on cancellation:
function refundAmount(booking: Booking, cancelledAt: Date): number { const hoursUntilSlot = hoursBetween(cancelledAt, booking.slotStart); if (hoursUntilSlot >= 24) return booking.amountPaise; // full refund if (hoursUntilSlot >= 6) return Math.round(booking.amountPaise * 0.5); // 50% return 0; // no refund }
The refund function reads booking metadata and a clock. It does not look at any "refund_eligibility" flag. The flag would have rotted; the function did not.
Operator-discretion overrides existed (weather cancellation, regular customer goodwill) but were always additive - the function returned the policy refund, the operator could choose to grant more on top, and that choice was logged with a reason.
Buffer-time enforcement
Most "slot calendars" let bookings butt up against each other. Real cleaning needs 15 minutes. We enforced this at the slot-availability check:
const BUFFER_MINUTES = 15; function effectiveSlotRange(slotStart: Date, slotEnd: Date) { return { start: addMinutes(slotStart, -BUFFER_MINUTES), end: addMinutes(slotEnd, BUFFER_MINUTES), }; } // Used in both the conflict check and the exclusion constraint
A 7pm-to-8pm booking on a turf made the 6:45pm-to-8:15pm window unavailable. The exclusion constraint enforced it. The UI showed it. The cleaning crew got their 15 minutes.
The constant lived in one place. When the operator wanted to test 10-minute buffers in some cities, we made the constant per-location and the rest of the code didn't change.
What surprised me
Database-level constraints saved us once. Early in the build, an application-level race got past the lock under a specific timing. The exclusion constraint caught it - the second booking failed with a constraint violation, the user saw "slot taken," nobody got a double-booking. The constraint is the safety net you hope to never need; we needed it once and were glad for it.
Cancellation races were as bad as booking races. Two players cancel the same booking simultaneously (one from a phone, one from a desktop, both refreshing the page). Without locking, both refunds get issued. We applied the same for update pattern on cancellation. Boring, important, easy to forget.
Peak-pricing felt obvious in hindsight; took a week to convince stakeholders. The "we should charge more for Friday evenings" rule was unintuitive to the operator at first. We shipped it as a per-city configurable multiplier. Adoption was rapid once they saw the revenue lift on Friday-evening slots.
What I'd do differently
Slot-level rows from day one. We started with turf-level locking and bookings as the only table. Eventually we added a turf_slots table for performance and richer per-slot metadata (operator notes, weather closures). Earlier introduction would have made some queries simpler.
Idempotent booking endpoint. A user double-tapping the "Book" button could in theory create two reservations. Application-level deduplication caught this in 99% of cases; making the endpoint truly idempotent (with a client-supplied idempotency key) would have closed the last 1%.
Booking holds with explicit TTL. We had reservations that auto-cancelled after 5 minutes if payment didn't complete. The TTL was implemented as a job that ran every minute. Better is to have an explicit expires_at column, the booking-availability query treats expired-without-payment reservations as available, and the cleanup job is strictly cosmetic.
The load case is Friday at 7pm. Pick for that, not for the median row.