← /writing/tech·2026 · 02 · 22·8 min read

Daily route generation under real-world constraints: a hand-rolled scheduler

Why we abandoned OR-Tools for a deterministic algorithm operators could read. Trust beats optimality for local services.

companion: Building a Subscription Commerce Platform for On-Demand Car Washing

Five hundred customers, twelve workers, six clusters, sunrise to noon. The job: every customer's car gets washed, accessible parking windows respected, yesterday's complaints handled first, no worker overloaded.

You can solve this by hand for thirty customers. You cannot solve it by hand for five hundred. The platform's scheduler had to.

I started with Google OR-Tools and abandoned it. The decision is in the title.

The decision

We built a deterministic, rule-based assignment engine instead of using a constraint-programming solver. The reasons were three:

  1. Operators needed to trust the schedule. A constraint solver produces a schedule that's optimal under its objective function, but the operator can't articulate why a particular assignment was made. When an operator says "swap these two," the platform either lets them or doesn't. A solver-generated schedule that gets manually overridden 30% of the time is a worse user experience than a hand-rolled engine where every decision is traceable.
  2. The objective function shifted weekly. This week we want to minimise drive time. Next week the supervisor wants experienced workers on premium customers. The week after, we're testing whether starting from the city's east side reduces complaint rates. A solver with a slow-moving objective function doesn't fit a business that learns weekly.
  3. The data shape didn't fit a solver. Constraint solvers want a clean problem. Our problem had a long tail of edge cases - "this customer has a basement parking with a 9–11am window," "this worker doesn't drive, only does walkable clusters," "this address requires the supervisor's permission to access." Encoding these into a solver's constraint language was harder than writing them as code.

The right tool for the job was a small assignment engine, written in TypeScript, with a deterministic algorithm that operators could read.

The algorithm

Pseudocode for the nightly schedule run:

tstype Customer = {
  id: string;
  cluster: string;            // 'cluster_north_1' etc
  vehicleType: "sedan" | "suv" | "ev";
  parkingType: "open" | "covered" | "basement";
  parkingWindow?: { start: string; end: string };  // 'HH:MM' local
  skipDates: Set<string>;     // 'YYYY-MM-DD'
  complaintsToRedo: string[]; // wash IDs from previous days
};

type Worker = {
  id: string;
  homeCluster: string;
  capacityMinutes: number;
  skills: Set<"sedan" | "suv" | "ev">;
  walkOnly: boolean;
};

type Assignment = {
  workerId: string;
  customerId: string;
  estimatedStart: string;   // 'HH:MM'
  estimatedDuration: number; // minutes
  carriedOverComplaint?: string;
};

function generateSchedule(date: Date, customers: Customer[], workers: Worker[]): Assignment[] {
  const assignments: Assignment[] = [];

  // 1. Filter out skips
  const dateKey = formatDate(date);
  const eligible = customers.filter((c) => !c.skipDates.has(dateKey));

  // 2. Group by cluster
  const byCluster = groupBy(eligible, (c) => c.cluster);

  for (const [cluster, clusterCustomers] of byCluster) {
    const clusterWorkers = workers.filter((w) => w.homeCluster === cluster);

    // 3. Carry over complaints first - yesterday's loose ends are today's first job
    const complaintCustomers = clusterCustomers.filter(
      (c) => c.complaintsToRedo.length > 0
    );
    const regularCustomers = clusterCustomers.filter(
      (c) => c.complaintsToRedo.length === 0
    );

    // 4. Order regular customers by parking window, then by historical sequence
    regularCustomers.sort((a, b) => {
      const aStart = a.parkingWindow?.start || "23:59";
      const bStart = b.parkingWindow?.start || "23:59";
      return aStart.localeCompare(bStart);
    });

    // 5. Greedy assign - each worker fills their capacity from the priority list
    const ordered = [...complaintCustomers, ...regularCustomers];
    const workerCapacity = new Map(clusterWorkers.map((w) => [w.id, w.capacityMinutes]));

    for (const customer of ordered) {
      const eligibleWorkers = clusterWorkers
        .filter((w) => w.skills.has(customer.vehicleType))
        .filter((w) => (workerCapacity.get(w.id) ?? 0) >= durationFor(customer))
        .sort((a, b) => (workerCapacity.get(b.id) ?? 0) - (workerCapacity.get(a.id) ?? 0));

      if (eligibleWorkers.length === 0) {
        // unassigned - supervisor exception queue
        assignments.push(unassigned(customer));
        continue;
      }

      const worker = eligibleWorkers[0];
      const start = nextSlotFor(worker, customer.parkingWindow);
      const duration = durationFor(customer);

      assignments.push({
        workerId: worker.id,
        customerId: customer.id,
        estimatedStart: start,
        estimatedDuration: duration,
        carriedOverComplaint: customer.complaintsToRedo[0],
      });

      workerCapacity.set(worker.id, (workerCapacity.get(worker.id) ?? 0) - duration);
    }
  }

  return assignments;
}

That's the entire scheduler. It is not optimal in any formal sense. It is deterministic, traceable, and good enough that supervisors trust it.

When a supervisor asks "why did Ramesh get this customer," the answer is: he was the worker with the most remaining capacity in that cluster who has the skill for that vehicle type and was available during the customer's parking window. The supervisor can argue with that or override it. They can't argue with a black-box optimiser.

Capacity smoothing

The greedy assignment skewed work toward the same workers - whoever had the most capacity got the next customer, again and again. After a week, our most-skilled workers were doing 8-hour days while less-experienced workers were under-loaded.

The fix was to smooth capacity after the initial assignment:

tsfunction smoothCapacity(assignments: Assignment[], workers: Worker[]): Assignment[] {
  const byWorker = groupBy(assignments, (a) => a.workerId);
  const targetMinutes = totalAssignedMinutes(assignments) / workers.length;
  const SWAP_THRESHOLD = 0.15;  // 15% deviation triggers a smoothing pass

  for (const [workerId, workerAssignments] of byWorker) {
    const workerMinutes = workerAssignments.reduce((s, a) => s + a.estimatedDuration, 0);
    const deviation = (workerMinutes - targetMinutes) / targetMinutes;
    if (deviation <= SWAP_THRESHOLD) continue;

    // Worker is overloaded. Find an underloaded worker in the same cluster
    // who has the skill for one of this worker's assignments.
    const overloadedWorker = workers.find((w) => w.id === workerId)!;
    const underloaded = workers
      .filter((w) => w.homeCluster === overloadedWorker.homeCluster && w.id !== workerId)
      .map((w) => {
        const wm = (byWorker.get(w.id) ?? []).reduce(
          (s, a) => s + a.estimatedDuration,
          0
        );
        return { worker: w, minutes: wm };
      })
      .filter((x) => (x.minutes - targetMinutes) / targetMinutes < -SWAP_THRESHOLD)
      .sort((a, b) => a.minutes - b.minutes);

    if (underloaded.length === 0) continue;

    // Move assignments off the overloaded worker, smallest first, until
    // they're back inside the threshold.
    const sortedByDuration = [...workerAssignments].sort(
      (a, b) => a.estimatedDuration - b.estimatedDuration
    );
    for (const a of sortedByDuration) {
      const target = underloaded[0];
      const customer = customerById(a.customerId);
      if (!target.worker.skills.has(customer.vehicleType)) continue;

      a.workerId = target.worker.id;
      target.minutes += a.estimatedDuration;
      if ((minutesFor(workerId) - targetMinutes) / targetMinutes <= SWAP_THRESHOLD) break;
    }
  }

  return assignments;
}

Swaps were structured events in the audit trail. The original assignment and the smoothing override coexisted. A supervisor could see "this customer was originally assigned to Ramesh, then smoothed to Suresh, because Ramesh was 25% over target."

Handling exceptions: the queue, not the alert

The hardest design question was what to do with customers the scheduler couldn't place. A worker fell sick. A vehicle broke. A new customer came in mid-day. These were normal events; the scheduler had to handle them gracefully.

Initially, we did this with alerts. The supervisor got a notification: "5 unassigned customers in cluster_east_2." They'd open the dashboard, click through, reassign each one. It felt active.

It was wrong. Alerts created urgency. Urgency created mistakes. Supervisors started missing alerts when there were too many.

We replaced alerts with a queue:

sqlcreate table assignment_exceptions (
  id              uuid primary key,
  customer_id     uuid not null,
  date            date not null,
  exception_type  text not null,        -- 'no_eligible_worker' | 'capacity_exceeded'
                                        -- 'parking_window_unmatched' | 'walk_in'
  detected_at     timestamptz not null default now(),
  resolved_at     timestamptz,
  resolution      text,                 -- 'reassigned' | 'rescheduled' | 'cancelled'
  resolved_by     uuid
);

The supervisor's dashboard showed the queue, sorted by detection time. Each exception had context - why it happened, which workers were eligible but unavailable, what nearby clusters had spare capacity. The supervisor worked the queue from the top. There were no alerts.

This single change cut the supervisor's "always-on" feeling and made the operating model sustainable.

The skip mechanic

A subscription customer travelling for the weekend wanted to skip Saturday and Sunday. The platform supported it. The implementation was small and matters more than it looks:

sql-- A subscription skip is just an event on the customer's wash schedule
create table wash_skip_events (
  id          uuid primary key,
  customer_id uuid not null,
  skip_date   date not null,
  requested_at timestamptz not null default now(),
  reason      text,
  unique (customer_id, skip_date)
);

-- The scheduler reads from a materialised view that excludes skipped dates
create view eligible_customers_for_date as
select c.*
from customers c
where not exists (
  select 1 from wash_skip_events s
  where s.customer_id = c.id
    and s.skip_date = current_date
);

A skip was a row, not a status change. A customer could skip and unskip the same date multiple times before midnight; the latest event won. A skip didn't reduce the customer's monthly billing - they had paid for 30 wash-days; skipping one shifted their renewal date by one day.

This last bit was the expensive lesson. We initially treated skip as "you don't get billed for that day." Customers loved it. The unit economics broke. We changed it to "you don't get billed but your renewal moves out by one day." Customers still loved it. The operator stayed solvent.

What surprised me

The scheduler's accuracy didn't matter as much as its consistency. A scheduler that's 95% optimal but produces wildly different assignments week-to-week is worse than one that's 80% optimal but predictable. Workers wanted to know roughly which neighbourhoods they'd serve. Predictability was the actual product feature.

Photo evidence was the highest-leverage feature in the field worker app. Every wash ended with a photo. The photo settled disputes, enabled supervisor QA, and gave the customer emotional reassurance. Nothing else we built came close.

The capacity smoothing pass was bypassed more often than expected. Supervisors learned that for certain workers ("Suresh handles complaints best"), they'd manually override the smoothing. We accepted this. The platform's job wasn't to optimise; it was to give the supervisor a starting point they could trust and edit.

What I'd do differently

Make the assignment cost function configurable per cluster. Some clusters are dense, walking-friendly, single-block routes. Others are sparse, vehicle-heavy, multi-stop routes. We used the same algorithm for both and tuned by accepting more overrides in sparse clusters. A pluggable per-cluster cost function would have reduced the override count.

Persist the assignment trace, not just the assignment. When a supervisor asked "why this assignment," the answer was a re-computation. Saving the trace - which workers were considered, why each was rejected - would have made operator trust faster to build and overrides easier to authorise.

Skip and complaint should have been the same primitive. They're both "events that affect the schedule." We modeled them separately. They should have been one event-type table with a discriminator. We refactored toward this later.


If your scheduling problem fits a constraint solver cleanly, use one. If it doesn't - if your objective function shifts, if operators need to override, if your data has a long tail of edge cases - write the algorithm yourself, keep it small, make every decision traceable. Trust beats optimality in most local-services businesses.