← /writing/tech·2026 · 03 · 08·8 min read

Building a multi-source financial state model: reconciling CRM, payments, and bank into one view

How the console became a system of display, not a system of record - and why that decision held up under load.

companion: Building a Financial Operations Console for MSME Businesses

Three sources of truth, none of them yours, all of them disagreeing about whether an invoice has been paid.

That was the operational reality of the MSME console. The CRM said one thing about the customer relationship; the payment gateway said another about the actual debit; the bank statement said a third when the money landed. The console's job was to be the fourth source of truth - the one the business owner could trust - derived from the other three.

This writeup is about how that derivation worked.

The decision

The console was a system of display, not a system of record. We owned no canonical financial data. We owned a projection over data we didn't write.

Every external update - CRM webhook, payment gateway event, bank statement row - was ingested as an internal event in a normalised schema. The console rendered views over those events, not over the source systems directly. The source systems remained writable from outside (the sales rep updates the CRM, the payment gateway issues the invoice, the bank settles the debit). The console caught up in seconds.

This decision had three consequences:

  1. The console was eventually consistent. A change in the CRM was visible in the console seconds, sometimes minutes, after it landed. This was a feature, not a bug - the customer was working in their CRM, not in our app.
  2. The console could survive a source going down. If the payment gateway's webhooks broke for an hour, the console kept showing the last-known state, with a freshness indicator. Polling fallback caught up on recovery.
  3. The console could absorb a new integration in days, not quarters. Customers who wanted their accounting tool integrated got a connector exercise, not a re-architecture.

The integration spine

Every integrated system had two paths into the platform: webhook (push) and polling (pull, fallback). The webhook was preferred; polling caught up on what the webhook missed.

                    ┌─────────────────────┐
   webhook ───────► │  ingest endpoint   │ ─────► event log
                    └─────────────────────┘
                              ▲
                              │
   polling worker ────────────┘
   (every N minutes per source)

The ingest endpoint was the only place external data became internal. It did three things:

  1. Verify - signature check, expected schema, no replay attacks.
  2. Normalise - vendor terminology mapped to internal vocabulary (more on this below).
  3. Persist - write to the event log atomically.
sqlcreate table external_events (
  id                uuid primary key,
  source            text not null,        -- 'crm:hubspot' | 'payment:razorpay' | 'bank:icici'
  source_event_id   text not null,        -- the source's own identifier
  event_type        text not null,        -- normalised: 'invoice.issued' | 'payment.received' | etc
  raw_payload       jsonb not null,       -- preserved for replay/debugging
  normalised        jsonb not null,       -- canonical shape
  customer_id       uuid,                 -- resolved at ingest where possible
  occurred_at       timestamptz not null, -- from the source
  ingested_at       timestamptz not null default now(),
  unique (source, source_event_id)
);

The unique (source, source_event_id) constraint was load-bearing. Webhooks deliver at-least-once. Polling overlaps with webhook delivery. Without that unique, every event lived twice within a quarter.

raw_payload was preserved indefinitely. Every time we discovered a new edge case in a vendor's data, we replayed historical events through an updated normaliser. The raw payload was the time machine.

Status mapping: the deterministic dictionary

Every source system has its own status vocabulary for the same underlying state. CRM A calls a paid invoice "Closed-Won." CRM B calls it "Settled." Payment gateway A reports "captured." Payment gateway B reports "successful." The bank says "credit posted."

These all describe the same fact: the money has arrived. The console had to know.

We built a deterministic mapping table:

sqlcreate table status_mappings (
  id              uuid primary key,
  source          text not null,        -- 'crm:hubspot'
  source_status   text not null,        -- 'closed-won'
  internal_status text not null,        -- 'invoice.paid'
  confidence      smallint not null,    -- 100 = certain, 50 = ambiguous
  notes           text,
  unique (source, source_status)
);

The confidence column was unusual but mattered. Some source statuses were unambiguous ("invoice voided" → invoice.voided). Some were ambiguous - "Pending" in HubSpot can mean "awaiting customer signature" or "awaiting collection," depending on which pipeline the deal is in. For ambiguous mappings, the normaliser flagged the event for operator review instead of auto-mapping.

Adding a new mapping was a configuration exercise, not a code change. New CRM rolled out by the customer? The operations team added rows to status_mappings and replayed the affected events through the normaliser.

The unified financial state

The console's main view answered one question per row: what is the current state of this customer relationship?

sqlcreate view customer_financial_state as
select
  c.id           as customer_id,
  c.name,

  -- outstanding receivables
  coalesce((
    select sum(amount_paise) from invoices i
    where i.customer_id = c.id
      and i.internal_status in ('invoice.issued', 'invoice.partially_paid')
  ), 0) as outstanding_receivable,

  -- aging bucket of the oldest unpaid invoice
  (
    select extract(epoch from (now() - issued_at)) / 86400
    from invoices i
    where i.customer_id = c.id
      and i.internal_status = 'invoice.issued'
    order by issued_at asc limit 1
  ) as oldest_invoice_age_days,

  -- last successful payment from this customer
  (
    select max(occurred_at) from external_events e
    where e.customer_id = c.id
      and e.event_type = 'payment.received'
  ) as last_payment_at,

  -- staleness - when did each source last update
  (
    select jsonb_object_agg(source, max(ingested_at))
    from external_events e
    where e.customer_id = c.id
    group by source
  ) as source_freshness

from customers c;

The source_freshness column was the answer to "is this view trustworthy?" If the CRM hadn't updated in 48 hours, the row showed a "CRM stale" indicator. The customer's owner could chase the integration, not blame the console.

Reconciliation as a workflow

The harder cases were when sources actively disagreed. The CRM said the invoice was paid; the payment gateway showed no successful charge for that invoice. Or the bank statement showed a credit that didn't match any invoice.

These were not "bugs to fix in code." They were business reality: invoices paid by cheque outside the gateway, bank credits that didn't match any tracked invoice, customers paying the wrong amount. The console treated them as a workflow.

The conflict queue was supposed to be a rare-exception view. It became the operations team's home page. Conflicts were the actual signal; the rest of the dashboard was background. If we had known on day one, we would have invested the design polish there first.

sqlcreate table reconciliation_conflicts (
  id              uuid primary key,
  conflict_type   text not null,        -- 'crm_says_paid_no_payment_event'
                                        -- 'unmatched_bank_credit'
                                        -- 'amount_mismatch'
                                        -- 'duplicate_payment'
  severity        text not null,        -- 'info' | 'warn' | 'block'
  customer_id     uuid,
  related_events  uuid[] not null,      -- the events that disagree
  detected_at     timestamptz not null default now(),
  resolved_at     timestamptz,
  resolved_by     uuid,
  resolution_note text
);

A daily job swept the event log for the patterns above and inserted conflict rows. The console surfaced them in a queue. An operator clicked into a conflict, saw the disagreeing events with timestamps and source URLs, picked a resolution ("yes, this was a cheque payment recorded by the customer in the CRM"), and the resolution was logged.

The resolution itself became an event in the log - reconciliation.resolved. Subsequent views took the resolution into account.

Webhook reordering and idempotency

Webhooks don't arrive in order. Two events from the same source can land at the ingest endpoint reversed: payment.succeeded before invoice.issued. The system has to be safe under that.

The pattern that worked: every event had its own occurred_at from the source. Projections sorted by occurred_at, not ingested_at. An out-of-order arrival rewrote the projection to its correct state once both events had landed.

For the projections that could not be eventually consistent - say, an alert that fires when an invoice is overdue - we waited for a quiet window. If no new event for that invoice landed in 5 minutes, we considered the state stable enough to alert on. The window was tunable per signal.

What surprised me

Bank statement reconciliation was the slowest moving piece. Bank rails don't have webhooks. They have CSV downloads or scraped statements. The polling cadence was hourly at best. Customers who relied on bank-side reconciliation accepted a 1-hour latency on credit visibility, which felt acceptable in context but always felt slower than the rest of the platform.

The "raw payload kept forever" decision paid off three times. Each time we discovered a vendor change or our own normaliser bug, the fix was "rerun the normaliser over historical raw payloads." A platform that throws away raw payloads can never recover from the day a vendor changes a status code without telling anyone.

What I'd do differently

Per-source schema versioning at ingest. Vendors evolve their webhook payloads silently. We caught this by alerting on schema mismatches but only after a few outages. A schema-version pin per source, with explicit "we accept v1 and v2 of this payload" rules, would have caught vendor drift earlier.

Reconciliation rules as configuration, not code. The conflict-detection job was code. Adding a new conflict type meant a deploy. Cleaner is to make conflict detection a set of declarative rules that the operations team can add without an engineer.

A per-customer event timeline view. We had this internally but never exposed it. Customers (especially their CAs) would have valued seeing the full event timeline for a customer, not just the rolled-up state. The events were already there; we just hadn't put them on a page.


If you are building a system on top of data you don't own, two principles are non-negotiable: keep the raw payload forever, and make conflicts a workflow, not a bug. The first lets you recover from any vendor change. The second turns disagreements between sources into the most useful surface in the product.