Streaming under flaky bandwidth: bitrate adaptation, audio fallback, and reconnection logic
Buy the transcoding stack. Build the reliability layer above it. Make audio-only first-class. The notification is the product.
The broadcaster is a 67-year-old priest holding a smartphone. The viewer is his grandson on a cracked Android in another city, on a 4G connection that drops once every twelve minutes.
You can build a streaming platform for premium creators on premium phones on premium networks. Most YouTube tutorials do. We had to build for the median of our actual audience - a population the streaming industry's defaults are not designed for. Almost every default had to change.
This writeup is about the streaming choices we made and the ones we explicitly didn't.
The decision
We bought the streaming infrastructure rather than building it. The choice was between Mux, Cloudflare Stream, AWS IVS, and self-hosted MediaSoup or Janus. We picked AWS IVS for the broadcaster side because the latency was right and the SDK was good enough; HLS for the viewer side because everything plays HLS.
We built almost nothing of the underlying stack. We invested entirely in the layer above it - the broadcaster's one-button workflow, the viewer's zero-friction experience, notifications, donations, the cultural surface.
The discipline was: do not optimise the part of the stack that has commodity solutions. Optimise the part where your audience's specific needs are not served by anyone else.
The broadcaster side: one button
The temple's broadcaster is the priest, sometimes a young volunteer, sometimes the temple's accountant who happens to know smartphones. The platform had to assume zero technical literacy and zero patience for setup.
The mobile app's "go live" button was three things behind the scenes:
async function goLive(channelId: string) { // 1. Acquire stream credentials from the platform const { ingestUrl, streamKey } = await api.requestStreamKey(channelId); // 2. Probe the network to set initial quality const networkProbe = await probeUploadBandwidth(); const initialQuality = pickInitialQuality(networkProbe); // 3. Start the encoder + push to the ingest URL await encoder.start({ url: ingestUrl, streamKey, video: initialQuality.video, audio: { codec: "aac", bitrate: 64_000, sampleRate: 22_050 }, }); // 4. Notify the platform - fan-out to viewers happens server-side await api.markChannelLive(channelId); } function pickInitialQuality(probe: NetworkProbe) { // Aggressive defaults; the priest's phone is rarely on 5G if (probe.kbpsUp >= 1500) { return { video: { width: 720, height: 1280, fps: 24, bitrate: 1_200_000 } }; } if (probe.kbpsUp >= 700) { return { video: { width: 480, height: 854, fps: 24, bitrate: 600_000 } }; } // Audio-only fallback return { video: null }; }
The audio-only fallback was real. If the priest's network couldn't carry video, the platform would still stream the aarti's audio - which, for many of our viewers, was 80% of what they came for. The chant mattered more than the visual.
Adaptive bitrate ladder
On the viewer side, AWS IVS did most of the work - it transcoded the broadcaster's stream into multiple HLS variants, and the player picked the right one for the viewer's bandwidth.
We tuned the ladder to favour our population:
Variant Resolution Video bitrate Audio bitrate ─────── ────────── ───────────── ───────────── audio-only - - 64 kbps 144p 256×144 180 kbps 64 kbps 240p 426×240 400 kbps 64 kbps 360p 640×360 800 kbps 64 kbps 480p 854×480 1,200 kbps 64 kbps 720p 1280×720 2,400 kbps 96 kbps
Two unusual choices:
- Audio-only as a first-class variant. Most ABR ladders skip this. For our audience, audio-only was the graceful floor, not a failure state. A viewer on 2G should hear the aarti even if the player can't decode video.
- No 1080p. Our broadcasters' phones rarely produced 1080p worth watching, and our viewers' devices and screens rarely benefited. Skipping it saved transcode cost and removed a tier the player would otherwise oscillate into.
If you're on Mux or Cloudflare Stream instead of IVS, the exact bitrates do not transfer. The lessons do: audio-only as a real floor, no 1080p ceiling for low-bandwidth audiences, fewer rungs on the ladder so the player doesn't oscillate.
Reconnection: assume the network will fail
The reconnection logic was the most-tested piece of the player. It had to handle:
- Brief 2-3 second drops (the most common case - base station handoff).
- Mid-length 30-second outages (the broadcaster's phone briefly disconnects from WiFi).
- Long failures (the broadcaster's phone died, the temple's WiFi router rebooted).
- Player-side network changes (viewer's 4G dropped to 2G mid-stream).
The pattern that worked:
function viewerReconnectStrategy(player: Player) { let attempt = 0; const MAX_ATTEMPTS = 60; // ~5 minutes of trying let backoffMs = 1000; player.on("error", async (err) => { if (!isRecoverable(err)) { showFatalError(err); return; } while (attempt < MAX_ATTEMPTS) { attempt += 1; showReconnectingToast(attempt); await sleep(backoffMs); try { await player.reload(); attempt = 0; backoffMs = 1000; showReconnectedToast(); return; } catch (e) { backoffMs = Math.min(backoffMs * 1.5, 8000); // cap at 8s } } showStreamEndedToast(); }); }
The 5-minute ceiling was deliberate. If a stream was actually dead, we wanted the viewer to know and stop spinning. Most successful reconnects happened in the first 30 seconds.
The toast ("Reconnecting…") mattered. A viewer who saw a frozen frame without explanation thought the platform had crashed. The same frozen frame with "Reconnecting…" was understood as a network event, not a product failure.
Notifications: the actual entry point
For most of our viewers, the notification was the product. They didn't open the app to "discover content"; they opened it because a notification said "morning aarti starting at temple X."
The notification pipeline had to be reliable across:
- iOS via APNs
- Android via FCM
- Older Android variants where push delivery was unreliable
- Multi-language delivery (Hindi, Tamil, Telugu, Kannada, Marathi, Malayalam, English - and the right script per language)
We treated notifications as a hard SLA, not a best-effort:
async function notifyChannelLive(channelId: string) { const subscribers = await db.subscribers.forChannel(channelId); // Fan-out in batches with retries for (const batch of chunk(subscribers, 1000)) { await Promise.allSettled( batch.map(async (sub) => { const message = await renderMessage(sub.languagePref, channelId); const result = await pushProvider.send({ token: sub.deviceToken, title: message.title, body: message.body, data: { channelId, action: "open_player" }, // collapse key - if multiple lives notifications stack, // only the latest shows collapseKey: `channel-${channelId}`, }); // Persist the delivery attempt for observability await db.notificationLog.insert({ subscriberId: sub.id, channelId, status: result.success ? "delivered" : "failed", providerResponse: result.body, attemptedAt: new Date(), }); }) ); } }
The collapseKey was load-bearing. Without it, a temple that started and stopped a stream three times in five minutes (priest accidentally hit stop, restarted) sent three notifications. Viewers turned them off. Collapse keys ensured only the latest "live" notification appeared.
The notification log table was the platform's most-queried during incidents. When a viewer reported "I didn't get notified about this morning's aarti," the answer was a query, not a guess.
The donation flow
Donors gave in two taps. The flow was designed for elderly viewers who would not fight a payment form.
function DonateSheet({ templeId }: { templeId: string }) { return ( <Sheet> <Heading>Support this temple</Heading> <PresetAmounts amounts={[51, 101, 251, 501, 1001]} onPick={onAmountPicked} /> <Button onClick={() => triggerUpiIntent({ templeId, amount })}> Pay via UPI </Button> </Sheet> ); }
UPI intent (the Android system-level UPI handoff) was the right primitive - the donor's bank app opened pre-filled, they tapped Pay, the donation completed. No form fields, no card details, no signup gate. The two taps assumed the donor was already logged into their bank app, which they were.
Recurring donations used UPI mandates. We deliberately did not chase recurring globally - for most of our donors, monthly automated debits were a foreign concept. We offered them but defaulted to one-time.
What surprised me
The single largest spike in viewers was during the first 60 seconds of a notification's delivery. Half of all viewing for any given event happened in the first 5 minutes after the notification went out. The streaming infrastructure had to absorb a 100x viewer count in 60 seconds, every time. The transcoding ladder paid for itself in those moments.
Older Android push delivery was 70% of our notification debugging. Various OEM custom Android builds aggressively kill background apps to "save battery." Standard FCM had real delivery rates we measured but couldn't fully control. We added a server-side reachability fallback (a low-frequency poll the app did when foregrounded) to catch missed events.
Viewers chained sessions. A viewer would watch morning aarti, close the app, come back at evening for the same temple's aarti. Per-session analytics underestimated engagement because most usage was multiple short sessions, not single long ones. Cohort retention was the right metric, not session length.
What I'd do differently
Server-side recording with a 30-day retention from day one. We added it later for a specific use case (devotees in different time zones wanting to watch later). Should have been default. The marginal cost is low; the audience benefit is high.
A "scheduled live" abstraction. Many temple events were daily and predictable (sunrise aarti, sunset aarti). The broadcaster going live each morning at the same time was a workflow, not a one-off. We modelled it as one-off events; we should have modelled it as a recurring schedule with the "live" event being a confirmation, not a creation.
Caption / transcript track. We never built it. For a stream where the audio is in a regional language, an automatic caption track in the viewer's preferred language would have unlocked an entire elderly demographic for whom listening to a different regional language is a barrier.
If you are streaming to an audience the streaming industry doesn't usually design for: the failure modes you optimise for are different. Buy the transcoding stack. Build the reliability layer above it. Make audio-only first-class. The notification is the product.