Case Study

KaHa Wearable Platform

Two data pipelines at scale: a real-time health telemetry platform for 10M+ wearable users, and an application performance analytics system that diagnosed how BT connect time and sync duration varied across phone makes, models, and OS versions.

Apache KafkaPythonAWSMongoDBDynamoDBFirebaseBigQueryNumPy

Context

KaHa Technologies builds the firmware and data platform that powers white-label health wearables — smartwatches and fitness bands sold under various consumer brands across Southeast Asia and beyond. Two distinct data problems came with the scale: getting health sensor data from 10M+ devices into ML-ready storage in under 5 seconds, and understanding why the companion app behaved differently depending on what phone a user was holding.

The second problem turned out to be more complex than it sounds. A Bluetooth connection event on a Samsung Galaxy behaves differently from the same event on a Xiaomi or a Vivo — different OS versions, different BT stack implementations, different background process management policies. Without instrumented data across device segments, QA could only test on the handsets they had in the office. The goal was a fully automated pipeline that ran daily and delivered a segmented performance report to QA and stakeholders — no manual queries, no ad-hoc exports.

Pipeline 1 — Health Telemetry Platform

Devices publish sensor events to Kafka. Python consumers normalize and route each event type to the right datastore. A separate batch layer handles feature extraction for ML model training and inference.

flowchart LR
    subgraph Devices ["10M+ Wearable Devices"]
        A["Health Sensors\nHR · SpO2 · Steps\nAccelerometer · Sleep"]
    end

    subgraph Ingestion ["Real-time Ingestion"]
        B["Apache Kafka\n2B+ events/month\nunder 5s lag"]
    end

    subgraph Processing ["Stream and Batch Processing"]
        C["Python Consumers\nEvent normalization\nStream aggregation"]
        D["AWS Lambda\nEvent-driven handlers\nAlert triggers"]
    end

    subgraph Storage ["Data Layer"]
        E[("MongoDB\nHealth time-series\nUser + device profiles")]
        F[("DynamoDB\n10k concurrent requests\nSession and device state")]
        G[/"S3\nRaw event archive\nBatch replay buffer"/]
    end

    subgraph ML ["Analytics and ML"]
        H["Feature Pipelines\nPython · NumPy · Pandas"]
        I["ML Model Serving\nHealth insight inference\nAnomaly detection"]
        J["Health Dashboards\nUser-facing metrics\nPartner reporting"]
    end

    A -->|"real-time"| B
    B --> C & D
    C --> E & G
    D --> F
    E & G --> H
    H --> I & J

    style B fill:#2997ff,color:#fff,stroke:#0077ed

Data Flow Detail

Every raw event passes through the same normalization step before branching — this keeps the storage layer clean regardless of device firmware version or sensor calibration differences between hardware partners.

flowchart TD
    RAW["Raw Sensor Events\nPulse · motion · SpO2\nhigh-frequency, unvalidated"]
    --> KAFKA["Kafka Topic\nPartitioned by device_id\nretention: 7 days"]
    --> CONSUMER["Python Consumer\nParse · validate · normalize\nfirmware version routing"]
    --> BRANCH{Event type}

    BRANCH -->|"health metric"| MONGO["MongoDB\ntime-series collection\nindexed by user_id + timestamp"]
    BRANCH -->|"session state"| DYNAMO["DynamoDB\ndevice + session state\nlow-latency reads"]
    BRANCH -->|"raw archive"| S3["S3\nParquet — daily partitions\nbatch ML training data"]

    MONGO & S3 --> FEATURE["Feature Extraction\nwindowed aggregations\nPandas + NumPy"]
    FEATURE --> MODEL["ML Inference\nhealth score · anomaly flag\nrecovery index"]

Pipeline 2 — Application Performance Analytics

The companion app instruments every performance-relevant event — BT connection time under different scenarios (toggle, out-of-range return, airplane mode recovery), sync duration by payload size, and mode-change response latency — and sends them to Firebase. BigQuery's native Firebase export streams these events into a queryable table within minutes. SQL aggregations segment performance by OS, device make, and model. A scheduled job runs these aggregations daily and emails a formatted performance report to the QA team and relevant stakeholders — replacing what had previously been ad-hoc queries run whenever someone had time to look.

flowchart LR
    subgraph App ["Mobile Application"]
        A["Companion App\nAndroid + iOS\nSamsung · Xiaomi · Apple · Vivo"]
        B["Smartwatch\nFirmware events\nBLE communication"]
    end

    subgraph Events ["Performance Events Captured"]
        C["BT Connect Time\nby scenario + device"]
        D["Data Sync Duration\ntransfer timing · payload size"]
        E["Mode Toggle Events\nairplane · BT on/off\nout of range + return"]
    end

    subgraph Pipeline ["Analytics Pipeline"]
        F["Firebase Analytics\nReal-time event ingestion\nSDK-level instrumentation"]
        G["BigQuery Export\nNative Firebase connector\nstreaming tables"]
        H["SQL Aggregations\nBigQuery + Python\nsegmentation queries"]
    end

    subgraph Reports ["Performance Reports"]
        I["By OS\nAndroid vs iOS\nperformance delta"]
        J["By Device\nMake + model breakdown\nSamsung · Xiaomi · Apple · Vivo"]
        K["By Scenario\nBT toggle · sync · range events\nP50 · P95 latencies"]
    end

    L["Scheduled Job\nDaily aggregation\nauto-report generation"]
    M["Email Delivery\nQA Team + Stakeholders\ndaily report"]

    B -->|"BLE events"| A
    A --> C & D & E
    C & D & E --> F
    F -->|"streaming export"| G
    G --> H
    H --> I & J & K
    H --> L --> M

    style G fill:#2997ff,color:#fff,stroke:#0077ed
    style M fill:#2997ff,color:#fff,stroke:#0077ed

Key Engineering Decisions

Decision #1 — Kafka for Health Telemetry

Why Kafka over SQS or direct writes to the datastore?

Direct writes at 2B events/month create a thundering herd problem — morning workout spikes and sleep tracking start times produce write bursts that overwhelm any single datastore. Kafka absorbs those spikes and makes consumer lag visible and manageable. The log retention (7 days) was the other key reason: when a firmware update changed the event schema, we could replay events through a new consumer version without losing data. SQS doesn't support replay. That replay capability paid off multiple times during firmware iteration cycles.

Decision #2 — Three-Store Split for Health Data

Why MongoDB + DynamoDB + S3 instead of one datastore?

The three access patterns are genuinely different. Health time-series (heart rate over 30 days, sleep stages per night) maps naturally to MongoDB's document model with time-indexed collections. Session and device state (is this device online? last sync timestamp?) needs single-digit millisecond reads — DynamoDB's job. Raw event archive for ML batch training doesn't need low latency; it needs cheap, durable storage with efficient range reads — S3 Parquet. Forcing any of these into a general-purpose database would mean overpaying or accepting worse performance on two of the three use cases.

Decision #3 — Firebase + BigQuery for APM

Why not Kafka for the performance analytics pipeline too?

The health telemetry pipeline needs sub-5-second lag because it feeds real-time ML inference. The APM pipeline has a different requirement: daily or weekly aggregated reports for QA and product teams. The latency requirement is hours, not seconds. Firebase's SDK handles client-side event batching and retry logic out of the box — instrumentation that would take weeks to build on a custom Kafka producer. The native BigQuery export means the data is queryable in SQL without building a consumer or a schema registry. For a reporting pipeline where freshness is measured in hours, Firebase + BigQuery is significantly less infrastructure for the same output.

Decision #4 — Device Segmentation in APM Reports

Why segment by make + model + OS, not just aggregate?

Aggregate BT connection time looked acceptable. Segmented by device, it was not: specific Xiaomi models running Android 12 showed 3x higher connect latency in airplane-mode-recovery scenarios compared to the same event on Samsung. An aggregate P95 latency number would have hidden this completely. The segmentation schema — capturing device manufacturer, model, and OS version on every event — was designed specifically so that QA could drill into failure modes by device segment rather than debugging against a single test handset. This is what made the analytics system actionable rather than just informational.

Scale and Constraints

The hardest constraint on the health telemetry side wasn't raw throughput — Kafka handles that. It was device heterogeneity. KaHa's platform runs on hardware from multiple OEM partners, each with slightly different sensor firmware, sampling rates, and data formats. The normalization layer in the Python consumer had to handle all of these consistently, and new device variants arrived regularly. Keeping that logic maintainable without making every new device a production incident required disciplined schema versioning and a test suite covering known edge cases from each hardware partner.

The 5-second lag target is harder than it sounds when you account for device connectivity patterns. Wearables aren't always online — they buffer locally and sync in batches when a phone is nearby. The pipeline had to handle out-of-order events gracefully: a batch of buffered readings arriving 20 minutes late is normal, not an error. The Kafka log retention window of 7 days provided the buffer needed to handle delayed delivery without data loss.