Data · Infrastructure · Analytics
I build the pipelines, platforms, and analytics layers that turn raw data into decisions.
2B+
events/month processed
10M+
users served
5M+
records/day through current pipelines
About
I've spent the last several years building pipelines that process billions of events per month, cloud data platforms that cut executive reporting from days to under an hour, and data visualizations that make complex datasets worth looking at.
Before my MS in Computer Science at UT Arlington, I built the data backbone for a wearable health platform at KaHa Technologies — Kafka ingestion, real-time telemetry, 10M+ users. I'm most interested in the full picture: from how data gets ingested to whether the person reading the dashboard actually trusts what it shows.
⚡ Outside of work: 🏸 racket sports, 🎸 music, 🧑🍳 cooking — and yes, I once worked as a chef for my university.
Capabilities
Kafka ingestion at 2B+ events/month, Python and PySpark transformations, dbt modeling, dimensional schemas — the whole chain from raw source to analytics-ready table.
AWS (Glue, Athena, Spark, S3), Snowflake, Databricks, DuckDB. I care about what each tool actually costs and whether engineers will be able to maintain it six months later.
Executive dashboards, D3.js data stories, self-serve reporting. I think about who is going to open this dashboard at 8am and what they actually need to see — not just what the data model can technically produce.
Feature pipelines and data layers for real-time ML inference — health wearable telemetry at 10M+ user scale, high-frequency biosensor research at 200Hz. The ML model is only as good as the data it gets.
Experience
State of Michigan — Ottawa Area ISD
KaHa Technologies
Nutanix
University of Texas at Arlington
Education
MS Computer Science
University of Texas at Arlington
Also while there
Work
Projects with a Case Study include architecture diagrams, data models, and the key engineering decisions behind them.
BI Dashboard & Modern Data Stack Prototype
A BI dashboard prototype built on DuckDB, dbt-core, and Streamlit. Two tabs for two audiences: a capital expenditure view for business stakeholders, and a pipeline health monitor for data engineers — both powered by the same fact table.
Interactive Data Storytelling with D3.js
500MB of raw Kaggle data, pre-aggregated down to 50KB via Python, rendered as a smooth animated bar chart race in D3.js. 13 years of App Store genre competition in one visualization. Published on Medium.
Geospatial Analytics Dashboard
Streamlit dashboard exploring Uber pickup and dropoff patterns across New York City. Neighborhood-level breakdowns, time filters, and statistical summaries — built to make the spatial patterns in the data easy to read.
Multi-Provider AI Chat Application
AI chat application with support for multiple LLM providers (Gemini, OpenAI) via a Streamlit interface. Actively developed — model switching and prompt memory are on the roadmap.
Toolkit
Tools I reach for — and know well enough to have opinions about.
Data Engineering & Orchestration
Cloud & Data Platforms
Databases & Storage
Visualization & Analytics
Infrastructure & DevOps
Writing
I write about data engineering patterns, visualization architecture, and lessons from building real systems.
A deep-dive into diagnosing four critical anti-patterns in a Matillion + Snowflake ETL design — memory exhaustion, brittle truncate-and-load, slow row-by-row inserts, and hardcoded credentials — and redesigning the pipeline into a robust, idempotent ELT architecture using S3 staging, high-water mark incremental loads, and MERGE-based upserts.
A Data Engineer's guide to turning 500MB of raw logs into a silky-smooth 50KB data story using Python and D3.js. Covers pre-aggregation strategy, D3.js animation architecture, and performance optimization.
An investigation into PPG-based HRV tracking for stress and recovery monitoring. Covers signal processing, data pipeline design for biosensor streams, and insights from real wearable data.