Data Fleet

data-fleet.workSystem Status: Operational

Brief description

data-fleet.work

v2.1.0 | Last updated: March 19, 2026

  • -
  • What is Data Fleet?

    Data Fleet is a real-time fleet observability and infrastructure monitoring platform built on production-grade Kubernetes infrastructure. It simulates a logistics fleet of 30 trucks operating across Northeast India, continuously generating telemetry data that flows through a complete data engineering pipeline — from edge collection to storage, processing, and live visualization.

    The system is not a demo. Every component runs on real hardware, real Kubernetes, real SSL certificates, and real DNS. The dashboard you see is pulling live data from a 3-node Kubernetes cluster hosted on Hetzner Cloud.

  • -
  • Data Pipeline

    Telemetry originates from a Python-based fleet simulator that models 30 trucks with realistic physics — fuel consumption, traffic zones, route generation via OSRM, refueling stops, and SLA tracking. Each truck broadcasts position, speed, fuel level, engine temperature, RPM, and vibration data every 25-35 seconds.

    This telemetry flows through Apache Kafka as the central streaming backbone. Two Kafka consumer groups process the stream in parallel — one persists raw telemetry events to MongoDB Atlas with TTL-based expiration, the other processes completed trip summaries with upsert deduplication. Dead letter queues handle failed messages. Pipeline lag, batch sizes, and consumer group health are all tracked via Prometheus metrics.

  • -
  • Infrastructure

    The platform runs on a 3-node Kubernetes cluster:

    The first node (data-fleet-vps-1, 4GB RAM) serves as the control plane and hosts cluster management components. The second node (data-fleet-vps-2-worker, 8GB RAM) runs all application workloads — the FastAPI backend, Kafka broker, Redis cache, OSRM routing engine, and both Kafka consumers. The third node (data-fleet-observability, 4GB RAM) hosts the entire observability stack — Prometheus with 30-day retention, Grafana, and node exporters.

    All nodes run in a private Kubernetes network with Calico CNI. External traffic enters through nginx ingress with cert-manager issuing Let's Encrypt TLS certificates. Cloudflare sits in front of the entire stack providing DDoS protection, edge caching, and IP masking — the real VPS IP is never exposed publicly.

  • -
  • Storage Architecture

    Persistent storage is managed through Kubernetes PersistentVolumes backed by local SSD on each node. Kafka uses a 5GB volume for message retention. Prometheus uses a 5GB volume for time-series data with 30-day retention. Redis is backed by a 2.5GB volume with RDB snapshots for state persistence across pod restarts. OSRM uses a 15GB read-only volume for the Northeast India road network map data.

    Redis serves as both a hot cache and state store — truck positions, fuel levels, and trip state are checkpointed every 30 seconds so the simulator resumes exactly where it left off after any restart. Cumulative metrics like total fuel consumed and diesel cost are stored as Redis counters that survive indefinitely.

  • -
  • Observability Stack

    The SRE dashboard you're viewing is built entirely on self-hosted observability tooling. Prometheus scrapes 6 targets — the FastAPI backend, both Kafka consumers, kube-state-metrics for Kubernetes object health, node-exporter on all 3 nodes for hardware metrics, and the nginx ingress controller.

    SLO compliance is calculated using a 30-day rolling window over HTTP request counters. Error budget consumption is derived from the ratio of 5xx responses to total requests. The dashboard shows both the 30-day monthly picture and a 5-minute real-time window simultaneously — because an outage that happened 2 weeks ago should still show in your monthly SLO even if the system is healthy right now.

    Kubernetes workload health shows live pod status, restart counts, deployment replica health, node CPU and memory with sparkline trend charts, and PVC storage allocation across all persistent volumes.

  • -
  • Security Model

    API access is restricted at two layers. At the Cloudflare layer, a WAF rule blocks all traffic that doesn't match known API paths. At the nginx layer, an IP whitelist allows only Cloudflare edge IPs and a specific personal IP — direct VPS access returns 403. SSL is enforced end-to-end in Full Strict mode, meaning both the Cloudflare-to-origin connection and the browser-to-Cloudflare connection are encrypted with valid certificates.

    Rate limiting is implemented via a Redis token bucket algorithm — each API key gets 10 tokens refilled at 0.5 tokens per second, with the Lua script executing atomically in Redis to prevent race conditions.

  • -
  • Tech Stack

    Backend: Python, FastAPI, Apache Kafka, Redis, MongoDB Atlas, OSRM

    Infrastructure: Kubernetes (kubeadm), Calico CNI, nginx ingress, cert-manager, Hetzner Cloud

    Observability: Prometheus, Grafana, kube-state-metrics, node-exporter

    Frontend: Next.js, TypeScript, Recharts, Tailwind CSS, Vercel

    Security: Cloudflare (WAF, DDoS, TLS), Let's Encrypt, Redis token bucket rate limiting

    Documentation · API Reference · Deployment Guide
    Data Flow
    API Calls
    Streaming Data
    Database Writes
    ETL/Features
    Data Pipeline: MongoDB + Kafka → Spark Streaming + S3 + Iceberg → ML