Event-Driven Pipeline Starter
Stream everything. Lose nothing.
A production starter for event-driven data pipelines: Apache Spark, Kafka, and Elasticsearch wired together with schema registry, dead letter queue, and monitoring. The backbone of real-time analytics platforms.
Tech stack
The Problem
Connecting Spark + Kafka + ES from scratch takes a week of config debugging
Most tutorials skip schema evolution, DLQ, and backpressure — the hard parts
Starting a new data pipeline project means re-solving the same infrastructure problems
What's Included
Everything you need to ship production-grade code
Kafka + Spark Streaming
Structured Streaming job consuming from Kafka with exactly-once semantics.
Elasticsearch Sink
Bulk indexing with retry, error handling, and index rotation by date.
Schema Registry
Confluent Schema Registry integration with Avro serialization and forward/backward compatibility.
Dead Letter Queue
Failed message routing to DLQ topic with metadata enrichment for debugging.
Docker Compose Dev Stack
Full local environment: Kafka, ZooKeeper, Schema Registry, ES, Kibana. One command.
Get the Template
One-time payment. Full source code. Lifetime updates.
Personal License
- Full source code (Scala/Python)
- Docker Compose stack
- README + setup guide
- Lifetime updates
Frequently Asked Questions
Scala or Python?
Both. The repo includes Scala (production-grade) and PySpark versions of every job.
Does this work with Confluent Cloud?
Yes. Connection configs for Confluent Cloud are included alongside local Docker setup.