Event-Driven Architecture in Data Engineering

Most data pipelines are built around a simple assumption: data moves on a schedule. A job runs every hour, every night, or every fifteen minutes, pulls what has changed since the last run, and loads it into the destination. This batch-oriented thinking is deeply embedded in how most data teams work, and for a large class of problems it is entirely appropriate. But there is a growing category of use cases where a schedule is the wrong primitive entirely — where what you actually care about is not time, but events.

Event-driven architecture is the alternative. Instead of asking “what changed in the last hour?”, an event-driven system asks “what just happened, and what should we do about it?” The difference sounds subtle, but it reshapes the entire pipeline from source to destination.

What an Event Actually Is

An event is a record of something that happened. A user placed an order. A sensor recorded a temperature reading. A payment was declined. A file was uploaded. Events are immutable — once something happened, it happened. You do not update an event; you append new events that describe what happened next.

This is a fundamentally different mental model from the table-centric thinking that dominates most data work. A database table represents the current state of the world. An event log represents the history of how the world got to its current state. Both are useful, but they answer different questions and support different kinds of analysis.

In an event-driven architecture, systems communicate by producing and consuming events rather than by directly calling each other or polling shared databases. A producer emits an event when something happens. One or more consumers react to that event, each doing whatever their particular responsibility requires. The producer does not know or care who is listening. The consumers do not need the producer to be aware of them. This decoupling is one of the core properties that makes event-driven systems powerful — and also one of the things that makes them genuinely harder to reason about.

The Infrastructure Layer: Message Brokers and Streaming Platforms

Event-driven data architecture is only possible because of infrastructure designed to handle high-throughput, low-latency event streams reliably. Apache Kafka is the dominant platform in this space and worth understanding in some depth.

Kafka is a distributed log. Producers write events to topics, which are divided into partitions spread across a cluster of brokers. Consumers read from those partitions, and Kafka retains the events for a configurable period — hours, days, or indefinitely — regardless of whether they have been consumed. This retention is critical: it means that if a downstream consumer fails, it can restart from where it left off rather than losing data. It also means you can add new consumers at any time and have them read the entire history of a topic from the beginning.

The partition model is how Kafka achieves horizontal scalability. Each partition is an ordered, immutable sequence of events. By distributing partitions across multiple brokers and allowing multiple consumers to read from different partitions in parallel, Kafka can handle millions of events per second without becoming a bottleneck.

Cloud-managed alternatives like Amazon Kinesis, Google Pub/Sub, and Confluent Cloud have made Kafka-style streaming more accessible to teams that do not want to manage the operational complexity of running their own Kafka cluster. Each has its own trade-offs around throughput, retention, and pricing, but the conceptual model is similar.

Stream Processing

Producing and storing events is only half the picture. The other half is processing them — applying transformations, aggregations, joins, and enrichments to event streams in motion rather than at rest.

Stream processing frameworks like Apache Flink, Apache Spark Structured Streaming, and Kafka Streams allow you to write logic that operates on unbounded streams of data. Instead of waiting for a batch to accumulate and then processing it, stream processors handle each event as it arrives, maintaining state across events and emitting results continuously.

A practical example makes this concrete. Imagine you are building a fraud detection system for a payments platform. In a batch world, you might run a daily job that looks for suspicious transaction patterns in the previous day’s data. By the time you flag a fraudulent transaction, the money is long gone. In a stream processing world, each transaction event is evaluated in real time against a model that considers the user’s recent transaction history, geographic location, and spending patterns. A suspicious transaction can be flagged and blocked within milliseconds of being submitted.

The window is the key abstraction in stream processing. Because streams are unbounded — data keeps arriving indefinitely — you need a way to scope aggregations to a meaningful time range. Tumbling windows divide time into fixed, non-overlapping buckets. Sliding windows move continuously and overlap. Session windows group events by activity, closing when a period of inactivity is detected. Choosing the right window type for your use case is one of the fundamental design decisions in any stream processing system.

The Lambda and Kappa Architectures

Two architectural patterns have been widely discussed in the context of combining batch and streaming processing. Understanding them is useful, even if you ultimately reject both.

The Lambda architecture maintains two parallel processing paths: a batch layer that reprocesses historical data periodically to produce accurate results, and a speed layer that processes the live stream to produce approximate results with low latency. A serving layer merges outputs from both. The problem with Lambda is the operational burden of maintaining two separate codebases that need to produce consistent results — one in a batch framework and one in a streaming framework.

The Kappa architecture, proposed as a simplification, eliminates the batch layer entirely. Everything is processed as a stream, and historical reprocessing is handled by replaying the event log from the beginning. Kappa is cleaner conceptually, but it places heavy demands on your streaming infrastructure and requires that your stream processing logic be replayable and deterministic.

In practice, most production systems are neither pure Lambda nor pure Kappa but something pragmatic in between — streaming for low-latency use cases, batch for heavy historical analysis, with tooling like dbt and Flink coexisting in the same platform.

Where Event-Driven Architecture Fits

Not every data problem benefits from event-driven architecture. The operational complexity is real — distributed systems are harder to debug, exactly-once semantics are difficult to achieve and reason about, and the skills required to build and operate streaming infrastructure are less common than batch engineering skills.

The right question is whether the latency of batch processing is actually a problem for your use case. If your stakeholders are happy with dashboards that update once a day, a batch pipeline is simpler, cheaper, and easier to maintain. If your business needs to react to what is happening right now — fraud, personalization, operational monitoring, real-time pricing — then event-driven architecture is not a luxury, it is a requirement.

The teams that do this well are the ones who resist the temptation to stream everything just because they can, and instead draw a deliberate boundary between what genuinely needs to be real-time and what is perfectly served by a well-engineered batch process.

The Shift in Thinking

The deepest thing that event-driven architecture demands is a shift in how you think about data. Tables and snapshots describe the world as it is. Events describe the world as it moves. Building systems that model that movement — that capture not just the current state but the transitions between states — opens up capabilities that batch systems simply cannot match.

For data engineers, this means adding a new set of tools and mental models to the toolkit. Not to replace batch thinking, but to know when to reach for something different.

Scroll to Top