Introduction to Kafka

Kafka is leading the way to move from ETL - extract, transform and load and batch workflows to near-real-time data feeds.

What is Kafka

as a distributed streaming platform. It has three main capabilities:

  • Reading and writing records like a message queue
  • Storing records with fault-tolerant
  • Processing streams as they occur
graph TD;
A[Producer service]-->B[Kafka cluster]-->C[Consumer service 1];
D[Producer service]-->B;
B-->E[Consumer service 2]

Data doesn’t have to be limited to only a single destination. The producers and consumers are completely decoupled, allow each client to work independently.

Delivery methods:

  • At least-once semantics - A message is sent as needed until it is acknowledged.
  • At most-once semantics - A message is only sent once and not resent on failure
  • Exactly-once semantics - A message is only seen once by the consumer of the message.
At least once semantics

At most once sematics

Exactly once semantics

  • “Dogfoods” itself For example, Kafka uses topics internally to manage consumer’s offsets.
Kafka is not the same as other message broker
  • The ability to replay messages by default
  • parallel processing of data
  • Kafka was designed to have multiple consumers.
Kafka in real-world
  • XMPP (Extensible Messaging and Presence Protocol)
  • JMS - Java Message Service (Jakarta EE)
  • OASIS Advanced Message Queuing Protocol (AMQP)

Features:

  • HA Apache Flume - data replication
  • Log aggregation - the log files are sent as messages into Kafka, and then different applications have a single logical topic to consume that information.
When Kafka might not be the right fit
  • You only need a once-monthly or even once-yearly summary of aggregate data.
  • Random lookup of data
  • exact ordering of messages
  • larger than 1MB message size

Getting to know Kafka

Kafka is a distributed system at heart, but it also possible to install and run it on a single host.

Producing and consuming a message

What are brokers?

Brokers can be thought of as the server side of Kafka.

Kafka CLI