Introduction to Apache Kafka in Linux
Apache Kafka has become the distributed messaging platform of choice for applications that require high performance, failure tolerance and scalability. In Linux environments, its installation and configuration are particularly simple thanks to the wide availability of packages and the compatibility with system management tools. This article shows step by step how to deploy Kafka on a Linux server, optimize its performance and monitor its basic operation.
Previous requirements
Before you start, make sure you have a recent Linux distribution (Ubuntu 22.04 LTS, CentOS Stream 9 or similar) with at least 4 GB of RAM and 2 CPU. It is necessary to install Java OpenJDK 11 or higher as Kafka runs on the JVM. In addition, it is recommended to create a dedicated user for the service and set up the firewall to allow ports 9092 (Kafka) and 2181 (Zookeeper) if Zookeeper is used.
Java installation
In Debian / Ubuntu based systems, run:
sudo apt updatesudo apt install -y openjdk-11-jdk
In RHEL / CentOS:
sudo dnf install -y java-11-openjdk-devel
Check the installation withjava -version.
Kafka download and extraction
Visit the Apache Kafka download page and choose the latest version (e.g. 3.7.0). Download the binary filetgzand extract in a directory of your choice, as/opt/kafka.
wget https: / / downloads.apache.org / kafka / 3.7.0 / kafka _ 2.13-3.7.0.tgzTar -xzf kafka _ 2.13-3.7.0.tgzsudo mv kafka _ 2.13-3.7.0 / opt / kafka
Then add the directory/opt/kafka/binto the PATH environment variable to facilitate access to scripts.
Zookeeper configuration (independent mode)
Kafka depends on Zookeeper for cluster coordination. For a test environment, you can start Zookeeper from the same directory:
cd /opt/kafkabin/zookeeper-server-start.sh config/zookeeper.properties &
In production, it is recommended to run Zookeeper as a separate service and adjust the number of replicas according to the desired failure tolerance.
Start the Kafka Broker
With Zookeeper on the move, launch the broker:
bin/kafka-server-start.sh config/server.properties &
The broker will listen by default at port 9092. You can check your status with:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
Creation of basic topics and tests
To create a topic calledtest-topicwith a partition and a replica factor of 1:
bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
Then produce and consume messages to validate the flow:
bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
Linux performance optimization
Some key settings to improve the performance of Kafka in Linux include:
- Increase the number of file descriptors (
ulimit -n) to 65535 or more. - Adjust the size of the network buffer (
net.core.rmem_maxandnet.core.wmem_max) at values such as 25 MB. - Use a low latency file system like XFS or ext4 with mounting options
noatimeanddata=writeback. - Configure the I / O to
deadlineornoopfor SSD disks. - Monitor the use of CPU, memory and E / S with tools such as
top,vmstatandiostat.
These changes can be made permanent by files/etc/security/limits.conf, /etc/sysctl.confand rules ofudev.
Monitoring and maintenance
To maintain the health of the cluster, implement basic metrics using JMX and expand them to systems like Prometheus using the Kafka JMX exporter. regularly review the logs located in/opt/kafka/logsand configure rotation withlogrotate. Finally, plan safe updates following version documentation and testing in a staging environment before applying production changes.
Conclusion
Unfold Apache Kafka in a Linux environment offers a solid basis for building scalable and resilient real-time data pipelines. Following the installation steps, adjusting system parameters and establishing monitoring practices, you can make the most of Kafka's potential while maintaining stability and expected performance on critical workloads.


