Cómo instalar y optimizar Apache Kafka en servidores Linux

Introduction to Apache Kafka in Linux

Apache Kafka has become the distributed messaging platform of choice for applications that require high performance, failure tolerance and scalability. In Linux environments, its installation and configuration are particularly simple thanks to the wide availability of packages and the compatibility with system management tools. This article shows step by step how to deploy Kafka on a Linux server, optimize its performance and monitor its basic operation.

Previous requirements

Before you start, make sure you have a recent Linux distribution (Ubuntu 22.04 LTS, CentOS Stream 9 or similar) with at least 4 GB of RAM and 2 CPU. It is necessary to install Java OpenJDK 11 or higher as Kafka runs on the JVM. In addition, it is recommended to create a dedicated user for the service and set up the firewall to allow ports 9092 (Kafka) and 2181 (Zookeeper) if Zookeeper is used.

Java installation

In Debian / Ubuntu based systems, run:

sudo apt update
sudo apt install -y openjdk-11-jdk

In RHEL / CentOS:

sudo dnf install -y java-11-openjdk-devel

Check the installation withjava -version.

Kafka download and extraction

Visit the Apache Kafka download page and choose the latest version (e.g. 3.7.0). Download the binary filetgzand extract in a directory of your choice, as/opt/kafka.

wget https: / / downloads.apache.org / kafka / 3.7.0 / kafka _ 2.13-3.7.0.tgzTar -xzf kafka _ 2.13-3.7.0.tgzsudo mv kafka _ 2.13-3.7.0 / opt / kafka

Then add the directory/opt/kafka/binto the PATH environment variable to facilitate access to scripts.

Zookeeper configuration (independent mode)

Kafka depends on Zookeeper for cluster coordination. For a test environment, you can start Zookeeper from the same directory:

cd /opt/kafka
bin/zookeeper-server-start.sh config/zookeeper.properties &

In production, it is recommended to run Zookeeper as a separate service and adjust the number of replicas according to the desired failure tolerance.

Start the Kafka Broker

With Zookeeper on the move, launch the broker:

bin/kafka-server-start.sh config/server.properties &

The broker will listen by default at port 9092. You can check your status with:

bin/kafka-topics.sh --bootstrap-server localhost:9092 --list

Creation of basic topics and tests

To create a topic calledtest-topicwith a partition and a replica factor of 1:

bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

Then produce and consume messages to validate the flow:

bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092

Linux performance optimization

Some key settings to improve the performance of Kafka in Linux include:

Increase the number of file descriptors (ulimit -n) to 65535 or more.
Adjust the size of the network buffer (net.core.rmem_maxandnet.core.wmem_max) at values such as 25 MB.
Use a low latency file system like XFS or ext4 with mounting optionsnoatimeanddata=writeback.
Configure the I / O todeadlineornoopfor SSD disks.
Monitor the use of CPU, memory and E / S with tools such astop, vmstatandiostat.

These changes can be made permanent by files/etc/security/limits.conf, /etc/sysctl.confand rules ofudev.

Monitoring and maintenance

To maintain the health of the cluster, implement basic metrics using JMX and expand them to systems like Prometheus using the Kafka JMX exporter. regularly review the logs located in/opt/kafka/logsand configure rotation withlogrotate. Finally, plan safe updates following version documentation and testing in a staging environment before applying production changes.

Conclusion

Unfold Apache Kafka in a Linux environment offers a solid basis for building scalable and resilient real-time data pipelines. Following the installation steps, adjusting system parameters and establishing monitoring practices, you can make the most of Kafka's potential while maintaining stability and expected performance on critical workloads.