Introduction
In Linux system management, file synchronization is a daily task that affects backups, deployments and replication of data between servers. Having a tool that is fast, reliable and consuming few resources is essential for maintaining the integrity of information.
The commandrsynchas become the de facto standard for these operations thanks to its differential transfer algorithm, which only sends the changed parts of the files, drastically reducing the use of bandwidth and transfer time.
What is rsync and how does it work?
rsync (remote sync) is an open source program that copies and synchronizes files and directories both locally and through networks. Its core uses the delta algorithm, which divides the files into blocks and compares checksums to determine which blocks have changed.
Thanks to this approach, rsync transfers only the modified blocks, avoiding sending identical data. In addition, you can compress the data flow with the -z option and encrypt it by SSH, offering a complete solution for backup and replication.
Brief story
The development of rsync began in 1996 by Andrew Tridgell and Paul Mackerras, initially as an improvement on rdist and scp. Since then, it has received numerous contributions from the community and has been integrated into virtually all Linux distributions.
Today, rsync is not only used in server environments, but also in desktop, NAS and continuous integration pipelines, demonstrating its versatility and robustness.
Basic syntax and essential options
The simplest way to invoke rsync is:
rsync [opciones] origen destino
Whereorigenmay be an absolute or relative route; anddestinothe point where the data will be deposited. Some of the most commonly used options are:
-a(archive): activates recursive mode and preserves permissions, owners, groups, timstamps and symbolic links.-v(verbose): increases the level of detail of the output, showing each file transferred.-z(compress): Compress data during transfer, ideal for slow links.--progress: shows an advance bar and transfer speed in real time.-e ssh: indicates that SSH should be used as a communication channel, ensuring encryption and authentication.
Advanced options for fine control
Beyond the basic options, rsync offers numerous lags that allow the transfer to be adapted to specific scenarios:
--excludeand--include: allow to filter files according to patterns, useful for omitting cache directories or temporary files.--delete: removes in the destination files that no longer exist in the source, maintaining an exact replica.--partial: retains partially transferred files in case of interruption, allowing to resume the task later.--temp-dir: specifies a temporary directory to place files during transfer, avoiding filling the root file system.--checksum: forces the MD5 instead of size and timestamp-based comparison, ensuring that changes are detected even when metadata are misleading.--bwlimit=KBPS: limits the bandwidth consumed, avoiding saturation of the network in shared environments.--log-file: records all operations in a log file for audit and problem resolution.
Practical examples of use
1. Local copy of a directory
To make a mirror from a project directory to a local backup unit:
rsync -avz --progress /home/usuario/proyectos/ /mnt/respaldo/proyectos/
2. Remote synchronization using SSH
Copy a website to a production server, preserving permissions and compressing traffic:
rsync -avz -e ssh --delete /var/www/html/ usuario@servidor.example.com:/var/www/html/
3. Use of exclusions to ignore temporary files
Exclude Node.js directories and compilation caches:
rsync -avz --exclude='node_modules/' --exclude='__pycache__/' /home/usuario/dev/ /respaldos/dev/
4. Creation of incremental snapshots with -link-dest
A common strategy for daily backups without duplicating identical data:
rsync -avz --delete --link-dest=/respaldos/ultimo /home/usuario/ /respaldos/$(date +%F)
5. Transfer from a list file
When you need to synchronize only a subset of files listed in a text:
rsync -avz --files-from=/home/usuario/lista.txt /origen/ /destino/
Good practices and performance recommendations
- Always try
--dry-runbefore running an operation that can delete or overwrite data. - Use absolute routes to avoid ambiguities, especially on scheduled tasks with cron.
- Combine
-zwith--progressto observe the effect of compression in real time. - In high latency networks, consider increasing the size of the block with
--block-size(by default 700 bytes) to reduce the amount of comparisons. - Monitoring the use of CPU and disk; in highly charged systems, disable compression (-z) can improve performance if the bandwidth is not the bottle neck.
- Keep rsync updated; recent versions include improvements in the management of dispersed files and in compatibility with modern file systems such as Btrfs and ZFS.
Automation with cron and scripts
The real rsync power is released when it is programmed to run automatically. A typical example of crontab entry for daily backup at 02: 30 would be:
30 2 * * * /usr/bin/rsync -az --delete --log-file=/var/log/rsync_backup.log /home/usuario/ /respaldos/diario/ >> /dev/null 2>&1
In addition, calls can be wrapped in shell scripts that verify connectivity, send mail notifications in case of failures and maintain a version history.
Safety and use of SSH
When the transfer passes through unreliable networks, it is essential to protect data. Use the option-e sshensures that all traffic is encrypted. For greater security, SSH keys can be created without password restricted to a single command by the optioncommand='rsync --server'in the file~/.ssh/authorized_keys, thus limiting access only to synchronization.
Another layer of protection is to use--rsync-pathto specify a wrapper that records each invocation or that runs rsync within a chroot environment.
Conclusion
rsync combines efficiency, flexibility and security in a single command line tool. Its delta algorithm minimizes the use of bandwidth, while its wide range of options allows to adapt to virtually any backup, deployment or replication scenario. Dominating rsync is an essential skill for any Linux administrator who seeks to keep your data synchronized in a reliable and optimal way.


