The Linux shuf command: mix lines randomly

Introduction

In the Linux command line world, there are numerous utilities designed to handle text quickly and efficiently. Among them, shuf stands out for its ability to mix input lines on a random basis, a task that is useful in testing, generating example data, creating random lists and many other scenarios. This article explores in depth the shuf command, its syntax, its most common options and some practical examples that you can apply immediately in your workflow.

What exactly does shuf do?

shuf belongs to the GNU choreutils package and its main function is to read lines from a standard file or input and reorder them randomly. Each execution produces a different permutation, provided that the random number generator is not initialized with the same seed. The tool does not modify the original file; it simply writes the result on the standard output, allowing it to be redirected to another file or chained with other commands by means of pipes.

Basic syntax

The most direct way to use shuf is to indicate the input file:

shuf name _ del _ archivo.txt

If no file is specified, shuf will read from the standard input, which allows it to be combined with other commands:

cat lista.txt | shuf

The result will be the same lines but in an unpredictable order every time you invoke it.

Most commonly used options

  • -n NUMor-head-count = NUM: shows only the first NUM lines after mixing. This is equivalent to taking a random sample of NUM size.
  • -eor- I do.: treats each argument of the command line as an input line. For example, shuf - and orange pear apple will produce a permutation of those three words.
  • -i LO-HIor-input-range = LO-HI: generates a sequence of numbers from LO to HI even and then the mixture. Very useful for creating draws or selecting random indexes.
  • -repeat: allows the output to contain duplicate lines. Without this option, each input line appears at most once in the result.
  • -zor-zero-finished: changes the line delimiter to a zero character, which facilitates the handling of file names containing spaces or line jumps.

Practical examples

Mix a log file

Suppose you have a log.txt file with thousands of entries and want to get a random sample of 100 lines for quick inspection:

shuf -n 100 log.txt

This command mixes the entire file and then displays only the first hundred lines, providing a statistically valid representation of the full content.

Generate lottery numbers

To simulate a six-number draw between 1 and 49, you can use the input range:

shuf -i 1-49 -n 6

Each execution will produce a different set of six numbers, provided that the -repeat option is not activated.

Create a random list of words

If you have a dictionary in format one word per line called dictionario.txt and you want to get ten words at random:

shuf -n 10 dictionario.txt

This technique is often used in the generation of memorable passwords or in the creation of test data for applications.

Combine shuf with other commands

The power of shuf is powered when used in pipes. For example, to obtain a random list of processes that consume more memory:

ps -eo pid, ppid, cmd,% mem --sort = -% mem | shuf -n 5

First all processes ordered by memory use are listed, then the output is mixed and the first five lines are shown, obtaining a random sample among the most consumer processes.

File name management with spaces

When file names may contain special spaces or characters, it is recommended to use the zero delimiter:

find. -type f -print0 | shuf -z -n 5 | xargs -0 ls -lh

This command finds all the files, passes them to shuf by a zero flow, selects five randomly and then displays their details with ls.

Alternatives and supplements

Although shuf is the most direct tool for mixing lines, there are other options that can be useful depending on the context. The commandsort -RIt also orders randomly, but its random algorithm may differ and does not guarantee the same uniform distribution as shuf in all versions of choreutils. Another alternative is to useawkwith the functionrand()to assign a random number to each line and then order for that value. These techniques can be valid when working in environments where shuf is not available or when more thin control over the random seed is needed.

Set a seed for reproducibility

In some scenarios, such as the debugging of scripts or the generation of reports that should be identical between executions, it is useful to fix the seed of the random number generator. shuf respects the environment variableGNU_SHUF_RANDOM_SOURCE. If assigned to a file or device like/dev/urandom, can influence the source of entropy. For example, runGNU_SHUF_RANDOM_SOURCE=./semilla.bin shuf -n 10 datos.txtwill have shuf read the semilla.bin bytes to initialize its generator, which will produce the same permutation as long as the semilla.bin content does not change.

Performance tips

shuf needs to read all the input into memory in order to create a uniform permutation. Therefore, with very large files (several gigabytes) the consumption of RAM may become high. In such cases, you can divide the file into smaller pieces usingsplit, apply shuf to each fragment and then combine the results withcator an additional mixture. Another strategy is to use flow tools likeperl -MList::Util=shuffle -e 'print shuffle <>'they can work lighter in certain scenarios, although they sacrifice the guarantee of strict uniformity that offers shuf.

Good practices when using shuf in scripts

When you integrate shuf into shell scripts, consider the following points:

  • Always check the exit status of shuf to detect errors, especially when reading from files that might be missing or inaccessible.
  • Use absolute paths or reliable relative paths to avoid surprises when the script is executed from different directories.
  • If you need reproducible results for testing, export a fixed value for GNU _ SHUF _ RANDOM _ SOURCE at the beginning of the script.
  • Be minimal of temporary files; if you create intermediate files, ensure they are cleaned up in a trap or at the end of the script.
  • When dealing with user-supplied input, validate that the data does not contain unexpected null bytes if you are not using the -z option.

Conclusion

The shuf command is a simple but powerful tool for any Linux user who needs to randomize text lines. Its clear syntax, combined with useful options such as -n, -e, -i and -repeat, makes it an indispensable ally for testing, data generation, drawing and task automation. By understanding its functioning and limitations, you can effectively integrate it into scripts and daily workflows, saving time and ensuring unpredictable results when you need them.

This work is under aCreative Commons License Attribution 4.0 International for Francesc Roig francesc @ vivaldi.net.

EnglishenEnglishEnglish