The cut command in Linux: extract text columns

Introduction

In the world of system management and data processing, having tools to handle text quickly is essential. One of the most useful and simple commands of Linux iscut, whose main function is to extract columns or fields from a text flow based on delimiters or character positions. Although at first sight it may seem limited, its combination with pipes and other commands makes it a key piece for scripts and log analysis.

Basic syntax

The general form of the command is:

cut OPTION... [FILE]...

If a file is not indicated,cutRead from the standard input, which makes it ideal for use in pipes. The most important options define which part of the text is to be extracted and how the fields are separated.

Most commonly used options

  • -f N: select the field (s) N (e.g.,-f 2for the second field). Several fields separated by commas can be indicated (-f 1,3,5) or ranges (-f 2-4).
  • -d DELIM: establishes the delimiter that separates the fields. By default,cutuse tabulation, but with this option you can specify any character, such as a coma (-d ',') or a point and coma (-d ';').
  • -b LIST: extract bytes according to the LIST list (useful when working with binary or fixed data).
  • -c LIST: extracts characters per position, similar to-bbut counting characters instead of bytes.

Practical examples

Let's imagine a CSV file calleddatos.csvwith the following content:

name, age, ciudJuan, 30, MadridAna, 25, BarcelonaLuis, 28, Sevilla

To get only the column of cities, we use:

cut -d ',' -f 3 datos.csv

This returns:

ciudMadridBarcelonaSevilla

If we want to remove the header and stay alone with the values, we can combine withtail:

cut -d ',' -f 3 datos.csv | tail -n + 2

Another common case is to extract the first and third fields:

cut -d ',' -f 1,3 datos.csv

Outcome:

name, citadJuan, MadridAna, BarcelonaLuis, Sevilla

When the delimiter is not a simple character, like several spaces, we can use-d ' 'and try to reduce multiple spaces withtr -s ' 'before implementationcut.

To work with log files where the information is in fixed positions, the option-cIt's very practical. Suppose each line has a 19-character timestamp followed by a message; to get only the message:

cut -c 20- archivo.log

This shows from character 20 to the end of each line.

Tips and tricks

  • Remember thatcutdoes not handle delimiters that are regular expressions; if you need something more complex, it combines withawk.
  • Use simple quotes around the delimiter to prevent the shell from interpreting special characters.
  • When you work with ranks,-f 2-means from field 2 to the last, while-f -2indicates from the beginning to field 2.
  • To quickly view the structure of a file, testcut -f 1-5 -d ',' archivo.csv | head.
  • In scripts, save the result in a variable:COL2=$(cut -d ';' -f 2 entrada.txt).

Conclusion

The commandcutis a light but powerful tool to extract text columns in Linux. Its simplicity makes it ideal for quick data processing tasks, while its ability to combine with other shell utilities makes it an indispensable component in any management tool box. Dominating your options and knowing when to use it will allow you to save time and write cleaner and more efficient scripts.

This work is under aCreative Commons License Attribution 4.0 International for Francesc Roig francesc @ vivaldi.net.

EnglishenEnglishEnglish