El comando awk en Linux: procesamiento avanzado de texto

Introduction

The awk command is one of the most powerful tools in the Unix / Linux environment for text processing. Although its syntax may seem cryptic at first, it allows for rapid filtering, transformation and data analysis without the need to write complex scripts in other languages. In this article we will explore from basic concepts to advanced techniques that will make the most of awk.

Basic syntax

The general form of awk is:awk 'patrón { acción }' archivo. If the pattern is omitted, the action is run on each input line. If the action is omitted, the full line is printed by default. The fields are accessed with $1, $2, etc.; and $0 represents the whole line.

Field separators

By default awk uses any amount of spaces or tabulations as a separator. It can be changed with the -F option or the internal variable FS. For example, to process a CSV file is usedawk -F',' '{print $1,$3}' file.csv. Regular expressions can also be used as a separator, which is useful when the fields are divided by several characters.

Incorporated variables

Awk provides several useful variables: NR (current registration number), NF (number of fields in the current registry), FILENAME (name of the input file), OFS (output field separator) and ORS (output registry separator). Modifying OFS and ORS allows you to change the output format without affecting the processing logic.

Control structures

Awk supports if-else, loops for and while, and associative arrangements. This allows for more complex accumulations, counts and operations. For example, to count how many times each word appears in a text you can use:

awk '{for(i=1;i<=NF;i++) freq[$i]++} END {for(w in freq) print w,freq[w]}' file

The END block is run after processing the entire input, ideal for printing summaries.

Integrated functions

Awk includes mathematical, chain and time functions. Some of the most used are length () (length of a chain), substance () (extract a subchain), match () (regular expression), topper () and tolower (). You can also define your own functions with the key word function.

Practical examples

Logus analysis

Suppose an Apache access file in format: IP - - [date] "protocol route method" status size. To get the IP that most requests made:

awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

Or, directly with awk:

awk '{ip[$1]++} END {for(i in ip) print ip[i],i}' access.log | sort -nr

CSV processing

To remove the second and fourth field from a CSV file and change the output separator to point and coma:

awk -F',' '{OFS=";"; print $2,$4}' datos.csv

Filtered by condition

Show only the lines where the third field is greater than 100:

awk '$3 > 100' file.txt

Tips and tricks

Use -v to pass shell to awk variables:awk -v lim=50 '$2 > lim' file
Combine awk with other commands by piping to create powerful workflows.
When you need numerical accuracy, use the sprintf function to control the floating point format.
Save awk scripts in files with .awk extension and run them withawk -f script.awk archivofor greater readability.

Conclusion

Awk remains an indispensable tool for any system manager, developer or data analyst working in Linux environments. Its ability to combine patterns, actions and structured programming makes it a Swiss knife for text processing. Practice the examples presented and explore your official documentation will allow you to automate tasks that previously required longer and more complex scripts.