Scripting in Unix

A single line of script (AWK) for extracting results from a text file can be done with AWK.

for example:
If you have a text file containing some results like the following and you are interested in finding out the number of packets. You may need to manually copy all the values for each line and paste in the excel sheet or any other graph plotting file. But you can extract that data using one line of script. Read on...its really simple.

CSThreshold: 3.652128e-10, packets: 21270, countBytes: 21780480, Num_Collisions: 19331
CSThreshold: 3.121854e-10, packets: 16057, countBytes: 16442368, Num_Collisions: 38348
CSThreshold: 2.684423e-10, packets: 15345, countBytes: 15713280, Num_Collisions: 42943
CSThreshold: 2.320993e-10, packets: 15110, countBytes: 15472640, Num_Collisions: 27455
CSThreshold: 2.017038e-10, packets: 15473, countBytes: 15844352, Num_Collisions: 16108
CSThreshold: 1.761250e-10, packets: 15692, countBytes: 16068608, Num_Collisions: 2641
 

prompt$ cat grid-output40.txt | awk '{print $4}' RS="\n" FS=",*:*"

This script extracts a column field from the above output file


The script will print the value of the packets without the commas. sample output is as follows:

21270
16057
15345
15110
15473
15692


This is simple and powerful way of parsing a file without much effort and saving a lot of time.

The parts of the command are explained below:

RS stands for Record separator i.e. how the records are separated. Here i have given "\n\n", which means that each record is separated by two new lines.

FS stands for Field separator and can be any regular expression. Here i have set that to ",*:*", which means each field may be separated by a comma or colon. It also takes the default value i.e. white space.

$4 indicates the fourth value in fourth column.


Both these variables have default values and need to be mentioned explicitly if you want to have a different separators. Default value for RS is "\n" and for FS is white space.

Suppose if the lines in the text file are separated by blank lines then you can modify the Record separator value to include the another "\n".