Skip to content

Commit 18c0a0f

Browse files
updates for version 1.4
1 parent 9a626f3 commit 18c0a0f

7 files changed

+62
-14
lines changed

Version_changes.md

+11
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
<br>
22

3+
### 1.4
4+
5+
* Added example for `NF` value when input line doesn't contain the input field separator or if it is empty.
6+
* Added example which uses both `nextfile` and `ENDFILE`.
7+
* Added example for working with floating-point numbers according to locale formatting.
8+
* Clarified use of `\0` with `gensub` function.
9+
* Updated error message for file not found.
10+
* Added further reading links for regexp metacharacter escaping and `NR==FNR` alternatives.
11+
12+
<br>
13+
314
### 1.3
415

516
* Added note regarding use of `NR==FNR` if the first file is empty

code_snippets/Field_separators.sh

+2
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ echo 'one;two;three;four' | awk -F';' '{print $3}'
4040

4141
echo '=a=b=c=' | awk -F= '{print $1 "[" $NF "]"}'
4242

43+
printf '\nhello\napple,banana\n' | awk -F, '{print NF}'
44+
4345
echo 'goal:amazing:whistle:kwality' | awk -v FS=: '{print $2}'
4446

4547
echo '1e4SPT2k6SPT3a5SPT4z0' | awk 'BEGIN{FS="SPT"} {print $3}'

code_snippets/Gotchas_and_Tips.sh

+6
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,12 @@ awk '{sum += $1} END{print sum}' /dev/null
6464

6565
awk '{sum += $1} END{print +sum}' /dev/null
6666

67+
echo '3.14' | awk '{$0++} 1'
68+
69+
echo '3,14' | awk '{$0++} 1'
70+
71+
echo '3,14' | LC_NUMERIC=de_DE awk -N '{$0++} 1'
72+
6773
## Forcing string context
6874

6975
echo '5 5.0' | awk '{print ($1==$2 ? "same" : "different"), "number"}'

code_snippets/Multiple_file_input.sh

+3
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ awk '/I/{print FILENAME; nextfile}' f[1-3].txt greeting.txt
1111
awk 'BEGINFILE{m1=m2=0} /o/{m1=1} /at/{m2=1}
1212
m1 && m2{print FILENAME; nextfile}' f[1-3].txt greeting.txt
1313

14+
awk 'BEGINFILE{m1=m2=0} /o/{m1=1; nextfile} /at/{m2=1}
15+
ENDFILE{if(!m1 && m2) print FILENAME}' f[1-3].txt greeting.txt
16+
1417
## ARGC and ARGV
1518

1619
awk 'BEGIN{for(i=0; i<ARGC; i++) print ARGV[i]}' f[1-3].txt greeting.txt

code_snippets/Processing_multiple_records.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ awk 'n && n--; /language/{n=1}' context.txt
2020

2121
awk '!n && /toy|flower/{n=2; next} n && n--' context.txt
2222

23-
awk -v n=2 'a[NR-n]; /toy|flower/{a[NR]=1}' context.txt
23+
awk -v n=2 'NR in a; /toy|flower/{a[NR+n]}' context.txt
2424

2525
awk 'n && !--n; /language/{n=3}' context.txt
2626

gnu_awk.md

+39-13
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Resources mentioned in Acknowledgements section are available under original lic
6464

6565
## Book version
6666

67-
1.3
67+
1.4
6868

6969
See [Version_changes.md](https://github.com/learnbyexample/learn_gnuawk/blob/master/Version_changes.md) to track changes across book versions.
7070

@@ -316,7 +316,7 @@ There are some more types of blocks that can be used, you'll see them in coming
316316

317317
## Strings and Numbers
318318

319-
Some examples so far have already used string and numeric literals. As mentioned earlier, `awk` tries to provide a concise way to construct a solution from the command line. The data type of a value is determined based on the syntax used. String literals are represented inside double quotes. Numbers can be integers or floating point. Scientific notation is allowed as well. See [gawk manual: Constant Expressions](https://www.gnu.org/software/gawk/manual/gawk.html#Constants) for more details.
319+
Some examples so far have already used string and numeric literals. As mentioned earlier, `awk` tries to provide a concise way to construct a solution from the command line. The data type of a value is determined based on the syntax used. String literals are represented inside double quotes. Numbers can be integers or floating-point. Scientific notation is allowed as well. See [gawk manual: Constant Expressions](https://www.gnu.org/software/gawk/manual/gawk.html#Constants) for more details.
320320

321321
```bash
322322
$ # BEGIN{} is also useful to write awk program without any external input
@@ -1030,7 +1030,7 @@ A **named character set** is defined by a name enclosed between `[:` and `:]` an
10301030
| `[:alpha:]` | `[a-zA-Z]` |
10311031
| `[:alnum:]` | `[0-9a-zA-Z]` |
10321032
| `[:xdigit:]` | `[0-9a-fA-F]` |
1033-
| `[:cntrl:]` | control characters - first 32 ASCII characters and 127th (DEL) |
1033+
| `[:cntrl:]` | control characters first 32 ASCII characters and 127th (DEL) |
10341034
| `[:punct:]` | all the punctuation characters |
10351035
| `[:graph:]` | `[:alnum:]` and `[:punct:]` |
10361036
| `[:print:]` | `[:alnum:]`, `[:punct:]` and space |
@@ -1157,9 +1157,9 @@ $ echo '1 good 2 apples' | awk '{$4 = gensub(/[aeiou]/, "X", "g", $4)} 1'
11571157

11581158
## Backreferences
11591159

1160-
The grouping metacharacters `()` are also known as **capture groups**. They are like variables, the string captured by `()` can be referred later using backreference `\N` where `N` is the capture group you want. Leftmost `(` in the regular expression is `\1`, next one is `\2` and so on up to `\9`. As a special case, `\0` or `&` metacharacter represents entire matched string. As `\` is special inside double quotes, you'll have to use `"\\1"` to represent `\1`.
1160+
The grouping metacharacters `()` are also known as **capture groups**. They are like variables, the string captured by `()` can be referred later using backreference `\N` where `N` is the capture group you want. Leftmost `(` in the regular expression is `\1`, next one is `\2` and so on up to `\9`. As a special case, `&` metacharacter represents entire matched string. As `\` is special inside double quotes, you'll have to use `"\\1"` to represent `\1`.
11611161
1162-
>![info](images/info.svg) Backreferences of the form `\N` can only be used with `gensub` function. `&` can be used with `sub`, `gsub` and `gensub` functions.
1162+
>![info](images/info.svg) Backreferences of the form `\N` can only be used with `gensub` function. `&` can be used with `sub`, `gsub` and `gensub` functions. `\0` can also be used instead of `&` with `gensub` function.
11631163
11641164
```bash
11651165
$ # reduce \\ to single \ and delete if it is a single \
@@ -1171,8 +1171,7 @@ $ # duplicate first column value as final column
11711171
$ echo 'one,2,3.14,42' | awk '{print gensub(/^([^,]+).*/, "&,\\1", 1)}'
11721172
one,2,3.14,42,one
11731173
1174-
$ # add something at start and end of string
1175-
$ # as only '&' is used, gensub isn't needed here
1174+
$ # add something at start and end of string, gensub isn't needed here
11761175
$ echo 'hello world' | awk '{sub(/.*/, "Hi. &. Have a nice day")} 1'
11771176
Hi. hello world. Have a nice day
11781177

@@ -1271,7 +1270,7 @@ $ echo '23 154 12 26 34' | awk -v ip="$r" '{gsub(ip, "X")} 1'
12711270
X 154 X X 34
12721271
```
12731272
1274-
>![info](images/info.svg) See [Using shell variables](#using-shell-variables) chapter for a way to avoid having to use `\\` instead of `\`.
1273+
>![info](images/info.svg) See [Using shell variables](#using-shell-variables) chapter for a way to avoid having to escape backslashes.
12751274
12761275
Sometimes, you need to get user input and then treat it literally instead of regexp pattern. In such cases, you'll need to first escape the metacharacters before using in substitution functions. Below example shows how to do it for search section. For replace section, you only have to escape the `\` and `&` characters.
12771276
@@ -1289,6 +1288,8 @@ $ echo 'f*(a^b) - 3*(a^b)' |
12891288
f*(a^b) - 3*c
12901289
```
12911290
1291+
>![info](images/info.svg) See [my blog post](https://learnbyexample.github.io/escaping-madness-awk-literal-field-separator/) for more details about escaping metacharacters.
1292+
12921293
>![info](images/info.svg) If you need to match instead of substitution, you can use the `index` function. See [index](#index) section for details.
12931294
12941295
## Summary
@@ -1526,6 +1527,7 @@ $ awk '{print $2.999999999999999}' table.txt
15261527
bread
15271528
cake
15281529
banana
1530+
15291531
$ # same as: awk '{print $3}' table.txt
15301532
$ awk '{print $2.9999999999999999}' table.txt
15311533
mat
@@ -1551,6 +1553,12 @@ three
15511553
$ # first and last fields will have empty string as their values
15521554
$ echo '=a=b=c=' | awk -F= '{print $1 "[" $NF "]"}'
15531555
[]
1556+
1557+
$ # difference between empty lines and lines without field separator
1558+
$ printf '\nhello\napple,banana\n' | awk -F, '{print NF}'
1559+
0
1560+
1
1561+
2
15541562
```
15551563
15561564
You can also directly set the special `FS` variable to change the input field separator. This can be done from the command line using `-v` option or within the code blocks.
@@ -3512,7 +3520,7 @@ yellow banana window shoes 3.14
35123520
35133521
## nextfile
35143522
3515-
`nextfile` will skip remaining records from current file being processed and move on to the next file.
3523+
`nextfile` will skip remaining records from the current file being processed and move on to the next file.
35163524
35173525
```bash
35183526
$ # print filename if it contains 'I' anywhere in the file
@@ -3526,6 +3534,11 @@ $ awk 'BEGINFILE{m1=m2=0} /o/{m1=1} /at/{m2=1}
35263534
m1 && m2{print FILENAME; nextfile}' f[1-3].txt greeting.txt
35273535
f2.txt
35283536
f3.txt
3537+
3538+
$ # print filename if it contains 'at' but not 'o'
3539+
$ awk 'BEGINFILE{m1=m2=0} /o/{m1=1; nextfile} /at/{m2=1}
3540+
ENDFILE{if(!m1 && m2) print FILENAME}' f[1-3].txt greeting.txt
3541+
f1.txt
35293542
```
35303543
35313544
>![warning](images/warning.svg) `nextfile` cannot be used in `BEGIN` or `END` or `ENDFILE` blocks. See [gawk manual: nextfile](https://www.gnu.org/software/gawk/manual/gawk.html#Nextfile-Statement) for more details, how it affects `ENDFILE` and other special cases.
@@ -3725,7 +3738,8 @@ $ awk '!n && /toy|flower/{n=2; next} n && n--' context.txt
37253738
$ # print only the 2nd line found after matching line
37263739
$ # the array saves matching result for each record
37273740
$ # doesn't rely on a counter, thus works for overlapping cases
3728-
$ awk -v n=2 'a[NR-n]; /toy|flower/{a[NR]=1}' context.txt
3741+
$ # same as: awk -v n=2 'a[NR-n]; /toy|flower/{a[NR]=1}'
3742+
$ awk -v n=2 'NR in a; /toy|flower/{a[NR+n]}' context.txt
37293743
sand stone
37303744
light blue
37313745
water
@@ -4144,7 +4158,6 @@ teal
41444158
light blue
41454159
green
41464160
yellow
4147-
41484161
$ cat color_list2.txt
41494162
light blue
41504163
black
@@ -4179,7 +4192,7 @@ teal
41794192
green
41804193
```
41814194
4182-
>![warning](images/warning.svg) Note that the `NR==FNR` logic will fail if the first file is empty.
4195+
>![warning](images/warning.svg) Note that the `NR==FNR` logic will fail if the first file is empty. See [this unix.stackexchange thread](https://unix.stackexchange.com/a/237110/109046) for workarounds.
41834196
41844197
## Comparing fields
41854198
@@ -4303,7 +4316,7 @@ If a file is passed as argument to `awk` command and cannot be opened, you get a
43034316
43044317
```bash
43054318
$ awk '{print $2}' xyz.txt
4306-
awk: fatal: cannot open file `xyz.txt' for reading (No such file or directory)
4319+
awk: fatal: cannot open file `xyz.txt' for reading: No such file or directory
43074320
```
43084321
43094322
It is recommended to always check for return value when using `getline` or perhaps use techniques from previous sections to avoid `getline` altogether.
@@ -4896,6 +4909,19 @@ $ awk '{sum += $1} END{print +sum}' /dev/null
48964909
0
48974910
```
48984911
4912+
The `-N` option (or `--use-lc-numeric`) is useful to work with floating-point numbers based on the current locale.
4913+
4914+
```bash
4915+
$ # my locale uses . for decimal point
4916+
$ echo '3.14' | awk '{$0++} 1'
4917+
4.14
4918+
4919+
$ echo '3,14' | awk '{$0++} 1'
4920+
4
4921+
$ echo '3,14' | LC_NUMERIC=de_DE awk -N '{$0++} 1'
4922+
4,14
4923+
```
4924+
48994925
## Forcing string context
49004926
49014927
Concatenate empty string to force string comparison.

sample_chapters/gnu_awk_sample.pdf

406 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)