You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gnu_awk.md
+39-13
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ Resources mentioned in Acknowledgements section are available under original lic
64
64
65
65
## Book version
66
66
67
-
1.3
67
+
1.4
68
68
69
69
See [Version_changes.md](https://github.com/learnbyexample/learn_gnuawk/blob/master/Version_changes.md) to track changes across book versions.
70
70
@@ -316,7 +316,7 @@ There are some more types of blocks that can be used, you'll see them in coming
316
316
317
317
## Strings and Numbers
318
318
319
-
Some examples so far have already used string and numeric literals. As mentioned earlier, `awk` tries to provide a concise way to construct a solution from the command line. The data type of a value is determined based on the syntax used. String literals are represented inside double quotes. Numbers can be integers or floatingpoint. Scientific notation is allowed as well. See [gawk manual: Constant Expressions](https://www.gnu.org/software/gawk/manual/gawk.html#Constants) for more details.
319
+
Some examples so far have already used string and numeric literals. As mentioned earlier, `awk` tries to provide a concise way to construct a solution from the command line. The data type of a value is determined based on the syntax used. String literals are represented inside double quotes. Numbers can be integers or floating-point. Scientific notation is allowed as well. See [gawk manual: Constant Expressions](https://www.gnu.org/software/gawk/manual/gawk.html#Constants) for more details.
320
320
321
321
```bash
322
322
$ # BEGIN{} is also useful to write awk program without any external input
@@ -1030,7 +1030,7 @@ A **named character set** is defined by a name enclosed between `[:` and `:]` an
1030
1030
|`[:alpha:]`|`[a-zA-Z]`|
1031
1031
|`[:alnum:]`|`[0-9a-zA-Z]`|
1032
1032
|`[:xdigit:]`|`[0-9a-fA-F]`|
1033
-
|`[:cntrl:]`| control characters - first 32 ASCII characters and 127th (DEL) |
1033
+
|`[:cntrl:]`| control characters — first 32 ASCII characters and 127th (DEL) |
The grouping metacharacters `()` are also known as **capture groups**. They are like variables, the string captured by `()` can be referred later using backreference `\N` where `N` is the capture group you want. Leftmost `(`in the regular expression is `\1`, next one is `\2` and so on up to `\9`. As a special case, `\0` or `&` metacharacter represents entire matched string. As `\` is special inside double quotes, you'll have to use `"\\1"` to represent `\1`.
1160
+
The grouping metacharacters `()` are also known as **capture groups**. They are like variables, the string captured by `()` can be referred later using backreference `\N` where `N` is the capture group you want. Leftmost `(`in the regular expression is `\1`, next one is `\2` and so on up to `\9`. As a special case, `&` metacharacter represents entire matched string. As `\` is special inside double quotes, you'll have to use `"\\1"` to represent `\1`.
1161
1161
1162
-
> Backreferences of the form `\N` can only be used with `gensub` function. `&` can be used with `sub`, `gsub` and `gensub` functions.
1162
+
> Backreferences of the form `\N` can only be used with `gensub` function. `&` can be used with `sub`, `gsub` and `gensub` functions. `\0` can also be used instead of `&` with `gensub` function.
1163
1163
1164
1164
```bash
1165
1165
$ # reduce \\ to single \ and delete if it is a single \
@@ -1171,8 +1171,7 @@ $ # duplicate first column value as final column
> See [Using shell variables](#using-shell-variables) chapter for a way to avoid having to use `\\` instead of `\`.
1273
+
> See [Using shell variables](#using-shell-variables) chapter for a way to avoid having to escape backslashes.
1275
1274
1276
1275
Sometimes, you need to get user input and then treat it literally instead of regexp pattern. In such cases, you'll need to first escape the metacharacters before using in substitution functions. Below example shows how to do it for search section. For replace section, you only have to escape the `\` and `&` characters.
> See [my blog post](https://learnbyexample.github.io/escaping-madness-awk-literal-field-separator/) for more details about escaping metacharacters.
1292
+
1292
1293
> If you need to match instead of substitution, you can use the `index` function. See [index](#index) section for details.
You can also directly set the special `FS` variable to change the input field separator. This can be done from the command line using `-v` option or within the code blocks.
> `nextfile` cannot be used in `BEGIN` or `END` or `ENDFILE` blocks. See [gawk manual: nextfile](https://www.gnu.org/software/gawk/manual/gawk.html#Nextfile-Statement) for more details, how it affects `ENDFILE` and other special cases.
$ # same as: awk -v n=2 'a[NR-n]; /toy|flower/{a[NR]=1}'
3742
+
$ awk -v n=2 'NR in a; /toy|flower/{a[NR+n]}' context.txt
3729
3743
sand stone
3730
3744
light blue
3731
3745
water
@@ -4144,7 +4158,6 @@ teal
4144
4158
light blue
4145
4159
green
4146
4160
yellow
4147
-
4148
4161
$ cat color_list2.txt
4149
4162
light blue
4150
4163
black
@@ -4179,7 +4192,7 @@ teal
4179
4192
green
4180
4193
```
4181
4194
4182
-
> Note that the `NR==FNR` logic will fail if the first file is empty.
4195
+
> Note that the `NR==FNR` logic will fail if the first file is empty. See [this unix.stackexchange thread](https://unix.stackexchange.com/a/237110/109046) for workarounds.
4183
4196
4184
4197
## Comparing fields
4185
4198
@@ -4303,7 +4316,7 @@ If a file is passed as argument to `awk` command and cannot be opened, you get a
4303
4316
4304
4317
```bash
4305
4318
$ awk '{print $2}' xyz.txt
4306
-
awk: fatal: cannot open file `xyz.txt'for reading (No such file or directory)
4319
+
awk: fatal: cannot open file `xyz.txt'for reading: No such file or directory
4307
4320
```
4308
4321
4309
4322
It is recommended to always check forreturn value when using `getline` or perhaps use techniques from previous sections to avoid `getline` altogether.
0 commit comments