Usage
- Usage: how to run JSON.awk?
- Do I need to care about the she-bang?
- How to parse multiple JSON data files as a single unit?
Applications
- Is mawk supported (Debian/Ubuntu)?
- It doesn't work with mawk (large input file)
- How to fix error: mawk: JSON.awk: line NNN: function cb_somename never defined?
TL;DR
awk -f JSON.awk 1.json [2.json ...]
gawk -f JSON.awk 1.json [2.json ...]
mawk -f callbacks.awk -f JSON.awk 1.json [2.json ...]
echo -e "1.json\n2.json" | awk -f JSON.awk
cat 1.json | awk -f JSON.awk "-" [2.json ...]
awk -v BRIEF=0 -f JSON.awk 1.json
Read the docs
The she-bang is the first line of file JSON.awk and reads
#!/usr/bin/awk -f
but could also be changed to
#!/bin/awk -f
or one of several other forms supported by your operating system.
The default value was chosen for performance reasons. Both binaries could be
installed on your system. Many Linux distributions link /bin/awk
to
/bin/busybox
, and /usr/bin/awk
to either /usr/bin/gawk
or
/usr/bin/mawk
. Busybox awk is under-powered and takes much longer to run
JSON.awk than gawk and mawk do on identical data.
Yes. JSON.awk is reported to work with mawk 1.3.4 20150503 and 20161120. Version 1.3.3 is known not to work. Please upgrade mawk to a supported version.
By default, JSON.awk parses each input file separately from all other input files. Therefore, for each input file it resets its internal data structures, and restarts from zero all ouput array indices. If your application needs to parse all data files as a single JSON object, you have two options:
- Pipe all data as a single JSON object as illustrated by the last notation shown at the end of QA 1 section Piping Data.
- Modify function
reset()
in file JSON.awk.
TL;DR
awk -v STREAM=0 -f my-callbacks.awk -f JSON.awk 1.json
Read the docs
I do not recommend running JSON.awk with mawk on large input files (1+ MB) because mawk shows serious limitations on my Linux test system (mawk 1.3.4 20171017, sprintf buffer size 8192). I noticed at least two issues:
- Mawk complains that its internal sprintf buffer is too small.
Solution:
mawk -Wsprintf=<new size>...
. - Mawk seems stuck. It isn't. It just takes a very long time to process some regular expressions. When this happens, eventually mawk will silently drop the ball, which then results in a parse error message. Solution: use gawk (recommended) or busybox awk. They both can handle large input files (tested with 3+ MB JSON text input).
Nothing's wrong with mawk nor JSON.awk. This error message is just an unfortunate consequence of mawk's parser design. Run
mawk -f callbacks.awk -f JSON.awk 1.json
to shut off the error message. Read section Mawk of the docs to know why this works.
Since JSON.awk version 1.4.1 the source code must be patched in order to run
under busybox awk. The patch is very simple: replace the literal string \000
(four characters) with the literal string \001
everywhere in file JSON.awk.
Busybox awk does not support the NUL character. However, the JSON spec
considers NUL a valid input character. So long as your input JSON texts do
not include NUL characters, you will not notice a difference between the
patched and unpatched source code. To apply the patch you can run script
patch-for-busybox-awk.sh
in the root folder of the repository.