Skip to content

Latest commit

 

History

History
151 lines (104 loc) · 4.29 KB

FAQ.md

File metadata and controls

151 lines (104 loc) · 4.29 KB

Frequently Asked Questions

Usage

Applications

Mawk

Busybox awk

top

1. Usage: how do I run JSON.awk?

TL;DR

awk  -f JSON.awk 1.json [2.json ...]
gawk -f JSON.awk 1.json [2.json ...]
mawk -f callbacks.awk -f JSON.awk 1.json [2.json ...]

echo -e "1.json\n2.json" | awk -f JSON.awk

cat 1.json | awk -f JSON.awk "-" [2.json ...]

awk -v BRIEF=0 -f JSON.awk 1.json

Read the docs

top

2. Do I need to care about the she-bang?

The she-bang is the first line of file JSON.awk and reads

#!/usr/bin/awk -f

but could also be changed to

#!/bin/awk -f

or one of several other forms supported by your operating system.

The default value was chosen for performance reasons. Both binaries could be installed on your system. Many Linux distributions link /bin/awk to /bin/busybox, and /usr/bin/awk to either /usr/bin/gawk or /usr/bin/mawk. Busybox awk is under-powered and takes much longer to run JSON.awk than gawk and mawk do on identical data.

top

3. Is mawk supported (Debian/Ubuntu)?

Yes. JSON.awk is reported to work with mawk 1.3.4 20150503 and 20161120. Version 1.3.3 is known not to work. Please upgrade mawk to a supported version.

top

4. How to parse multiple JSON data files as a single unit?

By default, JSON.awk parses each input file separately from all other input files. Therefore, for each input file it resets its internal data structures, and restarts from zero all ouput array indices. If your application needs to parse all data files as a single JSON object, you have two options:

  • Pipe all data as a single JSON object as illustrated by the last notation shown at the end of QA 1 section Piping Data.
  • Modify function reset() in file JSON.awk.

top

5. How to use JSON.awk in my application?

TL;DR

awk -v STREAM=0 -f my-callbacks.awk -f JSON.awk 1.json

Read the docs

top

6. It doesn't work with mawk (large input file)

I do not recommend running JSON.awk with mawk on large input files (1+ MB) because mawk shows serious limitations on my Linux test system (mawk 1.3.4 20171017, sprintf buffer size 8192). I noticed at least two issues:

  • Mawk complains that its internal sprintf buffer is too small. Solution: mawk -Wsprintf=<new size>....
  • Mawk seems stuck. It isn't. It just takes a very long time to process some regular expressions. When this happens, eventually mawk will silently drop the ball, which then results in a parse error message. Solution: use gawk (recommended) or busybox awk. They both can handle large input files (tested with 3+ MB JSON text input).

7. How to fix error: mawk: JSON.awk: line NNN: function cb_somename never defined?

Nothing's wrong with mawk nor JSON.awk. This error message is just an unfortunate consequence of mawk's parser design. Run

mawk -f callbacks.awk -f JSON.awk 1.json

to shut off the error message. Read section Mawk of the docs to know why this works.

top

8. How to run JSON.awk with busybox awk

Since JSON.awk version 1.4.1 the source code must be patched in order to run under busybox awk. The patch is very simple: replace the literal string \000 (four characters) with the literal string \001 everywhere in file JSON.awk. Busybox awk does not support the NUL character. However, the JSON spec considers NUL a valid input character. So long as your input JSON texts do not include NUL characters, you will not notice a difference between the patched and unpatched source code. To apply the patch you can run script patch-for-busybox-awk.sh in the root folder of the repository.

top