Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal to add 'error' function #27

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions proposals/0013-error-function.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Add 'error' function

|||
|---|---
| **JEP** | 20
| **Author** | Sebastien Rosset
| **Created**| 22-March-2023
| **[SemVer](https://semver.org/spec/v2.0.0.html#summary)** | MINOR

## Abstract

This JEP proposes a new function `error()` that raises a runtime error.
This could be used to detect and raise an error when input JSON documents have unexpected data.


## Motivation

AFAIK, there is no dedicated function to raise a runtime error within a JMESpath expression. Having a new `error()` function would help to catch errors at the specific point of the JMESpath expression where the error is detected. See "Concrete use cases" section for a complete ETL pipeline example.

### Example

Suppose we have the following JSON documents:
```json
{ "id": "eth0", "status": "up" }
{ "id": "eth1", "status": "down" }
```

In this example, the values of `status` are either `up` or `down`.
Suppose the JMESpath author wants to:
1. Map the values of `status` to `0` and `1`.
1. Raise an error when an expected `status` value is encountered. For example, when the value of `status` is set to `degraded`.

```
((status == 'up') && `1`) || ((status == 'down') && `0`) || error(join(" ", ["invalid value", status]))
```

### Concrete use cases

Using JMEspath in a ETL pipeline that extracts data from a large and diverse set of endpoints (devices and services).

1. The data is obtained from the endpoints using several methods. This may include REST APIs, CLI commands that produce JSON output or semi-structured text output, ansible playbooks, CSV files, or XML content.
1. For example, ansible modules may be used to return values in [JSON format](https://docs.ansible.com/ansible/2.9/user_guide/modules_intro.html).
1. The JSON data produced by CLI commands and ansible tasks can be very dynamic, e.g., ansible may invoke arbitrary shell commands where the task return values depends on the shell command output.
1. Data is converted to JSON format using a variety of tools.
1. JMESpath is then used to extract/filter data from the JSON documents.

Authors who write the ETL pipelines (including writing JMESpath expressions) may want to catch unexpected conditions at various points of the pipeline, such as unexpected input. When errors are detected, actions can be taken to enhance the ETL pipeline. This could be done by improving a JMESpath expression or any other part of the ETL pipeline (e.g., change CLI commands).

For example, the author of a JMESpath expression may know an input JSON document contains a `status` property. She needs to extract and transform `status`, but she is not sure what are all possible input values for that property. She wants to write the JMESpath expression defensively such that she can catch errors and report them. She may start with with following JMESpath expression:

```
((status == 'up') && `1`) || ((status == 'down') && `0`)
```

Because she is not 100% sure the ETL pipeline will always produce input values `up` and `down`, she needs to be able to catch the error at the specific JMESpath statement where the unexpected condition has been encountered.

## Specification

Add a new `error` function that takes a string expression.
If the `error` function is evaluated, an error is raised with the specified string message.

## Other options that have been considered

Without support for the `error` function in JMESpath expressions, an approach is to use separate tools for validation. For example, write a JSON schema that specifies two enum values for the `status` property. However, users have to write the validation logic in a different language. The user must replicate the same runtime evaluation logic as the JMESpath expression. In the above example with the `status` property, the logic is simple (two enum values) but in other cases, the JMESpath expression may reach an unexpetected condition with more complex scenarios. This may lead to problems where the validation logic does not match the JMESpath logic. Furthermore, when the ETL pipeline contains thousands of JMESpath expressions, it becomes very difficult to ensure validation performed by external tools matches expectations of JMESpath expressions, especially as the ETL pipeline evolves over time.

1. When appropriate, use a JSON schema to validate the input. However, this may not always be the right tool:
1. The JSON schema may not be available.
1. Constructing a JSON schema may be difficult when the input data is generated by a 3rd-party component.
1. JSON schema validation may have a significant runtime cost, especially for large documents with a complex schema.
1. Leverage the fact that some JMESpath expressions are invalid, e.g., `to_number('bad')` would raise an error. However, this is a kludgy workaround.

```
((status == 'up') && `1`) || ((status == 'down') && `0`) || to_number('bad')
```

Another option is to fork. It's very easy to fork `go-jmespath` and add a new `error()` function. This can be done in less than 20 lines, but it's not possible to do it without maintaining a fork. Forking projects comes with its own set of problems with few benefits to the community.

Another approach would be to support function extensibility in the JMESpath specification. This is similar to what exists in many Domain Specific Languages (DSL) such as OData, rule-based languages and static analysis tools. In these DSLs, the user can define custom functions to solve domain-specific problems. The JMESpath libraries would allow users to add functions without forking the git repos.

Yet another approach is to return a JSON document which is unique enough across all the JMESpath expression in the ETL pipeline to indicate there is an error. For example return `{ "error": "'status' has invalid value: 'degraded'", "path": "status" }`. The caller must somehow determine whether the returned JSON document represents an error or a valid use case. However, in some ETL cases, the JMESpath expression may be used to parse well-formed JSON documents and extract errors, so it can be confusing to distinguish between field extraction problems versus valid input.