Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ingestion input to JSON lines format #639

Merged
merged 8 commits into from
Feb 24, 2025

Conversation

ohltyler
Copy link
Member

@ohltyler ohltyler commented Feb 21, 2025

Description

This PR updates the ingestion input format to JSON lines format: https://jsonlines.org/. This is a more standard data file format for persisting JSON data, and makes it easier for importing.

More details:

  • adds new JsonLinesField component, which is similar to JsonField but has specific logic around error handling and formatting to be specific to JSON lines.
  • changes file upload type extension to be *.jsonl instead of JSON inputs
  • adds new JSON lines config field type and corresponding schema/value presets
  • updates all instances of the ingest docs parsing to handle a JSON lines string instead of a JSON array string
  • bug fix: missing parameters on processorConfigsToTemplateProcessors. Externally, this has no impact.

Demo video, showing the new input type, validation, error handling, and successful ingestion:

screen-capture.26.webm

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ohltyler
Copy link
Member Author

@ohltyler ohltyler marked this pull request as draft February 21, 2025 21:05
@kamingleung
Copy link

@ohltyler Can we add a learn more external link in the helper text? Users who are not familiar with JSON lines can learn more about it.
Something like:
Documents must be in JSON lines form. Learn more[icon]
Goes to https://jsonlines.org/

@ohltyler
Copy link
Member Author

@ohltyler Can we add a learn more external link in the helper text? Users who are not familiar with JSON lines can learn more about it. Something like: Documents must be in JSON lines form. Learn more[icon] Goes to https://jsonlines.org/

Yes, good idea :)

Screenshot 2025-02-21 at 1 35 25 PM

@ohltyler
Copy link
Member Author

Dynamically display multiple error messages if found (currently defaulting to 3):

Screenshot 2025-02-21 at 3 48 05 PM

Signed-off-by: Tyler Ohlsen <[email protected]>
@ohltyler ohltyler marked this pull request as ready for review February 22, 2025 00:10
@kamingleung
Copy link

Dynamically display multiple error messages if found (currently defaulting to 3):

Screenshot 2025-02-21 at 3 48 05 PM

@ohltyler Nice, does the list update dynamically if users fixes an error or a new error came up?

@ohltyler
Copy link
Member Author

Dynamically display multiple error messages if found (currently defaulting to 3):
Screenshot 2025-02-21 at 3 48 05 PM

@ohltyler Nice, does the list update dynamically if users fixes an error or a new error came up?

The list is cleared until users click out of focus again - this keeps any performance or lag issues from occurring, in addition to low-level details like line numbers becoming stale as users make changes.

@kamingleung
Copy link

@ohltyler Do you have a quick demo video on multiple error messages?

Inline validations feels much more clear and intuitive to me. Although this coder editor doesn't support inline validation, we should invest into an editor that support modern code editing patterns.

However, this shouldn't be a blocker to JSON lines.

@ohltyler
Copy link
Member Author

@ohltyler Do you have a quick demo video on multiple error messages?

Inline validations feels much more clear and intuitive to me. Although this coder editor doesn't support inline validation, we should invest into an editor that support modern code editing patterns.

However, this shouldn't be a blocker to JSON lines.

screen-capture.27.webm

Sure. And strongly agree, which is why I've brought this up again in opensearch-project/OpenSearch-Dashboards#2875

@ohltyler ohltyler merged commit a484199 into opensearch-project:2.x Feb 24, 2025
5 of 6 checks passed
@ohltyler ohltyler deleted the jsonlines branch February 24, 2025 17:37
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 24, 2025
Signed-off-by: Tyler Ohlsen <[email protected]>
(cherry picked from commit a484199)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ohltyler pushed a commit that referenced this pull request Feb 24, 2025
(cherry picked from commit a484199)

Signed-off-by: Tyler Ohlsen <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport main enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants