Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markup_parser: skip parsing HTML content when it is empty #161

Merged

Conversation

otegami
Copy link
Contributor

@otegami otegami commented Dec 17, 2024

Issue

This PR resolves the following CI error.

Error:
FullTextSearch::BatchRunnerSynchronizeTest#test_archived_changeset:
RuntimeError: input string cannot be empty
    plugins/full_text_search/lib/full_text_search/markup_parser.rb:20:in `parse'
    plugins/full_text_search/lib/full_text_search/issue_mapper.rb:25:in `upsert_fts_target'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:129:in `block (2 levels) in synchronize_fts_targets_internal'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:335:in `each'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:335:in `each'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:335:in `iterate'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:121:in `block in synchronize_fts_targets_internal'
    plugins/full_text_search/lib/full_text_search/resolver.rb:43:in `each'
    plugins/full_text_search/lib/full_text_search/resolver.rb:43:in `each'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:116:in `synchronize_fts_targets_internal'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:12:in `synchronize'
    plugins/full_text_search/test/unit/full_text_search/batch_runner_synchronize_test.rb:34:in `setup'

bin/rails test plugins/full_text_search/test/unit/full_text_search/batch_runner_synchronize_test.rb:219

Cause

The error occurs because Nokogiri's behavior for handling an empty string changed in version 1.17.0.

  • Before 1.17.0: Nokogiri::HTML::SAX::Parser#parse returned nil when passed an empty string.
  • After 1.17.0: Nokogiri::HTML::SAX::Parser#parse raises a RuntimeError when passed an empty string.

Refer to the following Nokogiri commit for details: sparklemotion/nokogiri@35596e7

Solution

Skip parsing the HTML content when it is empty by adding a presence check.

Issue

This PR resolves the following CI error.

```
Error:
FullTextSearch::BatchRunnerSynchronizeTest#test_archived_changeset:
RuntimeError: input string cannot be empty
    plugins/full_text_search/lib/full_text_search/markup_parser.rb:20:in `parse'
    plugins/full_text_search/lib/full_text_search/issue_mapper.rb:25:in `upsert_fts_target'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:129:in `block (2 levels) in synchronize_fts_targets_internal'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:335:in `each'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:335:in `each'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:335:in `iterate'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:121:in `block in synchronize_fts_targets_internal'
    plugins/full_text_search/lib/full_text_search/resolver.rb:43:in `each'
    plugins/full_text_search/lib/full_text_search/resolver.rb:43:in `each'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:116:in `synchronize_fts_targets_internal'
    plugins/full_text_search/lib/full_text_search/batch_runner.rb:12:in `synchronize'
    plugins/full_text_search/test/unit/full_text_search/batch_runner_synchronize_test.rb:34:in `setup'

bin/rails test plugins/full_text_search/test/unit/full_text_search/batch_runner_synchronize_test.rb:219
```

Cause

The error occurs because Nokogiri's behavior for handling an empty
string changed in version 1.17.0.

- Before 1.17.0: `Nokogiri::HTML::SAX::Parser#parse` returned
  `nil` when passed an empty string.
- After 1.17.0: `Nokogiri::HTML::SAX::Parser#parse` raises a
  `RuntimeError` when passed an empty string.

Refer to the following Nokogiri commit for details:
sparklemotion/nokogiri@35596e7

Solution

Skip parsing the HTML content when it is empty by adding a presence
check.
if html.presence
parser = Nokogiri::HTML::SAX::Parser.new(document)
parser.parse(html)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an early return instead?

def parse
  html = ...
  return ["", []] unless html.present?
  document = Document.new
  ...
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: 1a0fc37 Sure!

@kou kou merged commit 6c3e1dc into clear-code:master Dec 17, 2024
24 checks passed
@otegami otegami deleted the fix-markup-parser-behavior-for-empty-string branch December 17, 2024 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants