Skip to content

Bug: Exported DOM for rich-formatted nodes is malformed #7955

@kettanaito

Description

@kettanaito

The result of $generateHtmlFromNodes(editor) is a malformed HTML string as it contains unnecessary HTML elements, classes, and style attributes.

Lexical version: 0.38.1

Steps To Reproduce

  1. Initiate a Lexical editor with @lexical/rich-text.
  2. Create content that looks like this:
Image
  1. Export the editor's content to HTML using @lexical/html.

Link to code example:

import { registerRichText } from '@lexical/rich-text'

const editor = createEditor({
  name: 'test'
})

registerRichText(editor)

editor.update(() => {
  const paragraph = $createParagraphNode()
  const one = $createTextNode('One .')
  const two = $createTextNode('Two')
  two.setFormat('italic')
  const three = $createTextNode('. Three.')

  paragraph.append(one, two, three)
  $getSelection()?.insertNodes([paragraph])
})

Then, export the content:

import { $generateHtmlFromNodes } from '@lexical/html'

editor.read(() => {
  const html = $generateHtmlFromNodes(editor)
  console.log(html)
})

The current behavior

<p><span style="white-space: pre-wrap;">One. </span><i><em class="italic" style="white-space: pre-wrap;">Two</em></i><span style="white-space: pre-wrap;">. Three.</span></p>

The following criteria lead me to conclude the resulting HTML string is malformed:

  • Redundant <i><em> nesting (the same happens for "bold" format, resulting in the <b><string> nesting).
  • Redundant inline styles on HTML elements that already imply those styles from the user agent: <em class="italic">
  • Redundant inline style="white-space: pre-wrap;" attributes. Wrapping each span in this inline style results in unnecessarily verbose HTML for larger documents. The same inline style behavior should be applied by the user, if they choose so, on the parent element of the HTML.

The expected behavior

I expect for the resulting html to look like this:

<p><span>One. </span><em >Two</em><span>. Three.</span></p>

This represents the content of the editor with semantic and valid HTML tags.

Impact of fix

  • The behavior happens every time, there's no context affecting it.
  • Everybody would benefit from this fix since I assume the intention of $generateHtmlFromNodes is to get the valid HTML representation of the editor's contents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    copy+pasteRelates to Lexical Copy/Pastehtml

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions