Punctuation characters aren't stripped from auto-generated heading ID attributes #22248
Open
1 task done
Labels
needs:triage
[triage] this needs to be triaged by the Ghost team
Issue Summary
The Problem
Headings automatically include an
id
attribute with a lowercased and dashed 'slug'-ish version of the heading text. E.g., the second-level heading 'My Favorite Book' will be rendered as<h2 id="my-favorite-book">
.This transformation also strips out a number of non alphanumeric characters and encodes non-ASCII ones. E.g., this heading:
…is turned into this HTML:
<h2 id="whats-my-favorite-book-you-ask-why-moby-dick-of-course-i-smilelaugh">"What's my favorite book?" you ask? Why, 'Moby Dick' of course! (I smile/laugh.)</h2>
As you can see, the punctuation is stripped out: single and double quotation marks, question marks exclamation points, commas, parentheses, commas, periods.
But only some punctuation is stripped out. If I use curly/fancy/typographer's quotation marks or other punctuation or special characters, they are encoded instead of stripped. E.g., this heading:
Is turned into this HTML:
<h2 id="%E2%80%9Cit%E2%80%99s-me%E2%80%9D-i-said">“It’s me,” I said.</h2>
Why It's a Problem
I see two problems here:
The Request
Would you consider stripping more characters from the heading
id
attribute?In my testing, the following characters are removed from heading
id
attributes:' " ; , . < > / \ ? ! [ ] ( ) { } @ # $ % ^ & * = _ + ~
But these characters are not removed:
‘ ’ “ ” ` ¡ ¿ - – — •
Related Tickets
This has been brought up before, in #13876 and #14179, but those tickets were closed because it is intentional that characters are encoded so that "when links or URLs are displayed by browsers they will appear as native characters."
I understand this goal, but I don't think punctuation should be preserved, and I think the characters listed above could safely be removed from these attribute values without causing problems or losing important information.
Steps to Reproduce
‘ ’ “ ” ` ¡ ¿ - – — •
(e.g.,“It’s me—” I said
)id
attribute is full of encoded punctuation characters.Ghost Version
5.109.2
Node.js Version
18.20.5
How did you install Ghost?
macOS Sequoia 15.3.1, ghost-cli,
ghost install local
Database type
MySQL 5.7
Browser & OS version
n/a
Relevant log / error output
n/a
Code of Conduct
The text was updated successfully, but these errors were encountered: