Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Punctuation characters aren't stripped from auto-generated heading ID attributes #22248

Open
1 task done
matthewmcvickar opened this issue Feb 20, 2025 · 2 comments
Open
1 task done
Labels
needs:triage [triage] this needs to be triaged by the Ghost team

Comments

@matthewmcvickar
Copy link

matthewmcvickar commented Feb 20, 2025

Issue Summary

The Problem

Headings automatically include an id attribute with a lowercased and dashed 'slug'-ish version of the heading text. E.g., the second-level heading 'My Favorite Book' will be rendered as <h2 id="my-favorite-book">.

This transformation also strips out a number of non alphanumeric characters and encodes non-ASCII ones. E.g., this heading:

"What's my favorite book?" you ask? Why, 'Moby Dick' of course! (I smile/laugh.)

…is turned into this HTML:

<h2 id="whats-my-favorite-book-you-ask-why-moby-dick-of-course-i-smilelaugh">"What's my favorite book?" you ask? Why, 'Moby Dick' of course! (I smile/laugh.)</h2>

As you can see, the punctuation is stripped out: single and double quotation marks, question marks exclamation points, commas, parentheses, commas, periods.

But only some punctuation is stripped out. If I use curly/fancy/typographer's quotation marks or other punctuation or special characters, they are encoded instead of stripped. E.g., this heading:

“It’s me,” I said.

Is turned into this HTML:

<h2 id="%E2%80%9Cit%E2%80%99s-me%E2%80%9D-i-said">“It’s me,” I said.</h2>

Why It's a Problem

I see two problems here:

  1. The anchor URLs for linking to these headings are ugly and hard to read.
  2. The anchor URLs for these headings are not easy to guess, which means that editors who are trying to link to headings further down the page don't know what to put for URLs for internal links.

The Request

Would you consider stripping more characters from the heading id attribute?

In my testing, the following characters are removed from heading id attributes:
' " ; , . < > / \ ? ! [ ] ( ) { } @ # $ % ^ & * = _ + ~

But these characters are not removed:
‘ ’ “ ” ` ¡ ¿ - – — •

Related Tickets

This has been brought up before, in #13876 and #14179, but those tickets were closed because it is intentional that characters are encoded so that "when links or URLs are displayed by browsers they will appear as native characters."

I understand this goal, but I don't think punctuation should be preserved, and I think the characters listed above could safely be removed from these attribute values without causing problems or losing important information.

Steps to Reproduce

  1. In a post, make a new heading (e.g., a second-level heading).
  2. Use any of these special characters in a heading: ‘ ’ “ ” ` ¡ ¿ - – — • (e.g., “It’s me—” I said)
  3. Publish the post.
  4. In the published post, inspect the HTML for the heading.
  5. Note that the heading's id attribute is full of encoded punctuation characters.

Ghost Version

5.109.2

Node.js Version

18.20.5

How did you install Ghost?

macOS Sequoia 15.3.1, ghost-cli, ghost install local

Database type

MySQL 5.7

Browser & OS version

n/a

Relevant log / error output

n/a

Code of Conduct

  • I agree to be friendly and polite to people in this repository
@github-actions github-actions bot added the needs:triage [triage] this needs to be triaged by the Ghost team label Feb 20, 2025
@cathysarisky
Copy link
Contributor

cathysarisky commented Feb 20, 2025

to point 2 - a good enhancement would be exposing the header ids directly in the post editor in some way.

@kevinansfield
Copy link
Member

The related code can be found here https://github.com/TryGhost/Koenig/blob/main/packages/kg-utils/lib/slugify.js#L23-L30. PRs are always welcome 🙂

Something to note is that this would be a breaking change so it would need to be added as a new version conditional in that util otherwise links to content created in earlier Ghost versions can break if that content is edited or re-rendered with a different slugify behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:triage [triage] this needs to be triaged by the Ghost team
Projects
None yet
Development

No branches or pull requests

3 participants