Skip to content

Add support for character-level and glyph-level mirroring of RTL operators #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

icesfont
Copy link

@icesfont icesfont commented Jan 6, 2025

Depends on #276.

Please see #67 and https://people.igalia.com/fwang/mathml-operator-mirroring-explainer.html.

This augments the various steps that get the glyph for the character in the "layout of operators" algorithm to additionally mirror the character/glyph in RTL writing modes, using Unicode's BIDI or OpenType's rtlm respectively.

spec.html Outdated
<ul>
<li>
If <code>c</code> can be mirrored, then return
the glyph corresponding to the result of mirroring <code>c</code>, and exit with success. [[BIDI]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please give us some references to the spec definition of "can be mirrored" and "mirroring"? That will help review this patch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- in particular, it's this section of UAX9:

https://www.unicode.org/reports/tr9/#Mirroring

OpenType rtlm falls under HL6, I believe, so no check on Bidi_Mirrored is necessary:

https://www.unicode.org/reports/tr9/#HL6

The "corresponding codepoints" mentioned are given in Data9:

https://www.unicode.org/reports/tr41/tr41-34.html#Data9

spec.html Outdated
the glyph corresponding to the result of mirroring <code>c</code>, and exit with success. [[BIDI]]
</li>
<li>
Otherwise, if there exists an OpenType rtlm variant of the input glyph, return that and exit with success. [[OPEN-FONT-FORMAT]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for rtlm.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec.html Outdated
Otherwise, if there exists an OpenType rtlm variant of the input glyph, return that and exit with success. [[OPEN-FONT-FORMAT]]
</li>
<li>
Otherwise, return the input glyph and exit with success.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm is always successful... so I guess we don't need to specify "with success" here and elsewhere?

I think we indeed don't need to fail when a glyph is not mirrorable in RTL, we should just stretch the non-mirrored glyph.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, I've added this extra fallback step. I initially explicitly specified "with success" since I thought that this could fail if it wasn't possible to get any glyph for a codepoint (i.e. if the font didn't map one), but comparing this to the old version of the spec this should be fine(?).

@icesfont icesfont force-pushed the feat/rtl branch 3 times, most recently from 7ed76b0 to 349f89b Compare January 14, 2025 22:21
@icesfont
Copy link
Author

I forgot to reference it, this is related to this issue: #67

Copy link
Contributor

@fred-wang fred-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that with Harry last week and I believe the text looks good.

The only doubts were when the "algorithm to get a glyph corresponding to a character c given a directionality dir" is going to fail, and how to deal with cases when "OpenType rtlm" and "Bidi_Mirrored" properties interact.

Anyway, Harry is going to experiment more in Chromium and to write tests, so maybe we can have a clearer idea later.

Also the word "typically" in the Bidi_Mirrored case is a bit weird, but apparently that's what they use in BIDI.

@icesfont
Copy link
Author

Regarding the "typically" usage, I agree it is weird -- I was basing it on comments about mirrored codepoints from the BIDI spec:

This data file lists characters that have the Bidi_Mirrored=Yes property
value, for which there is another Unicode character that typically has a glyph
that is the mirror image of the original character's glyph.

Looking back on it now, this doesn't make sense with my changes since the point of that is just to help define what a mirrored codepoint is. I've amended the spec changes accordingly

@khaledhosny
Copy link

Note that text shaping engines (HarfBuzz, DirectWrite, Core Text), all implement this and the behavior here should be the same as regular text, so any implementation that already use a text shaping engine would need to mark the direction of the chunk of text RTL and let the shaping engine handle the mirroring.

@fred-wang
Copy link
Contributor

@khaledhosny At least for HarfBuzz (which is used in MathML implementations of Firefox and Chromium, and some ports of WebKit) we only landed the API to expose the data, not the stretchy op shaping API in harfbuzz/harfbuzz#241 ; IIRC Behdad suggested the stretchy op shaping API should really be a separate project. So for now each browser implement their own shaper for stretchy op, and they need to deal with the mirroring themselves.

For cases when stretchy op is not needed, the spec "fall back to the layout algorithm of 3.2.1.1 Layout of ", so the normal text layout should just work. Maybe we need to have a non-normative note for that and we definitely need to have tests.

@khaledhosny
Copy link

@fred-wang My comment as about getting the base glyph before doing math shaping. Say a parenthesis or root with RTL text. If you are using HarfBuzz or other shaping engine, you don’t need to worry about mirroring or applying rtlm feature manually.

$ hb-shape XITSMath-Regular.otf "(√)"
[parenleft=0+333|uni221A=1+928|parenright=2+333]
$ hb-shape XITSMath-Regular.otf "(√)" --direction=rtl
[parenleft=2+333|uni221A.rtlm=1+928|parenright=0+333]

Note how in the RTL case the parentheses were swapped (parenright correspond to character at index 0 and parenleft correspond to character at index 2), and the rtlm variant of the radical is used.

The math layout engine can then proceed to apply MATH table as usual to the mirrored glyphs.

@fred-wang
Copy link
Contributor

@khaledhosny Ah, yes. I believe that's what Firefox does and what @icesfont is doing in his patch for Chromium too (not sure whether his CL has been uploaded already)? But that should match what this PR describes of course.

@khaledhosny
Copy link

khaledhosny commented Feb 5, 2025

The code here icesfont/icesfont.github.io@70f5364#diff-579b07614a73c0529876012ea22a2ea07dc7f460663f123afde39a0497d0498aR127 seems to be trying to do the mirroring manually, but hb_buffer_set_direction(hb_buffer, HB_DIRECTION_RTL); should be all that is needed. Namely:

if (direction_ == TextDirection::kRtl) {
    hb_font_t* hb_font = harfbuzz_face->GetScaledFont();
    hb_buffer_t* hb_buffer = hb_buffer_create();
    hb_buffer_add(hb_buffer, stretchy_character_, 0);
    hb_buffer_set_direction(hb_buffer, HB_DIRECTION_RTL);
    hb_buffer_set_script(hb_buffer, HB_SCRIPT_MATH);
    hb_shape(hb_font, hb_buffer, nullptr, 0);
    ...
}

This sets direction to RTL, and script to math (the linked patch sets it to common, but math layout should be in math script per OT spec), and removed the if (stretchy_character_ <= 0xffff) as Unicode might encode other character with mirroring in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants