|
| 1 | +# 27 January 2025 | MessageFormat Working Group Teleconference |
| 2 | + |
| 3 | +### Attendees: |
| 4 | + |
| 5 | +- Eemeli Aro \- Mozilla (EAO) |
| 6 | +- Matt Radbourne \- Bloomberg (MRR) |
| 7 | +- Richard Gibson \- OpenJSF (RGN) |
| 8 | +- Mark Davis \- Google (MED) |
| 9 | +- Shane Carr \- Google (SFC) |
| 10 | + |
| 11 | +**Scribe:** MRR |
| 12 | + |
| 13 | +## Topic: PR Review |
| 14 | + |
| 15 | +| PR | Description | Recommendation | |
| 16 | +| ----- | ----- | ----- | |
| 17 | +| \#989 | Simplify syntax character definitions | Discuss | |
| 18 | +| \#988 | Add :percent | Discuss | |
| 19 | +| \#986 | Deduplicate the section "Default Value of `select` Option" | Discuss | |
| 20 | +| \#985 | Drop :currency currencyDisplay=formalSymbol option value | Discuss | |
| 21 | +| \#983 | Drop reference to “registry” | Discuss, Merge | |
| 22 | +| \#923 | Test schema ‘src’ property | Discuss | |
| 23 | + |
| 24 | +### 989 |
| 25 | + |
| 26 | +EAO: Hopefully this simplifies the character definitions for the ranges we have. When I started to compare, having a base range that’s extended no longer was useful. We now support unpaired surrogates without restriction. Has anyone had time to look at that? It’s an editorial PR but it might make it easier for issue \#724 to land. |
| 27 | + |
| 28 | +### 983 |
| 29 | + |
| 30 | +EAO: APP had a recommendation to discuss and merge this one but has not specifically approved the current state. Do you have any thoughts or should I continue with APP? |
| 31 | + |
| 32 | +MED: I think we need APP here for that part. |
| 33 | + |
| 34 | +EAO: I’ll leave it open then. |
| 35 | + |
| 36 | +### 985 |
| 37 | + |
| 38 | +EAO: We’d identified in TC39 as being not too well defined or specifically used for the Taiwanese dollar and has no other data in CLDR. We may end up with something a little bit in this space in ECMA402 later. Let’s drop this particular thing for now so we can define it better later. Do you have any thoughts? |
| 39 | + |
| 40 | +MED: I think I’m OK \- it’s not currently populated. There are some proposals for populating. I think SFC has some ideas for changes also. We can drop for now. |
| 41 | + |
| 42 | +\[Merged\] |
| 43 | + |
| 44 | +### 986 |
| 45 | + |
| 46 | +EAO: Effectively helps in defining :percent. It’s editorial. |
| 47 | + |
| 48 | +MED: I think in general, editorially, we do a lot of repetition. This makes the document longer than it needs to be and more confusing and see “what are the differences if any”. I’m in favour of this style of cleanup. |
| 49 | + |
| 50 | +EAO: If anyone were willing to click on the approve button, we can merge. |
| 51 | + |
| 52 | +\[Approved \+ merged\] |
| 53 | + |
| 54 | +### 988 - Whether to include an option for whether the option is x100 or not |
| 55 | + |
| 56 | +EAO: We are agreed that the default should be x100 like in ICU and Intl.numberFormat. SFFC said it might be useful to include an option for ‘100’ for 100%. Is there a reason to leave out this option that we currently don’t have. Last comment was from SFC about not wanting an unadorned style |
| 57 | + |
| 58 | +SFC: What you said was mostly correct but the attribution to myself… I support this option but I don’t think it’s a requirement for CLDR47. I do really like the direction this is going \- to turn the option on and off, and it’s superior to the ICU design. It would maybe be interesting to bring into ECMA-402. I think there are some good ideas posted on the thread. I don’t think it needs to be rushed. |
| 59 | + |
| 60 | +MED: I think it makes it more complicated for users to have lots of different functions that have lots of similar but not the same options (e.g. integer, number, percent). It bulks up the perceived API for the user. I’m weary of this separation. Internally you could redirect to the :percent function and duplicate APIs to your hearts content but, for the surface, it’s more complicated. |
| 61 | + |
| 62 | +SFC: I’m perfectly happy merging these back into a single function, however I think % should be separate, since the others are separate. I don't think we should land on this weird thing where % is the one thing that’s left in the catch-all. We should either use :unit or make a :percent function. |
| 63 | + |
| 64 | +MED: Merging with :unit is an interesting option. We do have a bunch of interesting things that people want to have (e.g. basis-points, per 100000). |
| 65 | + |
| 66 | +SFC: I think it’s already supported in :unit. If you do :unit unit=percent, you don’t have scaling, which I think is the right choice. |
| 67 | + |
| 68 | +EAO: I like this method of defining :percent |
| 69 | +I had not remembered that :unit already supports %. |
| 70 | + |
| 71 | +SFC: It does and, under the hood in ICU, is that it uses the same patterns. It’s been a separate style option for ages, and There are a certain contingent of people that feel strongly about this. I’m fine having :percent or not having. I know there are people that feel it’s worthwhile. |
| 72 | + |
| 73 | +EAO: I think I want to change the PR so it drops the ‘style’ from :integer. |
| 74 | +The other topic that we could discuss on :percent (and :unit) is \- do we want to include selection. Right now the spec includes selection. The PR does not include selection on :percent. I think :percent is a type of unit when formatted. |
| 75 | + |
| 76 | +MED: In English, you would say “3 meters is the length from A to B”. You don’t change the plural category. I’m not sure what other languages do. |
| 77 | + |
| 78 | +SFC: Before we have more data, I think we should select on the number by itself. We dont rally have data for how to expand that across units as well. We shouldn’t add preemptively \- it feels like a footgun. |
| 79 | + |
| 80 | +EAO: I think CLDR does contain information of compact notation but not scientific notation. |
| 81 | + |
| 82 | +MED: In CLDR we should be adding plural categories for types of things like we do for grammatical categories (e.g. grammatically categories can be narrower for units) If we did the same for plural, we could say “English plural for units is just ‘other’” but this is down the road. |
| 83 | + |
| 84 | +EAO: Yes, and I think, like with % and a dedicated :percent that does multiplication. I think these are the sort of holes we should leave, and fill them later when/if somebody shows a need for a certain type of behaviour. Instead of defining a x100 in :unit, we could have a :math multiply=100. |
| 85 | +I think we can work on that later. |
| 86 | + |
| 87 | +MED: I think that’s a good solution. The other concern I had is that we had a solution for everything in MF1 so we could get people migrated over. If we just added the math-x and percent is supported by :unit, we’re golden. |
| 88 | + |
| 89 | +SFC: :math times makes me slightly worried. It would be cheaper if we could make it multiplied by a power of 10\. I'd rather not have to support that. If I'm in message formatting, I’m probably going to have a fixed decimal and multiplying by values other than powers of 10 is expensive. It’s expensive because I already have a fixed decimal. |
| 90 | + |
| 91 | +MED: I think that’s a false economy, that’s not a lot of work. Rust supports doubles. It’s not rocket science. |
| 92 | + |
| 93 | +EAO: Even if the syntax would allow multiplication by any number, we should know ahead of time. If I’m right, the current math… |
| 94 | + |
| 95 | +SFC: Really they’re adding or subtracting 1 or 2 \- common integers. I’m very much in favour of restricting to the things we know are useful. That’s why I like the option discussed in the PR. It’s clean ,easy to implement, uncontroversial. |
| 96 | + |
| 97 | +EAO: We don’t absolutely have an identified need for multiplication by 100\. |
| 98 | + |
| 99 | +SFC: One need is that MF1 supports it. |
| 100 | + |
| 101 | +MED: We’re making a grievous mistake if we can’t handle what MF1 handles unless deprecated by choice. |
| 102 | + |
| 103 | +SFC: Do we support an input and output unit that are different? |
| 104 | + |
| 105 | +MED: I have to drop. |
| 106 | + |
| 107 | +SFC: I think we should just land the PR and move on. MED was concerned about the explosion but it’s already exploded \- his concern about MF1 allowing x100 is the strongest concern. We need to have the behavior somewhere. I think :percent is the cleanest. It seems like MED is opposed to using :unit without a way to scale by 100\. |
| 108 | + |
| 109 | +EAO: On the other hand we’d also get that satisfaction by having :math multiply with restrictions on multipliers like 10, 100 ,1000 |
| 110 | + |
| 111 | +SFC: That would be a path we could potentially explore |
| 112 | + |
| 113 | +EAO: I’d prefer adding :math multiply because I don’t like that we’re duplicating functionality that’s in :unit already. |
| 114 | +I’ll create a PR to drop the selection behavior from :unit and :currency. |
| 115 | + |
| 116 | +SFC: We didn’t have time to finish discussing with MED but I have more thoughts. |
| 117 | + |
| 118 | +EAO: If we have the ‘multiply’ in :math it allows us to get the number we are formatting as a :percent or :unit as something we can have a selection on, which is what we could be seen to be removing with :unit selection |
| 119 | + |
| 120 | +SFC: I’m open to :math multiply to a very small number of integers. I’d kind of prefer if :math add had the same. What’s the range of that? |
| 121 | + |
| 122 | +EAO: 0-99 at the moment |
| 123 | + |
| 124 | +SFC: …and that’s integers so that’s good. |
| 125 | + |
| 126 | +EAO: We’re limiting in the spec what can be literal value input. We’re allowing implementations to do other things. |
| 127 | + |
| 128 | +SFC: We really need to have a message validator. People complain with web engines that what Chrome does is standard, even if it’s not. If we end up with a hand-wavy spec, we have to follow what ICU does even if it’s stupid. I’m frustrated that I’m fighting an uphill battle that this is convenient. The only purpose of a spec is to aid implementation. It needs to be stricter to force this. We need to have a way to force that MF2 are compliant. |
| 129 | + |
| 130 | +## Topic: Issue review |
| 131 | + |
| 132 | +### 724 |
| 133 | + |
| 134 | +RGN: It’s not going to match any other technology. Everyone who comes to the technology is going to have to learn it anew. |
| 135 | + |
| 136 | +MED: If we look at the description, under requirements, 5 and 6 are pretty uncontentious – There are characters that look like whitespace that we don’t want to get confused. We could reserve all of the non-ASCII, either a narrow set (doesn’t interfere with the current syntax) or a broad set (forbid things). We could say, with ASCII `A-Za-z0-9-+_.` |
| 137 | + |
| 138 | +On the other end, if we say we allow all characters that don’t make the current syntax ambiguous. EAO has pointed out a few cases where we really can’t do this because it would make the syntax ambiguous. Otherwise, Non-initial \- only interior. E.g. we could have a hash mark interior because it would only cause collisions as initial. The other case is the slash, because of markup. |
| 139 | + |
| 140 | +EAO: We also need to exclude \* because it’s used in variant keys. |
| 141 | + |
| 142 | +MED: Variant keys say it has a special meaning syntactically. |
| 143 | + |
| 144 | +EAO: Currently, it’s at the syntax level where we detect it. |
| 145 | + |
| 146 | +MED: You could allow internal, but not initial |
| 147 | + |
| 148 | +EAO: The problem then is that it would be really easy to make a mistake that parses fine. |
| 149 | + |
| 150 | +RGN: I consider unquoted literals to be an attractive nuisance. There’s l;ots of opportunity for humans and machines to get confused. I’m against any extension, including extensions that have already happened. |
| 151 | + |
| 152 | +MED: We can’t change that but, 5 and 6, plus no ASCII characters \[mentioned in EAO’s comment on 724\] |
| 153 | + |
| 154 | +RGN: What’s the current state? |
| 155 | + |
| 156 | +MED: Anything that’s the name production, which is slightly narrower than the XML, I believe. |
| 157 | + |
| 158 | +EAO: We exclude U+fffd, and the arabic letter mark. Message.abnf is the definitive source and matches what’s in syntax.md. U+fffd is the replacement character that we consider to be a bit special. It ensures that, if you end up serializing or formatting something that includes the replacement character, that does not parse as a valid thing. Whether this is sufficient reason to leave it out is debatable, but the ALM we have to because we rely on it in the bidi production and it gets really complicated. |
| 159 | + |
| 160 | +RGN: Given that, I’m happier with the restrictions that currently exist rather than the broader set. I don’t know what the broader set brings to the table. Who is going to complain about not having unquoted strings that XML would also forbid. |
| 161 | + |
| 162 | +MED: XML talks about these as identifiers. Literals are not limited to identifiers. There’s no real intersection. |
| 163 | + |
| 164 | +RGN: That’s the question, what should be allowed as unquoted literals. |
| 165 | + |
| 166 | +MED: They’re not intended as XML element identifiers, anything like that. |
| 167 | + |
| 168 | +RGN: True in a strict sense, they're not identifiers but they can’t conflict with the syntax. I’m asking about scenarios in which, if they’re restricted in the same way as XML identifiers, what would be the benefit? |
| 169 | + |
| 170 | +MED: The XML identifiers were developed pretty early on. In order to be immutable, they have vast ranges of characters with arbitrary new stuff. They’re not principled and there are lots of areas of possible confusion already. |
| 171 | + |
| 172 | +RGN: Immutability is a huge benefit. |
| 173 | + |
| 174 | +MED: I’m not arguing with that. I’m saying the restriction of characters to solve the problem of confusability, that’s not going to be the case. |
| 175 | + |
| 176 | +RGN: I agree. It’s going to be arbitrary. |
| 177 | + |
| 178 | +MED: The purpose is to simplify the syntax for people to write messages, especially since we eliminated the use of quote marks to write literals. This means some literals will be more complicated than necessary. When I looked I thought it would be nice to use unquoted rational numbers… and for ranges. |
| 179 | + |
| 180 | +EAO: You’re stepping back from the use-cases presented here. |
| 181 | + |
| 182 | +MED: You could still use some of the characters \- e.g. symbol characters that happened since XML stabilized, but not before. This would allow you to use all sorts of mathematical characters that have happened since then,. It’s somewhat arbitrary to the user. |
| 183 | + |
| 184 | +RGN: I don’t think that can be fixed. |
| 185 | + |
| 186 | +MED: The only problem is in the ASCII range. If we disallow all the big blocks of weird s\*\*t. If we disallow whitespace etc, I can use mathematical symbols. The only reason for restricting those is that we’re using them syntactically. A lot of people know what the ASCII characters are. |
| 187 | + |
| 188 | +RGN: Do translators |
| 189 | + |
| 190 | +MED: With translating as it is, we’d be having a much bigger problem. |
| 191 | + |
| 192 | +RGN: For what population does it intend to improve experience? |
| 193 | + |
| 194 | +MED: That’s a good question. The same population that would put in name characters right now. |
| 195 | + |
| 196 | +RGN: We’ve excluded translators because we expect them to use tooling. |
| 197 | + |
| 198 | +MED: Right. And the people that are writing the message in the first place \- programmers. That’s currently the process \- programmers write the English and it gets fluffed up to translators. That’s MF1. |
| 199 | + |
| 200 | +RGN: Does MF1 have optional quoting of any tokens. |
| 201 | + |
| 202 | +MED: It doesn’t use quoting for operands. I think the closest thing would be \- ECMAScript has an equivalent for what we’re doing. |
| 203 | + |
| 204 | +EAO: The Intl formatters? \[MED: Yes\] It doesn’t seem to be changing that much. In the ASCII array, we have \[...\]. If we were to change the rule for unquoted literal to any sequence of name-char, as it currently is, would it satisfy the condition we’re looking for here? |
| 205 | + |
| 206 | +MED: I’d have to do some research on that. Name-char does exclude. I can do that before the next meeting. |
| 207 | + |
| 208 | +RGN: The difference between name-start and name-char is that the latter includes \[...\]. It seems to not be of much consequence. Names in XML are not allowed to start with middle-dot or combining characters |
| 209 | + |
| 210 | +MED: Some combining characters. There are some outside of that combining block. I could tell you how many combined characters are not in this range… almost 2400 that aren’t in name-char that are in name-start. Following name-start is a relatively arbitrary set of restrictions. |
| 211 | + |
| 212 | +RGN: If that’s the case we’re dealing with garbage-in garbage-out. If name-start already includes combining characters, you’re disrupting the human’s ability to detect any punctuation as syntactiucally relevant. |
| 213 | + |
| 214 | +MED: You can have linters that say “we’re gonna flag this as bizarre” or, if you’ve got editors, you can raise to people’s attention. We have a whole UTS devoted to that. |
| 215 | + |
| 216 | +RGN: To EAO’s question, is there an argument for removing start-char, given the restrictions on following characters aren’t things that we might care about? Relaxations have already taken that form. Allowing middle-dot is not a problem. Allowing combined characters is a remaining problem. Having already relaxed it, relaxing further on the first char significance is a justifiable step down the slippery slope. |
| 217 | + |
| 218 | +MED: I think you’re right EAO. This is basically turning into a big simplification of what we have for unquoted-literal |
| 219 | + |
| 220 | +EAO: Making unquoted-literal be some non-empty sequence of name-char would let us drop number production completely, which would be nice. It’s possible that there would be some ranges in name-start or name-char that we inherit because they’re in XML taht we might want to mess with more (e.g. include further). I think the next step here is to create a PR and iterate further. |
| 221 | + |
| 222 | +MED: I’ll do some research offline for the blocks that are currently restricted. See what the effect of that would be. |
| 223 | + |
| 224 | +EAO: We already have some problematic characters in name-start. Given we’ve allowed unpaired surrogates elsewhere, we can have the discussion about making it more free-for-all. We can discuss in the PR. |
| 225 | + |
| 226 | +MED: We can have a separate set of recommendations for linters. We can have much better policies for those because they’re much more flexible. |
| 227 | + |
| 228 | +EAO: Should that effectively be included in a “should” type instruction? We don’t include any directives like that at the moment but we ought to include in the spec a canonical serialization of messages, which might include this, but this is a slightly different concern. |
| 229 | + |
| 230 | +RGN: Guidance for people writing linters? \[Yes\] I can see the value of pointing out scenarios worth flagging. |
| 231 | + |
| 232 | +EAO: Also for function developers. |
| 233 | + |
| 234 | +MED: We also ought to include the link about source code issues \- [https://www.unicode.org/reports/tr55/](https://www.unicode.org/reports/tr55/) |
| 235 | + |
| 236 | +EAO: I’m going to write the initial PR draft changing the syntax, but just that change. We can iterate from there. Let’s see if we can add smaller PRs for the note/recommendation we just discussed. |
| 237 | + |
| 238 | +MED: I’ll research the gaps in the names \- which of them should be hard-and-fast exclusions and which are arbitrary. |
| 239 | + |
| 240 | +RGN: looks like 823 combining characters outside the Combining\_Diacritical\_Marks block: [https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%5B%5E%5B%3ACanonical\_Combining\_Class%3DNot\_Reordered%3A%5D%5D-%5B%3ABlock%3DCombining\_Diacritical\_Marks%3A%5D%5D\&g=\&i=](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%5B%5E%5B%3ACanonical_Combining_Class%3DNot_Reordered%3A%5D%5D-%5B%3ABlock%3DCombining_Diacritical_Marks%3A%5D%5D&g=&i=) |
0 commit comments