New lispusers CHARCODEUTILS module. #2078

MattHeffron · 2025-03-24T07:05:49Z

New lispusers CHARCODEUTILS implements CHARCODE.ENCODE, the inverse of standard FNS: CHARCODE.DECODE (and CHARCODE).

One argument, the 16-bit character integer. Returns the name (string) as could be given to CHARCODE.
E.g. (CHARCODE "FUNCTION,#^Q") == 657. So (CHARCODE.ENCODE 657) == "Function,#^Q"
(CHARCODE "#^GREEK,A") == 9857. So (CHARCODE.ENCODE 9857) == "Greek,^A"

…f standard FNS: CHARCODE.DECODE (and CHARCODE). One argument, the 16-bit character integer. Returns the name (string) as could be given to CHARCODE. E.g. (CHARCODE "FUNCTION,#^Q") == 657. So (CHARCODE.ENCODE 657) == "Function,#^Q" (CHARCODE "#^GREEK,A") == 9857. So (CHARCODE.ENCODE 9857) == "Greek,^A"

pamoroso · 2025-03-24T07:24:47Z

I tested this under Linux Mint 22 Cinnamon and it works as expected.

MattHeffron · 2025-03-24T17:03:00Z

Just an FYI:
I wrote this to help with looking at the values from (KEYACTION keyname) when we looking at the character to action operations, as in PR #2070

Allow (CL:CHARACTERP x) as well as SMALLP arguments. Cleanup handling of char=255 in any character set.

Added optional argument NONCHAR.IDENTITY (default NIL). If the C argument isn't SMALLP or CL:CHARACTERP, return C itself if NONCHAR.IDENTITY is non-null, else NIL.

pamoroso · 2025-03-27T10:42:29Z

I tested up to commit f701cdf on Linux Mint 22 Cinnamon and found nothing unusual.

nbriggs · 2025-03-27T17:51:00Z

What are the valid inputs to CHARCODE.ENCODE ? (CHARCODE.ENCODE 128) fails with TYPE-MISMATCH "357,55" is not a NUMBER - it also fails with the same error on the same string if given -1, or anything else it seems to be unable to encode.

EDIT: Looks as though I got an out-of-date version of the code. It's behaving better now.

nbriggs · 2025-03-27T18:29:34Z

With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377").

It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790.

(CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures).

rmkaplan · 2025-03-27T19:47:15Z

Try CHARNAME in the TEDIT architecture branch.

…

On Mar 27, 2025, at 11:29 AM, Nick Briggs ***@***.***> wrote: nbriggs left a comment (Interlisp/medley#2078) With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377"). It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790. (CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures). — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because your review was requested. <#2078 (comment)> <https://github.com/notifications/unsubscribe-auth/AQSTUJLAIHGY3AXZKKHWIXL2WQ7SJAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONJZGA2TKMJUGE> nbriggs left a comment (Interlisp/medley#2078) <#2078 (comment)> With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377"). It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790. (CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures). — Reply to this email directly, view it on GitHub <#2078 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSTUJLAIHGY3AXZKKHWIXL2WQ7SJAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONJZGA2TKMJUGE>. You are receiving this because your review was requested.

nbriggs · 2025-03-27T20:06:07Z

CHARCODE.DECODE handles the output of all values produced by CHARNAME on arguments from 0 to 65535, and (EQ (CHARCODE.DECODE (CHARCODE I)) I) is true on the same domain.

MattHeffron · 2025-03-28T03:54:59Z

I've rewritten this from scratch, so ignore this until I put up a new commit.

Uses names from CHARACTERNAMES where ever it can. Simpler by pre-processing CHARACTERNAMES, which can be done once ahead and passed to multiple calls. Added test function which checks that all 65536 character codes will correctly "round trip" through (CHARCODE.DECODE (CHARCODE.ENCODE cc)).

nbriggs · 2025-03-28T16:25:22Z

This version passes the test for (EQ I (CHARCODE.DECODE (CHARCODE.ENCODE I))) for all I from 0 to 65535, so much better. However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way.

MattHeffron · 2025-03-29T18:29:55Z

However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way.

This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like (CHARCODE.ENCODE 8455) which returns "^INFINITY" instead of "41,^G".

What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate?

Should it use names only for exact match to the entry in CHARACTERNAMES?
What about for values in Meta or Function character sets? E.g. "Meta,Tab" vs "Meta,^I".
Are those special cases by design?
Is it only the cases of the names of the digits that are weird, because they aren't control chars but get used instead of the "^expected-char" as you pointed out?

rmkaplan · 2025-03-29T19:27:12Z

FWIW, the logic in CHARNAME, after recursing down to a valid raw numeric code, is: If the code has a defined name, return it. Get the charset as a name or number for the high order byte. Build the asciiname from the ASCIICODE = lower 7 bits: if (defined name for ASCIICODE) elseif (a ctrl code (below (CHARCODE space) then concat ^ in front of its non-control equivalent (add (CHARCODE @)) else (CHARACTER ASCIICODE) If the low-byte of the code is above 128, pack # in front of the asciiname. Finally, unless charset 0, concat the charset and , in front of the Asciiname . (And a little wrinkle to say just do octal for the character part, in which case pack 0 too. Octal names are easier to look up in XCCS tables)

…

On Mar 29, 2025, at 11:30 AM, Matt Heffron ***@***.***> wrote: MattHeffron left a comment (Interlisp/medley#2078) However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way. This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like (CHARCODE.ENCODE 8455) which returns "^INFINITY" instead of "41,^G". What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate? Should it use names only for exact match to the entry in CHARACTERNAMES? What about for values in Meta or Function character sets? E.g. "Meta,Tab" vs "Meta,^I". Are those special cases by design? Is it only the cases of the names of the digits that are weird, because they aren't control chars but get used instead of the "^expected-char" as you pointed out? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because your review was requested. <#2078 (comment)> <https://github.com/notifications/unsubscribe-auth/AQSTUJLX37R67EJX7A27TZL2W3RDRAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGAYDGMRSGA> MattHeffron left a comment (Interlisp/medley#2078) <#2078 (comment)> However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way. This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like (CHARCODE.ENCODE 8455) which returns "^INFINITY" instead of "41,^G". What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate? Should it use names only for exact match to the entry in CHARACTERNAMES? What about for values in Meta or Function character sets? E.g. "Meta,Tab" vs "Meta,^I". Are those special cases by design? Is it only the cases of the names of the digits that are weird, because they aren't control chars but get used instead of the "^expected-char" as you pointed out? — Reply to this email directly, view it on GitHub <#2078 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSTUJLX37R67EJX7A27TZL2W3RDRAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGAYDGMRSGA>. You are receiving this because your review was requested.

nbriggs · 2025-03-29T20:07:17Z

I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s

There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think WHEELSCROLL-[UP,DOWN,LEFT,RIGHT] are/should be the names of characters, they should not appear in CHARACTERNAMES. That would get rid of things like (CHARCODE "GREEK,WHEELSCROLL-UP") => 9884.

The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- (CHARCODE "Meta,A") != (CHARCODE "#A") != (CHARCODE "Meta,#A") -- though this isn't an issue for turning codes back into names.

The IRM does use (CHARCODE "#^GREEK,A") (=> 9857) as an example (and you can put the # and ^ independently on either half of the name, their bit setting and clearing apply to the final code).

While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits (CL:FORMAT T "~b" (CHARCODE "^0,377")) => 10011111, and it also says that # sets the 8th bit, normally cleared -- but then implements it as an addition so that (CHARCODE "#A") => 193, (CHARCODE "##A") == (CHARCODE "META,A") => 321, and (CHARCODE "###A") => 449 -- I definitely think we should not use that in creating character names.

nbriggs · 2025-03-29T20:13:13Z

BTW: while (CHARCODE "META,##A") is an error, you can put as many hashes in front of the character set name as you want -- (CHARCODE "##META,#A") => 705

rmkaplan · 2025-03-29T21:25:57Z

I think some of what is described in the IRM are accidents of implementation, not the result of design. The hash-addtion thing, for example, should be dedocumented. There were character names in TEDITKEY of the form ##a, to get up into character set 1. Assuming that that's where the "meta" characters were located (Meta,a = ##a). Wouldn't work if Meta moves to a clean part of the Unicode charcode space. I also think that ^GREEK,A shouldn't be mentioned, even if it is implemented as if it is GREEK,^A. It suggests that somehow there is a control-greek character set. Note that if you do (READCCODE T) and type Meta-f, it prints as #1,102 and returns 358 (which maps to Meta,f and not #f). I.e., we don't implement the "traditional" notion of meta as the 8th bit. In that sense, calling it as we do is a misnomer. We could transition to another name for the meta character set, currently 1. Like we have a separate name Function currently for character set 2. But it would still be the case that the keyboard tables don't return #x when you type x with the meta key down. And we probably don't want to mix up the notion of a mode-shift with the typing of upper charset 0 characters. If the WHEELSCROLL characters are moved outside of character set 0, then GREEK,WHEELSCROLL-UP would fail. I chose those particular codes because they aren't affected by whether the control key is down, and I was probably confused about meta. But perhaps I should move those to some control codes in Function, get them out of this discussion. Unlike the clipboard characters, the wheelscroll characters are never actually typed.

…

On Mar 29, 2025, at 1:07 PM, Nick Briggs ***@***.***> wrote: nbriggs left a comment (Interlisp/medley#2078) I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think WHEELSCROLL-[UP,DOWN,LEFT,RIGHT] are/should be the names of characters, they should not appear in CHARACTERNAMES. That would get rid of things like (CHARCODE "GREEK,WHEELSCROLL-UP") => 9884. The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- (CHARCODE "Meta,A") != (CHARCODE "#A") != (CHARCODE "Meta,#A") -- though this isn't an issue for turning codes back into names. The IRM does use (CHARCODE "#^GREEK,A") (=> 9857) as an example (and you can put the # and ^ independently on either half of the name, their bit setting and clearing apply to the final code). While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits (CL:FORMAT T "~b" (CHARCODE "^0,377")) => 10011111, and it also says that # sets the 8th bit, normally cleared -- but then implements it as an addition so that (CHARCODE "#A") => 193, (CHARCODE "##A") == (CHARCODE "META,A") => 321, and (CHARCODE "###A") => 449 -- I definitely think we should not use that in creating character names. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because your review was requested. <#2078 (comment)> <https://github.com/notifications/unsubscribe-auth/AQSTUJNDOOBK47Y72TSDP7T2W34QZAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGIZDKMBRGQ> nbriggs left a comment (Interlisp/medley#2078) <#2078 (comment)> I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think WHEELSCROLL-[UP,DOWN,LEFT,RIGHT] are/should be the names of characters, they should not appear in CHARACTERNAMES. That would get rid of things like (CHARCODE "GREEK,WHEELSCROLL-UP") => 9884. The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- (CHARCODE "Meta,A") != (CHARCODE "#A") != (CHARCODE "Meta,#A") -- though this isn't an issue for turning codes back into names. The IRM does use (CHARCODE "#^GREEK,A") (=> 9857) as an example (and you can put the # and ^ independently on either half of the name, their bit setting and clearing apply to the final code). While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits (CL:FORMAT T "~b" (CHARCODE "^0,377")) => 10011111, and it also says that # sets the 8th bit, normally cleared -- but then implements it as an addition so that (CHARCODE "#A") => 193, (CHARCODE "##A") == (CHARCODE "META,A") => 321, and (CHARCODE "###A") => 449 -- I definitely think we should not use that in creating character names. — Reply to this email directly, view it on GitHub <#2078 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSTUJNDOOBK47Y72TSDP7T2W34QZAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGIZDKMBRGQ>. You are receiving this because your review was requested.

MattHeffron · 2025-03-31T04:58:47Z

BTW: while (CHARCODE "META,##A") is an error, you can put as many hashes in front of the character set name as you want -- (CHARCODE "##META,#A") => 705

Yes, because leading # is just adding 200Q to tail-recursion on the rest of the string, but following the comma (or other allowed delimiter) the result of the recursion at that point is checked for in range 0-255.

This whole multiple # is an awful hack that should be removed.
(Make it LOGOR, not IADD; optionally be an error (but more difficult to check considering recursion.))
The few (?) places that use that ought be changeable to 1,nn or Meta,nn.

MattHeffron · 2025-03-31T18:33:37Z

Discussion in 3/31 meeting was that this doesn't belong in LISPUSERS, but probably in LLREAD along with CHARCODE.DECODE. I'll defer moving it there until I finish stabilizing it.
It'll be easier to test if it doesn't require remaking loadups.

masinter

discussed 3/7/25 meeting

MattHeffron · 2025-04-09T22:02:57Z

Thinking more about this implementation, if I take out the work to try to use CHARACTERNAMES entries as much as possible, then I'm not sure that there's much value in this compared with CHARNAME in TEdit.
My only suggestions for CHARNAME would be:

to have it accept Common Lisp CHARACTER as encodable input, as well as SMALLP (0-65535);
use the first matching entry in CHARACTERNAMES as the name, to be consistent with Common Lisp generated named characters (e.g., #\Delete);
(optionally?) return any invalid argument as itself, so CHARCODE can be used with whole KEYACTION entries, etc.

MattHeffron · 2025-04-24T01:33:01Z

It appears the PR #2119 will address the points 1 & 3 that I made above. 2 is not that significant.
I will close this PR now.

MattHeffron added the enhancement New feature or request label Mar 24, 2025

MattHeffron self-assigned this Mar 24, 2025

MattHeffron marked this pull request as draft March 24, 2025 17:47

MattHeffron added 2 commits March 25, 2025 15:53

Handle recursion in the CHARACTERNAMES alist.

e3be9c1

Allow (CL:CHARACTERP x) as well as SMALLP arguments. Cleanup handling of char=255 in any character set.

Merge branch 'master' into mth39--CHARCODEUTILS-lispusers

981db34

MattHeffron mentioned this pull request Mar 25, 2025

TEDIT: New architecture for key bindings, plus better suggestions for initial window regions #2070

Merged

MattHeffron requested review from hjellinek and rmkaplan March 25, 2025 23:04

Add support for list argument recursion a la CHARCODE.DECODE.

f701cdf

Added optional argument NONCHAR.IDENTITY (default NIL). If the C argument isn't SMALLP or CL:CHARACTERP, return C itself if NONCHAR.IDENTITY is non-null, else NIL.

MattHeffron marked this pull request as ready for review March 28, 2025 06:33

Merge branch 'master' into mth39--CHARCODEUTILS-lispusers

f921d46

masinter approved these changes Apr 8, 2025

View reviewed changes

masinter marked this pull request as draft April 21, 2025 17:07

MattHeffron closed this Apr 24, 2025

Uh oh!

New lispusers CHARCODEUTILS module. #2078

New lispusers CHARCODEUTILS module. #2078

Uh oh!

Conversation

MattHeffron commented Mar 24, 2025

Uh oh!

pamoroso commented Mar 24, 2025

Uh oh!

MattHeffron commented Mar 24, 2025

Uh oh!

pamoroso commented Mar 27, 2025

Uh oh!

nbriggs commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbriggs commented Mar 27, 2025

Uh oh!

rmkaplan commented Mar 27, 2025 via email

Uh oh!

nbriggs commented Mar 27, 2025

Uh oh!

MattHeffron commented Mar 28, 2025

Uh oh!

nbriggs commented Mar 28, 2025

Uh oh!

MattHeffron commented Mar 29, 2025

Uh oh!

rmkaplan commented Mar 29, 2025 via email

Uh oh!

nbriggs commented Mar 29, 2025

Uh oh!

nbriggs commented Mar 29, 2025

Uh oh!

rmkaplan commented Mar 29, 2025 via email

Uh oh!

MattHeffron commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MattHeffron commented Mar 31, 2025

Uh oh!

masinter left a comment

Choose a reason for hiding this comment

Uh oh!

MattHeffron commented Apr 9, 2025

Uh oh!

MattHeffron commented Apr 24, 2025

Uh oh!

Uh oh!

nbriggs commented Mar 27, 2025 •

edited

Loading

MattHeffron commented Mar 31, 2025 •

edited

Loading