-
-
Notifications
You must be signed in to change notification settings - Fork 24
New lispusers CHARCODEUTILS module. #2078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…f standard FNS: CHARCODE.DECODE (and CHARCODE). One argument, the 16-bit character integer. Returns the name (string) as could be given to CHARCODE. E.g. (CHARCODE "FUNCTION,#^Q") == 657. So (CHARCODE.ENCODE 657) == "Function,#^Q" (CHARCODE "#^GREEK,A") == 9857. So (CHARCODE.ENCODE 9857) == "Greek,^A"
I tested this under Linux Mint 22 Cinnamon and it works as expected. |
Just an FYI: |
Allow (CL:CHARACTERP x) as well as SMALLP arguments. Cleanup handling of char=255 in any character set.
Added optional argument NONCHAR.IDENTITY (default NIL). If the C argument isn't SMALLP or CL:CHARACTERP, return C itself if NONCHAR.IDENTITY is non-null, else NIL.
I tested up to commit f701cdf on Linux Mint 22 Cinnamon and found nothing unusual. |
What are the valid inputs to CHARCODE.ENCODE ? (CHARCODE.ENCODE 128) fails with TYPE-MISMATCH "357,55" is not a NUMBER - it also fails with the same error on the same string if given -1, or anything else it seems to be unable to encode. EDIT: Looks as though I got an out-of-date version of the code. It's behaving better now. |
With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377"). It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790. (CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures). |
Try CHARNAME in the TEDIT architecture branch.
… On Mar 27, 2025, at 11:29 AM, Nick Briggs ***@***.***> wrote:
nbriggs
left a comment
(Interlisp/medley#2078)
With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377").
It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790.
(CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because your review was requested.
<#2078 (comment)> <https://github.com/notifications/unsubscribe-auth/AQSTUJLAIHGY3AXZKKHWIXL2WQ7SJAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONJZGA2TKMJUGE>
nbriggs
left a comment
(Interlisp/medley#2078)
<#2078 (comment)>
With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377").
It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790.
(CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures).
—
Reply to this email directly, view it on GitHub <#2078 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSTUJLAIHGY3AXZKKHWIXL2WQ7SJAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONJZGA2TKMJUGE>.
You are receiving this because your review was requested.
|
CHARCODE.DECODE handles the output of all values produced by CHARNAME on arguments from 0 to 65535, and (EQ (CHARCODE.DECODE (CHARCODE I)) I) is true on the same domain. |
I've rewritten this from scratch, so ignore this until I put up a new commit. |
Uses names from CHARACTERNAMES where ever it can. Simpler by pre-processing CHARACTERNAMES, which can be done once ahead and passed to multiple calls. Added test function which checks that all 65536 character codes will correctly "round trip" through (CHARCODE.DECODE (CHARCODE.ENCODE cc)).
This version passes the test for |
This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate? Should it use names only for exact match to the entry in |
FWIW, the logic in CHARNAME, after recursing down to a valid raw numeric code, is:
If the code has a defined name, return it.
Get the charset as a name or number for the high order byte.
Build the asciiname from the ASCIICODE = lower 7 bits:
if (defined name for ASCIICODE)
elseif (a ctrl code (below (CHARCODE space)
then concat ^ in front of its non-control equivalent (add (CHARCODE @))
else (CHARACTER ASCIICODE)
If the low-byte of the code is above 128, pack # in front of the asciiname.
Finally, unless charset 0, concat the charset and , in front of the Asciiname .
(And a little wrinkle to say just do octal for the character part, in which case pack 0 too. Octal names are easier to look up in XCCS tables)
… On Mar 29, 2025, at 11:30 AM, Matt Heffron ***@***.***> wrote:
MattHeffron
left a comment
(Interlisp/medley#2078)
However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way.
This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like (CHARCODE.ENCODE 8455) which returns "^INFINITY" instead of "41,^G".
What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate?
Should it use names only for exact match to the entry in CHARACTERNAMES?
What about for values in Meta or Function character sets? E.g. "Meta,Tab" vs "Meta,^I".
Are those special cases by design?
Is it only the cases of the names of the digits that are weird, because they aren't control chars but get used instead of the "^expected-char" as you pointed out?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because your review was requested.
<#2078 (comment)> <https://github.com/notifications/unsubscribe-auth/AQSTUJLX37R67EJX7A27TZL2W3RDRAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGAYDGMRSGA>
MattHeffron
left a comment
(Interlisp/medley#2078)
<#2078 (comment)>
However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way.
This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like (CHARCODE.ENCODE 8455) which returns "^INFINITY" instead of "41,^G".
What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate?
Should it use names only for exact match to the entry in CHARACTERNAMES?
What about for values in Meta or Function character sets? E.g. "Meta,Tab" vs "Meta,^I".
Are those special cases by design?
Is it only the cases of the names of the digits that are weird, because they aren't control chars but get used instead of the "^expected-char" as you pointed out?
—
Reply to this email directly, view it on GitHub <#2078 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSTUJLX37R67EJX7A27TZL2W3RDRAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGAYDGMRSGA>.
You are receiving this because your review was requested.
|
I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- The IRM does use While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits |
BTW: while |
I think some of what is described in the IRM are accidents of implementation, not the result of design.
The hash-addtion thing, for example, should be dedocumented. There were character names in TEDITKEY of the form ##a, to get up into character set 1. Assuming that that's where the "meta" characters were located (Meta,a = ##a). Wouldn't work if Meta moves to a clean part of the Unicode charcode space.
I also think that ^GREEK,A shouldn't be mentioned, even if it is implemented as if it is GREEK,^A. It suggests that somehow there is a control-greek character set.
Note that if you do (READCCODE T) and type Meta-f, it prints as #1,102 and returns 358 (which maps to Meta,f and not #f). I.e., we don't implement the "traditional" notion of meta as the 8th bit. In that sense, calling it as we do is a misnomer.
We could transition to another name for the meta character set, currently 1. Like we have a separate name Function currently for character set 2. But it would still be the case that the keyboard tables don't return #x when you type x with the meta key down. And we probably don't want to mix up the notion of a mode-shift with the typing of upper charset 0 characters.
If the WHEELSCROLL characters are moved outside of character set 0, then GREEK,WHEELSCROLL-UP would fail. I chose those particular codes because they aren't affected by whether the control key is down, and I was probably confused about meta. But perhaps I should move those to some control codes in Function, get them out of this discussion. Unlike the clipboard characters, the wheelscroll characters are never actually typed.
… On Mar 29, 2025, at 1:07 PM, Nick Briggs ***@***.***> wrote:
nbriggs
left a comment
(Interlisp/medley#2078)
I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s
There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think WHEELSCROLL-[UP,DOWN,LEFT,RIGHT] are/should be the names of characters, they should not appear in CHARACTERNAMES. That would get rid of things like (CHARCODE "GREEK,WHEELSCROLL-UP") => 9884.
The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- (CHARCODE "Meta,A") != (CHARCODE "#A") != (CHARCODE "Meta,#A") -- though this isn't an issue for turning codes back into names.
The IRM does use (CHARCODE "#^GREEK,A") (=> 9857) as an example (and you can put the # and ^ independently on either half of the name, their bit setting and clearing apply to the final code).
While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits (CL:FORMAT T "~b" (CHARCODE "^0,377")) => 10011111, and it also says that # sets the 8th bit, normally cleared -- but then implements it as an addition so that (CHARCODE "#A") => 193, (CHARCODE "##A") == (CHARCODE "META,A") => 321, and (CHARCODE "###A") => 449 -- I definitely think we should not use that in creating character names.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because your review was requested.
<#2078 (comment)> <https://github.com/notifications/unsubscribe-auth/AQSTUJNDOOBK47Y72TSDP7T2W34QZAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGIZDKMBRGQ>
nbriggs
left a comment
(Interlisp/medley#2078)
<#2078 (comment)>
I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s
There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think WHEELSCROLL-[UP,DOWN,LEFT,RIGHT] are/should be the names of characters, they should not appear in CHARACTERNAMES. That would get rid of things like (CHARCODE "GREEK,WHEELSCROLL-UP") => 9884.
The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- (CHARCODE "Meta,A") != (CHARCODE "#A") != (CHARCODE "Meta,#A") -- though this isn't an issue for turning codes back into names.
The IRM does use (CHARCODE "#^GREEK,A") (=> 9857) as an example (and you can put the # and ^ independently on either half of the name, their bit setting and clearing apply to the final code).
While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits (CL:FORMAT T "~b" (CHARCODE "^0,377")) => 10011111, and it also says that # sets the 8th bit, normally cleared -- but then implements it as an addition so that (CHARCODE "#A") => 193, (CHARCODE "##A") == (CHARCODE "META,A") => 321, and (CHARCODE "###A") => 449 -- I definitely think we should not use that in creating character names.
—
Reply to this email directly, view it on GitHub <#2078 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQSTUJNDOOBK47Y72TSDP7T2W34QZAVCNFSM6AAAAABZUDHLUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGIZDKMBRGQ>.
You are receiving this because your review was requested.
|
Yes, because leading This whole multiple |
Discussion in 3/31 meeting was that this doesn't belong in LISPUSERS, but probably in LLREAD along with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed 3/7/25 meeting
Thinking more about this implementation, if I take out the work to try to use
|
It appears the PR #2119 will address the points 1 & 3 that I made above. 2 is not that significant. |
New lispusers CHARCODEUTILS implements
CHARCODE.ENCODE
, the inverse of standard FNS:CHARCODE.DECODE
(andCHARCODE
).One argument, the 16-bit character integer. Returns the name (string) as could be given to
CHARCODE
.E.g.
(CHARCODE "FUNCTION,#^Q")
== 657. So(CHARCODE.ENCODE 657)
== "Function,#^Q"(CHARCODE "#^GREEK,A")
== 9857. So(CHARCODE.ENCODE 9857)
== "Greek,^A"