Skip to content

New lispusers CHARCODEUTILS module. #2078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

MattHeffron
Copy link
Contributor

New lispusers CHARCODEUTILS implements CHARCODE.ENCODE, the inverse of standard FNS: CHARCODE.DECODE (and CHARCODE).

One argument, the 16-bit character integer. Returns the name (string) as could be given to CHARCODE.
E.g. (CHARCODE "FUNCTION,#^Q") == 657. So (CHARCODE.ENCODE 657) == "Function,#^Q"
(CHARCODE "#^GREEK,A") == 9857. So (CHARCODE.ENCODE 9857) == "Greek,^A"

…f standard FNS: CHARCODE.DECODE (and CHARCODE).

One argument, the 16-bit character integer. Returns the name (string) as could be given to CHARCODE.
E.g. (CHARCODE "FUNCTION,#^Q") == 657. So (CHARCODE.ENCODE 657) == "Function,#^Q"
(CHARCODE "#^GREEK,A") == 9857. So (CHARCODE.ENCODE 9857) == "Greek,^A"
@MattHeffron MattHeffron added the enhancement New feature or request label Mar 24, 2025
@MattHeffron MattHeffron self-assigned this Mar 24, 2025
@pamoroso
Copy link
Contributor

I tested this under Linux Mint 22 Cinnamon and it works as expected.

@MattHeffron
Copy link
Contributor Author

Just an FYI:
I wrote this to help with looking at the values from (KEYACTION keyname) when we looking at the character to action operations, as in PR #2070

@MattHeffron MattHeffron marked this pull request as draft March 24, 2025 17:47
Allow (CL:CHARACTERP x) as well as SMALLP arguments.
Cleanup handling of char=255 in any character set.
Added optional argument NONCHAR.IDENTITY (default NIL).
If the C argument isn't SMALLP or CL:CHARACTERP, return C itself if NONCHAR.IDENTITY is non-null, else NIL.
@pamoroso
Copy link
Contributor

I tested up to commit f701cdf on Linux Mint 22 Cinnamon and found nothing unusual.

@nbriggs
Copy link
Contributor

nbriggs commented Mar 27, 2025

What are the valid inputs to CHARCODE.ENCODE ? (CHARCODE.ENCODE 128) fails with TYPE-MISMATCH "357,55" is not a NUMBER - it also fails with the same error on the same string if given -1, or anything else it seems to be unable to encode.

EDIT: Looks as though I got an out-of-date version of the code. It's behaving better now.

@nbriggs
Copy link
Contributor

nbriggs commented Mar 27, 2025

With the latest version, (CHARCODE.ENCODE n) for n < 0 produces 2-part octal strings that can't be decoded by CHARCODE.DECODE instead of NIL (e.g. -1 => "77777777,377").

It's also odd that it produces a control-modifier "^" on named characters that are already in the control space -- e.g., 775 => 3,^Bell yet it correctly produces "3,6" for 774 rather than "3,^6" which would be 790.

(CHARCODE.ENCODE 412) => "Meta,#^WHEELSCROLL-UP" which is a bad character specification according to CHARCODE.DECODE. (there'll be others in the same pattern, I haven't exhaustively enumerated the failures).

@rmkaplan
Copy link
Contributor

rmkaplan commented Mar 27, 2025 via email

@nbriggs
Copy link
Contributor

nbriggs commented Mar 27, 2025

CHARCODE.DECODE handles the output of all values produced by CHARNAME on arguments from 0 to 65535, and (EQ (CHARCODE.DECODE (CHARCODE I)) I) is true on the same domain.

@MattHeffron
Copy link
Contributor Author

I've rewritten this from scratch, so ignore this until I put up a new commit.

Uses names from CHARACTERNAMES where ever it can.
Simpler by pre-processing CHARACTERNAMES, which can be done once ahead and passed to multiple calls.
Added test function which checks that all 65536 character codes will correctly "round trip" through (CHARCODE.DECODE (CHARCODE.ENCODE cc)).
@MattHeffron MattHeffron marked this pull request as ready for review March 28, 2025 06:33
@nbriggs
Copy link
Contributor

nbriggs commented Mar 28, 2025

This version passes the test for (EQ I (CHARCODE.DECODE (CHARCODE.ENCODE I))) for all I from 0 to 65535, so much better. However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way.

@MattHeffron
Copy link
Contributor Author

However I find it really odd that it chooses to generate "^Zero" for (CHARCODE.ENCODE 16) rather than "^P" - I don't know anyone who would express it that way.

This falls out from the design "Uses names from CHARACTERNAMES where ever it can." (mentioned in the commit message) which was done for a case like (CHARCODE.ENCODE 8455) which returns "^INFINITY" instead of "41,^G".

What are the types of cases where you think that using the "names from CHARACTERNAMES" are (in)appropriate?

Should it use names only for exact match to the entry in CHARACTERNAMES?
What about for values in Meta or Function character sets? E.g. "Meta,Tab" vs "Meta,^I".
Are those special cases by design?
Is it only the cases of the names of the digits that are weird, because they aren't control chars but get used instead of the "^expected-char" as you pointed out?

@rmkaplan
Copy link
Contributor

rmkaplan commented Mar 29, 2025 via email

@nbriggs
Copy link
Contributor

nbriggs commented Mar 29, 2025

I need to think about it some more, but my first thought is that it should use names for exact match only, rather as Ron described above.s

There seems to be some confusion around the names of KEYS vs the names of CHARACTERS -- I don't think WHEELSCROLL-[UP,DOWN,LEFT,RIGHT] are/should be the names of characters, they should not appear in CHARACTERNAMES. That would get rid of things like (CHARCODE "GREEK,WHEELSCROLL-UP") => 9884.

The fact that we have the "Meta" character set name is a little confusing when we have the "#" that sets the "traditional" meta (8th) bit - so when you speak the name you have to be careful -- (CHARCODE "Meta,A") != (CHARCODE "#A") != (CHARCODE "Meta,#A") -- though this isn't an issue for turning codes back into names.

The IRM does use (CHARCODE "#^GREEK,A") (=> 9857) as an example (and you can put the # and ^ independently on either half of the name, their bit setting and clearing apply to the final code).

While the IRM says "^" derives the code from clearing the 7th bit (normally set), in fact it clears the 6th and 7th bits (CL:FORMAT T "~b" (CHARCODE "^0,377")) => 10011111, and it also says that # sets the 8th bit, normally cleared -- but then implements it as an addition so that (CHARCODE "#A") => 193, (CHARCODE "##A") == (CHARCODE "META,A") => 321, and (CHARCODE "###A") => 449 -- I definitely think we should not use that in creating character names.

@nbriggs
Copy link
Contributor

nbriggs commented Mar 29, 2025

BTW: while (CHARCODE "META,##A") is an error, you can put as many hashes in front of the character set name as you want -- (CHARCODE "##META,#A") => 705

@rmkaplan
Copy link
Contributor

rmkaplan commented Mar 29, 2025 via email

@MattHeffron
Copy link
Contributor Author

MattHeffron commented Mar 31, 2025

BTW: while (CHARCODE "META,##A") is an error, you can put as many hashes in front of the character set name as you want -- (CHARCODE "##META,#A") => 705

Yes, because leading # is just adding 200Q to tail-recursion on the rest of the string, but following the comma (or other allowed delimiter) the result of the recursion at that point is checked for in range 0-255.

This whole multiple # is an awful hack that should be removed.
(Make it LOGOR, not IADD; optionally be an error (but more difficult to check considering recursion.))
The few (?) places that use that ought be changeable to 1,nn or Meta,nn.

@MattHeffron
Copy link
Contributor Author

Discussion in 3/31 meeting was that this doesn't belong in LISPUSERS, but probably in LLREAD along with CHARCODE.DECODE. I'll defer moving it there until I finish stabilizing it.
It'll be easier to test if it doesn't require remaking loadups.

Copy link
Member

@masinter masinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed 3/7/25 meeting

@MattHeffron
Copy link
Contributor Author

Thinking more about this implementation, if I take out the work to try to use CHARACTERNAMES entries as much as possible, then I'm not sure that there's much value in this compared with CHARNAME in TEdit.
My only suggestions for CHARNAME would be:

  1. to have it accept Common Lisp CHARACTER as encodable input, as well as SMALLP (0-65535);
  2. use the first matching entry in CHARACTERNAMES as the name, to be consistent with Common Lisp generated named characters (e.g., #\Delete);
  3. (optionally?) return any invalid argument as itself, so CHARCODE can be used with whole KEYACTION entries, etc.

@masinter masinter marked this pull request as draft April 21, 2025 17:07
@MattHeffron
Copy link
Contributor Author

It appears the PR #2119 will address the points 1 & 3 that I made above. 2 is not that significant.
I will close this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants