Skip to content

URL detection with * in URL is strange #9543

@tomkel

Description

@tomkel

Describe the bug
echo 'https://crontab.guru/#1_*_*_*_*'

When hovering/clicking the URL in kitty, the URL is different based on which * or _ you hover over.
If you hover before the first _, the URL is https://crontab.guru/#1
If you hover on the first _, the URL is https://crontab.guru/#1_
If you hover on the first *, the URL is https://crontab.guru/#1_*
If you hover on the second _, the URL is https://crontab.guru/#1_*_
etc.

To Reproduce
Steps to reproduce the behavior:

  1. kitty --config NONE
  2. echo 'https://crontab.guru/#1_*_*_*_*'
  3. Hover/click on the URL in different parts to see what kitty thinks the URL is

Screenshots
If applicable, add screenshots to help explain your problem.

**Environment details**
kitty 0.45.0 (c26b770530) created by Kovid Goyal
Darwin aa-32-26-25-09-68 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:53:05 PST 2026; root:xnu-12377.81.4~5/RELEASE_ARM64_T6020 arm64
ProductName:		macOS ProductVersion:		26.3 BuildVersion:		25D125
OpenGL: '4.1 Metal - 90.5' Detected version: 4.1
Frozen: True
Fonts:
  medium: Menlo-Regular: /System/Library/Fonts/Menlo.ttc
          Features: ()
    bold: Menlo-Bold: /System/Library/Fonts/Menlo.ttc
          Features: ()
  italic: Menlo-Italic: /System/Library/Fonts/Menlo.ttc
          Features: ()
      bi: Menlo-BoldItalic: /System/Library/Fonts/Menlo.ttc
          Features: ()
Paths:
  kitty: /Applications/kitty.app/Contents/MacOS/kitty
  base dir: /Applications/kitty.app/Contents/Resources/kitty
  extensions dir: /Applications/kitty.app/Contents/Resources/Python/lib/kitty-extensions
  system shell: /bin/zsh
System color scheme: light. Applied color theme type: none

Config options different from defaults:
Changed shortcuts:
	cmd+k →  clear_terminal_and_scrollback
	cmd+l →  clear_last_command
	ctrl+cmd+, →  reload_config
	ctrl+cmd+l →  clear_screen
	opt+cmd+k →  clear_scrollback
	opt+cmd+r →  reset_terminal
	shift+cmd+/ →  open_kitty_website

Important environment variables seen by the kitty process:
	LANG                                en_US.UTF-8
	EDITOR                              nvim
	SHELL                               /bin/zsh

Additional context
Hints URL detection with ctrl+alt+e or ctrl+alt+p>f works OK. In hints only one URL is detected.
Kitty handles https://crontab.guru/#1_%2A_%2A_%2A_%2A just fine. %2A is the URI encoding for *.
This seems to be a URI encoding question.
Apparently * is a "reserved" character in URLs.
In RFC 3986, Section 2.2, * is further defined as a reserved "sub-delim". In Section 3.3 they also become "pchar".
Apparently these sub-delims / pchars are allowed in many different parts of the URI: user, host, path, query, fragment.
Section 2.2 also says:

URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.

"specifically allowed by the URI scheme" - so this would depend on the "http/https" scheme whether or not * should be encoded?

RFC 9110 Section 4.2.5

Note: The fragment identifier component is not part of the scheme definition for a URI scheme (see Section 4.3 of [URI]), thus does not appear in the ABNF definitions for the "http" and "https" URI schemes above.

URI Section 4.4:

When a URI reference refers to a URI that is, aside from its fragment component (if any), identical to the base URI (Section 5.1), that reference is called a "same-document" reference.
When a same-document reference is dereferenced for a retrieval action, the target of that reference is defined to be within the same entity (representation, document, or message) as the reference;

So fragment syntax is specificed by the dereferenced resource in the fragment, which in this case is also the http/html page (same-document reference). But the http spec Section 4.2.5 says it doesn't define fragment syntax. So it's undefined behavior?

In the case of the query segment however, HTTP spec Section 4.1 reuses the query definiton from Section 3.4 of the URI spec, which includes pchars: ( pchar / "/" / "?" ). So * is probably valid in the query segment.

Apologies if I'm misunderstanding the specs, it's my first time reading them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions