Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Uri does not allow * (or other sub-delims) in hostname #64707

Open
jimmylewis opened this issue Feb 2, 2022 · 16 comments
Open

System.Uri does not allow * (or other sub-delims) in hostname #64707

jimmylewis opened this issue Feb 2, 2022 · 16 comments
Assignees
Labels
area-System.Net enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@jimmylewis
Copy link

jimmylewis commented Feb 2, 2022

Description

Per RFC 3987 (and 3986), the host name is allowed to contain certain delimiter characters according to the ABNF grammar:

   IRI            = scheme ":" ihier-part [ "?" iquery ]
                         [ "#" ifragment ]

   ihier-part     = "//" iauthority ipath-abempty
                  / ipath-absolute
                  / ipath-rootless
                  / ipath-empty
   iauthority     = [ iuserinfo "@" ] ihost [ ":" port ]
   ihost          = IP-literal / IPv4address / ireg-name
   ireg-name      = *( iunreserved / pct-encoded / sub-delims )
   sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

However, attempting to create a Uri containing these in the host portion throws an exception.

Reproduction Steps

new Uri("http://*:5000", UriKind.Absolute)

Expected behavior

This is a grammatically valid URI, and should be parsed.

Actual behavior

Unhandled exception. System.UriFormatException: Invalid URI: The hostname could not be parsed.
   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind, UriCreationOptions& creationOptions)
   at System.Uri..ctor(String uriString, UriKind uriKind)

Regression?

Not a regression AFAICT

Known Workarounds

No response

Configuration

.NET 6

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Feb 2, 2022
@ghost
Copy link

ghost commented Feb 2, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Per RFC 3987 (and 3986), the host name is allowed to contain certain delimiter characters according to the ABNF grammar:

   IRI            = scheme ":" ihier-part [ "?" iquery ]
                         [ "#" ifragment ]

   ihier-part     = "//" iauthority ipath-abempty
                  / ipath-absolute
                  / ipath-rootless
                  / ipath-empty
   iauthority     = [ iuserinfo "@" ] ihost [ ":" port ]
   ihost          = IP-literal / IPv4address / ireg-name
   ireg-name      = *( iunreserved / pct-encoded / sub-delims )
   sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

However, attempting to create a Uri containing these in the host portion throws an exception.

Reproduction Steps

new Uri("http://*:5000", UriKind.Absolute)

Expected behavior

This is a grammatically valid URI, and should be parsed.

Actual behavior

Unhandled exception. System.UriFormatException: Invalid URI: The hostname could not be parsed.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind, UriCreationOptions& creationOptions)
at System.Uri..ctor(String uriString, UriKind uriKind)

Regression?

Not a regression AFAICT

Known Workarounds

No response

Configuration

.NET 6

Other information

No response

Author: jimmylewis
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@MihaZupan
Copy link
Member

You are probably right in that this is a possible Uri authority according to the spec, but the Uri class is more restrictive than the general ABNF when it comes to the authority part. For example, we don't allow pct-encoded hosts either.

What is the use-case for such a Uri? What is the value of being able to represent it with this class?

@jimmylewis
Copy link
Author

jimmylewis commented Feb 3, 2022

In my case, I was using Uri.IsWellFormed to verify that a string is valid URI according to the RFC. As the documentation states:

The existing Uri class has been extended in .NET Framework v3.5, 3.0 SP1, and 2.0 SP1 to provide IRI support based on RFC 3987.

My code in part of a JSON Schema validator, and the JSON Schema spec specifies the RFC as the definition for valid/invalid values.

The user reporting the problem with my code is using this value in their appsettings.json for an ASP.NET Core project for a Kestrel endpoint, e.g.

{
  "Kestrel": {
    "Endpoints": {
      "MyHttpsEndpoint": {
        "Url": "https://*:5001",   // warning on this line
        "SslProtocols": [ "Tls12", "Tls13" ],
        "Certificate": {
          "Path": "<path to .pfx file>",
          "Password": "$CREDENTIAL_PLACEHOLDER$"
        }
      }
    }
  }
}

The JSON Schema for appsettings.json defines the "Url" property here as using the "uri" format defined in the JSON schema spec.

If this is out of scope for the System.Uri class to handle, that's fine, I'll can look into finding or writing another parser. I didn't see any remarks in the (admittedly long) documentation about non-conformance to the specifications.

@BhaaLseN
Copy link

BhaaLseN commented Feb 6, 2022

This also affects the UriBuilder class, in case any changes are planned as result of this issue.

Our use-case was a configuration tool that both reads and writes configuration files that contain host URLs, and the plan was to simply put * as hostname to not limit hosting to a specific hostname (which, in many of our cases, is undesireable).

Our workaround is to assume * is only in the host portion (which might not apply to everyone reading this), and replacing it with localhost for the duration of Uri/UriBuilder use; then replace localhost back with * before it goes in the config file.

@karelz
Copy link
Member

karelz commented Feb 15, 2022

Triage: Uri does more than just URI RFC. It also validates DNS schema.
To change it, we would technically break behavior and change rejection from Uri to HTTP stack or other parts of Networking.
Given lower demand, we think it is fine. Let's keep it open to see if more users need it.

We admit it would be nice to have ability to parse just according to the RFCs (skipping DNS validation).

@karelz karelz added this to the Future milestone Feb 15, 2022
@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions and removed untriaged New issue has not been triaged by the area owner labels Feb 15, 2022
@dibarbet
Copy link
Member

dibarbet commented Sep 6, 2023

I think we're hitting this issue in our vscode C# extension. We're getting passed a URI like perforce://@=1454483/some/file/here/source.cs from the vscode client, and System.URI is throwing when we try and create a URI for it. I believe this is because of the '=' in the host, which looks like a valid host character in the RFC (link).

We use URIs here because vscode sends us document identifiers as URI formatted strings. They can represent a real file on disk or a virtual document and we use the URI type to get info on it (e.g. local path if it has one).

Original issue - dotnet/vscode-csharp#6256

@karelz
Copy link
Member

karelz commented Sep 7, 2023

@dibarbet thanks for your details. I think that Uri validation of hostnames likely kicks in here.
As mentioned above #64707 (comment), we do more than just Uri RFC.

Do you have a reasonable workaround in your code base? (e.g. treating such special Uris as string)
We are tracking this as feature request and not a common one so far, so it is not clear when or if it will be implemented.

@dibarbet
Copy link
Member

dibarbet commented Sep 7, 2023

I don't have a good workaround unfortunately - the URI conversion comes from the LSP protocol library we use (internal link). This library contains C# type definitions for the LSP spec that we use to deserialize LSP message, and is used in VS, Roslyn, vscode C#, etc.

Switching the protocol libraries to use string instead of the URI type would be a large undertaking, for example the base document identifier type, used by almost all other definitions, is based on a URI (internal link).

cc also @olegtk as an FYI and to see if he had any opinions. Switching over to string might be a good long term move anyway (due to other issues with encoded paths). But I don't think it will be simple.

@karelz
Copy link
Member

karelz commented Sep 8, 2023

OK, I could imagine that in future versions we could solve it in some way.
For example, add a new UriCreationOptions property (something like DisableDnsHostnameValidation). @MihaZupan what do you think?

However, it will be in future versions of .NET, so you will have to wait and upgrade.
Also, if it won't solve all your problems with Uri (as you hinted at encoded paths), then perhaps it is not that much needed after all ...

@CyrusNajmabadi
Copy link
Member

@karelz @MihaZupan Do you guys think something could be done here? This really is unpleasant on our side as we have a lot of APIs designed around System.Uri, and now we're painfully finding out that it doesn't work for all the types of Uris our clients use (like vscode). Thanks!

@dibarbet
Copy link
Member

To add on to that, while we are able to move the parsing of the URI out from a critical path (e.g. treat it as a string), at some point we do actually need to parse them to extract information like the scheme, file path, or query parameters from the URI strings. Other than writing our own URI parser, there doesn't seem to be a great way to workaround this issue in System.Uri. Additionally, various other components (not owned by us) do use System.Uri as API parameters, so we end up having to convert anyway.

This does seem to be a pretty common issue for VSCode extensions written in C#. For example O# created it's own URI parser to handle VSCode URIs. And it looks like the bicep extension has feature limitations due to this issue - Azure/bicep#13274

Anyway - just wanted to provide more information on the cost this has on us

@MihaZupan
Copy link
Member

Given that and related issues (#77464, #94195), it sounds to me like there's sufficient reasoning for us to make a change here (either behind a flag, or just by default).

@karelz
Copy link
Member

karelz commented Feb 5, 2025

Agreed, we should fix it given the pain it causes. Thanks @CyrusNajmabadi for the detailed info -- it definitely helps in prioritization.
And thanks @rokonec for self-assigning!

@karelz
Copy link
Member

karelz commented Feb 5, 2025

Key question: If we fix just .NET 10, is it going to help all these scenarios in a way they won't be unhappy too much? (at least not more than now)

@BhaaLseN
Copy link

BhaaLseN commented Feb 5, 2025

I've sat here for a while and thought about this question, and I still don't think I have a clear answer (at least for us). We ran into this with a .NET Framework application (and a fix there would be cool, but fairly unrealistic), but we also have a very trivial workaround (just string-replace before/after with something that is valid - 2 lines difference). Plus, it only affects utility applications meant to give our end users a nicer configuration UI rather than a bare JSON or XML file (which, at this point, have nothing stopping us from upgrading them to .NET 8 or .NET 10 when it's out).
Our bigger applications don't really have to deal with this, aside from what CoreWCF has run into (but that's still on the list, to migrate from regular WCF to CoreWCF...we have other things keeping us on .NET Framework there that are more difficult to replace with compatible alternatives). And once those are ready to move, we'd go straight to .NET 10 (since I don't really expect us to get all of this done before end of the year, with an intermediate stop at .NET 8 LTS).

So, TL;DR: I don't know.

But: I'd like to request that UriBuilder receives the same treatment, and not just Uri. At the very least to accept */+ for UriBuilder.Host (where ireg-name and sub-delims ultimately go); or in some other new property that allows us to read and write it. The majority of this discussion seemed to be focused on the latter; but the former also exists (and unless I missed something, will continue to do so).

@dibarbet
Copy link
Member

dibarbet commented Feb 5, 2025

Key question: If we fix just .NET 10, is it going to help all these scenarios in a way they won't be unhappy too much? (at least not more than now)

A fix in .NET 10 will definitely help for Roslyn's VSCode scenarios (we can upgrade our server to .NET 10 when it releases). And most of the problematic URIs we get come from our VSCode side.

It won't help us in VS at the moment, but we also rarely encounter these kinds of URIs in VS. And adding to .NET framework I assume is not feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

7 participants