-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setters can cause failure to roundtrip #651
Comments
A few options here:
Option 1 is probably the most web-compatible for the most content, but leaves a weird edge case and a sour taste in my mouth. Option 2 can be a bit surprising to the user (the pathname can change even when only hash is being set), but probably also web-compatible too. Option 3 is the most theoretically pleasing, but I would not be surprised if some web content depends on I'm leaning towards option 2. |
I think I like these a bit better than 2. 4 would be problematic for fragment navigation, which is definitely a thing in |
I think I prefer 2 over 5. With 2, users have more options: if they want to keep the space and the delimiter, then they can explicitly opt into it by setting 6 seems a little surprising – we never only percent-encode one character. Chrome is also already running into some trouble trying to percent-encode more characters in non-special URLs (https://crbug.com/1252739), so ideally I'd like to keep the path the way it is. |
I can't think of anywhere else where naked spaces are allowed (not in the scheme, query, fragment, userinfo, not in domains, not in opaque hostnames, etc) and RFC-3986 also doesn't allow them. Are there use-cases which absolutely depend on non-encoded spaces being preserved verbatim in CBAB URL paths? |
@karwa The ability to percent-encode unusual characters in CBAB paths would be the ideal case. Sadly, I doubt that will be web-compatible. In fact, I think it'll be difficult just to encode spaces in hierarchical non-special URL paths (in Chrome and Firefox that is – Safari already does that). |
@achristensen07 @valenting thoughts? I could live with 2. |
I think that no.2 is the least likely to cause breakage, so that's a ➕ for me. |
Not particularly relevant to the discussion here, but be aware that setters do not preserve validity. |
@alwinb I think we are using different definitions of "valid" here. When I say "setters will keep the URL valid" I don't mean "valid URL string" as defined in the spec, but rather "roundtrippable". If you have other examples of where setters make the URL non-roundtrippable, I'd certainly like to hear it. |
Setters being round-trippable is nice. I'd like to add that I think setters should preserve the structure of the URL component, so that One issue I've been wondering about bringing up is that non-special paths can have unescaped backslashes, but if you go through the setter, the URL parser will then interpret those backslashes as path separators and normalise them to forwardslashes. This violates the // Backslashes are not percent-encoded in non-special URLs,
// so copying the path to a special URL can cause it to be interpreted differently.
var specialURL = WebURL("https://example.com/")!
specialURL.serialized() // "https://example.com/"
specialURL.path, // "/"
specialURL.pathComponents.count // 1
let nonSpecialURL = WebURL(#"foo://example.com/baz\qux"#)!
nonSpecialURL.serialized() // "foo://example.com/baz\qux"
nonSpecialURL.path // "/baz\qux"
nonSpecialURL.pathComponents.count. // 1
specialURL.path = nonSpecialURL.path
specialURL.serialized() // "https://example.com/baz/qux" (!)
specialURL.path // "/baz/qux" (!)
specialURL.pathComponents.count // 2 (!) We could get around this if the URL parser percent-encoded each component in list-style paths with a "path component" encode-set, which would consist of the path set + backslash. Currently the path set does not include path separators themselves. |
@karwa Escaping Either way it should probably be in a separate ticket. (Edit: created #675 to track this.) |
I like 2, which is effectively what Safari already does and we haven't seen any compatibility issues from it. |
@achristensen07 Oh interesting! Didn't realize Safari already fixed this in some way. Sounds like we have alignment on 2 then. |
As opaque paths can end in U+0020, those trailing U+0020 code points need to be removed from the path when both query and fragment become null. Tests: ... Fixes #651.
I got around to creating tests for this as well as a PR: #728. Safari does indeed seem to do this correctly (although there are some unrelated I noted Gecko down as supportive per @valenting's comment above. |
Those It turns out that you cannot use Tests will be updated to reflect all this. |
As opaque paths can end in U+0020, those trailing U+0020 code points need to be removed from the path when both query and fragment become null. Tests: web-platform-tests/wpt#37556. Fixes #651.
Generally, we assume setters will keep the URL valid and roundtrippable. However, this turns out not to be true in a specific case:
This is only an issue because we allow spaces to appear verbatim in cannot-be-a-base URL paths.
(first discovered in https://crbug.com/291747)
The text was updated successfully, but these errors were encountered: