Skip to content

Commit 7b0e201

Browse files
committed
All: Fix invalid URLs in sitemap
https://github.com/jquery/jquerymobile.com/actions/runs/6490030580/job/17625221359 ``` DEBUG:scrapy.core.engine:Crawled (200) <GET https://api.jquerymobile.com/wp-sitemap.xml> (referer: None) ERROR:scrapy.core.scraper:Spider error processing <GET https://api.jquerymobile.com/wp-sitemap.xml> (referer: None) Traceback (most recent call last): … File "/home/seleuser/.local/share/virtualenvs/seleuser-AdYDHarm/lib/python3.10/site-packages/scrapy/spiders/sitemap.py" in _parse_sitemap File "/home/seleuser/.local/share/virtualenvs/seleuser-AdYDHarm/lib/python3.10/site-packages/scrapy/http/request/__init__.py" in self._set_url(url) File "/home/seleuser/.local/share/virtualenvs/seleuser-AdYDHarm/lib/python3.10/site-packages/scrapy/http/request/__init__.py" in _set_url ValueError: Missing scheme in request url: //api.jquerymobile.com/wp-sitemap-posts-post-1.xml 2023-10-12 01:21:37 [scrapy.core.scraper] ERROR: Spider error processing <GET https://api.jquerymobile.com/wp-sitemap.xml> (referer: None) ``` Ref jquery/infrastructure-puppet#33
1 parent 234e75e commit 7b0e201

File tree

3 files changed

+42
-50
lines changed

3 files changed

+42
-50
lines changed

CONTRIBUTING.md

+18-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,22 @@
1-
Welcome! Thanks for your interest in contributing to jquery-wp-content. You're **almost** in the right place. More information on how to contribute to this and all other jQuery Foundation projects is over at [contribute.jquery.org](https://contribute.jquery.org). You'll definitely want to take a look at the articles on contributing [to our websites](https://contribute.jquery.org/web-sites/) and [code](https://contribute.jquery.org/code).
1+
# Contributing
2+
3+
Welcome! Thanks for your interest in contributing to jquery-wp-content. More information on how to contribute to this and other projects is over at [contribute.jquery.org](https://contribute.jquery.org). You'll definitely want to take a look at the articles on contributing [to our websites](https://contribute.jquery.org/web-sites/) and [code](https://contribute.jquery.org/code).
24

35
You may also want to take a look at our [commit & pull request guide](https://contribute.jquery.org/commits-and-pull-requests/) and [style guides](https://contribute.jquery.org/style-guide/) for instructions on how to maintain your fork and submit your code. Before we can merge any pull request, we'll also need you to sign our [contributor license agreement](https://contribute.jquery.org/cla).
46

57
You can [Chat on Gitter](https://gitter.im/jquery/dev), should you have any questions. If you've never contributed to open source before, we've put together [a short guide with tips, tricks, and ideas on getting started](https://contribute.jquery.org/open-source/).
8+
9+
## Code knowledge
10+
11+
### Protocol-relative URLs
12+
13+
As of 2023, we run with the default WordPress settings to formatting and cleaning URLs. If revisiting this in the future, consider the following constraints:
14+
15+
* When accessing sites in older browsers over HTTP instead of HTTPS, references to theme assets (e.g. stylesheets) must either use the current scheme, or use a protocol-relative URL, or be an absolute path URL without protocol or hostname (`theme_root_uri`).
16+
17+
* Intra-site links to pages and categories should generally use a path or the canonical URL.
18+
19+
* Avoid stripping the protocol from a `clean_url` filter as various uses require a full URL:
20+
* Server-side requests, such as for `downloads.wordpress.org`, must specify an explicit protocol in the URL.
21+
* When building `/wp-sitemap.xml`, URLs must be full and with the canonical protocol explicitly set. Sitemaps are invalid if they contain relative URLs.
22+
* When outputting `<link rel=canonical>` via `wp_head/rel_canonical`, the URL must be full and canonical. Or `rel_canonical` must be remove_action'ed replaced with a custom version that calls `esc_attr()` instead of `esc_url()` to avoid the `clean_url` filter.

plugins/jquery-actions.php

+24-37
Original file line numberDiff line numberDiff line change
@@ -14,47 +14,34 @@
1414
remove_action( 'wp_head', 'wp_shortlink_wp_head', 10 );
1515
remove_action( 'template_redirect', 'wp_shortlink_header', 11 );
1616

17+
// Ensure relative links remain on the current protocol
18+
// (such as references to theme assets and intra-site links).
19+
// This does not influence 'home' and 'siteurl' options, and thus
20+
// does not affect <link rel=canonical> and sitemap output.
21+
if ( @$_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https' ) {
22+
$_SERVER['HTTPS'] = '1';
23+
} elseif ( @$_SERVER['HTTP_X_FORWARDED_PROTO'] == 'http' ) {
24+
$_SERVER['HTTPS'] = '0';
25+
}
26+
1727
/**
18-
* Add rel=canonical on singular pages (API pages, and blog posts)
28+
* Add rel=me link to HTML head for Mastodon domain verification
29+
*
30+
* Usage:
31+
*
32+
* Put one or more comma-separated URLs in the 'jquery_xfn_rel_me' WordPress option.
33+
*
34+
* Example:
35+
*
36+
* 'jquery_xfn_rel_me' => 'https://example.org/@foo,https://social.example/@bar'
1937
*
20-
* Derived from WordPress 6.3.1 rel_canonical:
38+
* See also:
2139
*
22-
* - Avoid applying esc_url and its 'clean_url' filter so that
23-
* 'https://' is not stripped, and thus the URL is actually canonical.
40+
* - https://docs.joinmastodon.org/user/profile/#verification
41+
* - https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/rel/me
42+
* - https://microformats.org/wiki/rel-me
43+
* - https://gmpg.org/xfn/
2444
*/
25-
function jq_rel_canonical() {
26-
if ( !is_singular() ) {
27-
return;
28-
}
29-
$id = get_queried_object_id();
30-
if ( $id === 0 ) {
31-
return;
32-
}
33-
34-
$url = wp_get_canonical_url( $id );
35-
if ( $url) {
36-
echo '<link rel="canonical" href="' . esc_attr( $url ) . '" />' . "\n";
37-
}
38-
}
39-
remove_action( 'wp_head', 'rel_canonical' );
40-
add_action( 'wp_head', 'jq_rel_canonical' );
41-
42-
// Add rel=me link to HTML head for Mastodon domain verification
43-
//
44-
// Usage:
45-
//
46-
// Put one or more comma-separated URLs in the 'jquery_xfn_rel_me' WordPress option.
47-
//
48-
// Example:
49-
//
50-
// 'jquery_xfn_rel_me' => 'https://example.org/@foo,https://social.example/@bar'
51-
//
52-
// See also:
53-
//
54-
// - https://docs.joinmastodon.org/user/profile/#verification
55-
// - https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/rel/me
56-
// - https://microformats.org/wiki/rel-me
57-
// - https://gmpg.org/xfn/
5845
function jquery_xfnrelme_wp_head() {
5946
$option = get_option( 'jquery_xfn_rel_me' , '' );
6047
$links = $option !== '' ? explode( ',', $option ) : array();

plugins/jquery-filters.php

-12
Original file line numberDiff line numberDiff line change
@@ -120,18 +120,6 @@ function jquery_unfiltered_html_for_term_descriptions() {
120120
return $sortedTerms;
121121
}, 20, 3 );
122122

123-
// Strip protocol from urls making them protocol agnostic.
124-
add_filter( 'theme_root_uri', 'strip_https', 10, 1 );
125-
add_filter( 'clean_url', 'strip_https', 11, 1 );
126-
function strip_https($url) {
127-
// WordPress core updates need a protocol.
128-
if ( 'downloads.wordpress.org' === parse_url( $url, PHP_URL_HOST ) ) {
129-
return $url;
130-
}
131-
132-
return preg_replace( '/^https?:/', '', $url );
133-
}
134-
135123
add_filter( 'xmlrpc_wp_insert_post_data', function ( $post_data, $content_struct ) {
136124
if ( $post_data['post_type'] !== 'page' ) {
137125
return $post_data;

0 commit comments

Comments
 (0)