Commit c63cb0a
authored
Reduce crawler load on servers (#1581)
The crawler has a hard time crawling all specs nowadays due to more stringent
restrictions on servers that lead to network timeouts and errors. See:
w3c/webref#1244
The goal of this update is to reduce the load of the crawler onto servers. Two
changes:
1. The list of specs to crawl gets sorted to distribute origins. This should
help with diluting requests sent to a specific server at once. The notion of
"origin" used in the code is loose and more meant to identify the server that
serves the resource than the actual origin.
2. Requests sent to a given origin are serialized, and sent 2 seconds minimum
after the last request was sent (and processed). The crawler still processes
the list 4 specs at a time otherwise (provided the specs are to be retrieved
from different origins).
The consequence of 1. is that the specs are no longer processed in order, so
logs will make the crawler look a bit drunk, processing specs seemingly
randomly, as in:
```
1/610 - https://aomediacodec.github.io/afgs1-spec/ - crawling
8/610 - https://compat.spec.whatwg.org/ - crawling
12/610 - https://datatracker.ietf.org/doc/html/draft-davidben-http-client-hint-reliability - crawling
13/610 - https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis - crawling
12/610 - https://datatracker.ietf.org/doc/html/draft-davidben-http-client-hint-reliability - done
16/610 - https://drafts.css-houdini.org/css-typed-om-2/ - crawling
13/610 - https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis - done
45/610 - https://fidoalliance.org/specs/fido-v2.1-ps-20210615/fido-client-to-authenticator-protocol-v2.1-ps-errata-20220621.html - crawling
https://compat.spec.whatwg.org/ [error] Multiple event handler named orientationchange, cannot associate reliably to an interface in Compatibility Standard
8/610 - https://compat.spec.whatwg.org/ - done
66/610 - https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html - crawling
https://aomediacodec.github.io/afgs1-spec/ [log] extract refs without rules
1/610 - https://aomediacodec.github.io/afgs1-spec/ - done
```1 parent 2be9f4c commit c63cb0a
2 files changed
+154
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
126 | 132 | | |
127 | 133 | | |
128 | 134 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
34 | 116 | | |
35 | 117 | | |
36 | 118 | | |
| |||
95 | 177 | | |
96 | 178 | | |
97 | 179 | | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
116 | 225 | | |
117 | 226 | | |
118 | 227 | | |
| |||
343 | 452 | | |
344 | 453 | | |
345 | 454 | | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
346 | 472 | | |
347 | 473 | | |
348 | 474 | | |
349 | 475 | | |
350 | 476 | | |
351 | 477 | | |
352 | | - | |
353 | | - | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
354 | 482 | | |
355 | 483 | | |
356 | 484 | | |
| |||
0 commit comments