site discoverability/web crawlers/bots #180

vtp10 · 2021-08-18T02:19:08Z

vtp10
Aug 18, 2021

The Git page provides the following info:
_Public Discoverability
These machine-readable files must be made available to the public without restrictions that would impede the re-use of that information.

The location of the these URLs must be provided over HTTPS to ensure the integrity of the data.

Robots.txt
To allow for search engine discoverability, neither a robots.txt file nor meta tag on the page where the files are hosted will have rules such that give instructions to web crawlers to not index the page.

This is typically follows the format of or for a robots.txt file using the Disallow directive._

Question/Clarification:

The above information relates specifically to search engine discoverability, but does this mean that "all" web crawlers, bots, etc. should be allowed to access the site & its relevant data without any impediments? Are there any cases where such external agents should not be allowed?

Answered by ericsilvertx

Aug 18, 2021

By design the web crawlers and bot looks for the robots.txt and then do accordingly. My interpretation is that there should be no restriction on who or what system can retrieve the files.

View full answer

ericsilvertx · 2021-08-18T18:39:14Z

ericsilvertx
Aug 18, 2021

By design the web crawlers and bot looks for the robots.txt and then do accordingly. My interpretation is that there should be no restriction on who or what system can retrieve the files.

4 replies

retjami Nov 3, 2021

This is my interpretation as well. We are concerned about excessive egress charges. What are plans doing to protect themselves?

RLTx1391 Feb 28, 2022

Has there been any response on this? @tdfow I believe you asked about this for the October webinar but didn't seen any response on "Are we allowed to impose any protections against entities either maliciously or not, spiking egress fees by downloading files multiple times, etc."

rajkumarsowmy Feb 28, 2022

agreed. we need to get an answer around any protections that stops spiking egress fees.

shaselton-usds Apr 18, 2022
Maintainer

@ericsilvertx is correct. At this point there are no restrictions. If you have some thoughts on actual restrictions, CMS is open to hearing them for consideration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

site discoverability/web crawlers/bots #180

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

site discoverability/web crawlers/bots #180

vtp10 Aug 18, 2021

Replies: 1 comment · 4 replies

ericsilvertx Aug 18, 2021

retjami Nov 3, 2021

RLTx1391 Feb 28, 2022

rajkumarsowmy Feb 28, 2022

shaselton-usds Apr 18, 2022 Maintainer

vtp10
Aug 18, 2021

Replies: 1 comment 4 replies

ericsilvertx
Aug 18, 2021

shaselton-usds Apr 18, 2022
Maintainer