site discoverability/web crawlers/bots #180
-
The Git page provides the following info: The location of the these URLs must be provided over HTTPS to ensure the integrity of the data. Robots.txt This is typically follows the format of or for a robots.txt file using the Disallow directive._ Question/Clarification: The above information relates specifically to search engine discoverability, but does this mean that "all" web crawlers, bots, etc. should be allowed to access the site & its relevant data without any impediments? Are there any cases where such external agents should not be allowed? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
By design the web crawlers and bot looks for the robots.txt and then do accordingly. My interpretation is that there should be no restriction on who or what system can retrieve the files. |
Beta Was this translation helpful? Give feedback.
By design the web crawlers and bot looks for the robots.txt and then do accordingly. My interpretation is that there should be no restriction on who or what system can retrieve the files.