|
| 1 | +--- |
| 2 | +title: "Block the Bots in Enhance Projects" |
| 3 | +image: '/_public/blog/post-assets/stop-sign.jpg' |
| 4 | +image_alt: "An all ways stop sign." |
| 5 | +photographer: "John Matychuk" |
| 6 | +photographer_url: "https://unsplash.com/@john_matychuk?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash" |
| 7 | +category: ai, enhance |
| 8 | +description: "Introducing a new plugin for Architect and Enhance projects to block AI crawler bots." |
| 9 | +author: 'Simon MacDonald' |
| 10 | +avatar: 'simon.png' |
| 11 | + |
| 12 | +published: "July 31, 2024" |
| 13 | +--- |
| 14 | + |
| 15 | +The backlash against Artificial Intelligence bots scraping the web seems to be growing. Web luminaries like [Ethan Marcotte](https://follow.ethanmarcotte.com/@beep) have written about [how and why](https://ethanmarcotte.com/wrote/blockin-bots/) they are opting out of their _work being hoovered up to train “AI” data models_. Sites like [Read The Docs](https://about.readthedocs.com/) are stating that [AI crawlers need to be more respectful](https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/) after noticing their bandwidth declined 75% after blocking AI bots. Cloud providers like CloudFlare have made it much [easier to block bots](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click). |
| 16 | + |
| 17 | +With Enhance applications you’ve always been able to block AI crawlers by providing your own `robots.txt` file, but today we are introducing a new plugin called `@enhance/arc-plugin-block-bots`. |
| 18 | + |
| 19 | +## Functionality |
| 20 | + |
| 21 | +The plugin will add a new route to your application at `/robots.txt`. This route is used to tell web crawlers and bots which pieces of your web site they are allowed to access. By default, the response generated by the plugin looks like this: |
| 22 | + |
| 23 | +``` |
| 24 | +User-agent: Amazonbot |
| 25 | +User-agent: anthropic-ai |
| 26 | +User-agent: Applebot-Extended |
| 27 | +User-agent: Bytespider |
| 28 | +User-agent: CCBot |
| 29 | +User-agent: ChatGPT-User |
| 30 | +User-agent: ClaudeBot |
| 31 | +User-agent: Claude-Web |
| 32 | +User-agent: cohere-ai |
| 33 | +User-agent: Diffbot |
| 34 | +User-agent: FacebookBot |
| 35 | +User-agent: FriendlyCrawler |
| 36 | +User-agent: Google-Extended |
| 37 | +User-agent: GoogleOther |
| 38 | +User-agent: GoogleOther-Image |
| 39 | +User-agent: GoogleOther-Video |
| 40 | +User-agent: GPTBot |
| 41 | +User-agent: ImagesiftBot |
| 42 | +User-agent: img2dataset |
| 43 | +User-agent: Meta-ExternalAgent |
| 44 | +User-agent: OAI-SearchBot |
| 45 | +User-agent: omgili |
| 46 | +User-agent: omgilibot |
| 47 | +User-agent: PerplexityBot |
| 48 | +User-agent: YouBot |
| 49 | +Disallow: / |
| 50 | +``` |
| 51 | + |
| 52 | +Once a day, the plugin will check the well maintained [ai.robots.txt](https://github.com/ai-robots-txt/ai.robots.txt) for new user agents to block. If the list has been updated, your site’s `robot.txt` file will be updated accordingly. This way you don’t need to constantly update the file as the plugin will take care of that chore for you. |
| 53 | + |
| 54 | +## Setup |
| 55 | + |
| 56 | +To add `@enhance/arc-plugin-block-bots` to your Enhance application first install the package. |
| 57 | + |
| 58 | +```bash |
| 59 | +npm i @enhance/arc-plugin-block-bots |
| 60 | +``` |
| 61 | + |
| 62 | +Then edit your `.arc` file to add the plugin. |
| 63 | + |
| 64 | +```arc |
| 65 | +@plugins |
| 66 | +enhance/arc-plugin-block-bots |
| 67 | +``` |
| 68 | + |
| 69 | +Then all you need to do is deploy your application and the `/robots.txt` route will be available. |
| 70 | + |
| 71 | +## Future Plans |
| 72 | + |
| 73 | +This is just the first release of our bot blocking plugin. We’ve noticed that not all bots are well behaved citizens of the interwebs as some will ignore your `robots.txt` directives. We are looking at ways to protect each and every route of your application from bots using Enhance middleware or by automatically configuring [Amazon WAF Bot Control](https://aws.amazon.com/waf/features/bot-control/). |
| 74 | + |
| 75 | +## Next Steps |
| 76 | + |
| 77 | +* Try out the [plugin](https://github.com/enhance-dev/arc-plugin-block-bots) in your project, and let us know if you have any issues. |
| 78 | +* Let us know what metric you want to see next in the plugin. Better yet, send us a PR! |
| 79 | +* [Follow](https://fosstodon.org/@enhance_dev) Axol, the Enhance Mascot on Mastodon |
| 80 | +* Join the [Enhance Discord](https://enhance.dev/discord) and share what you’ve built, or ask for help. |
| 81 | + |
0 commit comments