Fast and easy tool for crawling buggy links from your sitemap.
Just clone this repo, run php crawler.php http://example.com/sitemap.xml, check created logs and improve your SEO quality.
This module has few simple configuration parameters. Using them you can manage speed of crawling and also include/exclude some reports.
Let's check them:
errorLog - boolean value for including/excluding error reports. (it means checking links to pages with 4XX or 5XX codes).
redirectLog - boolean value for including/excluding redirect reports. (it means checking links to pages with 3XX codes).
internalOnly - boolean value for including/excluding external links checking.
streamCount - integer value to define count of parallel parsing streams. By default is 10, but can be set higher to improve crawling speed.
excludeList - array of link patterns that crawler must exclude. It could be defined like ['/meta/', '/search/'] etc.