-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lists of domains other than .gov and .mil? #1
Comments
Uh, those lists are old. From pre-crawl research. I hope to get the final (or final as of now) lists up soon. That said, we have not been separating the lists by .org/.us/.com but instead by "basic" (i.e. http), FTP, and "social media." One could easily take the "basic" list and grep out the .org/.us/.com etc. |
I guess they were never added? The scrapes of those sites would be pretty darn useful if they aren't somewhere else by now. |
We decided to wait until all the data was indexed. We are wrapping up ingesting the data from all the crawling partners now. That should finish this month. At that point we will have a "complete" copy of all EOT data and will be working with folks at the GSA to make (and hopefully maintain) a complete list. |
We're happy to help, just let us know! |
I see the seed lists for .gov and .mil, but wondering about .org/.us/.com etc. Is there a way to generate those lists as well?
The text was updated successfully, but these errors were encountered: