-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search all silo posts for links to users' sites and send mentions #456
Comments
cc @kylewm in case you're interested in adding flickr search support... (see above) |
the remaining part here is to send mention posts themselves, not just their responses. this needs a new |
finally soft launched this, and it worked well, but evidently has a memory leak, so i had to roll it back.
ugh. there's FUD here and there about the sockets API maybe causing memory leaks due to badly handled range requests, but i can't tell how real it is or if it could be causing this. i suspect i've just been wasteful with memory, e.g. lots of string concatenations and silver lining: at least i know the window of commits where the leak was introduced! |
silver lining: it's working ok, at least! e.g. the top response here: https://www.brid.gy/twitter/kylewmahan#responses is this tweet: https://twitter.com/anarcho/status/643921641664200704 which propagated as a mention to https://kylewm.com/2015/09/repost-of-glenn-greenwald-the-new-revolving-door |
wow, that mention is hidden behind a redirect too, pretty cool! |
some of this might be just because our slow poll frequency is once a day, so we're still working through the first set of search results for many users. that should be done by around noon PST. i'll revisit if latency is still consistently bad after that. |
scratch that, we'll be caught up by ~1:30pm PST today, since we're ~90m behind. math! |
the poll queue is still behind by 45m :/, but i'm hoping some of that was due to #490. i pushed out a change there (1ebfe1c) a few hours ago that adds a bunch of shortlink generator domains to the blacklist and checks the blacklist before searching for a domain, so i'm hoping that will help some too. |
tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them. |
Does brid.gy also turn @ mentions to my twitter username to webmentions to my domain? That would be similar to this and very nice |
@singpolyma not right now, but that's an interesting feature request. just to confirm, you're proposing they'd be sent to your front page, e.g. |
@snarfed yes. or whatever URL is on my twitter profile |
i currently craft search queries by stripping scheme (ie http://), putting quotes around the remaining domain and path, and ORing all of those together, e.g. i added the scheme back to G+ searches in 485af73, and it looks like that cut out the false positives but didn't add any false negatives. still working on Twitter. here's some research so far for the example domain
hrmph. |
i'm now thinking about still using the |
Filtering false positives seems like an essential thing to do. Trying to get as much as possible is probably the best, then filter after |
i wish! sadly many users' domains are common words, or have common words in them, so their false positive rate can be 1K:1 or even 1M:1 for domains with words like blog or web. :/ and bridgy is approaching 1k twitter users, so I'd like to try to cut down that workload (and cost) a bit. |
filter out common words and only search for the unique part maybe? |
oh boy, and now i'm in the business of maintaining a stop word list and search query rewriter. :P you're definitely right, it's doable, i'm just not sure i want to take that plunge... |
Sorry. Was a thought |
np! definitely appreciated. 👬 |
spun out of #51. from #51 (comment):
silo support for this is mixed:
/search/tweets.json?q=
/activities?query=
The text was updated successfully, but these errors were encountered: