parameterizing dateinc field , change the default date inc #3

rush00121 · 2017-03-29T18:13:58Z

Instead of pulling data for every day , this will extend the search filter parameter to get a bigger date range. This should make the scraping process faster .

… to 50 days

bpb27 · 2017-03-29T18:19:20Z

In my experience, you actually get less results when you use an interval, as opposed to going day by day. Have you tried running this on a user with a lot of tweets (20K+) and compared the total with the interval method?

rush00121 · 2017-03-29T18:51:48Z

I was trying out to get tweets from realdonaldtrump .He has > 30k tweets. I refactored the code and ran it with a dateinterval of 50 days. It was way faster than if I get it for 1 day .I did not record metrics to prove this but this definitely sped up the scraping process for me.

bpb27 · 2017-03-29T18:54:58Z

By less results I mean you only get / collect 28K total tweets (w/ interval method) instead of the 30K total tweets (w/ day by day method).

rush00121 · 2017-03-29T18:59:57Z

I did not test the number of tweets scraped. Let me test it and see if both the results are the same or not .

rush00121 · 2017-03-29T22:39:59Z

I tested for dates from 2010-01-01 - 2017-03-01 .

For the previous code, I got results :
total tweet count: 26141

With my modifications, I got results :
total tweet count: 27357

I took a smaller sample size :

dates from 2017-01-01 - 2017-02-01 .

Both runs gave me total tweet count: 204

I am not sure why in the previous run , my code gave me more results. Is is a timeout issue with the twitter page or something else.

But in both cases, I got a significant speed improvement .

ryanbateman · 2017-07-06T17:21:01Z

I also ran this PR branch and got 14479 for 2010-01-01 - 2017-06-30. This may well be a facet of the loading of the pages on my machine - a factor which it seems would be an issue regardless - but definitely worth considering.

ryanbateman · 2017-07-07T13:29:31Z

Increasing the page load time to 2 netted 15687 ids for that same time period.

parameterizing dateinc field , change the default date inc from 1 day…

4686e94

… to 50 days

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parameterizing dateinc field , change the default date inc #3

parameterizing dateinc field , change the default date inc #3

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

bpb27 commented Mar 29, 2017

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

bpb27 commented Mar 29, 2017

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

ryanbateman commented Jul 6, 2017

Uh oh!

ryanbateman commented Jul 7, 2017

Uh oh!

Uh oh!

parameterizing dateinc field , change the default date inc #3

Are you sure you want to change the base?

parameterizing dateinc field , change the default date inc #3

Uh oh!

Conversation

rush00121 commented Mar 29, 2017

Uh oh!

bpb27 commented Mar 29, 2017

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

bpb27 commented Mar 29, 2017

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

rush00121 commented Mar 29, 2017

Uh oh!

ryanbateman commented Jul 6, 2017

Uh oh!

ryanbateman commented Jul 7, 2017

Uh oh!

Uh oh!