-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathtwitter-scraper-install-notes.txt
39 lines (36 loc) · 2.04 KB
/
twitter-scraper-install-notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ sudo apt-get update
$ sudo apt-cache search pip
$ sudo apt-get install python-pip
$ sudo pip install twarc
# 1) sign up for twitter if you don't have an account
# I needed to include a phone number in my profile before
# I could register an app, this involved getting a verification
# code via text and entering it on twitter.
#
# 2) Register the app with Twitter: http://apps.twitter.com
# When the app is registered go to the "Keys and Access Tokens"
# tab and generate an access token. Once this is done note the following:
# Consumer Key (e.g. 6q3avm7iWk9YqGKzWAYWuDJIq)
# Consumer Secret (e.g. hHReWGbOfnYazHlOkJeInR13k1adqRumHaInnVth0on7qJ31LN)
# Access Token (e.g. 567223483-Ew5zX3w4N3t9Cv6YFx1rfgIk4Wkj1r9PzwYPeWKk)
# Access Secret (e.g. LtOtuTNsjQrIlwqLibBRaC0z6WSgkd0Kw0WKD3Mr4mCFH)
# Under the "Permissions" tab set the access to "Read only".
# Return to your terminal and run twarc.py. On the first run the tool
# will ask for the keys and secrets generated. Copying and
# pasting these directly from the twitter page is the best way to pass
# these. NOTE: This will produce an error. The tool is complaining
# because it doesn't have enough information to actually run. This
# is fine because all we care about is having the tool generate a
# .twarc file in the home directory and save the keys to that file
# in the correct format.
#OK, look at the Docs for twarc here: https://github.com/DocNow/twarc
$ twarc configure
#seems to work, get streams of infor print to terminal until I hit ctrl+c
#the rest doesn't really work, maybe it did at one point, but it doesn't now.
$ mkdir tweets
$ twarc search ponies >tweets/search-ponies.json
$ sudo apt-get install git
$ git clone https://github.com/recrm/ArchiveTools
$ ./ArchiveTools/json-extractor.py -path tweets/ created_at retweet_count favorite_count text
#this creates a csv file containing the data from the tweet search I did on ponies, pulling out the data the tweet was created and the tweet text.
#seems there is some issue with how the tweet text is dealt with though