There is still a way to go but so far in my attempt to write a twitter monitoring program I have managed to get a program to listen to the twitter feed for a list of keywords, and summarise them after the program exits.
So lets walk through this section by section (my apologies for the lack of comments in the code, but its just a hacky test so far) so to begin with we include a few libraries: time, to get the current time stamp; datetime, to help turn the time stamps into something readable; Twython is the python twitter API, allowing us to get access the tweets as they come in.
Next up we define a few variables, first up is our search term list, then the keys that twitter need in order to access the API. APP_KEY
and APP_SECRET
are used in order to identify the program that you are using to access the twitter feed, we could probably get away with just this as we are not sending tweets but OAUTH_TOKEN
and OAUTH_TOKEN_SECRET
would allow me in theory to get the program to tweet its results from my account. I will write up how to get these keys for you another time. Next in our variable list the program stores when it was started, and the number of seconds it is to run for, in this case it is set to a 24 hours.
So now we get onto the parts which in fact talk to twitter, so we create the class MyStreamer, which as a child class of TwythonStreamer. We then create two functions in this class, on_success
and on_error
.
on_success
first checks that we are still within the time frame we wanted to use, and if not disconnecting the program from the twitter stream. after this it checks that the tweet actually has some text, and prints out the tweet to the screen. We then capture the data we wanted from the tweet, such as the users screen name, the text of the tweet, and when the tweet was sent and save the data into the array tweets
. In order to make the timestamp more user friendly we use datetime.fromtimestamp()
to interpret twitters time stamp, remembering that twitter stores its timestamps in ms rather than s.
on_error
is the code that is run when there is a problem with the twitter feed, such as non internet connection. This is set up to simply print the error code out to the screen and then disconnect the twitter stream.
Next up we define our own function finishsup()
. This code is designed to process the contents of tweets
and display them to the user, eventually via email but for now simply on the screen.
The first output from finishup()
is to display the contents of tweets in the order in which they were captured in a human friendly way. We then run through and display for each search term, which of the tweets match, counting the number as we go, summarising the results at the end.
The final output is to look for tweets which use more than one of our search terms. To do this, we first create a loop over all terms, and inside this loop, we make another loop over all terms. We then check that both values of the loop are not the same, then search over all stored tweets to see if the contain both search terms, and if both are present we display the tweet.
This is a very early stage project, but bellow is the python code that i have running so far. Let me know if spot anything that can be done better, or something silly I have overlooked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
import time from datetime import datetime from twython import Twython from twython import TwythonStreamer #Search terms TERMS = ['#GaN','#gan', '#physics','#Physics', '#LEDs','#LED','#journorequest','#science','#manchester','#Manchester','#cambridge','#Cambridge'] APP_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxx' APP_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OAUTH_TOKEN = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' OAUTH_TOKEN_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' starttime=time.time() runtime=60*60*24 class MyStreamer(TwythonStreamer): def on_success(self, data): if time.time() < starttime+runtime: if 'text' in data: print data['text'] tweets.append([ data['user']['screen_name'], data['text'].encode('utf-8'), datetime.fromtimestamp(int(data['timestamp_ms'])/1000.0)]) else: self.disconnect() def on_error(self, status_code, data): print status_code self.disconnect() def finishup(): stream.disconnect() print '\n\n' for tweet in tweets: print 'User "', tweet[0], '" Said "', tweet[1], '" at ', tweet[2] ,'\n' results = [] print '\nTweets by Term\n' for term in TERMS: result = 0 print term,'\n' for tweet in tweets: if term in tweet[1]: print tweet[1],'\n' result +=1 results.append(result) i = 0 print '\nResults per Term\n' for term in TERMS: print term, '\t' , results[i] i += 1 print '\nResult pairs\n' for term in TERMS: for term2 in TERMS: if(term2 != term): for tweet in tweets: if term in tweet[1]: if term2 in tweet[1]: print term, term2, '\n', tweet[1] tweets = [] print starttime,starttime + runtime try: stream = MyStreamer(APP_KEY, APP_SECRET,OAUTH_TOKEN, OAUTH_TOKEN_SECRET) stream.statuses.filter(track=TERMS) finishup() except KeyboardInterrupt: finishup() |