TL;DR: skip the reading and download TweetThief from GitHub to search for uncredited copies of your tweets.
Over the last year, I've participated in a number of Twitter chats. The National Cyber Security Alliance hosts Twitter conversations every couple of months, under the hashtags #ChatSTC (Stop. Think. Connect., their cyber awareness campaign slogan) and #ChatDPD (Digital Privacy Day). It's a great way to share information with people interested in security advice, as well as to learn from like-minded professionals.
During several of these chats, I've noticed an oddity: most of the participants contribute original thoughts to the conversation, or retweet pertinent comments to their own audiences. A couple of participants though appear to copy and paste the comments of others verbatim, with no credit given. They aren't retweeting someone else's thoughts, but are instead claiming them for their own.
For instance, Chilean software company XaviWare parroted several legitimate security professionals and companies during Twitter chats over the past few months. A few examples:
After taking a closer look, it appears virtually everything tweeted by XaviWare is a copy of something first posted by another user. This particular account has garnered nearly 2,000 followers without a shred of original content.
Even better, occasionally these accounts will plagiarize one another while plagiarizing original content. For example, Ukrainian user PasswordRandom stole a comment from John Borst, only to have XaviWare duplicate the act:
Given the randomness of what is copied, I suspect these are "Twitter bots," or software programs automatically tweeting, rather than live humans. I am not sure of their purpose though I can make a few guesses. One possibility is bots following people and tweeting random stuff in the hopes of building up their own following. A second possibility is that these are part of a "buy more followers" scam, in which bogus accounts build up a following, then sell their follows.
At first I thought it curious, but not particularly noteworthy. It never caught my attention when I had free time to ponder it. However, another well-traveled infosec Twitter personality commented on the same last week, when he noticed a reply from someone else to him was was copied by the same PasswordRandom account I had noticed earlier:
It seems perhaps tweet-stealing bots are common enough to annoy a few people - and thus something that might be worth playing around with.
And so play I did.
Today I am releasing TweetThief, an open-source Python project to search for copied tweets. Given a Twitter alias (I suggest your own, but frankly the script doesn't care), TweetThief retrieves recent tweets from that account, then searches to see if anyone else has said the exact same thing without crediting the original source.
A few caveats:usage: tweetthief.py [-h] -a TWITTER_ALIAS [-n NUMTWEETS] [-l] [-p PROXY] Find copied but not attributed tweets. TweetThief retrieves the most recent tweets from a specific Twitter user, then searches Twitter for other tweets that are exact copies but are not retweets. optional arguments: -h, --help show this help message and exit -a TWITTER_ALIAS, --alias TWITTER_ALIAS Twitter alias whose tweets to analyze -n NUMTWEETS, --numtweets NUMTWEETS Maximum number of tweets to analyze for specified Twitter user; default 20 -l, --loose-match Loose matching. By default, TweetThief matches only an exact copy of the full text of a tweet; loose matching will match if the copy contains the original tweet. For example, if the original tweet is "Help Me" and the copy is "Help Me Rhonda" TweetThief normally will not report this as a match, but in loose-match mode it will. -p PROXY, --proxy PROXY HTTPS proxy to use, if necessary, in the form of https://proxy.com:port
- TweetThief requires the Tweepy module for Python, a Python module that provides hooks into the Twitter API. Tweepy may be found on GitHub.
- TweetThief also requires an access token for the Twitter API. Refer to documentation at https://dev.twitter.com/oauth/overview/application-owner-access-tokens, and set up your own app-specific tokens at https://apps.twitter.com
- The Twitter API limits the number of queries you can do in a 15-minute period; since TweetThief retrieves tweets and then performs a search for each individual tweet, practically speaking it is limited to about the 200 most recent tweets.
- The default setting for TweetThief is to look only for an exact copy of a tweet. For instance, if you tweeted "Help Me" and someone else tweeted "Help Me Rhonda", TweetThief does not consider that to be a match. However, the "-loose-match" option loosens up the search strictness and will match tweets that contain your tweet.
- The Twitter API search module serves up only tweets from the last week or so, and is "tuned" toward most relevant results rather than most complete results. I have found some limited cases where the search API does not return results for a search, even though I created a duplicate tweet to search for.
So have at it. Download TweetThief from GitHub and explore your own tweets. Do what you will with any parrots you find: ignore them, block them, report them as abuse / spambots.
Is there a pattern to the copied tweets? Are certain individuals copied more often? Are the offenders related to "buy more followers" scams? If you find anything interesting, I'd love to hear about it, either in the comments below, or by tweeting to me (@dnlongen).
Updated September 29: I'm adding a few reference links below that could come in handy for anyone looking at this and wanting to do more with Python and Twitter:
- Tweepy, the Twitter API module for Python
- Tweepy documentation
- Tweepy User Object structure (via "tkang's blog")
- Tweepy Status Object structure (via "tkang's blog")
- Twitter API native documentation
- Twitter API authentication token documentation