Stock tweets

To begin backtesting, stock tweets per day must be stored locally given a range of dates. While there are around 5000 stocks being monitored and scraped from stocktwits, many of these stocks don't receive a significant number of tweets on a daily basis. Therefore, for each trading day, only the top 100 most tweeted about stocks are saved for backtesting. After determining the daily top stocks, the tweets must be queried for by date. An example query here finds all bullish/bearish AAPL tweets between March 1st and March 2nd.

{
  {'symbol': 'AAPL'},
  {"$or": [
    {'isBull': True},
    {'isBull': False}
  ]},
  {'time': 
    {
      '$gte': datetime.datetime(2019, 3, 1),
      '$lt': datetime.datetime(2019, 3, 2)
    }
  }
}

Once fetched, each stock's tweets can be efficiently accessed based on the following example data structure.

{
    '2019-03-01': [
        [True, 'user1', '2019-03-01 13:40:00'],
        [True, 'user2', '2019-03-01 14:42:00'],
        [False, 'user3', '2019-03-01 14:53:00'],
        ...
    ],
    ...
}

The data for each stock is then stored in the stock_files folder as pickle files (ex. stock_files/AAPL.pkl).

Last updated