Generate User Features

First save user tweets locally before generating user features:
With the 65,000 users that are stored, each user's sentiment on a stock can't be weighted equally. Instead, each user must be given a weight based on their past predictions. Below are a list of raw features that are extracted from historical user tweets. Each feature is only based on tweets the user has labeled bullish or bearish and all other tweets are ignored. Example user feature

1. Number of Overall Predictions

The number of unique predictions a user has made across all labeled tweets. A prediction is counted once per stock on any given day. For example, if a user tweets 3 times about AAPL in one day, this will only counted as one prediction.

2. Number of Correct Predictions

The number of unique predictions that a user predicted correctly. Correctness is based on the stock's price change from 4 PM to 9:30 AM the next trading day. If the user's sentiment and stock direction match, the prediction is counted as correct.

3. Percent Return

The theoretical total return a user would have received based on their historical tweet sentiments and % stock price changes. There are also variations of percent return to be explored.
  • weighted return - returns are weighted based on the time of tweeting from 4pm. The maximum weight is at 4pm and diminish as tweet time is farther from the end of the trading day.
  • logged return - returns are weighted based on the log sum of the number of tweets in a day for a stock
For backtesting, the user's features should only be based on historical data before the date of prediction. So for example, if a user tweeted bullish on May 20th, the user's features should be based on historical predictions before May 20th. This is done so that future tweets are not used for user weighting in backtesting. For this reason, a new set of user features must be stored whenever any feature is updated.

Notes about these features

  • Each feature is separated into a bull and bear sentiment based on the user's past predictions.
  • Each feature has an associated stock specific feature. The stock specific features are based on predictions made about each particular stock.
  • Each feature is based on user tweets up till 4pm on the current trading and is predicting for price movement at 9:30am the next trading day.