for date in stock_features:
for user in stock_features[date]:
user_weight = weight(user)
After transformation, all features are scaled appropriately so that feature values are standardized between all stocks. Each feature is both standardized and normalized to a scalar between 0 and 1. After being standardized, each feature is weighted based on the bull or bear sentiment of the corresponding tweet. If wa and wn are the determined weights for the associated sentiment and non-associated sentiment feature values, the final feature value is
Associated sentiment is defined as the the feature value (bull or bear) associated with the predicted sentiment and non-associated is the opposite feature value. For example, if a user tweeted bullish about a stock and their return features were
return_bull = 10and
return_bear = 20, using the weights
w_a = 0.8and
w_n = 0.2, the resulting feature for user return would be
0.8*10 + 0.2*20 = 12. If the user had tweeted bearish, the result would be
0.8*20 + 0.2*10 = 18.
After feature transformation, the user weighting function returns a normalized weight based on a given user prediction. This weight function is a linear combination of the user's features with their associated weight parameters.
After the user weight is computed, a corresponding time weight is calculated based on the tweeted/prediction time. This time weight is calculated by taking the time difference between 4pm on the current trading and the time of posting. This difference is then applied to a sigmoid function with the goal of giving more weight to tweets closer to 4pm.
def sigmoid(time_posted, curr_trading_day):
diff = time - curr_trading_day
x = diff + 5.2 # hours back where w = 0.5
return 1 / (1 + math.exp(-x))
The final tweet weight for each user is then defined as:
This final tweet weight for each user is first added to a bull and bear raw sum for each stock per day. The total raw weight per day is then computed by subtracting the raw bear sum from the raw bull sum based on pre-defined bull and bear weights.
On some days, a stock may not receive many tweets, making a stock's total weight skewed by only a few users. A cutoff c is added where any stock with less than c tweets will not be considered. Below is a time series example of the raw computed stock weights for ticker BYND each day for a pre-defined date range.
Raw stock features ($BYND)
The computed stock weights for each day are stored along with their historical average and standard deviation. By using historical weights n days before the current trading day, the stock weights are standardized. These weights are then mapped to their deviations from each stock's n days mean.
Standardized stock features
Using each stock's daily weight, backtesting can be done alongside historical stock price data. First, on each trading day, a list of standardized stock weights are compiled. An example object below:
A final cutoff c is applied each trading day to all stocks by only considering stocks that satisfy the inequality,
abs(deviation) > c. Below are same standardized weights for $BYND but with the cutoff visualized.
Standardized stock weights w/ cutoffs
Then, accuracy and theoretical returns can be calculated by comparing direction of stock deviation (positive/negative) with the stock's next day price movement from close to open.
Daily price % change ($BYND)
The red dots indicate the days the stock was chosen and the corresponding percent price change.