Try to Predict S&P500
Posted on Dim 23 septembre 2018 in Machine Learning
Predict the S&P500 Index¶
The goal is to try to compute the S&P500 with indicators.
I will train a model from 1950-2012 and make predictions from 2013-2015.
In [44]:
import pandas as pd
import numpy as np
from datetime import datetime
In [45]:
df = pd.read_csv('YAHOO-INDEX_GSPC.csv')
Reading Data¶
In [46]:
df["Date"] = pd.to_datetime(df["Date"])
df.sort(columns = ["Date"], ascending = True, inplace = True)
df.head()
Out[46]:
Indicators¶
I want to teach the model how to predict the current price from historical prices
We should not include current price in the indicator otherwise it will be impossible to predict a future index
In [78]:
df['mean_5day'] = pd.rolling_mean(df['Close'], window = 5).shift(1)
df['mean_30day'] = pd.rolling_mean(df['Close'], window = 30).shift(1)
df['mean_365day'] = pd.rolling_mean(df['Close'], window = 365).shift(1)
#I use shift because rolling include the current price so I assign mean to the next row.
Filter Data¶
In [82]:
print(stocks.iloc[365])
In [83]:
filtered_df = df[df["Date"] >= datetime(year=1951, month=6, day=19)]
filtered_df.head()
Out[83]:
1951-06-19 correspond to the date we get the mean of 365 days.
Split Data for Training and Test¶
In [111]:
df_drop_na = filtered_df.dropna(axis = 0)
train = df_drop_na[df["Date"] < datetime(year=2013, month=1, day=1)]
test = df_drop_na[df["Date"] >= datetime(year=2013, month=1, day=1)]
Prediction¶
In [112]:
from sklearn.linear_model import LinearRegression
import math
model = LinearRegression()
x = ['mean_5day', 'mean_30day', 'mean_365day']
y = ['Close']
model.fit(train[x], train[y])
predictions = model.predict(test[x])
#I use Mean Absolute error to show how close I was to the index.
mae = np.sum(abs(predictions - test[y]))/len(predictions)
print(mae)
In [101]:
print(model.score(train[x], train[y]))
Addition of Standard Deviation Indicator to increase the prediction¶
In [113]:
df_drop_na['std_5day'] = pd.rolling_std(df['Close'], window = 5).shift(1)
train = df_drop_na[df["Date"] < datetime(year=2013, month=1, day=1)]
test = df_drop_na[df["Date"] >= datetime(year=2013, month=1, day=1)]
x = ['mean_5day', 'mean_30day', 'mean_365day', 'std_5day']
y = ['Close']
model.fit(train[x], train[y])
predictions = model.predict(test[x])
#I use Mean Absolute error to show how close I was to the index.
mae = np.sum(abs(predictions - test[y]))/len(predictions)
print(mae)