Predict the S&P500 Index¶

The goal is to try to compute the S&P500 with indicators.

I will train a model from 1950-2012 and make predictions from 2013-2015.

In [44]:

import pandas as pd
import numpy as np
from datetime import datetime

In [45]:

df = pd.read_csv('YAHOO-INDEX_GSPC.csv')

Reading Data¶

In [46]:

df["Date"] = pd.to_datetime(df["Date"])
df.sort(columns = ["Date"], ascending = True, inplace = True)
df.head()

/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:3: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  app.launch_new_instance()

Out[46]:

	Date	Open	High	Low	Close	Volume	Adjusted Close
16847	1950-01-03	16.66	16.66	16.66	16.66	1260000.0	16.66
16846	1950-01-04	16.85	16.85	16.85	16.85	1890000.0	16.85
16845	1950-01-05	16.93	16.93	16.93	16.93	2550000.0	16.93
16844	1950-01-06	16.98	16.98	16.98	16.98	2010000.0	16.98
16843	1950-01-09	17.08	17.08	17.08	17.08	2520000.0	17.08

Indicators¶

I want to teach the model how to predict the current price from historical prices

We should not include current price in the indicator otherwise it will be impossible to predict a future index

In [78]:

df['mean_5day'] = pd.rolling_mean(df['Close'], window = 5).shift(1)
df['mean_30day'] = pd.rolling_mean(df['Close'], window = 30).shift(1)
df['mean_365day'] = pd.rolling_mean(df['Close'], window = 365).shift(1)
#I use shift because rolling include the current price so I assign mean to the next row.

/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with 
	Series.rolling(window=5,center=False).mean()
  if __name__ == '__main__':
/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:2: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with 
	Series.rolling(window=30,center=False).mean()
  from ipykernel import kernelapp as app
/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:3: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with 
	Series.rolling(window=365,center=False).mean()
  app.launch_new_instance()

Filter Data¶

In [82]:

print(stocks.iloc[365])

Date              1951-06-19 00:00:00
Open                            22.02
High                            22.02
Low                             22.02
Close                           22.02
Volume                        1.1e+06
Adjusted Close                  22.02
Name: 16482, dtype: object

In [83]:

filtered_df = df[df["Date"] >= datetime(year=1951, month=6, day=19)]
filtered_df.head()

Out[83]:

	Date	Open	High	Low	Close	Volume	Adjusted Close	mean_5day	mean_30day	mean_365day
16482	1951-06-19	22.020000	22.020000	22.020000	22.020000	1100000.0	22.020000	21.800	21.703333	19.447726
16481	1951-06-20	21.910000	21.910000	21.910000	21.910000	1120000.0	21.910000	21.900	21.683000	19.462411
16480	1951-06-21	21.780001	21.780001	21.780001	21.780001	1100000.0	21.780001	21.972	21.659667	19.476274
16479	1951-06-22	21.549999	21.549999	21.549999	21.549999	1340000.0	21.549999	21.960	21.631000	19.489562
16478	1951-06-25	21.290001	21.290001	21.290001	21.290001	2440000.0	21.290001	21.862	21.599000	19.502082

1951-06-19 correspond to the date we get the mean of 365 days.

Split Data for Training and Test¶

In [111]:

df_drop_na = filtered_df.dropna(axis = 0)

train = df_drop_na[df["Date"] < datetime(year=2013, month=1, day=1)]
test = df_drop_na[df["Date"] >= datetime(year=2013, month=1, day=1)]

/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:3: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  app.launch_new_instance()
/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: UserWarning: Boolean Series key will be reindexed to match DataFrame index.

Prediction¶

In [112]:

from sklearn.linear_model import LinearRegression
import math
model = LinearRegression()
x = ['mean_5day', 'mean_30day', 'mean_365day']
y = ['Close']

model.fit(train[x], train[y])
predictions = model.predict(test[x])

#I use Mean Absolute error to show how close I was to the index.
mae = np.sum(abs(predictions - test[y]))/len(predictions)
print(mae)

Close    16.621037
dtype: float64

In [101]:

print(model.score(train[x], train[y]))

0.999550132408

Addition of Standard Deviation Indicator to increase the prediction¶

In [113]:

df_drop_na['std_5day'] = pd.rolling_std(df['Close'], window = 5).shift(1)

train = df_drop_na[df["Date"] < datetime(year=2013, month=1, day=1)]
test = df_drop_na[df["Date"] >= datetime(year=2013, month=1, day=1)]

x = ['mean_5day', 'mean_30day', 'mean_365day', 'std_5day']
y = ['Close']

model.fit(train[x], train[y])
predictions = model.predict(test[x])

#I use Mean Absolute error to show how close I was to the index.
mae = np.sum(abs(predictions - test[y]))/len(predictions)
print(mae)

Close    16.617805
dtype: float64

/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: pd.rolling_std is deprecated for Series and will be removed in a future version, replace with 
	Series.rolling(window=5,center=False).std()
  if __name__ == '__main__':
/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:3: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  app.launch_new_instance()
/Users/comalada/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: UserWarning: Boolean Series key will be reindexed to match DataFrame index.