Time-series Auto-correlation Vs Partial Auto Correlation

Abhishek Chikara
Analytics Vidhya
Published in
4 min readFeb 17, 2021

--

Not A blog Just Writing Notes For Myself :)

Problem Statement: We have monthly data of a Stock and we are gonna predict the average monthly price of a stock by comparing previous months prices So before that, we need to see how much predictor variable is related to our dependent variable/output

Here’s the notation for our situation We have

Sti = Price of the Stock This Month

Sti_1 =Price of the Stock previous Month

Sti_2 = Price of the Stock 2 month ago

The most important concept in time series is that it helps us to predict the future behavior of the variable(stock) based on past experience(values)

In our Situation, we have to find the value of Sti (output) while we have 2 predictor variables Sti_1 and Sti_2

Whether to Check if the predictor variables can help us to predict the value, we have to check the correlation between them

Let’s take the Stock MSFT monthly closing Prices

MSFT LAST 3 Month Prices

So we have to find the correlation between the current month prices and the prices 2 months ago. In notation form, we have to find CoRR(Sti_2, Sti)

One way is to find the ACF(Autocorrelation) between them Sti_2, Sti

We can use pearsonr() SciPy function which is used to calculate the Pearson’s correlation coefficient

import pandas as pd
import pandas_datareader.data as web
data = web.get_data_yahoo('MSFT','01/31/2019','01/31/2021',interval='m')corr_data = pd.DataFrame(data.Close,index=data.index)
corr_data['t_2'] = corr_data['Close'].shift(2)
corr_data.dropna(inplace=True)
from scipy.stats import pearsonr
corr,_= pearsonr(corr_data.t_2,corr_data.Close)
print('Pearsons correlation: %.3f' % corr)
Pearsons correlation: 0.942

In theoretical terms, Auto Correlation(ACF) comprises two factors

  1. A Direct effect(Direct Route)
  2. An Indirect effect(Indirect Route)

Sti_2 price have some kind of direct effect on Sti_1 prices similarly Sti_1 prices have some direct effect on Sti. So we can say Sti_2 have some indirect effect on Sti prices via Sti_1 but there is also some kind of direct effect for Sti_2 prices on Sti price as well.

(In practical situation these effects may be caused by a variety of reasons linked to MSFT decision on their product or the management bringing out a new product in the market)

So Both of these components makes the Auto Correlation Function

As we can see the Pearsons correlation: 0.942 is quite high and that’s is because of the indirect effect component and we couldn’t able to assure that Sti_2 is a good predictor for Sti

So to find a direct relationship between them we use Partial AutoCorrelation

In Partial Correlation, it only looks at the direct effect of Sti_2 prices(predictor variable) on Sti(output)

So for that, we build a regression model

St = Φ21St_1 + Φ22 St_2 + εt

So the coefficient Φ22 gives us the direct effect of St_2 on St (Which is Pacf of k =2)

linReg = LinearRegression(fit_intercept=True)
linReg.fit(corr_data[['t_1','t_2']], corr_data.Close)
loadings = np.insert(linReg.coef_, 0, linReg.intercept_)
out = pd.DataFrame(loadings, columns=['regression model'])
out = out.transpose()
fullNames = ['Intercept'] + ['t_1','t-2']
out.columns = fullNames
Our Regression model intercepts and Coeff

So the Direct correlation between our St_2 and St prices is 0.243278

Another way of calculating the Partial Co-relation

Understanding Partial Auto-Correlation | by Sachin Date | Towards Data Science

Summary (Directly From the Blog)

how much of the variance in Ti_2 has not been explained by the variance in Ti_1) we do two things:

  1. Step 1: We fit a linear regression model (i.e. a straight line) to the distribution of T i(current price )versus Ti_1. This linear model will let us predict Ti(current price) from T_1). Conceptually, this linear model is allowing us to explain the variance in T as a function of the variance in Ti_1. But like all optimally fitted models, our model is not going to be able to explain all of the variance in Ti. This fact takes us to step 2.
  2. Step 2: In this step, we calculate the residual errors of the linear model that we built in Step 1. The residual error is the difference between the observed value of Ti and the value predicted by the model. We do this residue calculation for each value of Ti to get a time series of residuals. This residuals time series gives us what we are looking for. It gives us the amount of variance in Ti which cannot be explained by the variance in Ti-1, plus of course some noise.

To calculate the second variable in the correlation, namely the amount of variance in Ti_2) that cannot be explained by the variance in Ti_1), we execute steps 1 and 2 above in the context of Ti_2 and Ti-1 instead of respectively Ti and Ti_1). This gives us the residuals series we are seeking for variable 2.

The final step is to apply the formula for Pearson’s correlation coefficient to these two time series of residuals.

corr_data = pd.DataFrame(data.Close,index=data.index)
corr_data['t_1'] = corr_data['Close'].shift(1)
corr_data['t_2'] = corr_data['Close'].shift(2)
corr_data.dropna(inplace=True)
from sklearn.linear_model import LinearRegression
linReg = LinearRegression(fit_intercept=True)
model = linReg.fit(corr_data[['t_1']],corr_data.Close)corr_data['Predicted_Close|T_1'] =linReg.predict(corr_data[['t_1']])corr_data['Residuals_Close|T_1'] = corr_data['Close'] -corr_data['Predicted_Close|T_1']model = linReg.fit(corr_data[['t_1']],corr_data[['t_2']])corr_data['Predicted_t_2|t_1'] = linReg.predict(corr_data[['t_1']])
corr_data['Residuals_t_2|t_1'] = corr_data['t_2'] - corr_data['Predicted_t_2|t_1']
corr, _= pearsonr(corr_data['Residuals_Close|T_1'],corr_data['Residuals_t_2|t_1'])
corr
0.23979474295389985

--

--