Extracting Features/Patterns in a price data using Wavelet Transform(PART-1)

Abhishek Chikara
The Startup
Published in
5 min readJan 29, 2021

--

Not a Blog Just Writing Notes for myself :)

Why Not use FFT or Fourier Transform to extract patterns or components in stock price data instead of using Wavelet transform directly?

Fourier Transform will work very well when the frequency spectrum is stationary. That is, the frequencies present in the signal are not time-dependent; if a signal contains a frequency of x Hz this frequency should be present equally anywhere in the signal.
The more non-stationary/dynamic a signal is, the worse the results will be.

Our Stock Data is dynamic in nature. A much better approach for analyzing dynamic signals is to use the Wavelet Transform instead of the Fourier Transform.

What’s the Difference between Fourier Transform and Wavelet Transform?

So wavelet transform also transform the signal into its frequency domain, just like Fourie Transform The Difference is Fourier Transform has a very high resolution in the frequency domain and zero resolution in the time domain

while the output of wavelet transform has a high resolution in frequency as well as the time domain

In layman’s terms: A Fourier transform (FT) will tell you what frequencies are present in your signal. A wavelet transform (WT) will tell you what frequencies are present and where (or at what scale). If you had a signal that was changing in time, the FT wouldn’t tell you when (time) this has occurred. You can also think of replacing the time variable with a space variable, with a similar analogy.

So Using Fourier transform we get to know which frequencies are present in the signal but not at which time that frequency oscillates so How the scientist overcome this problem?

Short-Time Fourier Transform. In this approach, the original signal is split into several parts of equal length (which may or may not have an overlap) by using a sliding window before applying the Fourier Transform. The idea is quite simple: if we split our signal into 10 parts, and the Fourier Transform detects a specific frequency in the second part, then we know for sure that this frequency has occurred between 2/10 th and 3/10 th of our original signal.

The main problem with this approach is that you run into the theoretical limits of the Fourier Transform known as the uncertainty principle. The smaller we make the size of the window the more we will know about where a frequency has occurred in the signal, but less about the frequency value itself. The larger we make the size of the window the more we will know about the frequency value and less about the time.

The Wavelet Transform has:

for small frequency values a high resolution in the frequency domain, low resolution in the time- domain,
for large frequency values a low resolution in the frequency domain, high resolution in the time domain.

How does the Wavelet Transform work?

The Fourier Transform uses a series of sine-waves with different frequencies to analyze a signal. That is, a signal is represented through a linear combination of sine-waves.
The Wavelet Transform uses a series of functions called wavelets, each with a different scale. The word wavelet means a small wave, and this is exactly what a wavelet is.

So a sine wave is from -inf to +inf while a wavelet exist for a particular duration which has a zero mean

There are many types of Wavelets to choose from

import pywt
print(pywt.families(short=False))
['Haar', 'Daubechies', 'Symlets', 'Coiflets', 'Biorthogonal', 'Reverse biorthogonal',
'Discrete Meyer (FIR Approximation)', 'Gaussian', 'Mexican hat wavelet', 'Morlet wavelet',
'Complex Gaussian wavelets', 'Shannon wavelets', 'Frequency B-Spline wavelets', 'Complex Morlet wavelets']

So Wavelet has two main concept

1. Scaling: Scaling is the process of stretching and shrinking the signal in time. It is represented by φ(t/s) where φ(t) is signal while s is the scaling factor and corresponds to how much a signal is scaled in time and scale is inversely proportional to the frequency

  • So a signal with a large scale/window(stretch wavelet) and analyze ‘large’ features(which corresponds to lower freq) and then we look at the signal with smaller scales(shrunken wavelet) in order to analyze smaller features(which corresponds to higher freq).
  • A stretched wavelet helps with capturing slowly varying changes in the signal while a shrunken wavelet helps in capturing abrupt changes

2. Shifting: Shifting simply means advancing or delaying the onset of the wavelet along the length of the signal. A shifted wavelet is usually presented by φ(t-k). We need to shift the wavelet in order the find the feature we are looking for in the signal

Since the Wavelet is localized in time, we can multiply our signal with the wavelet at different locations in time. We start with the beginning of our signal and slowly move the wavelet towards the end of the signal. This procedure is also known as a convolution. After we have done this for the original (mother) wavelet, we can scale it such that it becomes larger and repeat the process.

How Wavelets can be helpful for extracting features from our price data?

  1. Denoising a series

2. determining the principal component of movement in the series

  • Denoising is accomplished by recomposing the series by summing up the components from the decomposition, less the last few highest frequency components. This denoised (or filtered) series, if chosen well, often gives a view of the core price process. Assuming continuation in the same direction can be used to extrapolate for a short period forward.

So let’s fetch MSFT price data and smooth the data using wavelet transform as a filter bank:

import yfinance as yf
import pywt
import numpy as np
import matplotlib.pyplot as plt
import copy
import pandas as pd
#define the ticker symbol
tickerSymbol = 'MSFT'
#get data on this ticker
tickerData = yf.Ticker(tickerSymbol)
#get the historical prices for this ticker
tickerDf = tickerData.history(period='1d', start='2010-1-1', end='2020-1-25')
composite_signal = tickerDf['Close'].values
composite_signal
def filter_bank(index_list, wavefunc='db4', lv=4, m=1, n=4, plot=False):

'''
WT: Wavelet Transformation Function
index_list: Input Sequence;

lv: Decomposing Level;

wavefunc: Function of Wavelet, 'db4' default;

m, n: Level of Threshold Processing

'''

# Decomposing
coeff = pywt.wavedec(index_list,wavefunc,mode='sym',level=lv) # Decomposing by levels,cD is the details coefficient
sgn = lambda x: 1 if x > 0 else -1 if x < 0 else 0 # sgn function
# Denoising
# Soft Threshold Processing Method
for i in range(m,n+1): # Select m~n Levels of the wavelet coefficients,and no need to dispose the cA coefficients(approximation coefficients)
cD = coeff[i]
Tr = np.sqrt(2*np.log2(len(cD))) # Compute Threshold
for j in range(len(cD)):
if cD[j] >= Tr:
coeff[i][j] = sgn(cD[j]) * (np.abs(cD[j]) - Tr) # Shrink to zero
else:
coeff[i][j] = 0 # Set to zero if smaller than threshold
# Reconstructing
coeffs = {}
for i in range(len(coeff)):
coeffs[i] = copy.deepcopy(coeff)
for j in range(len(coeff)):
if j != i:
coeffs[i][j] = np.zeros_like(coeff[j])

for i in range(len(coeff)):
coeff[i] = pywt.waverec(coeffs[i], wavefunc)
if len(coeff[i]) > len(index_list):
coeff[i] = coeff[i][:-1]

if plot:
denoised_index = np.sum(coeff, axis=0)
data = pd.DataFrame({'CLOSE': index_list, 'denoised': denoised_index})
data.plot(figsize=(10,10),subplots=(2,1))
data.plot(figsize=(10,5))

return coeff
coeff=filter_bank(composite_signal,plot=True)
fig, ax = plt.subplots(len(coeff), 1, figsize=(10, 20))
for i in range(len(coeff)):
if i == 0:
ax[i].plot(coeff[i], label = 'cA[%.0f]'%(len(coeff)-i-1))
ax[i].legend(loc = 'best')
else:
ax[i].plot(coeff[i], label = 'cD[%.0f]'%(len(coeff)-i))
ax[i].legend(loc = 'best')

The above function also return out Ordered list of coefficients arrays where n denotes the level of decomposition. The first element (cA_n) of the result is approximation coefficients array and the following elements (cD_n - cD_1) are details coefficients arrays.

In the next post, we will use other forms of Wavelet Transform to generate features for our price data

--

--