Time Series Analysis of Stock Market Data with Python

From Raw Prices to Actionable Financial Insights

Time series analysis of stock market data is the cornerstone of quantitative finance and algorithmic trading. Whether you’re a data enthusiast or an aspiring quant, understanding how stock prices evolve over time is essential. In this project, we walk through a complete Python pipeline that fetches three years of historical data for Reliance Industries (RELIANCE.NS) using the yfinance library. You will learn to visualize price trends, compute moving averages, detect buy/sell signals, analyze monthly returns, decompose seasonal patterns, and calculate key risk metrics like volatility and maximum drawdown.

The techniques demonstrated here form the foundation for backtesting trading strategies and assessing portfolio risk. Moreover, they illustrate how pandas, matplotlib, and statsmodels work in harmony to transform raw price data into meaningful intelligence. If you’re new to predictive modeling, you might also appreciate our guide on House Price Prediction with Linear Regression.

About the Dataset

We source live data from Yahoo Finance via yfinance. The dataset spans three full years of daily Open, High, Low, Close, and Volume values. For this analysis, we focus on the adjusted closing price, which accounts for corporate actions like dividends and stock splits. The data is stored in a pandas DataFrame, making manipulation and visualization seamless.

What You Will Learn

Fetching stock data programmatically with yfinance.
Plotting time series and overlaying moving averages.
Identifying buy/sell signals using SMA crossovers.
Resampling data to monthly frequency and computing returns.
Analyzing return distributions to understand volatility.
Decomposing a time series into trend, seasonal, and residual components.
Calculating annualized return, volatility, and maximum drawdown.

Note: The code snippets are presented separately after each explanatory section. All analysis was performed in a Jupyter Notebook using Python 3.

Step 1: Data Acquisition with yfinance

The first task in time series analysis of stock market data is obtaining reliable historical prices. The yfinance library offers a clean interface to download data from Yahoo Finance. After installation, we create a Ticker object for RELIANCE.NS (the NSE ticker for Reliance Industries). The history(period=”3y”) method returns a pandas DataFrame containing daily OHLCV data for the last three years. We then strip timezone information from the index to simplify subsequent plotting.

This step ensures we have a clean DataFrame ready for analysis. The resulting dataset includes columns like Open, High, Low, Close, and Volume. For the remainder of this project, we will work primarily with the adjusted closing price. For readers interested in other data sources, the pandas-datareader library provides similar functionality.

Code Preview (what follows): Installing yfinance, importing libraries, fetching data, and cleaning the index.

!pip install yfinance pandas matplotlib statsmodels numpy

# pip install yfinance pandas matplotlib statsmodels numpy
import yfinance as yf
import pandas as pd

ticker = yf.Ticker("RELIANCE.NS")
df = ticker.history(period="3y")
df.index = df.index.tz_localize(None)

!pip install yfinance pandas matplotlib statsmodels numpy

# pip install yfinance pandas matplotlib statsmodels numpy
import yfinance as yf
import pandas as pd

ticker = yf.Ticker("RELIANCE.NS")
df = ticker.history(period="3y")
df.index = df.index.tz_localize(None)

Step 2: Plotting the Closing Price Over Time

Visualization is the first step in any time series analysis of stock market data. A line chart of the closing price reveals the overall trajectory: long‑term growth punctuated by occasional dips. We use matplotlib to create a clean, informative plot. The chart immediately shows that Reliance has trended upward over the three‑year window, though several sharp corrections are visible—likely corresponding to market‑wide events.

This visual context is invaluable before diving into numerical metrics. It also confirms that the price series is non‑stationary, which justifies the use of moving averages in later steps. If you’re new to exploratory data analysis, our Brazilian E‑Commerce EDA covers similar visualization techniques in a different domain.

Code Preview (what follows): Plotting the closing price column with matplotlib, adding labels and a title.

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['Close'], label='RELIANCE.NS Close Price')
plt.title('Closing Price Time Series')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['Close'], label='RELIANCE.NS Close Price')
plt.title('Closing Price Time Series')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

Step 3: Smoothing Price Data with Moving Averages

Raw daily prices are noisy; moving averages smooth out short‑term fluctuations and highlight longer‑term trends. In this step, we compute two simple moving averages (SMAs): a 20‑day SMA (representing about one month of trading) and a 50‑day SMA (about one quarter). These are added as new columns to the DataFrame using pandas’ rolling() method followed by mean().

The 20‑day average reacts more quickly to price changes, while the 50‑day average provides a slower, more stable baseline. Later, we will use the interaction between these two lines to generate trading signals. This technique is a fundamental component of time series analysis of stock market data and is widely used by technical analysts.

Code Preview (what follows): Creating SMA_20 and SMA_50 columns with df[‘Close’].rolling(window).mean().

df['SMA_20'] = df['Close'].rolling(window=20).mean()
df['SMA_50'] = df['Close'].rolling(window=50).mean()

df['SMA_20'] = df['Close'].rolling(window=20).mean()
df['SMA_50'] = df['Close'].rolling(window=50).mean()

Step 4: Generating Buy and Sell Signals from SMA Crossovers

A classic trading heuristic is the SMA crossover: when the faster (20‑day) average crosses above the slower (50‑day) average, it may signal an uptrend (a “buy” signal). Conversely, a cross below suggests a downtrend (a “sell” signal). We generate a binary Signal column (1 when SMA_20 > SMA_50, else 0) and then use diff() to detect crossover points. These points are overlaid on the price chart as green up‑arrows (buy) and red down‑arrows (sell).

This visualization makes it easy to evaluate the effectiveness of the crossover strategy at a glance. While not a guarantee of future performance, this method helps traders time entries and exits. For those interested in building predictive models, our Titanic Survival Prediction project demonstrates similar binary classification logic.

Code Preview (what follows): Creating the Signal column, identifying crossover positions, and plotting arrows on the chart.

import numpy as np

# Determine signals and crossover positions
df['Signal'] = np.where(df['SMA_20'] > df['SMA_50'], 1.0, 0.0)
df['Position'] = df['Signal'].diff()

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['Close'], label='Close', alpha=0.5)
plt.plot(df.index, df['SMA_20'], label='20-day SMA')
plt.plot(df.index, df['SMA_50'], label='50-day SMA')

# Plotting the signals
plt.plot(df[df['Position'] == 1].index, df['SMA_20'][df['Position'] == 1], '^', markersize=10, color='green', label='Buy')
plt.plot(df[df['Position'] == -1].index, df['SMA_20'][df['Position'] == -1], 'v', markersize=10, color='red', label='Sell')

plt.title('SMA Crossover Signals')
plt.legend()
plt.show()

import numpy as np

# Determine signals and crossover positions
df['Signal'] = np.where(df['SMA_20'] > df['SMA_50'], 1.0, 0.0)
df['Position'] = df['Signal'].diff()

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['Close'], label='Close', alpha=0.5)
plt.plot(df.index, df['SMA_20'], label='20-day SMA')
plt.plot(df.index, df['SMA_50'], label='50-day SMA')

# Plotting the signals
plt.plot(df[df['Position'] == 1].index, df['SMA_20'][df['Position'] == 1], '^', markersize=10, color='green', label='Buy')
plt.plot(df[df['Position'] == -1].index, df['SMA_20'][df['Position'] == -1], 'v', markersize=10, color='red', label='Sell')

plt.title('SMA Crossover Signals')
plt.legend()
plt.show()

Step 5: Aggregating to Monthly Frequency

Daily data is granular, but investors often think in terms of months. Time series analysis of stock market data frequently involves resampling to a lower frequency. Using pandas’ resample(‘ME’).mean() (where ‘ME’ denotes month‑end), we aggregate daily prices to monthly averages. Next, we compute the percentage change from one month to the next using pct_change().

A bar chart of monthly returns provides a quick visual summary. Positive bars indicate months where the stock gained, while negative bars show losses. This view helps identify seasonal patterns or periods of heightened volatility. For instance, you might notice that certain quarters consistently outperform others—an insight useful for tactical allocation.

Code Preview (what follows): Resampling to month‑end averages, calculating monthly returns, and generating a bar plot.

# Note: pandas uses 'ME' for Month-End frequency in newer versions
monthly_data = df.resample('ME').mean()
monthly_data['Monthly_Return'] = monthly_data['Close'].pct_change()

plt.figure(figsize=(12, 6))
plt.bar(monthly_data.index, monthly_data['Monthly_Return'], width=20)
plt.title('Monthly Returns')
plt.xlabel('Date')
plt.ylabel('Return')
plt.show()

# Note: pandas uses 'ME' for Month-End frequency in newer versions
monthly_data = df.resample('ME').mean()
monthly_data['Monthly_Return'] = monthly_data['Close'].pct_change()

plt.figure(figsize=(12, 6))
plt.bar(monthly_data.index, monthly_data['Monthly_Return'], width=20)
plt.title('Monthly Returns')
plt.xlabel('Date')
plt.ylabel('Return')
plt.show()

Step 6: Understanding Volatility Through Return Distribution

Volatility measures the degree of variation in returns. A common way to visualize it is through a histogram of daily returns. We calculate daily percentage changes with pct_change() and then drop missing values. Plotting a histogram with 50 bins reveals the distribution’s shape, central tendency, and tails.

In a typical stock, daily returns approximate a normal distribution but exhibit fat tails— meaning extreme moves occur more often than a true normal distribution would predict. This phenomenon is crucial for risk management. If you’ve explored regression models in our House Price Prediction project, you’ll recognize the importance of understanding residual distributions.

Code Preview (what follows): Computing daily returns and generating a histogram with 50 bins.

df['Daily_Return'] = df['Close'].pct_change()

plt.figure(figsize=(10, 6))
plt.hist(df['Daily_Return'].dropna(), bins=50, edgecolor='black')
plt.title('Daily Return Distribution')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.show()

df['Daily_Return'] = df['Close'].pct_change()

plt.figure(figsize=(10, 6))
plt.hist(df['Daily_Return'].dropna(), bins=50, edgecolor='black')
plt.title('Daily Return Distribution')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.show()

Step 7: Decomposing Trend, Seasonality, and Residuals

Time series analysis of stock market data often requires separating the signal into underlying components. The seasonal_decompose function from statsmodels breaks the series into three parts: trend (long‑term direction), seasonal (repeating patterns), and residual (unexplained noise). We use a multiplicative model with a period of 252 (approximate trading days in a year).

The decomposition plot reveals that Reliance’s price exhibits a strong upward trend. The seasonal component shows subtle but consistent fluctuations—perhaps related to quarterly earnings cycles or festival‑driven market activity in India. Meanwhile, the residuals capture random shocks that cannot be attributed to trend or seasonality. This breakdown is essential for building forecasting models like ARIMA.

Code Preview (what follows): Dropping NaNs, applying seasonal decomposition with period=252, and displaying the four‑panel plot.

from statsmodels.tsa.seasonal import seasonal_decompose

# Dropping NaNs to avoid statsmodels errors
close_prices = df['Close'].dropna()
decomposition = seasonal_decompose(close_prices, model='multiplicative', period=252)

fig = decomposition.plot()
fig.set_size_inches(12, 8)
plt.show()

from statsmodels.tsa.seasonal import seasonal_decompose

# Dropping NaNs to avoid statsmodels errors
close_prices = df['Close'].dropna()
decomposition = seasonal_decompose(close_prices, model='multiplicative', period=252)

fig = decomposition.plot()
fig.set_size_inches(12, 8)
plt.show()

Step 8: Quantifying Risk and Reward

The final step in our time series analysis of stock market data is calculating summary statistics that investors care about most. We compute three key metrics:

Annualized Average Return: The mean daily return multiplied by 252 (trading days).
Annualized Volatility: The standard deviation of daily returns multiplied by the square root of 252.
Maximum Drawdown: The largest peak‑to‑trough decline in cumulative returns, reflecting worst‑case loss.

To compute drawdown, we first build a cumulative return series. Then we track the running maximum (peak) and calculate the percentage drop from that peak at each point. The minimum of these percentages is the maximum drawdown. Finally, we annotate a cumulative returns plot with these statistics, providing a concise dashboard of performance.

For Reliance over this period, the annualized return is positive, indicating growth. Volatility remains moderate, while the maximum drawdown shows the largest temporary loss an investor would have endured. Such metrics are standard in portfolio analysis and are directly comparable across assets. To see how regression can also be used for financial modeling, revisit our House Price Prediction guide.

Code Preview (what follows): Calculating annualized return and volatility, computing maximum drawdown, and plotting cumulative returns with annotation.

# Calculations
avg_return = df['Daily_Return'].mean() * 252
volatility = df['Daily_Return'].std() * np.sqrt(252)

cumulative_returns = (1 + df['Daily_Return']).cumprod()
peak = cumulative_returns.cummax()
max_drawdown = ((cumulative_returns - peak) / peak).min()

# Plotting and annotating
plt.figure(figsize=(12, 6))
plt.plot(cumulative_returns, label='Cumulative Returns')
plt.title('Performance with Key Stats')

# Annotation
stats_text = (f"Avg Annual Return: {avg_return*100:.2f}%\n"
              f"Annual Volatility: {volatility*100:.2f}%\n"
              f"Max Drawdown: {max_drawdown*100:.2f}%")

plt.annotate(stats_text, xy=(0.02, 0.85), xycoords='axes fraction', 
             bbox=dict(boxstyle="round", fc="white", alpha=0.8))

plt.legend()
plt.show()

# Calculations
avg_return = df['Daily_Return'].mean() * 252
volatility = df['Daily_Return'].std() * np.sqrt(252)

cumulative_returns = (1 + df['Daily_Return']).cumprod()
peak = cumulative_returns.cummax()
max_drawdown = ((cumulative_returns - peak) / peak).min()

# Plotting and annotating
plt.figure(figsize=(12, 6))
plt.plot(cumulative_returns, label='Cumulative Returns')
plt.title('Performance with Key Stats')

# Annotation
stats_text = (f"Avg Annual Return: {avg_return*100:.2f}%\n"
              f"Annual Volatility: {volatility*100:.2f}%\n"
              f"Max Drawdown: {max_drawdown*100:.2f}%")

plt.annotate(stats_text, xy=(0.02, 0.85), xycoords='axes fraction', 
             bbox=dict(boxstyle="round", fc="white", alpha=0.8))

plt.legend()
plt.show()

📁 Complete project code available on GitHub: https://github.com/codeayan/Time-Series-Analysis-on-Stock-Market-Data