Welcome to Day 28 of the 30 Days of Data Science series! 🎉 Today, we will explore Time Series Forecasting, one of the most critical techniques in data science used for analyzing sequential data over time. We'll cover key concepts and popular models like ARIMA and Prophet. By the end of this lesson, you’ll have the tools to forecast future trends and patterns effectively.
- 📚 Introduction to Time Series Forecasting
- 📈 Understanding Time Series Data
- 🔮 ARIMA (AutoRegressive Integrated Moving Average)
- 📜 Seasonal Decomposition of Time Series (STL)
- 📦 SARIMA (Seasonal ARIMA)
- 🌍 Prophet: Time Series Forecasting Made Easy
- 🧠 LSTM (Long Short-Term Memory Networks)
- ✍️ Practice Exercise
- 📝 Summary
Time series forecasting predicts future values based on previously observed data. It is widely used in areas like:
- Finance: Stock price prediction 📈
- Weather Forecasting: Temperature and rainfall prediction 🌧️
- Retail: Sales forecasting 🛒
Forecasting allows businesses and researchers to plan effectively and make informed decisions.
A time series is a sequence of data points collected or recorded at regular time intervals.
- Trend: Overall upward or downward movement over time.
- Seasonality: Regular patterns that repeat over a fixed period.
- Cyclic Patterns: Long-term fluctuations not tied to seasonality.
- Noise: Random variations or outliers in data.
import pandas as pd
import matplotlib.pyplot as plt
# Sample data
data = {
'Date': pd.date_range(start='2023-01-01', periods=12, freq='M'),
'Sales': [200, 220, 250, 270, 300, 350, 400, 420, 450, 470, 500, 550]
}
df = pd.DataFrame(data)
# Plot
plt.plot(df['Date'], df['Sales'], marker='o', linestyle='-')
plt.title("Monthly Sales Data")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.grid()
plt.show()
ARIMA is a statistical modeling technique for analyzing and forecasting time series data. It combines three components:
- AR (AutoRegressive): Uses past values.
- I (Integrated): Differencing the data to make it stationary.
- MA (Moving Average): Uses past forecast errors.
- Visualize the Data: Plot the series and check for trends, seasonality, and stationarity.
- Stationarity Test: Use tests like the Augmented Dickey-Fuller (ADF) test.
- Differencing: Transform non-stationary data to stationary.
- Parameter Selection: Use
p
,d
,q
to define the ARIMA model. - Model Training: Fit the ARIMA model to your data.
- Forecasting: Predict future values.
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
import matplotlib.pyplot as plt
# Example data
data = [112, 118, 132, 129, 121, 135, 148, 145, 140, 155, 164, 170]
df = pd.DataFrame(data, columns=['Sales'])
# Fit ARIMA model
model = ARIMA(df['Sales'], order=(1, 1, 1))
model_fit = model.fit()
# Summary of the model
print(model_fit.summary())
# Forecast future values
forecast = model_fit.forecast(steps=5)
print("Forecasted Values:", forecast)
Seasonal Decomposition of Time Series (STL) splits the data into trend, seasonal, and residual components.
from statsmodels.tsa.seasonal import STL
import pandas as pd
import matplotlib.pyplot as plt
# Sample time series data
data = [112, 118, 132, 129, 121, 135, 148, 136, 119, 104, 118, 115]
df = pd.DataFrame(data, columns=['value'])
# STL decomposition
stl = STL(df['value'], period=12)
result = stl.fit()
# Plot components
result.plot()
plt.show()
SARIMA extends ARIMA by incorporating seasonality.
The model is defined by parameters (p, d, q) x (P, D, Q, s)
where:
(p, d, q)
are ARIMA parameters.(P, D, Q, s)
are seasonal parameters.
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Fit SARIMA model
model = SARIMAX(df['value'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
sarima_result = model.fit()
# Forecast
forecast = sarima_result.forecast(steps=5)
print("SARIMA Forecast:", forecast)
Prophet is an open-source library developed by Facebook for time series forecasting. It is highly flexible, easy to use, and handles missing data, holidays, and seasonal patterns effectively.
- Handles seasonality and holiday effects.
- Robust to missing data.
- Requires minimal tuning.
from prophet import Prophet
import pandas as pd
import matplotlib.pyplot as plt
# Create example data
data = {
'ds': pd.date_range(start='2023-01-01', periods=12, freq='M'),
'y': [200, 220, 250, 270, 300, 350, 400, 420, 450, 470, 500, 550]
}
df = pd.DataFrame(data)
# Fit Prophet model
model = Prophet()
model.fit(df)
# Create future dataframe
future = model.make_future_dataframe(periods=6, freq='M')
# Forecast
forecast = model.predict(future)
# Plot results
fig = model.plot(forecast)
plt.show()
LSTMs are a type of recurrent neural network (RNN) capable of learning long-term dependencies.
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
# Sample data
data = np.array([112, 118, 132, 129, 121, 135, 148, 136, 119, 104, 118, 115])
X = data[:-1].reshape((1, len(data)-1, 1)) # Features
y = data[1:] # Labels
# Define LSTM model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(X.shape[1], 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# Train
model.fit(X, y, epochs=200, verbose=0)
# Forecast
forecast = model.predict(X)
print("LSTM Forecast:", forecast)
Try the following:
- Load a time series dataset of your choice (e.g., stock prices, weather data).
- Preprocess the data to handle missing values.
- Train an ARIMA model and forecast future values.
- Compare the performance of ARIMA and Prophet on the same dataset.
In this lesson, we covered the fundamentals of Time Series Forecasting, explored ARIMA, and demonstrated the use of Prophet for efficient predictions. Forecasting is a powerful tool for uncovering trends and patterns in sequential data. Mastering these techniques will empower you to tackle real-world problems in diverse domains.