Hourly Energy Consumption Forecasting with XGBoost (Calendar Features Only)ΒΆ
Introduction to XGBoost for Time-Series ForecastingΒΆ
XGBoost (Extreme Gradient Boosting) is a powerful, scalable machine learning algorithm renowned for its speed, accuracy, and interpretability in structured data tasks. While not inherently time-aware, it excels at time-series forecasting when paired with thoughtful feature engineering to encode temporal patterns.
Key Advantages for Time SeriesΒΆ
Nonlinear Pattern Capture:
XGBoostβs tree-based structure models complex interactions (e.g., how hour-of-day and day-of-week jointly influence energy demand) that linear models like ARIMA cannot capture.Feature Importance Insights:
Built-in metrics and tools like SHAP (SHapley Additive exPlanations) quantify how each engineered feature (e.g.,hour
,month
) contributes to predictions, aiding model debugging and trust.Robustness & Efficiency:
Handles large datasets (e.g., hourly data over decades) gracefully, resists overfitting via regularization, and supports early stopping for optimal training.
Why Calendar Features?ΒΆ
In this workflow, XGBoost relies solely on calendar-derived features (no weather/exogenous data) to forecast energy demand. This approach:
- Extracts cyclical patterns (e.g., daily, weekly, seasonal cycles) from timestamps.
- Avoids dependency on external data sources, making it lightweight and broadly applicable.
- Demonstrates how even simple temporal features can encode rich, actionable signals for tree-based models.
Workflow LogicΒΆ
By treating time series as a supervised regression problem (predicting future values from engineered past features), XGBoost bypasses traditional time-series assumptions (e.g., stationarity) and leverages its strength in tabular data modeling. The final forecast combines learned hierarchical relationships (e.g., β3 PM on summer weekendsβ correlates with high AC usage) into a single prediction.
Workflow Overview
Data Preparation
- Load PJM East hourly load data (2002β2018) and parse the timestamp into a
DatetimeIndex
. - Drop the final incomplete day (2018-08-03) and ensure the series is sorted chronologically.
- Load PJM East hourly load data (2002β2018) and parse the timestamp into a
Chronological Train/Val/Test Split
- Split into 80% train, 10% validation, and 10% test by time (no shuffling), to mimic a real forecasting setup and prevent data leakage.
Feature Engineering
- Derive eight purely calendar-based features from the timestamp:
hour
,dayofweek
,quarter
,month
,year
,dayofyear
,dayofmonth
, andweekofyear
.
- Derive eight purely calendar-based features from the timestamp:
Model Training
- Fit an
XGBRegressor
(1,000 trees, learning rate 0.01) using the train set, with early stopping (50 rounds) monitored on the validation set.
- Fit an
Interpretability with SHAP
- Compute SHAP values on the validation set to rank and visualize feature importance via both beeswarm and bar charts.
Forecasting and Evaluation
- Predict on the held-out test set and compute RMSE, MAE, and MAPE.
- Plot the full series (actual vs. predicted) and zoom in on representative weeks to visually inspect performance.
Error Analysis
- Aggregate errors by day and identify the top 10 worst- and best-predicted dates.
- Hypothesize root causes (e.g., extreme cold/heat, holiday effects) for the largest errors.
Key Highlights
Baseline Accuracy
- RMSE: ~3,973 MW
- MAE: ~3,137 MW
- MAPE: 10.24%
Most Influential Features
- Day of week and hour of day dominate, followed by day of year, reflecting strong weekly and daily cycles in electricity demand.
Systematic Error Patterns
- Largest misses occur on deep-freeze weekends and holiday-week days (e.g., New Yearβs Eve, Presidentsβ Day weekend), when calendar alone canβt capture weather spikes or industrial closures.
- Best performance aligns with βnormalβ weekdays in stable seasons (e.g., Dec 30, 2016).
Next Steps
- Integrate Weather Variables (temperature, humidity, wind) to capture extreme-temperature spikes.
- Add Holiday Flags (and surrounding days) to account for anomalous load patterns.
- Introduce Lagged Load Features so the model can learn autoregressive patterns.
- Expand to More Advanced Models (e.g., hybrid XGBoost + LSTM, or purely deep-learning approaches) for further accuracy gains.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import xgboost as xgb
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split
import shap
import warnings
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
Load and Prepare DataΒΆ
Load the PJM East hourly energy consumption data. Ensure the index is a DatetimeIndex, sort it chronologically, and remove the last day which has incomplete data.
pjme = pd.read_csv('../data/3/PJME_hourly.csv', index_col='Datetime', parse_dates=['Datetime'])
# Remove 08-03-2018, it's the last day of the dataset and has incomplete data
pjme = pjme[pjme.index < '2018-08-03']
# Ensure DatetimeIndex and sort
pjme.index = pd.to_datetime(pjme.index)
pjme = pjme.sort_index()
# Verify
print(f'Data shape: {pjme.shape}')
print(f'Index monotonic? {pjme.index.is_monotonic_increasing}')
pjme.head()
Data shape: (145365, 1) Index monotonic? True
PJME_MW | |
---|---|
Datetime | |
2002-01-01 01:00:00 | 30393.0 |
2002-01-01 02:00:00 | 29265.0 |
2002-01-01 03:00:00 | 28357.0 |
2002-01-01 04:00:00 | 27899.0 |
2002-01-01 05:00:00 | 28057.0 |
color_pal = sns.color_palette() # Use seaborn's default palette
pjme.plot(style='.', figsize=(15,5), color=color_pal[0], title='PJM East Hourly Energy Consumption (MW)')
plt.ylabel('MW')
plt.show()
Train/Validation/Test SplitΒΆ
Split the data chronologically to simulate a real-world forecasting scenario:
- Training Set: First ~80% of the data.
- Validation Set: Next ~10% of the data (used for early stopping).
- Test Set: Final ~10% of the data (used for final model evaluation).
# Define split points based on time
total_hours = len(pjme)
test_split_point = int(total_hours * 0.9)
val_split_point = int(total_hours * 0.8)
test_split_date = pjme.index[test_split_point]
val_split_date = pjme.index[val_split_point]
# Perform the split using the calculated dates
test_df = pjme.loc[pjme.index >= test_split_date].copy()
val_df = pjme.loc[(pjme.index >= val_split_date) & (pjme.index < test_split_date)].copy()
train_df = pjme.loc[pjme.index < val_split_date].copy()
# Check proportions and sizes
total_len = len(pjme)
train_len = len(train_df)
val_len = len(val_df)
test_len = len(test_df)
print(f"Total: {total_len} rows")
print(f"Train: {train_len} rows ({train_len/total_len:.1%})")
print(f"Val : {val_len} rows ({val_len/total_len:.1%})")
print(f"Test : {test_len} rows ({test_len/total_len:.1%})")
Total: 145365 rows Train: 116292 rows (80.0%) Val : 14536 rows (10.0%) Test : 14537 rows (10.0%)
# Verify the chronological split
print("\n--- Split Verification ---")
print(f"Original sorted? {pjme.index.is_monotonic_increasing}")
print(f"Train ends at: {train_df.index.max()}")
print(f"Val starts at: {val_df.index.min()}")
print(f"Val ends at: {val_df.index.max()}")
print(f"Test starts at: {test_df.index.min()}")
# Confirm no overlap and chronological order within splits
assert train_df.index.max() < val_df.index.min()
assert val_df.index.max() < test_df.index.min()
assert train_df.index.is_monotonic_increasing
assert val_df.index.is_monotonic_increasing
assert test_df.index.is_monotonic_increasing
print("Split boundaries and ordering confirmed.")
--- Split Verification --- Original sorted? True Train ends at: 2015-04-09 14:00:00 Val starts at: 2015-04-09 15:00:00 Val ends at: 2016-12-05 05:00:00 Test starts at: 2016-12-05 06:00:00 Split boundaries and ordering confirmed.
# Plot the splits
fig, ax = plt.subplots(figsize=(15, 5))
train_df['PJME_MW'].plot(ax=ax, label='Training Set', title='Data Train/Validation/Test Split', style='.')
val_df['PJME_MW'].plot(ax=ax, label='Validation Set', style='.')
test_df['PJME_MW'].plot(ax=ax, label='Test Set', style='.')
ax.axvline(val_split_date, color='black', ls='--')
ax.axvline(test_split_date, color='black', ls='--')
ax.legend()
plt.ylabel('MW')
plt.show()
Create Time Series FeaturesΒΆ
Define a function to extract calendar-based features from the DatetimeIndex.
def create_features(df, label=None):
"""
Creates time series features from a datetime index.
Args:
df (pd.DataFrame): DataFrame with a DatetimeIndex.
label (str, optional): Name of the target column. Defaults to None.
Returns:
pd.DataFrame or tuple(pd.DataFrame, pd.Series):
If label is None, returns DataFrame with features.
If label is provided, returns tuple (feature DataFrame X, target Series y).
"""
df = df.copy()
df['hour'] = df.index.hour
df['dayofweek'] = df.index.dayofweek
df['quarter'] = df.index.quarter
df['month'] = df.index.month
df['year'] = df.index.year
df['dayofyear'] = df.index.dayofyear
df['dayofmonth'] = df.index.day
df['weekofyear'] = df.index.isocalendar().week.astype(int)
X = df[['hour', 'dayofweek', 'quarter', 'month', 'year',
'dayofyear', 'dayofmonth', 'weekofyear']]
if label:
y = df[label]
return X, y
return X
TARGET = 'PJME_MW'
X_train, y_train = create_features(train_df, label=TARGET)
X_val, y_val = create_features(val_df, label=TARGET)
X_test, y_test = create_features(test_df, label=TARGET)
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
print(f"X_val shape: {X_val.shape}, y_val shape: {y_val.shape}")
print(f"X_test shape: {X_test.shape}, y_test shape: {y_test.shape}")
X_train shape: (116292, 8), y_train shape: (116292,) X_val shape: (14536, 8), y_val shape: (14536,) X_test shape: (14537, 8), y_test shape: (14537,)
Train XGBoost ModelΒΆ
Train an XGBoost Regressor model using only the calendar features. Use the validation set for early stopping to prevent overfitting and find the optimal number of boosting rounds.
# Instantiate the XGBoost regressor
reg = xgb.XGBRegressor(
n_estimators=1000,
early_stopping_rounds=50,
learning_rate=0.01, # Lower learning rate often requires more estimators but can generalize better
objective='reg:squarederror',
n_jobs=-1,
random_state=42
)
# Fit the model
print("Training XGBoost model...")
reg.fit(
X_train, y_train,
eval_set=[(X_train, y_train), (X_val, y_val)],
verbose=100 # Print evaluation results every 100 rounds
)
print("Model training complete.")
Training XGBoost model... [0] validation_0-rmse:6389.23892 validation_1-rmse:6966.29632 [100] validation_0-rmse:3911.96295 validation_1-rmse:4455.85693 [200] validation_0-rmse:3241.13734 validation_1-rmse:3897.31998 [300] validation_0-rmse:2999.51806 validation_1-rmse:3792.87927 [397] validation_0-rmse:2845.39505 validation_1-rmse:3783.11235 Model training complete.
Feature Importance Analysis (SHAP)ΒΆ
Use SHAP values to understand the contribution of each calendar feature to the model's predictions on the validation set.
# Explain model predictions using SHAP
print("Calculating SHAP values...")
explainer = shap.TreeExplainer(reg)
shap_values = explainer.shap_values(X_val)
print("SHAP values calculated.")
# Plot SHAP summary plot (beeswarm)
print("Generating SHAP summary plot (beeswarm)...")
shap.summary_plot(shap_values, X_val, feature_names=X_val.columns)
plt.show()
# Plot SHAP summary plot (bar)
print("Generating SHAP summary plot (bar)...")
shap.summary_plot(shap_values, X_val, feature_names=X_val.columns, plot_type="bar")
plt.show()
Calculating SHAP values... SHAP values calculated. Generating SHAP summary plot (beeswarm)...
Generating SHAP summary plot (bar)...
Forecast on Test Set and EvaluateΒΆ
Use the trained model to make predictions on the unseen test set. Visualize the predictions against the actual values.
# Make predictions on the test set
test_df['MW_Prediction'] = reg.predict(X_test)
# Combine train/val/test for plotting full series (add target back to train/val)
train_df_plot = train_df.copy()
train_df_plot[TARGET] = y_train
val_df_plot = val_df.copy()
val_df_plot[TARGET] = y_val
all_df = pd.concat([train_df_plot, val_df_plot, test_df], sort=False)
# Plot actuals vs predictions for the entire dataset (test predictions highlighted)
fig, ax = plt.subplots(figsize=(15, 6))
all_df[TARGET].plot(ax=ax, label='Actual', style='.', alpha=0.5, ms=2)
test_df['MW_Prediction'].plot(ax=ax, label='Prediction (Test Set)', style='-')
ax.set_title('Actual vs. Predicted Energy Consumption (Calendar Features Only)')
ax.set_ylabel('MW')
ax.legend()
plt.show()
Visualize Forecast for Specific PeriodsΒΆ
Zoom in on specific weeks in the test set to better visualize the model's performance.
# Plot the forecast with the actuals for the first week of December 2016
fig, ax = plt.subplots(figsize=(15, 6))
start_date = '2016-12-05'
end_date = '2016-12-12'
test_df.loc[start_date:end_date][TARGET].plot(ax=ax, label='Actual', style='.-')
test_df.loc[start_date:end_date]['MW_Prediction'].plot(ax=ax, label='Prediction', style='-')
ax.set_title(f'Forecast vs Actuals ({start_date} to {end_date})')
ax.set_ylabel('MW')
ax.legend()
plt.show()
# Plot the forecast with the actuals for the first week of July 2017
fig, ax = plt.subplots(figsize=(15, 6))
start_date = '2017-07-01'
end_date = '2017-07-08'
test_df.loc[start_date:end_date][TARGET].plot(ax=ax, label='Actual', style='.-')
test_df.loc[start_date:end_date]['MW_Prediction'].plot(ax=ax, label='Prediction', style='-')
ax.set_title(f'Forecast vs Actuals ({start_date} to {end_date})')
ax.set_ylabel('MW')
ax.legend()
plt.show()
Test Set Error MetricsΒΆ
Evaluate the model's performance on the held-out test set using standard regression metrics.
Define the MAPE function.
def mean_absolute_percentage_error(y_true, y_pred):
"""Calculates MAPE given y_true and y_pred."""
y_true, y_pred = np.array(y_true), np.array(y_pred)
mask = y_true != 0
if not np.all(mask):
print("Warning: Zero values found in y_true, MAPE calculation excludes these points.")
return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100
Calculate and display the error metrics.
y_true_test = test_df[TARGET]
y_pred_test = test_df['MW_Prediction']
rmse = np.sqrt(mean_squared_error(y_true=y_true_test, y_pred=y_pred_test))
mae = mean_absolute_error(y_true=y_true_test, y_pred=y_pred_test)
mape = mean_absolute_percentage_error(y_true=y_true_test, y_pred=y_pred_test)
print("--- Test Set Error Metrics (Calendar Features Only) ---")
print(f"RMSE: {rmse:,.2f} MW")
print(f"MAE: {mae:,.2f} MW")
print(f"MAPE: {mape:.2f}%")
--- Test Set Error Metrics (Calendar Features Only) --- RMSE: 3,973.63 MW MAE: 3,137.22 MW MAPE: 10.24%
Error Analysis: Worst and Best Predicted DaysΒΆ
Worst Predicted DaysΒΆ
Analyze the prediction errors by day to identify systematic issues.
# Calculate error and absolute error
test_df['error'] = test_df[TARGET] - test_df['MW_Prediction']
test_df['abs_error'] = test_df['error'].abs()
# Group by date and calculate mean errors
error_by_day = test_df.groupby(test_df.index.date) \
.agg(mean_PJME_MW=(TARGET, 'mean'),
mean_MW_Prediction=('MW_Prediction', 'mean'),
mean_error=('error', 'mean'),
mean_abs_error=('abs_error', 'mean'))
# Add weekday name for context
error_by_day['weekday'] = pd.to_datetime(error_by_day.index).strftime('%A')
print("Top 10 Days with Worst Mean Absolute Error:")
error_by_day.sort_values('mean_abs_error', ascending=False).head(10)
Top 10 Days with Worst Mean Absolute Error:
mean_PJME_MW | mean_MW_Prediction | mean_error | mean_abs_error | weekday | |
---|---|---|---|---|---|
2018-01-06 | 43565.750000 | 33872.621094 | 9693.130371 | 9693.130371 | Saturday |
2017-02-24 | 26445.083333 | 36059.433594 | -9614.351725 | 9614.351725 | Friday |
2018-01-07 | 42159.708333 | 32589.742188 | 9569.965658 | 9569.965658 | Sunday |
2017-02-25 | 24344.458333 | 33805.746094 | -9461.285645 | 9461.285645 | Saturday |
2017-02-19 | 24555.500000 | 33521.769531 | -8966.270996 | 8966.270996 | Sunday |
2017-12-31 | 39016.000000 | 30089.259766 | 8926.738200 | 8926.738200 | Sunday |
2017-05-19 | 38032.583333 | 29108.976562 | 8923.606120 | 8923.606120 | Friday |
2017-02-20 | 27070.583333 | 35970.457031 | -8899.874837 | 8899.874837 | Monday |
2018-02-21 | 27572.500000 | 36334.296875 | -8761.797201 | 8761.797201 | Wednesday |
2017-02-23 | 27663.416667 | 36159.445312 | -8496.028239 | 8496.028239 | Thursday |
Hereβs a row-by-row breakdown in table form, with a quick hypothesis for each large-error day:
Date | Weekday | MAE | Observed Avg Load (MW) |
Predicted Avg Load (MW) |
Hypothesized Cause | Notes |
---|---|---|---|---|---|---|
2018-01-06 | Saturday | 9,693 | 43,565 | 33,873 | Mid-winter cold snap β heating demand | Temp plunged below seasonal normβSat baseline under-forecasted peak load. |
2017-02-24 | Friday | 9,614 | 26,445 | 36,059 | Pre-holiday Friday drop | Friday before Presidentsβ Day weekend often lower commercial activity. |
2018-01-07 | Sunday | 9,570 | 42,160 | 32,590 | Cold snap carries over β heating | Another deep-freeze weekend dayβmodel thinks βnormal Sunday,β but load spiked. |
2017-02-25 | Saturday | 9,461 | 24,344 | 33,806 | Holiday-weekend anomaly β baseline load | Saturday of Presidentsβ Day weekendβfurther drop in industrial/commercial. |
2017-02-19 | Sunday | 8,966 | 24,556 | 33,522 | Presidentsβ Day weekend | Actual Sunday load depressed by holiday; model sees typical Sunday. |
2017-12-31 | Sunday | 8,927 | 39,016 | 30,089 | New Yearβs Eve celebrations β residential | Party lighting, cooking, heatingβload well above a βnormalβ Sunday forecast. |
2017-05-19 | Friday | 8,923 | 38,033 | 29,109 | Unusual weather or event β demand | Late-spring heat wave or local event; no calendar feature to capture it. |
2017-02-20 | Monday | 8,900 | 27,071 | 35,970 | Presidentsβ Day holiday | Federal holidayβMonday load down, but model treats as normal Monday. |
2018-02-21 | Wednesday | 8,762 | 27,572 | 36,334 | Post-holiday rebound | Day after Presidentsβ Dayβload recovering, model still on βweekdayβ norm. |
2017-02-23 | Thursday | 8,496 | 27,663 | 36,159 | Holiday-week carryover | Thursday after Presidentsβ Day sees atypical load pattern through week. |
Key takeaway:
Because this model only βknowsβ day-of-week (and maybe month), it canβt adapt to
- Weather extremes (deep-freeze or heat anomalies)
- Holiday/holiday-weekend effects (industrial closures, party-time spikes)
Adding external regressorsβtemperature or heating-degree days, holiday flags, special-event indicatorsβwill give the model the context it needs to correct these large errors.
Best Predicted DaysΒΆ
print("\nTop 10 Days with Best Mean Absolute Error:")
error_by_day.sort_values('mean_abs_error', ascending=True).head(10)
Top 10 Days with Best Mean Absolute Error:
mean_PJME_MW | mean_MW_Prediction | mean_error | mean_abs_error | weekday | |
---|---|---|---|---|---|
2016-12-30 | 32213.458333 | 32159.609375 | 53.847900 | 330.570068 | Friday |
2017-05-02 | 28546.916667 | 28805.884766 | -258.968343 | 404.729899 | Tuesday |
2016-12-12 | 32757.208333 | 32902.617188 | -145.408447 | 496.963460 | Monday |
2017-03-18 | 29677.083333 | 29711.373047 | -34.288981 | 506.869222 | Saturday |
2017-04-01 | 27047.041667 | 26972.796875 | 74.244466 | 508.144043 | Saturday |
2017-12-08 | 32779.291667 | 32935.121094 | -155.828532 | 562.034098 | Friday |
2016-12-08 | 32704.666667 | 33030.042969 | -325.375651 | 577.592773 | Thursday |
2017-04-07 | 28840.250000 | 29130.056641 | -289.808512 | 579.313721 | Friday |
2018-04-19 | 28949.958333 | 28955.976562 | -6.018066 | 588.082194 | Thursday |
2017-10-24 | 28657.416667 | 29212.718750 | -555.302165 | 620.493896 | Tuesday |
Plotting Example DaysΒΆ
Worst Predicted DayΒΆ
Visualize the predictions versus actuals for the day with the highest average absolute error. This often highlights limitations of the feature set (e.g., inability to capture extreme weather or holiday effects).
# Find and plot the worst day
worst_day = error_by_day['mean_abs_error'].idxmax()
worst_day_str = worst_day.strftime('%Y-%m-%d')
fig, ax = plt.subplots(figsize=(15, 6))
test_df.loc[worst_day_str][TARGET].plot(ax=ax, label='Actual', style='.-')
test_df.loc[worst_day_str]['MW_Prediction'].plot(ax=ax, label='Prediction', style='-')
ax.set_title(f'Worst Predicted Day ({worst_day_str}) - MAE: {error_by_day.loc[worst_day, "mean_abs_error"]:.2f} MW')
ax.set_ylabel('MW')
ax.legend()
plt.show()
Best Predicted DayΒΆ
Visualize the predictions versus actuals for the day with the lowest average absolute error. These days typically represent 'normal' conditions that the calendar features capture well.
# Find and plot the best day
best_day = error_by_day['mean_abs_error'].idxmin()
best_day_str = best_day.strftime('%Y-%m-%d')
fig, ax = plt.subplots(figsize=(15, 6))
test_df.loc[best_day_str][TARGET].plot(ax=ax, label='Actual', style='.-')
test_df.loc[best_day_str]['MW_Prediction'].plot(ax=ax, label='Prediction', style='-')
ax.set_title(f'Best Predicted Day ({best_day_str}) - MAE: {error_by_day.loc[best_day, "mean_abs_error"]:.2f} MW')
ax.set_ylabel('MW')
ax.legend()
plt.show()
Conclusion & Next StepsΒΆ
This model, using only calendar features, establishes a baseline performance. The significant errors on certain days (often related to weather extremes or holidays) highlight the need for additional features.
Next Steps:
- Incorporate Weather Data: Add features like temperature, humidity, wind speed, etc.
- Add Holiday Features: Explicitly flag holidays and potentially days surrounding them.
- Consider Lagged Features: Include past energy consumption values as features.
These additions, explored in the subsequent notebook (time-series-forecasting-with-xgboost-2.ipynb
), significantly improve model accuracy by providing context beyond simple calendar cycles.