Budget Allocation with PyMC-Marketing#

The purpose of this notebook is to explore the recently included function in the PyMC-Marketing library that focuses on budget allocation. This function’s underpinnings are based on the methodologies inspired by Bolt’s work in the article, “Budgeting with Bayesian Models”.

Prerequisite Knowledge#

The notebook assumes the reader has knowledge of the essential functionalities of PyMC-Marketing. If one is unfamiliar, the “MMM Example Notebook” serves as an excellent starting point, offering a comprehensive introduction to media mix models in this context.

Introducing the budget allocator#

This notebook instigates an examination of the function within the PyMC-Marketing library, which addresses these challenges using Bayesian models. The function intends to provide:

Quantitative measures of the effectiveness of different media channels.
Probabilistic ROI estimates under a range of budget scenarios.

Basic Setup#

Like previous notebooks revolving around PyMC-Marketing, this relies on a specific library set. Here are the requisite imports necessary for executing the provided code snippets subsequently.

import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

from pymc_marketing.mmm.builders.yaml import build_mmm_from_yaml
from pymc_marketing.mmm.multidimensional import (
    MultiDimensionalBudgetOptimizerWrapper,
)
from pymc_marketing.paths import data_dir

warnings.filterwarnings("ignore")

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
/Users/carlostrujillo/Documents/GitHub/pymc-marketing/pymc_marketing/mmm/multidimensional.py:72: FutureWarning: This functionality is experimental and subject to change. If you encounter any issues or have suggestions, please raise them at: https://github.com/pymc-labs/pymc-marketing/issues/new
  warnings.warn(warning_msg, FutureWarning, stacklevel=1)
/var/folders/f0/rbz8xs8s17n3k3f_ccp31bvh0000gn/T/ipykernel_38045/3141621575.py:9: UserWarning: The pymc_marketing.mmm.builders module is experimental and its API may change without warning.
  from pymc_marketing.mmm.builders.yaml import build_mmm_from_yaml

These imports and configurations form the fundamental setup necessary for the entire span of this notebook.

The expectation is that a model has already been trained using the functionalities provided in prior versions of the PyMC-Marketing library. Thus, the data generation and training processes will be replicated in a different notebook. Those unfamiliar with these procedures are advised to refer to the “MMM Example Notebook.”

Loading a Pre-Trained Model#

To utilize a saved model, load it into a new instance of the MMM class using the build_mmm_from_yaml method below.

seed: int = sum(map(ord, "mmm_allocation_example"))
rng: np.random.Generator = np.random.default_rng(seed=seed)

data_path = data_dir / "multidimensional_mock_data.csv"
data_df = pd.read_csv(data_path, parse_dates=["date"], index_col=0)
data_df.head()

	date	y	x1	dayofyear	t	geo
0	2018-04-02	3984.662237	159.290009	92	0	geo_a
1	2018-04-09	3762.871794	56.194238	99	1	geo_a
2	2018-04-16	4466.967388	146.200133	106	2	geo_a
3	2018-04-23	3864.219373	35.699276	113	3	geo_a
4	2018-04-30	4441.625278	193.372577	120	4	geo_a

x_train = data_df.drop(columns=["y"])
y_train = data_df["y"]

mmm = build_mmm_from_yaml(
    X=x_train,
    y=y_train,
    config_path=data_dir / "config_files" / "multi_dimensional_example_model.yml",
)

For more details on the build_mmm_from_yaml, consult the pymc-marketing documentation on Model Deployment.

Alternatively, load a model that has been saved to MLflow via pymc_marketing.mlflow.log_inference_data or has been autologged to MLflow via pymc_marketing.mlflow.autolog(log_mmm=True), from the PyMC-Marketing MLflow module.

## If you have a hosted MLflow server, you will of course need to authenticate first.
# RUN_ID = "your_run_id"
# from pymc_marketing.mlflow import load_mmm
# mmm = load_mmm(RUN_ID)

# # Load the full model with the InferenceData
# mmm = load_mmm(
#     run_id=RUN_ID,         # The MLflow run ID from which to load the model
#     full_model=True,       # Set to True to get the full MMM model with InferenceData
#     keep_idata=True,       # Set to True if you want to keep the downloaded InferenceData saved locally
# )

Problem Statement#

Before jumping into the data, let’s first define the business problem we are trying to solve. In a progressively competitive scenario, marketers are tasked with distributing a predetermined marketing budget across various channels to maximize a certain response. Consider a forthcoming quarter wherein a marketing team must decide the division of its operations between two advertising channels, represented as x1 and x2. These could effectively symbolize any medium, such as TV, digital advertising, print, etc.

The task lies in making decisions that invoke data, comply with factual evidence, and align with business logic. For instance, how can one incorporate prior information like budget restrictions, platform trends, constraints, or even distinctive features of each channel into the decision-making process?

Introducing Budget Allocation Function#

The budget allocation capabilities in PyMC-Marketing aims to tackle this issue by offering a Bayesian framework for optimal allocation. This enables marketers to:

Integrate the outcomes of Media Mix Modeling (MMM), quantifying each channel’s effectiveness in metrics like ROI, incremental sales, etc.
Merge this empirical data with prior business knowledge and logic for making holistic and robust decisions.

By utilizing this function, marketers can guarantee that the budget spread not only obeys the mathematical rigor furnished by the MMM outcomes but also incorporates business-specific factors, thereby achieving a balanced and optimized budget plan.

Getting started#

Media Mix Modeling (MMM) acts as a dependable method to estimate the contribution of each channel (e.g., x1, x2) to a target variable like sales or any variable.

The function saturation_scatterplot() allows for visualization of this direct channel impact. However, it is crucial to remember that this only unveils the “observable space” for values of X (spend) and Y (contribution).

mmm.plot.saturation_scatterplot(original_scale=True);

../../_images/c83e225c164ff5ac79af2aaaae7cdf06779ab1432d5e50ca3e8b24b61ebf341a.png

The observable space only encompasses our data points and does not illustrate what transpires beyond those points. As a result, it is not assured that the maximum contribution point for each channel lies within this observable range.

If we want to visualize certain level of response, we can use sample_curve to get an estimate of our response in scaled space given a max value of X in scaled space as well. In the example below, we are using the value 3 which represent 3X the max historical value on each channel. Depending on your scaling method, max_value could represent a different thing.

After it, using the function saturation_curves, we can predict the shape of the model fitting curve for the amount spent that was not previously observed.

curve = mmm.saturation.sample_curve(
    mmm.idata.posterior[["saturation_beta", "saturation_lam"]], max_value=3
)
fig, axes = mmm.plot.saturation_curves(
    curve,
    original_scale=True,
    n_samples=10,
    hdi_probs=0.85,
    random_seed=rng,
    subplot_kwargs={"figsize": (12, 8), "ncols": 2},
    rc_params={
        "xtick.labelsize": 10,
        "ytick.labelsize": 10,
        "axes.labelsize": 10,
        "axes.titlesize": 10,
    },
)

for ax in axes.ravel():
    ax.title.set_fontsize(10)

if fig._suptitle is not None:
    fig._suptitle.set_fontsize(12)

plt.tight_layout()
plt.show()

Sampling: []

../../_images/c3a958d29fa18d05d6905eba62cc598c387972fbeeb085157df1d0b87b2e2b4d.png

We can identify which saturation function was used in the pre-trained model:

print(f"Model was train using the {mmm.saturation.__class__.__name__} function")
print(f"and the {mmm.adstock.__class__.__name__} function")

Model was train using the LogisticSaturation function
and the GeometricAdstock function

Within PyMC-Marketing we have different saturation functions, you can observe all in the transformer module.

Once these parameters are obtained, you can visualize it using the arviz.summary function (each parameter has the prefix saturation or adstock respectively) and, if desired, you can recreate the curves for each channel independently based on them. More crucially, these parameter values are indispensable when using the budget_allocator function, which leverages this information to optimize your marketing budget across distinct channels. This section is fundamental to budget optimization.

az.summary(
    data=mmm.fit_result,
    var_names=[
        "saturation_beta",
        "saturation_lam",
        "adstock_alpha",
    ],
)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
saturation_beta[x1]	0.370	0.021	0.332	0.409	0.001	0.001	565.0	537.0	1.01
saturation_beta[x2]	0.269	0.061	0.182	0.384	0.003	0.003	552.0	499.0	1.01
saturation_lam[x1]	4.025	0.409	3.242	4.756	0.016	0.013	625.0	490.0	1.00
saturation_lam[x2]	2.815	1.091	1.139	4.744	0.053	0.060	492.0	430.0	1.01
adstock_alpha[x1]	0.395	0.032	0.331	0.453	0.001	0.001	728.0	555.0	1.00
adstock_alpha[x2]	0.178	0.043	0.104	0.266	0.002	0.001	624.0	325.0	1.01

Other methods to explore#

The current optimization use the full posterior, and it can be use for more than minimize or maximize, can consider all information to perfom risk assesments, you can take a read to Risk Allocation for Media Mix Models. At the same time, it could be a powerful and interesting solution as it’s described on the following blog “Using bayesian decision making to optimize supply chains”

The current methodology is similar to the ones used on other libraries as Robyn from Meta and Google Lightweight from Google. You can explore the solutions and compare if needed.

Conclusion#

MMM models and methodologies used here are designed to bridge the gap between theoretical rigor and actionable marketing insights. They represent a significant stride towards a more data-driven, analytical approach to marketing budget allocation, which could change how organizations invest in customer acquisition and retention.

Consequently, your engagements, feedback, and thoughts are not merely welcomed but actively solicited to make this tool as practical and universally applicable as possible.

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor

Last updated: Sat Jul 26 2025

Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.4.0

pytensor: 2.31.7

matplotlib    : 3.10.3
xarray        : 2025.7.1
pandas        : 2.3.1
pytensor      : 2.31.7
pymc_marketing: 0.15.1
numpy         : 2.2.6
arviz         : 0.22.0

Watermark: 2.5.0