CLV Quickstart#

Customer Lifetime Value (CLV) is the measure of a customer’s contribution over time to a business. This metric is used to inform spending levels on new customer acquisition, retention, and other marketing and sales efforts, so reliable estimation is essential.

PyMC-Marketing provides tools to segment customers on their past behavior (see RFM Segmentation) as well as the following Buy Till You Die (BTYD) probabilistic models to predict future behavior:

  • BG/NBD model for continuous time, non-contractual modeling

  • Pareto/NBD model for continuous time, non-contractual modeling with covariates

  • Shifted BG model for discrete time, contractual modeling

  • BG/BB model for discrete time, contractual modeling

  • Exponential Gamma model for discrete time, contractual modeling (coming soon)

  • Gamma-Gamma model for expected monetary value

  • Modified BG/NBD model, similar to the BG/NBD model, but assumes non-repeat customers are still active.

This table contains a breakdown of the four BTYD modeling domains, and examples for each:

Non-contractual

Contractual

Continuous

online purchases

ad conversion time

Discrete

concerts & sports events

recurring subscriptions

In this notebook we will demonstrate how to estimate future purchasing activity and CLV with the CDNOW dataset, a popular benchmarking dataset in CLV and BTYD research. Data is available here, with additional details here.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from arviz.labels import MapLabeller
from pymc_extras.prior import Prior

from pymc_marketing import clv
az.style.use("arviz-darkgrid")

%config InlineBackend.figure_format = "retina" # nice looking plots

1.1 Data Requirements#

For all models, the following nomenclature is used:

  • customer_id represents a unique identifier for each customer.

  • frequency represents the number of repeat purchases that a customer has made, i.e. one less than the total number of purchases.

  • T represents a customer’s “age”, i.e. the duration between a customer’s first purchase and the end of the period of study. In this example notebook, the units of time are in weeks.

  • recency represents the time period when a customer made their most recent purchase. This is equal to the duration between a customer’s first and last purchase. If a customer has made only 1 purchase, their recency is 0.

  • monetary_value represents the average value of a given customer’s repeat purchases. Customers who have only made a single purchase have monetary values of zero.

The rfm_summary function can be used to preprocess raw transaction data for modeling:

raw_trans = pd.read_csv(
    "https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/data/cdnow_transactions.csv"
)

raw_trans.head(5)
_id id date cds_bought spent
0 4 1 19970101 2 29.33
1 4 1 19970118 2 29.73
2 4 1 19970802 1 14.96
3 4 1 19971212 2 26.48
4 21 2 19970101 3 63.34
rfm_data = clv.utils.rfm_summary(
    raw_trans,
    customer_id_col="id",
    datetime_col="date",
    monetary_value_col="spent",
    datetime_format="%Y%m%d",
    time_unit="W",
)

rfm_data
customer_id frequency recency T monetary_value
0 1 3.0 49.0 78.0 23.723333
1 2 1.0 2.0 78.0 11.770000
2 3 0.0 0.0 78.0 0.000000
3 4 0.0 0.0 78.0 0.000000
4 5 0.0 0.0 78.0 0.000000
... ... ... ... ... ...
2352 2353 2.0 53.0 66.0 19.775000
2353 2354 5.0 24.0 66.0 44.928000
2354 2355 1.0 44.0 66.0 24.600000
2355 2356 6.0 62.0 66.0 31.871667
2356 2357 0.0 0.0 66.0 0.000000

2357 rows × 5 columns

It is important to note these definitions differ from that used in RFM segmentation, where the first purchase is included, T is not used, and recency is the number of time periods since a customer’s most recent purchase.

To visualize data in RFM format, we can plot the recency and T of the customers with the plot_customer_exposure function. We see a large chunk (>60%) of customers haven’t made another purchase in a while.

fig, ax = plt.subplots(figsize=(10, 5))
(
    rfm_data.sample(n=100, random_state=42)
    .sort_values(["recency", "T"])
    .pipe(clv.plot_customer_exposure, ax=ax, linewidth=0.5, size=0.75)
);
../../_images/7ed7ba687ead75818776261d5dbe41858cb228992e462e0b38d90fac0ed152f0.png

Predicting Future Purchasing Behavior with the BG/NBD Model#

This dataset is an example of continuous time, noncontractual transactions because it comprises purchases from an online music store. PyMC-Marketing provides several models for this use case:

We will be using the BG/NBD model in this notebook because it works well for basic use cases. For more comprehensive modeling, the Pareto/NBD and MBG/NBD models have expanded functionality and fewer limitations.

Let’s take a look at the underlying structure of the BG/NBD model:

bgm = clv.BetaGeoModel(data=rfm_data)

bgm.build_model()
bgm.graphviz()
../../_images/ba760052a47a6bf6f8ec11769ffc524137e54b9d4db3054a74e116d8c2e7e92a.svg

The default priors for the r and alpha purchase rate parameters follow a HalfFlat distribution, which is an uninformative positive distribution. The a and b dropout parameters are pooled with the hierarchical kappa_dropout and phi_dropout priors for improved performance.

More informative priors can be specified in a custom model configuration by passing a dictionary with keys for prior names, and values as distributions defined with the Prior class in PyMC-Marketing.

model_config = {
    "a": Prior("HalfNormal", sigma=10),
    "b": Prior("HalfNormal", sigma=10),
    "alpha": Prior("HalfNormal", sigma=10),
    "r": Prior("HalfNormal", sigma=10),
}
bgm = clv.BetaGeoModel(
    data=rfm_data,
    model_config=model_config,
)
bgm.build_model()
bgm
BG/NBD
            alpha ~ HalfNormal(0, 10)
                a ~ HalfNormal(0, 10)
                b ~ HalfNormal(0, 10)
                r ~ HalfNormal(0, 10)
recency_frequency ~ BetaGeoNBD(a, b, r, alpha, <constant>)
bgm.graphviz()
../../_images/763e7af5d041cea21686f5df79e1c3e9edbf0873d928664851981143b5c04102.svg

Having specified the model, we can now fit it.

Model Fitting with MAP#

By default, fit generates full Bayesian posteriors via MCMC sampling. For extremely large datasets where uncertainty estimates are not needed and/or MCMC is too slow, maximum a posteriori can be used to estimate point estimates for model parameters.

Use rfm_train_test_split for train/test splits. Here we are training on 52 weeks of data, and withholding the remaining 26 weeks for testing:

rfm_train_test_data = clv.utils.rfm_train_test_split(
    raw_trans,
    customer_id_col="id",
    datetime_col="date",
    monetary_value_col="spent",
    train_period_end="19980101",
    datetime_format="%Y%m%d",
    time_unit="W",
)

rfm_train_test_data
customer_id frequency recency T monetary_value test_frequency test_monetary_value test_T
0 1 3.0 49.0 52.0 23.723333 0.0 0.000 26.0
1 2 1.0 2.0 52.0 11.770000 0.0 0.000 26.0
2 3 0.0 0.0 52.0 0.000000 0.0 0.000 26.0
3 4 0.0 0.0 52.0 0.000000 0.0 0.000 26.0
4 5 0.0 0.0 52.0 0.000000 0.0 0.000 26.0
... ... ... ... ... ... ... ... ...
2352 2353 0.0 0.0 40.0 0.000000 2.0 19.775 26.0
2353 2354 5.0 24.0 40.0 44.928000 0.0 0.000 26.0
2354 2355 0.0 0.0 40.0 0.000000 1.0 24.600 26.0
2355 2356 4.0 26.0 40.0 33.317500 2.0 28.980 26.0
2356 2357 0.0 0.0 40.0 0.000000 0.0 0.000 26.0

2357 rows × 8 columns

Note any customers who made their first purchase during the testing period will be excluded. Use rfm_summary to retain all customers for the final model.

bgm_map = clv.BetaGeoModel(data=rfm_train_test_data)

bgm_map.fit(method="map")
bgm_map.fit_summary()

alpha            6.754
phi_dropout      0.219
kappa_dropout    2.171
r                0.283
a                0.475
b                1.696
Name: value, dtype: float64

Let’s evaluate model performance by tracking predictions against historical purchases.

clv.plot_expected_purchases_over_time(
    model=bgm_map,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="id",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
    set_index_date=True,
    t_start_eval=52,
);
../../_images/ea7600ff6286f9b9c3109976483a458ca90a5850085da7d4b226a15cf3654fa5.png

Looks like MAP is over-predicting the number of purchases during the testing period, and these predictions also don’t have any uncertainty estimates.

Model Fitting with MCMC#

MCMC sampling is recommended to illustrate uncertainty in parameter estimates and create credibility intervals around predictions.

bgm.fit()
bgm.fit_summary()
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, a, b, r]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 7 seconds.
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 7.092 0.509 6.106 8.020 0.012 0.009 1727.0 2042.0 1.0
a 0.687 0.156 0.431 0.972 0.004 0.004 1667.0 1616.0 1.0
b 3.259 0.977 1.763 5.103 0.025 0.030 1624.0 1739.0 1.0
r 0.276 0.012 0.254 0.299 0.000 0.000 1558.0 2098.0 1.0

We can use ArviZ, a Python library tailored to produce visualizations for Bayesian models, to plot the posterior distribution of each parameter.

az.plot_posterior(bgm.fit_result);
../../_images/dd21c8b39b4b2ab544e21a565f7484fe8a931872de57a747f816d01d54055976.png

1.2.1. Visualizing Predictions over Time#

Looking at the time plots again, we can see results track much more closely when fit with MCMC:

clv.plot_expected_purchases_over_time(
    model=bgm,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="id",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
);
../../_images/05ea83671616b29a3a39b6ba0ef64f544fbdba4881cc0fd9c4f82cc927340f37.png

Purchases can be plotting cumulatively as well as incrementally:

clv.plot_expected_purchases_over_time(
    model=bgm,
    purchase_history=raw_trans,
    datetime_col="date",
    customer_id_col="id",
    datetime_format="%Y%m%d",
    time_unit="W",
    t=78,
    set_index_date=True,
    plot_cumulative=False,
);
../../_images/3eda80ef48fdd982d7194682f1fee05f818ad692cc752bed793fa5adcf746d3e.png

Note how the model is able to respond to trend changes over time.

Visualizing Prediction Matrices#

clv.plot_frequency_recency_matrix(bgm);
../../_images/c6d13b032fc1545f829cb4852af5cdaec3178c1ea215a007ead40ad84854894c.png

We can see our best customers have been active for over 60 weeks and have made over 20 purchases (bottom-right). Note the “tail” sweeping up towards the upper-left corner - these customers are infrequent and/or may not have purchased recently. What is the probability they are still active?

clv.plot_probability_alive_matrix(bgm)
<Axes: title={'center': 'Probability Customer is Alive,\nby Frequency and Recency of a Customer'}, xlabel="Customer's Historical Frequency", ylabel="Customer's Recency">
../../_images/6f092e8738883a7adacfefda2808725e16719bc1570f9919aed04a690dcf64f3.png

Note that all non-repeat customers have an alive probability of 1, which is one of the quirks of BetaGeoModel. In many use cases this is still a valid assumption, but if non-repeat customers are a key focus in your use case, you may want to try ParetoNBDModel instead.

Looking at the probability alive matrix, we can infer that customers who have made fewer purchases are less likely to return, and may be worth targeting for retention.

Ranking customers from best to worst#

Having fit the model, we can ask what is the expected number of purchases for our customers over the next 10 time periods. Let’s look at the four more promising customers.

num_purchases = bgm.expected_purchases(future_t=10)

sdata = rfm_data.copy()
sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values
sdata.sort_values(by="expected_purchases").tail(4)
customer_id frequency recency T monetary_value expected_purchases
812 813 30.0 72.0 74.0 35.654000 3.442797
1202 1203 32.0 71.0 72.0 47.172187 3.813601
156 157 36.0 74.0 77.0 30.603611 3.900803
1980 1981 35.0 66.0 68.0 46.748857 4.307768

We can also plot credibility intervals for the expected purchases:

ids = [813, 1203, 157, 1981]
ax = az.plot_posterior(num_purchases.sel(customer_id=ids), grid=(2, 2))
for axi, id in zip(ax.ravel(), ids, strict=False):
    axi.set_title(f"Customer: {id}", size=20)
plt.suptitle("Expected Number of Purchase over 10 Time Periods", fontsize=28, y=1.05);
../../_images/99f9c33bce771f00448c1eeb40153bb45fdec9034d94bbaf880e05b5e4dc4efa.png

Predicting purchase behavior of a new customer#

We can use the fitted model to predict the number of purchases for a fresh new customer.

az.plot_posterior(bgm.expected_purchases_new_customer(t=10).sel(customer_id=1))
plt.title("Expected purchases of a new customer in the first 10 periods");
../../_images/6a0e6ee8f8e2339fe14f60e219badc40ae6191487bafe8a6801d889414fec4a3.png

Customer Probability Histories#

Given a customer transaction history, we can calculate their historical probability of being alive, according to our trained model.

Let look at active customer 1516 and assess the change in probability that the user will ever return if they do no other purchases in the next 9 time periods.

customer_1516 = rfm_data.loc[1515]
customer_1516
customer_id       1516.000000
frequency           27.000000
recency             67.000000
T                   70.000000
monetary_value      51.944074
Name: 1515, dtype: float64
customer_1516_history = pd.DataFrame(
    dict(
        customer_id=np.arange(10),
        frequency=np.full(10, customer_1516["frequency"], dtype="int"),
        recency=np.full(10, customer_1516["recency"]),
        T=(np.arange(0, 10) + customer_1516["recency"]).astype("int"),
    )
)
customer_1516_history
customer_id frequency recency T
0 0 27 67.0 67
1 1 27 67.0 68
2 2 27 67.0 69
3 3 27 67.0 70
4 4 27 67.0 71
5 5 27 67.0 72
6 6 27 67.0 73
7 7 27 67.0 74
8 8 27 67.0 75
9 9 27 67.0 76
p_alive = bgm.expected_probability_alive(data=customer_1516_history)
az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(
    customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase"
)

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();
../../_images/8547975dade5c085ea990d42ed484b0ed57c88c25c2cda91d8a404bc83b5f03a.png

We can see that, if no purchases are being made in the next 9 weeks, the model has low confidence that the costumer will ever return. What if they had done one purchase in between?

customer_1516_history.loc[7:, "frequency"] += 1
customer_1516_history.loc[7:, "recency"] = customer_1516_history.loc[7, "T"] - 0.5
customer_1516_history
customer_id frequency recency T
0 0 27 67.0 67
1 1 27 67.0 68
2 2 27 67.0 69
3 3 27 67.0 70
4 4 27 67.0 71
5 5 27 67.0 72
6 6 27 67.0 73
7 7 28 73.5 74
8 8 28 73.5 75
9 9 28 73.5 76
p_alive = bgm.expected_probability_alive(data=customer_1516_history)
az.plot_hdi(customer_1516_history["T"], p_alive, color="C0")
plt.plot(customer_1516_history["T"], p_alive.mean(("draw", "chain")), marker="o")
plt.axvline(
    customer_1516_history["recency"].iloc[0], c="black", ls="--", label="Purchase"
)
plt.axvline(customer_1516_history["recency"].iloc[-1], c="black", ls="--")

plt.title("Probability Customer 1516 will purchase again")
plt.xlabel("T")
plt.ylabel("p")
plt.legend();
../../_images/2653fba41c4261250645b44674ea95cd6864e9ecfa82932dbd65eb90bb02a830.png

From the plot above, say that customer 1516 makes a purchase at week 73.5, just over 6 weeks after they have made their last recorded purchase. We can see that the probability of the customer returning quickly goes back up!

Estimating Customer Lifetime Value Using the Gamma-Gamma Model#

Until now we’ve focused mainly on transaction frequencies and probabilities, but to estimate economic value we can use the Gamma-Gamma model.

The Gamma-Gamma model assumes at least 1 repeat transaction has been observed per customer. As such we filter out those with zero repeat purchases.

nonzero_data = rfm_data.query("frequency>0")
nonzero_data
customer_id frequency recency T monetary_value
0 1 3.0 49.0 78.0 23.723333
1 2 1.0 2.0 78.0 11.770000
5 6 14.0 76.0 78.0 76.503571
6 7 1.0 5.0 78.0 11.770000
7 8 1.0 61.0 78.0 26.760000
... ... ... ... ... ...
2351 2352 1.0 47.0 66.0 14.490000
2352 2353 2.0 53.0 66.0 19.775000
2353 2354 5.0 24.0 66.0 44.928000
2354 2355 1.0 44.0 66.0 24.600000
2355 2356 6.0 62.0 66.0 31.871667

1126 rows × 5 columns

If computing the monetary value from your own data, note that it is the mean of a given customer’s value, not the sum. monetary_value can be used to represent profit, or revenue, or any value as long as it is consistently calculated for each customer.

The Gamma-Gamma model relies upon the important assumption there is no relationship between the monetary value and the purchase frequency. In practice we need to check whether the Pearson correlation is less than 0.3:

nonzero_data[["monetary_value", "frequency"]].corr()
monetary_value frequency
monetary_value 1.000000 0.052819
frequency 0.052819 1.000000

Transaction frequencies and monetary values are uncorrelated; we can now fit our Gamma-Gamma model to predict average spend and expected lifetime values of our customers

The Gamma-Gamma model takes in a ‘data’ parameter, a pandas DataFrame with 3 columns representing Customer ID, average spend of repeat purchases, and number of repeat purchase for that customer. As with the BG/NBD model, these parameters are given HalfFlat priors which can be too diffuse for small datasets. For this example, we will use the default priors, but other priors can be specified just like with the BG/NBD example above.

gg = clv.GammaGammaModel(data=nonzero_data)
gg.build_model()
gg
Gamma-Gamma Model (Mean Transactions)
         p ~ HalfFlat()
         q ~ HalfFlat()
         v ~ HalfFlat()
likelihood ~ Potential(f(q, p, v))
gg.fit();
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p, q, v]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
gg.fit_summary()
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
p 4.799 0.750 3.456 6.195 0.025 0.022 929.0 1099.0 1.0
q 3.933 0.280 3.374 4.409 0.010 0.008 893.0 928.0 1.0
v 23.707 5.264 14.324 33.157 0.189 0.186 818.0 970.0 1.0
az.plot_posterior(gg.fit_result);
../../_images/86511cdfd451f24985c5f876b7b3926b01e53ced997571370413e27a93a1f00b.png

Predicting spend value of customers#

Having fit our model, we can now use it to predict the conditional, expected average lifetime value of our customers, including those with zero repeat purchases.

expected_spend = gg.expected_customer_spend(data=rfm_data)
az.summary(expected_spend.isel(customer_id=range(10)), kind="stats")
mean sd hdi_3% hdi_97%
x[1] 26.115 0.438 25.339 26.975
x[2] 21.643 1.325 19.206 24.174
x[3] 37.612 0.891 35.961 39.305
x[4] 37.612 0.891 35.961 39.305
x[5] 37.612 0.891 35.961 39.305
x[6] 74.829 0.368 74.169 75.483
x[7] 21.643 1.325 19.206 24.174
x[8] 30.901 0.605 29.739 32.028
x[9] 36.430 0.153 36.148 36.727
x[10] 37.612 0.891 35.961 39.305
labeller = MapLabeller(var_name_map={"x": "customer"})
az.plot_forest(
    expected_spend.isel(customer_id=(range(10))), combined=True, labeller=labeller
)
plt.xlabel("Expected mean spend");
../../_images/ea0fca4043be6ae4862d7b7aa1071408c58edb812039617d2f2ee6301ee2f048.png

We can also look at the average expected mean spend across all customers

az.summary(expected_spend.mean("customer_id"), kind="stats")
mean sd hdi_3% hdi_97%
x 37.998 0.562 36.966 39.073
az.plot_posterior(expected_spend.mean("customer_id"))
plt.axvline(expected_spend.mean(), color="k", ls="--")
plt.title("Expected mean spend of all customers");
../../_images/1c0c38c99d5881225e243d71ee56a6df95333fa0b4fbab7872aedfa9e81d4237.png

Predicting spend value of a new customer#

az.plot_posterior(gg.expected_new_customer_spend())
plt.title("Expected mean spend of a new customer");
../../_images/2e99f74b2feee3e3f6f22e81f10845ff0f3dbf1029cd377244277ed94eb90c1a.png

Estimating CLV#

Finally, we can combine the GG with the BG/NBD model to obtain an estimate of the customer lifetime value. This relies on the discounted cash flow model, adjusting for cost of capital.

If computational issues are encountered, use the thin_fit_result method prior to estimating CLV.

bgm.thin_fit_result(keep_every=2)
BG/NBD
            alpha ~ HalfNormal(0, 10)
                a ~ HalfNormal(0, 10)
                b ~ HalfNormal(0, 10)
                r ~ HalfNormal(0, 10)
recency_frequency ~ BetaGeoNBD(a, b, r, alpha, <constant>)
clv_estimate = gg.expected_customer_lifetime_value(
    transaction_model=bgm,
    data=rfm_data,
    future_t=12,  # months
    discount_rate=0.01,  # monthly discount rate ~ 12.7% annually
    time_unit="W",  # original data is in weeks
)
az.summary(clv_estimate.isel(customer_id=range(10)), kind="stats")
mean sd hdi_3% hdi_97%
x[1] 29.244 1.049 27.306 31.182
x[2] 3.102 0.311 2.540 3.708
x[3] 5.615 0.224 5.205 6.047
x[4] 5.615 0.224 5.205 6.047
x[5] 5.615 0.224 5.205 6.047
x[6] 501.233 16.738 470.280 531.458
x[7] 4.080 0.352 3.414 4.735
x[8] 16.210 0.437 15.358 16.984
x[9] 46.878 1.312 44.499 49.302
x[10] 5.615 0.224 5.205 6.047
az.plot_forest(
    clv_estimate.isel(customer_id=range(10)), combined=True, labeller=labeller
)
plt.xlabel("Expected CLV");
../../_images/ac8fd90751faa54acde779cbe820b9286ef80e4a51184961de345b8fc5345983.png

According to our models, customer[6] has a much higher expected CLV. There is also a large variability in this estimate that arises solely from uncertainty in the parameters of the BG/NBD and GG models.

In general, these models tend to induce a strong correlation between expected CLV and uncertainty. This modelling of uncertainty can be very useful when making marketing decisions.

%load_ext watermark
%watermark -n -u -v -iv -w -p pymc,pytensor
Last updated: Sun Jul 27 2025

Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.4.0

pymc    : 5.25.1
pytensor: 2.31.7

arviz         : 0.22.0
pandas        : 2.3.1
numpy         : 2.2.6
pymc_marketing: 0.15.1
matplotlib    : 3.10.3
pytensor      : 2.31.7
pymc_extras   : 0.4.0

Watermark: 2.5.0