CRM Analysis for Lifecycle
CRM analysis is an analysis that begins by understanding the customer lifecycle. It starts with advertising, which generates leads—where leads are customers who initiate communication with customer service due to their interest in the ads.
In this dataset, whenever a customer clicks on an ad, they are counted as a lead because it directly leads to communication with customer service. These leads will then convert into purchasers if the transaction is successful.
Therefore, it’s necessary to generate dummy data consisting of ads data, leads data, and purchase data to perform this analysis.
import pandas as pd
import numpy as np
np.random.seed(42)
start_date = '2024-01-01'
end_date = '2024-12-31'
dates = pd.date_range(start=start_date, end=end_date)
sources = ['Google', 'TikTok', 'Facebook']
campaigns_per_source = 3
sku_list = [f'SKU{i:03d}' for i in range(1, 11)]
# Generate campaigns
campaigns = []
for source in sources:
for i in range(1, campaigns_per_source + 1):
campaigns.append((source, f'{source}_Camp{i}'))
# Generate customer_id
customer_counter = 1
ads_data = []
leads_records = []
purchase_records = []
transaction_id = 1
for date in dates:
for source, campaign in campaigns:
impressions = np.random.randint(1000, 10000)
clicks = np.random.randint(50, max(51, impressions // 10))
spend = round(np.random.uniform(100, 1000), 2)
# Leads = clicks, for unique customer_id
customer_ids = [f'CUST{customer_counter + i:07d}' for i in range(clicks)]
customer_counter += clicks
for cust_id in customer_ids:
sku_interest = np.random.choice(sku_list)
leads_records.append([date, cust_id, sku_interest, campaign, source])
# Conversion 64% from leads to purchase
num_purchase = int(clicks * 0.64)
if num_purchase > 0:
purchase_customers = np.random.choice(customer_ids, size=num_purchase, replace=False)
for cust_id in purchase_customers:
num_transactions = np.random.choice([1, 2, 3], p=[0.4, 0.4, 0.2])
last_purchase_date = date
for n in range(num_transactions):
if n == 0:
purchase_date = date
else:
gap_days = np.random.randint(7, 60)
purchase_date = last_purchase_date + pd.Timedelta(days=gap_days)
if purchase_date > dates[-1]:
purchase_date = dates[-1]
num_sku_bought = np.random.choice([1, 2], p=[0.7, 0.3])
skus_bought = np.random.choice(sku_list, size=num_sku_bought, replace=False)
for sku in skus_bought:
price = round(np.random.uniform(50, 500), 2)
qty = np.random.choice([1, 2, 3], p=[0.8, 0.15, 0.05])
purchase_records.append([
f'TXN{transaction_id:07d}',
purchase_date,
cust_id,
sku,
price,
qty,\
])
transaction_id += 1
last_purchase_date = purchase_date
ads_data.append([
campaign,
source,
date,
impressions,
clicks,
spend,
clicks,
round(spend / clicks, 2) if clicks > 0 else 0,
num_purchase
])
df_ads = pd.DataFrame(ads_data, columns=[
'campaign_id', 'source', 'date', 'impressions', 'clicks', 'spend', 'leads', 'cpl', 'conversions'
])
df_leads = pd.DataFrame(leads_records, columns=[
'lead_date', 'customer_id', 'sku_interest', 'campaign_id', 'source'
])
df_purchase = pd.DataFrame(purchase_records, columns=[
'transaction_id', 'purchase_date', 'customer_id', 'sku', 'price', 'qty'
])
After generating the data, I need to calculate the total purchase amount per customer, label whether a transaction is a first purchase, identify retention by checking for repeat purchases, and also flag second purchases if they occur within 45 days of the first purchase.
uniquetx = cek[['customer_id', 'purchase_date', 'transaction_id']].drop_duplicates().sort_values(['customer_id', 'purchase_date', 'transaction_id'])
uniquetx['purchase_order'] = uniquetx.groupby('customer_id').cumcount() + 1
# for labelling transaction as first_purchase or retention
uniquetx = uniquetx.sort_values('purchase_order', ascending=False)
uniquetx['first_purchase'] = np.where(uniquetx['purchase_order']==1, 'yes', '')
uniquetx['retention'] = np.where(uniquetx['purchase_order']>=2, 'yes', '')
# for labelling trnasaction of second purchase in range 65 days after first_purchase
uniquetx['gap_days_since_first'] = (
uniquetx['purchase_date'] - uniquetx.groupby('customer_id')['purchase_date'].transform('min')
).dt.days
uniquetx['second_purchase'] = np.where(
(uniquetx['gap_days_since_first'] <= 65) & (uniquetx['purchase_order'] == 2), 'yes', ''
)
and the result will be like this
transaction_id | purchase_date | customer_id | total | amount | purchase_order | first_purchase | retention | gap_days_since_first | second_purchase |
---|---|---|---|---|---|---|---|---|---|
TXN0000001 | 2024-01-01 | CUST0000016 | 466.90 | 538.54 | 1 | yes | 0 | ||
TXN0001878 | 2024-01-01 | CUST0001746 | 98.37 | 98.37 | 1 | yes | 0 | ||
TXN0001880 | 2024-01-01 | CUST0001696 | 148.38 | 148.38 | 1 | yes | 0 | ||
TXN0001882 | 2024-01-01 | CUST0001671 | 418.27 | 866.80 | 1 | yes | 0 | ||
TXN0001884 | 2024-01-01 | CUST0001677 | 303.46 | 500.86 | 1 | yes | 0 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
TXN1065894 | 2024-12-31 | CUST0927262 | 434.93 | 1634.36 | 2 | yes | 21 | yes | |
TXN0871594 | 2024-12-31 | CUST0758628 | 330.17 | 330.17 | 3 | yes | 85 | ||
TXN0936038 | 2024-12-31 | CUST0814528 | 495.26 | 495.26 | 3 | yes | 65 | ||
TXN0871616 | 2024-12-31 | CUST0758551 | 337.68 | 337.68 | 3 | yes | 85 | ||
TXN1132881 | 2024-12-31 | CUST0985650 | 460.04 | 460.04 | 1 | yes | 0 |
I also need tiering to complete the customer lifecycle analysis in CRM. I’ve designed the tiering segmentation based on customer purchase behavior in the last 6 months, with the following criteria:
Power User: Total purchases over 2000 and more than 6 transactions. Tier 3: Total purchases over 1000 with 4 or more transactions. Tier 2: Total purchases over 600 with 3 or more transactions. Tier 1: At least 1 purchase in the last 6 months, but total purchases under 550 and fewer than 2 transactions.
This segmentation helps in identifying high-value customers for targeted marketing and retention strategies.
cutoff_date = pd.to_datetime('2024-12-31') - pd.DateOffset(months=6)
recent_purchase = cek[cek['purchase_date'] >= cutoff_date]
customer_cltv = recent_purchase.groupby('customer_id').agg(
total_amount=('total', 'sum'),
freq=('transaction_id', 'nunique')
).reset_index()
def assign_tier(row):
if row['total_amount'] > 2000 or row['freq'] > 6:
return 'Power User'
elif row['total_amount'] > 1000 or row['freq'] >= 4:
return 'Tier 3'
elif row['total_amount'] > 600 or row['freq'] >= 3:
return 'Tier 2'
else:
return 'Tier 1'
customer_cltv['tier'] = customer_cltv.apply(assign_tier, axis=1)
customer_cltv =customer_cltv[['customer_id', 'tier']]
cek = pd.merge(cek, customer_cltv, on='customer_id', how='left')
cek['tier'] = cek['tier'].fillna('No Tier')
I applied fillna() to anticipate cases where customers don’t have any purchases within the last 6 months. This ensures that these customers are still included in the dataset with a default tier label (e.g., ‘No Tier’), making it easier to differentiate active from inactive customers during analysis. The result will be like this.
transaction_id | purchase_date | customer_id | total | amount | purchase_order | first_purchase | retention | gap_days_since_first | second_purchase | tier |
---|---|---|---|---|---|---|---|---|---|---|
TXN0000001 | 2024-01-01 | CUST0000016 | 466.90 | 538.54 | 1 | yes | 0 | No Tier | ||
TXN0001878 | 2024-01-01 | CUST0001746 | 98.37 | 98.37 | 1 | yes | 0 | No Tier | ||
TXN0001880 | 2024-01-01 | CUST0001696 | 148.38 | 148.38 | 1 | yes | 0 | No Tier | ||
TXN0001882 | 2024-01-01 | CUST0001671 | 418.27 | 866.80 | 1 | yes | 0 | No Tier | ||
TXN0001884 | 2024-01-01 | CUST0001677 | 303.46 | 500.86 | 1 | yes | 0 | No Tier | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
TXN1065894 | 2024-12-31 | CUST0927262 | 434.93 | 1634.36 | 2 | yes | 21 | yes | Tier 2 | |
TXN0871594 | 2024-12-31 | CUST0758628 | 330.17 | 330.17 | 3 | yes | 85 | Tier 3 | ||
TXN0936038 | 2024-12-31 | CUST0814528 | 495.26 | 495.26 | 3 | yes | 65 | Tier 2 | ||
TXN0871616 | 2024-12-31 | CUST0758551 | 337.68 | 337.68 | 3 | yes | 85 | Tier 3 | ||
TXN1132881 | 2024-12-31 | CUST0985650 | 460.04 | 460.04 | 1 | yes | 0 | Tier 1 |
But how do i know the conversion from ads to leads to purchase? I need to merge 2 tables (leads, and purchase, because for lifecycle only need 2 of these) and visualize it with funnel chart
leads = df_leads.groupby('lead_date')['customer_id'].nunique().sum()
df_leads_total = pd.DataFrame({'desc': ['Leads'], 'unique_customers': [leads]})
first_purchase = cek[cek['first_purchase'] == 'yes']['customer_id'].nunique()
df_first_purchase = pd.DataFrame({'desc': ['First Purchase'], 'unique_customers': [first_purchase]})
second_purchase = cek[cek['second_purchase'] == 'yes']['customer_id'].nunique()
df_second_purchase = pd.DataFrame({'desc': ['Second Purchase'], 'unique_customers': [second_purchase]})
tiering = cek[cek['tier'] != 'No Tier'].groupby('tier')['customer_id'].nunique().reset_index()
tiering = tiering.rename(columns={'tier': 'desc', 'customer_id': 'unique_customers'})
lifecycle_total = pd.concat([df_leads_total, df_first_purchase, df_second_purchase, tiering])
mapping = {
'Leads': 1,
'First Purchase': 2,
'Second Purchase': 3,
'Tier 1': 4,
'Tier 2': 5,
'Tier 3': 6,
'Power User': 7
}
lifecycle_total['sorted_by'] = lifecycle_total['desc'].map(mapping)
lifecycle_total = lifecycle_total.sort_values('sorted_by').reset_index(drop=True)
lifecycle_total
and the result must be like this
desc | unique_customers | sorted_by |
---|---|---|
Leads | 985,858 | 1 |
First Purchase | 629,389 | 2 |
Second Purchase | 377,702 | 3 |
Tier 1 | 206,636 | 4 |
Tier 2 | 100,726 | 5 |
Tier 3 | 49,896 | 6 |
Power User | 1,839 | 7 |
And now funnel left
import plotly.express as px
lifecycle_total = lifecycle_total.sort_values('sorted_by').reset_index(drop=True)
conversion_rates = [100]
for i in range(1, len(lifecycle_total)):
rate = lifecycle_total.loc[i, 'unique_customers'] / lifecycle_total.loc[i-1, 'unique_customers'] * 100
conversion_rates.append(rate)
lifecycle_total['conversion_rate_%'] = conversion_rates
fig = px.funnel(lifecycle_total, x='unique_customers', y='desc', title='Customer Lifecycle Funnel with Conversion Rates')
for i, row in lifecycle_total.iterrows():
fig.add_annotation(
x=row['unique_customers'] + 0.6,
y=row['desc'],
text=f"{row['conversion_rate_%']:.1f}%",
showarrow=False,
font=dict(color="black", size=12)
)
fig.show()
From the coding i got this funnel chart that can represent about lifecycle of customer from leads until tiering.
It is evident that the funnel from leads to power users gradually narrows.
Initially, there were 985K customers showing interest in purchasing the product. Out of those, 629K customers actually completed their first purchase. This means that 63.8% of the leads converted into buyers, while the remaining 36.2% decided not to proceed with the purchase despite showing interest through advertisements.
Furthermore, of the 63.8% who made their first purchase, only 60% (or 377K customers) made a repeat purchase within 2 months. This implies that, out of the original 985K leads, only 377K customers (around 30%) converted into repeat buyers within 2 months after their first purchase.
To improve conversion from interest to purchase, the 36.2% of leads who didn’t convert can be targeted through remarketing campaigns offering first-purchase promotions.
Meanwhile, to encourage repeat purchases from the 40% of customers who bought once but did not return within 2 months, strategies such as membership programs or “buy 1, get 1 free” promotions can be utilized.