Today’s study share , Enclosed please find !
RFM Model is an important tool to measure customer value and customer profitability . In numerous customer relationship management (CRM) In the analysis mode of ,RFM Models are widely mentioned .
therefore RFM Model is the knowledge that data analysts must master , And this article details RFM While modeling , And it comes with kaggle Project practice , Collect this article , You’re afraid you don’t understand RFM Model , Don’t you know how to classify users ?
RFM The model is based on a customer’s recent purchase behavior 、 The overall frequency of the purchase and how much it cost 3 Indicators to describe the value of the customer .
R value :Recency, Last consumption
- The last consumption refers to the time from the last purchase to the current time . For example, when was the last time I bought a car , When was the last time I bought a record .
Theoretically , The customer who spent the last time more recently should be a better customer , They are also most likely to respond to the provision of immediate goods or services . If marketers want to grow their performance , Only by stealing the market share of competitors , If we should pay close attention to consumers’ purchase behavior , So the latest consumption is the first tool that marketers should use .
function :R The function of value is not only to provide promotion information , Marketers’ consumption reports can monitor the soundness of the business . Good marketers regularly check consumption analysis , To grasp the trend . If the monthly report shows customers who bought very close last time ,( Consumption is 1 Months ) If the number increases , It means that the company is a steady growth company ; conversely , Last time, fewer and fewer customers spent one month , It is a sign that the company is on an imperfect road .
F value :Frequency, Consumption frequency
- Consumption frequency is the number of times a customer has purchased in a limited period of time . We can say the most frequent customers , They are often the most satisfied customers . Increasing the number of times customers buy means stealing market share from competitors , Earn turnover from others .
Based on this indicator , We can divide our customers into five equal parts : If the customer who purchases once is a new customer , Customers who buy twice are potential customers , Customers who buy three times are old customers , Customers who buy four times are mature customers , If you buy five or more times, you are a loyal customer . The goal of operators is to let consumers upgrade .
Be careful : The consumption frequency of different types of goods often has a large gap , Such as wedding products and snacks , The former is often bought almost once ( More society will be chaotic, ha ha ), The latter are consumables , Consume comments and engage in , It is relatively easy to repeat purchases , So F Value is not suitable for cross category comparison .
M value :Monetary, Consumption amount
- The amount of consumption is the same as the frequency of consumption , There is a limited time frame , It means a period of time ( Usually 1 year ) Consumption amount in . It can also verify Pareto’s law ( Commonly known as the 28 law ), namely 80% Our income comes from 20% The customer .
M The value is RFM Relative to R Values and F Value is the hardest to use , But the most valuable indicator . Beauty products of the same brand , The price fluctuation range is basically within the acceptable range of a specific consumer group , In addition, the purchase frequency of a single category is not high , So for general stores ,M Value has a relatively weak effect on customer segmentation .
be based on RFM Model for customer segmentation
CRM In practice, you can choose RFM In the model 1-3 Customer segmentation based on three indicators , As shown in the following table . Keep in mind that the breakdown indicators need to be within a reasonable range that you can control , Not more is better , Once the user subdivides too many groups , First, it will bring greater difficulty to the implementation of their own marketing plan , In the future, you may miss the user group or cause multiple interruptions to the same user .
There are two reference criteria for how to select the final index : The customer base of the store , The goods and customer structure of the store .
The customer base of the store : When the number of customers in the store is small , choice 1-2 One dimension can be subdivided ; On the contrary, you can choose 2-3 The user uses two indicators .
The goods and customer structure of the store : If the commodity level in the store is relatively single , When the difference of customer unit price is small , Purchase frequency (F value ) And the amount of money spent (M value ) Highly correlated cases , You can only choose the purchase frequency that is easy to operate (F value ) Substitute consumption amount (M value ). For stores that have just opened and have not yet formed customer stickiness , You can give up the purchase frequency (F value ), Directly use the last consumption (R value ) Or the amount of consumption (M value ).
adopt RFM Output the target user after scoring the model
RFM The model score mainly has three parts :
-
determine RFM Segments of the three indicators and the score of each segment ;
-
Calculate each customer RFM The score of the three indicators ;
-
Calculate the total score of each customer , And select high-quality customers according to the total score
Take the picture above as an example .
At this point, we add the scores obtained by each user under each index , You can get the final score .
But what we need to pay attention to here is , For each score corresponding to each indicator, it should not be the same as the above figure , Further assignment shall be made according to different stores ( Listen to other netizens say you can use AHP Analytic hierarchy process , I haven’t learned about ).
also , When adding, it is best to set a weight for each index first , For example, the final calculation formula can be :score = 0.5R+0.3F+0.2M.
For specific weight settings, please refer to the above-mentioned Two reference standards .
be based on RFM Common strategies for
RFM It is very suitable for enterprises that provide a variety of goods , The unit price of these goods is relatively low , Or complementary to each other , It is necessary to buy repeatedly , These enterprises may provide the following goods : Consumer goods 、 clothing 、 Small appliances, etc ;RFM It also applies to such enterprises , They provide both high-value durable goods 、 At the same time, it also provides supporting parts or maintenance services , as follows : Precision machine tool 、 Complete set of production equipment 、 Printers, etc ;RFM For commodity wholesale 、 Trade in raw materials 、 And some service industries ( Such as travel 、 insurance 、 transport 、 Courier 、 Entertainment, etc ) It is also suitable for enterprises .
RFM It can be used to increase the number of transactions of customers . Commonly used in the industry DM( Direct mail ), Often send thousands of mail order lists at a time , In fact, this is a waste of money . According to the statistics ( In terms of general mail order daily necessities ), If all R(Recency) Our customers are divided into five levels , The best response rate of level 5 is three times that of level 4 , Because these customers have just completed the transaction , Therefore, we will pay more attention to the product information of the same company . If you use M(Monetary) To divide customers into five levels , The best and second best average response rate , Almost no significant difference .
Some people will use the absolute contribution of customers to analyze whether customers are lost , But absolute amounts sometimes misinterpret customer behavior . Because the price of each commodity may be different , There are different discounts for the promotion of different products , So the relative grading ( for example R、F、M Are divided into five levels ) To compare the changes of consumers in the level range , It can show relative behavior . For enterprises R、F The change of , We can infer the change of customer consumption , According to the possibility of customer churn , List customers , Again from M( Consumption amount ) From the perspective of , You can focus on customers with high contribution and high loss opportunities , Focus on visiting or contacting , Recover more business opportunities in the most effective way .
Add
The above three indicators will subdivide the dimensions 4 Share , In this way, we can subdivide 4x4x4=64 Class user , Then according to each type of user precision marketing …… obviously 64 Such users are beyond the computing scope of ordinary human brain , Not to mention targeting 64 Customized marketing strategy for class users . In practice , We just need to do two points for each dimension once , In this way 3 In two dimensions, we still get 8 Group users .
( Numbering sequence RFM,1 On behalf of the high ,0 Represents low )
Important value customers (111): The recent consumption time is close 、 The frequency and amount of consumption are very high , Must be VIP ah !
Important customer retention (011): Recently, the consumption time is far away , But the consumption frequency and amount are very high , It shows that this is a loyal customer who hasn’t been here for a while , We need to take the initiative to keep in touch with him .
Key development customers (101): Recently, the consumption time is relatively close 、 The consumption amount is high , But the frequency is not high , Low loyalty , Potential users , We must focus on the development of .
Important retention customers (001): Recently, the consumption time is far away 、 Consumption frequency is not high , But users with high consumption amount , It may be users who are about to lose or have lost , Retention measures should be given .
Project source :
www.kaggle.com/carrie1/ecommerce-data
Project brief introduction :
This is a cross-border data set , It includes stores registered in the UK on 2010 year 12 month 1 solstice 2011 year 12 month 9 All online retail transactions that occur between days . The company mainly sells unique all-weather gifts , Many customers are wholesalers .
The main purpose of this project is to use RFM Model for user classification .
PS: This project is in jupyter Running on .
The import module
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
Load data
df = pd.read_csv('data.csv',encoding = 'ISO-8859-1', dtype = {
'CustomerID':str})
Next, let’s officially start the analysis !
1 Data exploration and data cleaning
1.1 Data exploration
df.shape
df
Data contains 541910 That’s ok ,8 A field , The field content is :
InvoiceNo: The order no. , Every deal has 6 It’s an integer , The return order number begins with a letter ’C’.
StockCode: Product number , from 5 An integer makes up .
Description: Product description .
Quantity: Product quantity , A minus sign indicates a return .
InvoiceDate: Order specific time .
UnitPrice: The unit price ( pound ), The price per unit product .
CustomerID: Customer number , Each customer number consists of 5 Digit composition .
Country: The name of the country , Each customer’s country / The name of the region .
df.info()
It’s not hard to see , We need to convert the date format , And conventional missing value statistics 、 De duplication and outlier detection and processing .
1.2 Missing value statistics
df.apply(lambda x :sum(x.isnull())/len(x),axis=0)
df.drop(['Description'],axis=1,inplace=True)
df.head()
df['CustomerID'] = df['CustomerID'].astype('str')
df.info()
df['CustomerID'] = df['CustomerID'].fillna('unknown')
1.3 Date format conversion
df['date'] = [x.split(' ')[0] for x in df['InvoiceDate']]
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].astype('datetime64[M]')
df[['date', 'month']]
1.4 duplicate removal
df = df.drop_duplicates()
df.shape
1.5 Exception handling
ad locum , We treat return orders as abnormal data ( That is, data with negative quantity or negative unit price ).
df[(df['Quantity']<0) | (df['UnitPrice']<0)]
df = df[(df['Quantity']>0) & (df['UnitPrice']>0)]
df[(df['Quantity']<0) | (df['UnitPrice']<0)]
2 The user classification
R_value = df.groupby('CustomerID')['date'].max()
R_value = (df['date'].max() - R_value).dt.days
R_value
F_value = df.groupby('CustomerID')['InvoiceNo'].nunique()
F_value
df['amount'] = df['Quantity'] * df['UnitPrice']
M_value = df.groupby('CustomerID')['InvoiceNo'].nunique()
M_value = df.groupby('CustomerID')['amount'].sum()
M_value
R_value.describe()
R_value.hist(bins = 30)
M_value.describe()
M_value.hist(bins = 30)
M_value.plot.box()
It can be seen that it is very uneven .
M_value[M_value<2000].hist(bins = 30)
F_value.quantile([0.1,0.2,0.3,0.4,0.5,0.9,1])
F_value.hist(bins = 30)
F_value.plot.box()
F_value[F_value<30].hist(bins = 30)
The same is very uneven .
R_bins = [0,30,90,180,360,720]
F_bins = [1,2,5,10,20,5000]
M_bins = [0,500,2000,5000,10000,200000]
First, R value :
R_score = pd.cut(R_value,R_bins,labels=[5,4,3,2,1],right=False)
R_score
Next is F value :
F_score = pd.cut(F_value,F_bins,labels=[1,2,3,4,5],right=False)
F_score
And finally M value :
M_score = pd.cut(M_value,M_bins,labels=[1,2,3,4,5],right=False)
M_score
Generate a new data frame and take a look :
rfm = pd.concat([R_score,F_score,M_score],axis=1)
rfm.rename(columns={
'date':'R_score','InvoiceNo':'F_score','amount':'M_score'},inplace=True)
rfm
rfm.info()
Change the data format :
rfm['R_score'] = rfm['R_score'].astype('float')
rfm['F_score'] = rfm['F_score'].astype('float')
rfm['M_score'] = rfm['M_score'].astype('float')
rfm.describe()
According to the average value, the value is divided under each index :
rfm['R'] = np.where(rfm['R_score']>3.82,' high ',' low ')
rfm['F'] = np.where(rfm['F_score']>2.03,' high ',' low ')
rfm['M'] = np.where(rfm['M_score']>1.89,' high ',' low ')
rfm
rfm['RFM']=rfm['R']+rfm['F']+rfm['M']
rfm
def rfm2grade(x):
if x==' Gao Gaogao ':
return ' High value customers '
elif x==' High and low ':
return ' Focus on developing customers '
elif x==' Low high high ':
return ' Focus on keeping customers '
elif x==' Low low high ':
return ' Focus on retaining customers '
elif x==' High and low ':
return ' General value customers '
elif x==' High and low ':
return ' General development clients '
elif x==' Low and high ':
return ' Generally keep customers '
else:
return ' Generally, keep customers '
rfm[' User level ']=rfm['RFM'].apply(rfm2grade)
rfm
3 Classification results
rfm[' User level '].value_counts()
rfm[' User level '].hist(figsize=(12,9))
rfm[' User level '].value_counts() / rfm[' User level '].value_counts().sum()
rfm[' User level '].value_counts().plot(kind = 'pie',
figsize = (15, 9),
autopct='%.1f%%',
title = 'RFM The user classification ',
textprops = {
'fontsize':8},
subplots=True)
plt.legend(loc=2, bbox_to_anchor=(1.05,1.0),borderaxespad = 0.)
4 Conclusions and suggestions
From the results of the proportion of user classification , High value customers and important development customers account for… Of the total 47%, It is an important source of company income .
-
High value customers (111)
RFM All three values are high , To provide vip service . -
Focus on developing customers (101)
Consumption frequency is low , But the other two values are high , We have to find a way to increase his consumption frequency , It is recommended to timely push the company’s activity information or new product related information to attract customers . -
Focus on keeping customers (011)
The recent consumption is far from the present time , That is to say F Low value , But the frequency and amount of consumption are high . This kind of user , It’s a loyal customer who hasn’t come for a while . You should take the initiative to keep in touch with him , Increase the repurchase rate . You can give coupons or push product discount information to increase the number of purchases . -
Focus on retaining customers (001)
The recent consumption time is far from now 、 Consumption frequency is low , But the amount of consumption is high . This kind of user , It’s going to drain , Take the initiative to contact users , Investigate what went wrong , And find a way to recover . Of course, you can also give coupons or push product discount information to increase the number of purchases . -
General development clients (100)
The company shall obtain the detailed data information of the customer , Understand the customer’s consumption attributes . It is recommended to carry out precision marketing and timely push product information to such customers .
Of course , The final marketing strategy should be based on the company’s own financial investment .
RFM You can’t use it too much , And customers who cause high transactions continue to receive letters . Every enterprise should design a customer contact frequency rule , For example, you should send a thank-you call or… Within three days or a week Email, And actively care about whether consumers have use problems , After one month, send out the inquiry on whether the use is satisfactory , After three months, provide cross selling suggestions , And began to pay attention to the possibility of customer loss , Constantly create opportunities to actively contact customers . thus , The opportunity for customers to buy again will also be greatly improved .
For the convenience of friends in need, run the code , I also put the complete code and data files on the network disk , A friend in need takes it by himself .
link :pan.baidu.com/s/1qzVvW2tFYquertL6jbWx8w
Extraction code :1024
Quote and thank you :www.jianshu.com/p/4b60880f24e2
Recommended Columns
machine learning : Share practical machine learning projects and common model explanations
Data analysis : Share data, analyze practical projects and sort out common skills
Read more here: Source link