What is scoring. Main.


What is scoring

Illustration

In many businesses and areas, it is very important to assess the likelihood of an event occurring. Each direction has its own event. For credit institutions and banks - the probability of default (non-payment of the loan). For marketing - the probability of a client's response after an advertising campaign, a certain contact with the client. For collection - the probability of fulfillment of obligations after a certain action by the collector for a particular debtor.
It is necessary to determine in advance the probability of an event occurring in order to minimize risks and reduce unjustified expenses.
For credit institutions, it is important to determine whether to issue a loan to a client at his request, because a credit institution earns on the interest received from the loan only if the client repays this loan without unnecessary additional costs. Accordingly, it is important to determine in advance the probability of default by the client.
For marketing directions and for collection activities, this is important in order to minimize costs in order to achieve the goal. After all, if you know in advance that a higher probability of a response from a particular client will be through email communication, then you don’t need to send SMS notifications or call him. Thus, it is possible for some clients to use cheaper communication channels with the same effect.
The best way to determine the likelihood of an event occurring is scoring.
The scoring system got its name from the English word "score". Attributes are used to determine the probability of the target event occurring and, depending on the value of the attribute, a certain number of points is assigned to it. When we know the values for all attributes (they can be both negative and positive), we can sum up the total number of points. It is this general assessment that is the probability of the target event occurring.
It seems that everything is quite simple, but in order to create the necessary scorecards and understand which value of which attribute how many points to assign, it is necessary to carry out a number of mathematical calculations on historical data.
Attributes can be:- personal data filled in by the client (On employment, social status, type of activity, age, gender, number of children, etc.);- data that you received from partners or from your system (the number of transactions by a client in the past with you or with competitors, the speed of filling out an application or questionnaire, the frequency of calls to the company and the purposes of the calls, the presence of a client in social networks or the activity of using them, GEO and others);
To be more precise, almost any information that you know or can learn about the client before your target action can act as attributes.
Scoring is most often used as:
Risky (risk check):- Application (the goal is the issuance of the product, according to the static data of the client's application and data on the client itself);- Behavioral (goal - issuance of the product, according to the behavior of the client, the frequency of calls, types of calls, the activity of using products, etc.);- Collection (as the compatibility of the two previous ones, but the purpose of scoring is to select activities with the client for less costly and more effective collection);- Fraud (the goal is to find the probability that the client is a fraudster).Marketing:- Response (what is the probability of a response to marketing activity);- Attrition (what is the probability of a customer churn, which customer will soon leave the service in the company);- Utilization (the goal is the maximum utilization of the company's products, their use and what actions of which customers will be motivated to do this);- Income (obtaining the highest profitability from which clients).

Initial data preparation

The construction of scoring maps is based on statistical models. To build them, there must be sufficient and high-quality information about customers. The quality of the initial statistical data for building a statistical model determines its forecasting accuracy and the success of the development of the scoring system as a whole.
The development of a scoring model is based on the analysis of previous experience. A sufficient amount of information is one of the main prerequisites for building a model. The amount of data may vary depending on specific models, but in general the data should satisfy the requirements of statistical significance and randomness.
Ideally, scoring models should be applied to the same products, market sector, and economic situation that formed the basis of the past experience data. For example, information on consumer loans cannot be adequately used in the development of a scorecard for car loans. This requirement specifies the period for which data is collected. The historical period of data for building a model is usually determined by the type of scoring and the type of product, the type of client. This means that if your customers are segmented, it is important to create separate scorecards not only for different products (loan, credit card, car loan, early debt collection, late debt collection, email advertising, sms advertising, telemarketing use), but and for different customer segments. The simplest client segmentation: a new client and a repeat client who has already used the services of the company. For them, completely different attributes may be important.
Therefore, the use of several scoring cards and models often gives a much better result.
It is best to take data as fresh as possible, but where the target event has already been committed. For example, you have issued a loan and your goal is that the client does not go into delay for more than 1 month. Then it is best to take data for the longest possible period, but only for the period where all customers, in theory, could either close the loan or go into arrears. It is desirable to avoid ambiguities as much as possible. Or you do SMS-mailing to clients, where the target task is to get the necessary activity from them during the week. Then you can use these clients for analysis at least after a week, not earlier.Data about a certain type of customer must be excluded from the source infobase. These may be atypical clients - scammers, employees, VIP clients, deceased clients, i.e. all those customers who are knocked out of the mass with something bright. For each such type, if necessary, it is better to build a separate scoring model.

Definition of dependent variable

The choice of the dependent variable is determined by the purpose of constructing the scoring model. For example, fact of the expire, response to activity from a specific channel, purchase of additional goods.At the stage of determining the dependent variable, customers are divided into three groups: "bad", "good" and "uncertain". Bad - those where the desired goal was not achieved. The good ones are those where the goal was achieved. Uncertain - those where the goal could not yet be achieved or there is little data on the data of clients, for example, an incomplete questionnaire with a lack of many attributes or the inability to calculate most of the data, etc.When building a scorecard, only clients defined as "bad" and "good" are used. Undefined clients are excluded from the source data to create the model.

Formation of training and test lists

The information data available for building a scoring model is often referred to as a historical sample. The historical sample should reflect the target population of customers as closely as possible, i.e. be representative. So after preparing this customer data, breaking it down into different segments by customer type, region, or product, you can move on to the next step.
To check the adequacy and accuracy of the scoring model prediction at the stage of its development, the historical sample must be divided into two groups:- training sample - observations, according to which the model will be directly built;- test or control sample - observations from which the value of the dependent variable will be known, but they will not participate in the construction of the model, but will be used to test the accuracy of the model prediction.
The training and control samples should be formed on the basis of a random selection mechanism, usually in the ratio of 70–80% and 30–20%, respectively, of the initial volume of the historical sample.Checking the reliability of the model consists in its application and comparison of the results on the control and test samples. The model should give correct predictions not only on the training set, but also in practice when it is applied. Typically, a two-sample model generalization strategy is used. Similar accuracy scores obtained on the training and test sets are a sign that in practice the scoring model will work in much the same way.
If the quality of the model is insufficient, then you can change the attributes, attribute weights, or coefficient weights for the final score to change the model and rerun the test.

Determination of the list size

It is possible to build a model on repeating dependencies. Accordingly, the larger the sample, the better. In practice, a sample size of at least 5,000–10,000 clients is recommended for model building. But there is a practice of building a model for 1,000 clients. In any case, try to use the maximum number of clients, but on the condition of dividing into the necessary segments and excluding those clients for which the data is not complete, as described above.
A 50/50 ratio of good to bad is best, but this is usually not possible. In any case, we do not recommend specifically making such a ratio for the sample. It is better to take everyone for a certain period, even if the ratio of bad ones to good ones is an order of magnitude smaller.
After preparing this data, you can find the necessary dependencies and build a scoring model, thereby significantly improving the achievement of desired results with lower costs.
Our “Scoring Machine” system will help you with this, thanks to which you can build a model simply by loading your historical data into it. And it's just as easy to test it. And if the quality of the model does not suit you, then you can change the coefficients for building the model and build it again with other coefficients in the algorithm. If you wish, you can change individual attributes in the finished model in the same way in manual mode.
Enter the portal to the “Scoring Machine” system and build a scoring model simply and easily!