Last month we have summarized backtest results of a model we developed based on the performance data from European P2P lending platforms who use our application.

The model divides the population into five risk bands (A to E) and stands mostly on data from the Google account, LinkedIn and device recognition.


  1. PD for score buckets:

|Score Band |Score Range (xxx points) | Estimated PD | Observed PD

|A |950–1000 |<2.33% |< 3.1%

|B |920–949 |2.33% — 9.05% |3.1% — 8.1%

|C |670–919 |9.06% — 13.32% |8.2% — 14.3%

|D |520–669 |13.33% — 20% |14.4% — 23.6%

|E |0–519 |>20% |> 23.6%

– we used our standard performance definition: Bad customer is defined as 30DPD+, or 1 unpaid installment;

– performance is measured on loans originated post Aug’16 and till Mar’17 (6 months window), i.e. most recent performance data used;

– range represents span of PD in each bucket across various our clients representing different markets / geos.


2. Distribution of the scores:

|Score Band |Score |Score distribution |Most recent sample result

|A | 950–1000 |14% |10%

|B | 920–949 |18% |18%

|C | 670–919 |29% |30%

|D | 520–669 |24% |26%

|E | 0–519 |15% |16%

Observed score distribution is pretty stable (stability index is about 2%).


3. Model performance.

AUC 0.76

Gini 0.52

KS 0.44

Since implementation model has demonstrated high stability in performance metrics (1–2% deviation). Deviation is more significant if we look at individual client’s data (variation is within 10%), although this can be explained by impact of low volumes samples.

Characteristics of validation sample:

850 unique customers scored, 21.2% — Bad rate.