Best Practices in Measuring Analytic Project Performance / Success

In the 5th Annual Survey (2011) data miners shared their best practices in how they measure analytics project performance / success. 236 data miners shared their best practices. There is great diversity in data miners' performance / success measurement methodologies, and many data miners described using multiple measurement techniques. The five methodologies mentioned by the most data miners were:

  • Model performance (accuracy, F, ROC, AUC, lift)
  • Financial performance (ROI and other financial measures)
  • Performance in a control or other group
  • Feedback from users, clients, or management
  • Cross-validation

Below are examples of the best practices they shared. As you can see, some of the richest descriptions include measurements that cross several of these categories. Complete lists of the data miners' verbatim best practice descriptions are also available by following the links below.

Model Performance

Fifty-three data miners participating in the 5th Annual Survey described best practice methodologies that included measures of model performance. (All 53 responses can be seen here.)

Selected examples:

  • Cross-validation and sliding-window validation during model training and data mining process and parameter optimization. Metrics: accuracy, recall, precision, ROC, AUC, lift, confidence, support, conversion rates, churn rates, ROI, increase in sales volume and profit, cost savings, run times, etc. Continuous monitoring of model performance metrics. Use of control groups and independent test sets.
  • Standard statistical measurements (KS, ROC, R-square etc.), profitability metrics, loss impact etc.
  • Two phases: 1.- There are expected results for AUC or K-S test in order for them to be accepted by the supervisor (in Credit Scoring is the banking supervisor). 2.- Once implemented, we recommend conducting stress testing and back testing at least once a month, and we've developed tools to alert the users of potential disruption in the original patterns of the model.
  • For classification models, I use AUC, for regression models I use RMSE. Everything is cross-validated during model building, and performance is then assessed on a hold-out sample.
  • Model quality: standard performance measures such as precision, recall, accuracy, etc. Model complexity: memory usage & computation time.

Financial Performance

Forty-three data miners participating in the 5th Annual Survey described best practice methodologies that included measures of financial performance. (All 43 responses can be seen here.)

Selected examples:

  • We measure ROI, cost, gain, model accuracy, precision, recall, ROC, AUC, lift charts, and customized metrics. The focus is on the benefit for the business and for the customer.
  • Longitudinal validation based on hard, objective outcomes, preferably financial where sensible and achievable.
  • We now know approximately how much it will cost to develop a workable solution. We factor that cost into the feasibility. We also continuously refine our cost/benefit analysis throughout the evolution of the project. In the end, success is what the sponsor and team says it is. It does not always end up as increased revenue or lowered costs.
  • I work in Lean Six Sigma, so we routinely quantify the financial benefits of analytic projects and opportunities.
  • Real-world analysis of results, almost always tied to a financial measure (i.e., something that can be expressed in dollars or readily converted to dollars).

Performance in Control or Other Group

Thirty-five data miners participating in the 5th Annual Survey described best practice methodologies that included measures of performance in control or other groups. (All 35 responses can be seen here.)

Selected examples:

  • Evaluate model accuracy using cross-validation, or out-of-bag samples, or hold out data (if data set truly large). Once happy with method, conduct pilot study to measure accuracy to make sure model works in real environment.
  • Model Performance 1. Overall accuracy on a validation data set 2. Sensitivity and Specificity 3. ROC curve. Analytic Project Success 1. Significant increase in rates of marketing returns 2. Adoption of the model by the pertinent business unit.
  • Out of sample performance. Ease of implementation. Understanding & buy-in from organization.
  • Always against a hold out control group and tracked over time and multiple campaigns.

Other Measures of Analytic Success

In the 5th Annual Survey, twenty-nine data miners described best practice methodologies that included measures of feedback from users, clients, or management. Fourteen data miners described best practice methodologies that included cross-validation. And 131 data miners described best practice methodologies with other measures. They provided insights into a diverse set of best practices. (Verbatim responses can be seen here.)

Selected examples:

  • Always build (at least) one metric which measures the experience from the most granular (boots on the ground) user. You have to have a metric that connects to what your user's experience (read: complaint) is.
  • Positive feedback and changed habits.
  • We continuously monitor Cross-validation results. Feeding new incoming data for that the outcome is known into the validation loop. On decrease of performance, model is automatically rebuilt and optimized.
  • Diagnostics, cross-validation, post market evaluation.
  • We track the performance of the direct response models on a daily basis as campaigns are in the field and look for segments where the model seems to be under-performing. Using this data, we sometimes re-train or build secondary models to compensate.
  • Conviction rate
  • Comparative analysis of manufacturing results against laboratory results.
  • Defects/1000; by months-in-service.