Best Practices in Measuring Analytic Project Performance / Success: Performance in Control or Other Group

In the 5th Annual Survey (2011) data miners shared their best practices in how they measure analytics project performance / success. The previous web page summarizes the most frequently mentioned measures. Since some of the richest descriptions contain measurements that cross several of the categories, a data miner's best practice description may appear in several of the verbatim lists (model performance, financial performance, outside group performance). The remaining verbatims are in the other best practice measures list.

Below is the full text of the best practice methodologies that included measures of performance in a control or other group.

  • Evaluate model accuracy using cross-validation, or out-of-bag samples, or hold out data (if data set truly large). Once happy with method, conduct pilot study to measure accuracy to make sure model works in real environment.
  • Model Performance 1. Overall accuracy on a validation data set 2. Sensitivity and Specificity 3. ROC curve. Analytic Project Success 1. Significant increase in rates of marketing returns 2. Adoption of the model by the pertinent business unit.
  • Out of sample performance. Ease of implementation. Understanding & buy-in from organization.
  • Always against a hold out control group and tracked over time and multiple campaigns.
  • Always backtesting of new models with unseen, more recent data. Model quality evaluation of most existing models on a monthly basis.
  • The best we can really do is wait several years and test it retrospectively. From the time of rollout to the time of being able to evaluate is at least 3 years, probably 4-5 to have confidence. The best we can do is evaluate it on test data before rollout.
  • Test and control
  • Cross-validation and sliding-window validation during model training and data mining process and parameter optimization. Metrics: accuracy, recall, precision, ROC, AUC, lift, confidence, support, conversion rates, churn rates, ROI, increase in sales volume and profit, cost savings, run times, etc. Continuous monitoring of model performance metrics. Use of control groups and independent test sets.
  • For classification models, I use AUC, for regression models I use RMSE. Everything is cross-validated during model building, and performance is then assessed on a hold-out sample.
  • Test & control groups. Incremental ROI gain.
  • Cross-validation, using independent test sets.
  • Campaign scores results / use of test samples or groups.
  • Churn management = Net Save % (Target vs Control methodology)
  • Efficiency of models. Using control group in deployment phase.
  • Sensitivity analysis, benchmarking.
  • Using treatment and control groups, matching, deploying pilot experiences.
  • We regularly conduct studies to review performance, insure data integrity, and maintain baseline measures. Many undergo peer review.
  • Champion - Challenger
  • Champion vs challenger methods to show incremental gain.
  • Compare model to independent test data.
  • Comparing control groups to scored groups.
  • Comparison of model to holdout sample or control group.
  • Consult with client to apply control groups.
  • Control Group
  • Control group comparison (with predictive models).
  • Control group comparison, model evaluation on testing data.
  • Control groups, control groups, control groups..! Control groups to determine real model prediction accuracy. Control groups to determine success of CRM activities.
  • Control groups; comparison to older models.
  • Next-Best-Offer: customers with data mining based NBO vs. customers in control group with random NBO.
  • Performance of predictive models for retention, cross-sell, or acquisition measured against hold out group.
  • Random controls and "naive" controls (comparison against what would have been done if the models hadn't been used, which usually differs from purely random sampling).
  • Statistically designed test and control groups.
  • Using hold out samples not used to build models to validate them.
  • We build models that are used by our clients. We test model performance before we provide the models to our clients, with performance information so the client knows how well they work when applied appropriately to new data.
  • Whenever possible, we perform split runs by comparing current practice with analytics-driven output.