Best Practices in Measuring Analytic
Project Performance / Success:
Performance in Control or Other
Best Practices in Measuring Analytic
Project Performance / Success:
Performance in Control or Other

In the 5th Annual Survey (2011) data miners shared their best practices in how
they measure analytics project performance / success.  The
previous web page
summarizes the most frequently mentioned measures.  Since some of the richest
descriptions contain measurements that cross several of the categories, a data
miner's best practice description may appear in several of the verbatim lists (
performance, financial performance, outside group performance).  The remaining
verbatims are in the
other best practice measures list.

Below is the full text of the best practice methodologies that included measures of
performance in a control or other group.  

  • Evaluate model accuracy using cross-validation, or out-of-bag samples,
    or hold out data (if data set truly large).  Once happy with method,
    conduct pilot study to measure accuracy to make sure model works in
    real environment.

  • Model Performance  1. Overall accuracy on a validation data set            
    2. Sensitivity and Specificity  3. ROC curve.  Analytic Project Success  
    1. Significant increase in rates of marketing returns  2. Adoption of the
    model by the pertinent business unit.

  • Out of sample performance.  Ease of implementation.  Understanding &
    buy-in from organization.

  • Always against a hold out control group and tracked over time and
    multiple campaigns.

  • Always backtesting of new models with unseen, more recent data.  
    Model quality evaluation of most existing models on a monthly basis.

  • The best we can really do is wait several years and test it
    retrospectively.  From the time of rollout to the time of being able to
    evaluate is at least 3 years, probably 4-5 to have confidence. The best
    we can do is evaluate it on test data before rollout.

  • Test and control

  • Cross-validation and sliding-window validation during model training
    and data mining process and parameter optimization.  Metrics:
    accuracy, recall, precision, ROC, AUC, lift, confidence, support,
    conversion rates, churn rates, ROI, increase in sales volume and profit,
    cost savings, run times, etc.  Continuous monitoring of model
    performance metrics.  Use of control groups and independent test sets.

  • For classification models, I use AUC, for regression models I use
    RMSE.  Everything is cross-validated during model building, and
    performance is then assessed on a hold-out sample.

  • Test & control groups. Incremental ROI gain.

  • Cross-validation, using independent test sets.

  • Campaign scores results / use of test samples or groups.

  • Churn management = Net Save % (Target vs Control methodology)

  • Efficiency of models. Using control group in deployment phase.

  • Sensitivity analysis, benchmarking.

  • Using treatment and control groups, matching, deploying pilot

  • We regularly conduct studies to review performance, insure data
    integrity, and maintain baseline measures.  Many undergo peer review.

  • Champion - Challenger

  • Champion vs challenger methods to show incremental gain.

  • Compare model to independent test data.

  • Comparing control groups to scored groups.

  • Comparison of model to holdout sample or control group.

  • Consult with client to apply control groups.

  • Control Group

  • Control group comparison (with predictive models).

  • Control group comparison, model evaluation on testing data.

  • Control groups, control groups, control groups..!  Control groups to
    determine real model prediction accuracy.  Control groups to determine
    success of CRM activities.

  • Control groups; comparison to older models.

  • Next-Best-Offer: customers with data mining based NBO vs. customers
    in control group with random NBO.

  • Performance of predictive models for retention, cross-sell, or acquisition
    measured against hold out group.

  • Random controls and "naive" controls (comparison against what would
    have been done if the models hadn't been used, which usually differs
    from purely random sampling).

  • Statistically designed test and control groups.

  • Using hold out samples not used to build models to validate them.

  • We build models that are used by our clients.  We test model
    performance before we provide the models to our clients, with
    performance information so the client knows how well they work when
    applied appropriately to new data.

  • Whenever possible, we perform split runs by comparing current practice
    with analytics-driven output.
Copyright (c) 2012 Rexer Analytics, All rights reserved