Best Practices in Measuring Analytic
Project Performance / Success
Best Practices in Measuring Analytic
Project Performance / Success

In the 5th Annual Survey (2011) data miners shared their best practices in how
they measure analytics project performance / success.  236 data miners shared
their best practices.  There is great diversity in data miners' performance / success
measurement methodologies, and many data miners described using multiple
measurement techniques.   The five methodologies mentioned by the most data
miners were:
  • Model performance (accuracy, F, ROC, AUC, lift)
  • Financial performance (ROI and other financial measures)
  • Performance in a control or other group
  • Feedback from users, clients, or management
  • Cross-validation

Below are examples of the best practices they shared.  As you can see, some of
the richest descriptions include measurements that cross several of these
categories.  Complete lists of the data miners' verbatim best practice descriptions
are also available by following the links below.


Model Performance:

Fifty-three data miners participating in the 5th Annual Survey described best
practice methodologies that included measures of model performance.  (
All 53
responses can be seen here.)

Selected examples

  • Cross-validation and sliding-window validation during model training
    and data mining process and parameter optimization.  Metrics:
    accuracy, recall, precision, ROC, AUC, lift, confidence, support,
    conversion rates, churn rates, ROI, increase in sales volume and profit,
    cost savings, run times, etc.  Continuous monitoring of model
    performance metrics.  Use of control groups and independent test sets.

  • Standard statistical measurements (KS, ROC, R-square etc.),
    profitability metrics, loss impact etc.

  • Two phases:  1.- There are expected results for AUC or K-S test in    
    order for them to be accepted by the supervisor (in Credit Scoring is the
    banking supervisor).  2.- Once implemented, we recommend
    conducting stress testing and back testing at least once a month, and
    we've developed tools to alert the users of potential disruption in the
    original patterns of the model.

  • For classification models, I use AUC, for regression models I use
    RMSE.  Everything is cross-validated during model building, and
    performance is then assessed on a hold-out sample.

  • Model quality: standard performance measures such as precision,
    recall, accuracy, etc.  Model complexity: memory usage & computation
    time.


Financial Performance:

Forty-three data miners participating in the 5th Annual Survey described best
practice methodologies that included measures of financial performance.  (
All 43
responses can be seen here.)

Selected examples:

  • We measure ROI, cost, gain, model accuracy, precision, recall, ROC,
    AUC, lift charts, and customized metrics. The focus is on the benefit for
    the business and for the customer.

  • Longitudinal validation based on hard, objective outcomes, preferably
    financial where sensible and achievable.

  • We now know approximately how much it will cost to develop a   
    workable solution. We factor that cost into the feasibility. We also
    continuously refine our cost/benefit analysis throughout the evolution of
    the project. In the end, success is what the sponsor and team says it is.
    It does not always end up as increased revenue or lowered costs.

  • I work in Lean Six Sigma, so we routinely quantify the financial benefits
    of analytic projects and opportunities.

  • Real-world analysis of results, almost always tied to a financial measure
    (i.e., something that can be expressed in dollars or readily converted to
    dollars).


Performance in Control or Other Group:

Thirty-five data miners participating in the 5th Annual Survey described best
practice methodologies that included measures of performance in control or other
groups.  (
All 35 responses can be seen here.)

Selected examples:

  • Evaluate model accuracy using cross-validation, or out-of-bag samples,
    or hold out data (if data set truly large).  Once happy with method,
    conduct pilot study to measure accuracy to make sure model works in
    real environment.

  • Model Performance  1. Overall accuracy on a validation data set           
    2. Sensitivity and Specificity  3. ROC curve.  Analytic Project Success  
    1. Significant increase in rates of marketing returns  2. Adoption of the
    model by the pertinent business unit.

  • Out of sample performance.  Ease of implementation.  Understanding &
    buy-in from organization.

  • Always against a hold out control group and tracked over time and
    multiple campaigns.


Other Measures of Analytic Success:

In the 5th Annual Survey, twenty-nine data miners described best practice
methodologies that included measures of feedback from users, clients, or
management.  Fourteen data miners described best practice methodologies that
included cross-validation.  And 131 data miners described best practice
methodologies with other measures.  They provided insights into a diverse  
set of best practices.  (
Verbatim responses can be seen here.)

Selected examples:

  • Always build (at least) one metric which measures the experience from
    the most granular (boots on the ground) user.  You have to have a
    metric that connects to what your user's experience (read: complaint) is.

  • Positive feedback and changed habits.

  • We continuously monitor Cross-validation results. Feeding new
    incoming data for that the outcome is known into the validation loop. On
    decrease of performance, model is automatically rebuilt and optimized.

  • Diagnostics, cross-validation, post market evaluation.

  • We track the performance of the direct response models on a daily
    basis as campaigns are in the field and look for segments where the
    model seems to be under-performing.  Using this data, we sometimes
    re-train or build secondary models to compensate.

  • Conviction rate

  • Comparative analysis of manufacturing results against laboratory
    results.

  • Defects/1000; by months-in-service.
Copyright (c) 2012 Rexer Analytics, All rights reserved