Best Practices in Measuring Analytic
Project Performance / Success:
Model Performance Measures
Best Practices in Measuring Analytic
Project Performance / Success:
Model Performance Measures

In the 5th Annual Survey (2011) data miners shared their best practices in how
they measure analytics project performance / success.  The
previous web page
summarizes the most frequently mentioned measures.  Since some of the richest
descriptions contain measurements that cross several of the categories, a data
miner's best practice description may appear in several of the verbatim lists (model
financial performance, outside group performance).  The remaining
verbatims are in the
other best practice measures list.

Below is the full text of the best practice methodologies that included measures
of model performance (accuracy, F, ROC, AUC, lift, etc.).  

  • Cross-validation and sliding-window validation during model training
    and data mining process and parameter optimization.  Metrics:
    accuracy, recall, precision, ROC, AUC, lift, confidence, support,
    conversion rates, churn rates, ROI, increase in sales volume and profit,
    cost savings, run times, etc.  Continuous monitoring of model
    performance metrics.  Use of control groups and independent test sets.

  • Standard statistical measurements (KS, ROC, R-square etc.),
    profitability metrics, loss impact etc.

  • Two phases:  1.- There are expected results for AUC or K-S test in     
    order for them to be accepted by the supervisor (in Credit Scoring is the
    banking supervisor).  2.- Once implemented, we recommend
    conducting stress testing and back testing at least once a month, and
    we've developed tools to alert the users of potential disruption in the
    original patterns of the model.

  • For classification models, I use AUC, for regression models I use
    RMSE.  Everything is cross-validated during model building, and
    performance is then assessed on a hold-out sample.

  • Model quality: standard performance measures such as precision,
    recall, accuracy, etc.  Model complexity: memory usage & computation

  • We measure ROI, cost, gain, model accuracy, precision, recall, ROC,
    AUC, lift charts, and customized metrics. The focus is on the benefit for
    the business and for the customer.

  • Longitudinal validation based on hard, objective outcomes, preferably
    financial where sensible and achievable.

  • Metrics: model prediction accuracy, saved costs, gained increase in
    sales volume, gained increase in customer satisfaction, reduction of
    churn rate, ROI, gained insights;    Best practice: ask for target metrics
    from day one on, i.e. as soon as talking about project and application
    requirements; measure project success along these metrics and
    optimize these metrics.

  • Accuracy of model predictions, ROI.

  • Try to translate results/lift in terms of money.

  • Evaluate model accuracy using cross-validation, or out-of-bag samples,
    or hold out data (if data set truly large).  Once happy with method,
    conduct pilot study to measure accuracy to make sure model works in
    real environment.

  • Model Performance  1. Overall accuracy on a validation data set            
    2. Sensitivity and Specificity  3. ROC curve.  Analytic Project Success  
    1. Significant increase in rates of marketing returns  2. Adoption of the
    model by the pertinent business unit.

  • In fact I am more on data mining deployment than on modeling. Hence,
    accuracy of model is important but I focus on:  data mining application
    availability (ex: the score is computed when needed), the ability of the
    customers to efficiently use the application (I am working in a consultant

  • Customer feedback on accuracy, reliability.

  • Cross validated precision, recall, F-measure.

  • Cross-validation, precision and recall index, ROC curve.

  • For model predictions, I use k-fold cross validation a AUC measures.

  • Accuracy of demand/service forecasts; impacts cost of subcontracting
    that may be required.

  • For supervised tasks: use of classical measures, such as
    precision+recall, FMesure etc.  For unsupervised tasks: use of validity
    criteria from the literature.

  • Model lift, model robustness, explanatory variables.

  • Project length, Analysis Accuracy, Actionability, Scalability.

  • Project performance is evaluated based on the following metrics for
    model prediction:  accuracy, sensitivity, and specificity.  Also the
    McNemar's test is used to compare results and to estimate the
    significance of the results.

  • Sensitivity & Specificity, AUC.

  • Standard risk measures, lift and loss.

  • Uplift, stability of results though time, ability to handle data changes.

  • We have many measure that range from the "standard" like R^2
    through proprietary measures of (non-financial) risk and can and do
    compare estimated outcomes to actual performance.

  • We normally measure the performance of the project by standard
    measurement in Text Mining such as precision, recall, F1, ROUGE
    which are easily to compute.

  • Accuracy over time.

  • Analysis of model success over time, performance by percentile.

  • Calculate lift by slices of 5% of scored customers.

  • Compare projection to actual results.

  • Constant follow up of predicted vs actual figures.

  • Empirical validation

  • Examine predicted vs actual outcomes.

  • For example Monthly comparison of model detected fraud and actual

  • I create myself my own lift metrics, and try to get executives familiar    
    with the fact that we need to (1) measure lift,  (2) use sound metric to
    measure lift.

  • I use ROC curves.

  • In predictive models we use RAUC to measure performance.

  • KPI on a monthly basis : based on Confusion Matrix.  True Positive
    Rate, False Positive Rate, False Negative Rate, Gain Charts.

  • Lift charts and area under ROC.

  • Lift charts, ROC curves, RMSE - actual vs. predicted.

  • Lift in x%, ROC (area under curve).

  • Map actual results vs model predictions

  • Mean Average Percent Error for time series

  • Model performance - Lift and KS.

  • Monthly validation of all model scores against actual outcomes.

  • Percent correctly identified

  • Prediction capabilities.

  • Prediction vs. reality (time series).

  • Predictions will be matched against real data as it comes in.

  • ROC curve / error rate

  • The whole focus of our activity is on accuracy of performance. This is
    assessed simply as r2 of relationship between outcomes and
    predictions made ahead of time.

  • Tracking type I and Type II errors by implementing multivariate analysis
    methods vs standard SQC.
Copyright (c) 2012 Rexer Analytics, All rights reserved