Best Practices in Measuring Analytic Project Performance / Success:Model Performance Measures

In the 5th Annual Survey (2011) data miners shared their best practices in how they measure analytics project performance / success. The previous web page summarizes the most frequently mentioned measures. Since some of the richest descriptions contain measurements that cross several of the categories, a data miner's best practice description may appear in several of the verbatim lists (model performance, financial performance, outside group performance). The remaining verbatims are in the other best practice measures list.

Below is the full text of the best practice methodologies that included measures of model performance (accuracy, F, ROC, AUC, lift, etc.).

  • Cross-validation and sliding-window validation during model training and data mining process and parameter optimization. Metrics: accuracy, recall, precision, ROC, AUC, lift, confidence, support, conversion rates, churn rates, ROI, increase in sales volume and profit, cost savings, run times, etc. Continuous monitoring of model performance metrics. Use of control groups and independent test sets.
  • Standard statistical measurements (KS, ROC, R-square etc.), profitability metrics, loss impact etc.
  • Two phases: 1.- There are expected results for AUC or K-S test in order for them to be accepted by the supervisor (in Credit Scoring is the banking supervisor). 2.- Once implemented, we recommend conducting stress testing and back testing at least once a month, and we've developed tools to alert the users of potential disruption in the original patterns of the model.
  • For classification models, I use AUC, for regression models I use RMSE. Everything is cross-validated during model building, and performance is then assessed on a hold-out sample.
  • Model quality: standard performance measures such as precision, recall, accuracy, etc. Model complexity: memory usage & computation time.
  • We measure ROI, cost, gain, model accuracy, precision, recall, ROC, AUC, lift charts, and customized metrics. The focus is on the benefit for the business and for the customer.
  • Longitudinal validation based on hard, objective outcomes, preferably financial where sensible and achievable.
  • Metrics: model prediction accuracy, saved costs, gained increase in sales volume, gained increase in customer satisfaction, reduction of churn rate, ROI, gained insights; Best practice: ask for target metrics from day one on, i.e. as soon as talking about project and application requirements; measure project success along these metrics and optimize these metrics.
  • Accuracy of model predictions, ROI.
  • Try to translate results/lift in terms of money.
  • Evaluate model accuracy using cross-validation, or out-of-bag samples, or hold out data (if data set truly large). Once happy with method, conduct pilot study to measure accuracy to make sure model works in real environment.
  • Model Performance 1. Overall accuracy on a validation data set 2. Sensitivity and Specificity 3. ROC curve. Analytic Project Success 1. Significant increase in rates of marketing returns 2. Adoption of the model by the pertinent business unit.
  • In fact I am more on data mining deployment than on modeling. Hence, accuracy of model is important but I focus on: data mining application availability (ex: the score is computed when needed), the ability of the customers to efficiently use the application (I am working in a consultant firm).
  • Customer feedback on accuracy, reliability.
  • Cross validated precision, recall, F-measure.
  • Cross-validation, precision and recall index, ROC curve.
  • For model predictions, I use k-fold cross validation a AUC measures.
  • Accuracy of demand/service forecasts; impacts cost of subcontracting that may be required.
  • For supervised tasks: use of classical measures, such as precision+recall, FMesure etc. For unsupervised tasks: use of validity criteria from the literature.
  • Model lift, model robustness, explanatory variables.
  • Project length, Analysis Accuracy, Actionability, Scalability.
  • Project performance is evaluated based on the following metrics for model prediction: accuracy, sensitivity, and specificity. Also the McNemar's test is used to compare results and to estimate the significance of the results.
  • Sensitivity & Specificity, AUC.
  • Standard risk measures, lift and loss.
  • Uplift, stability of results though time, ability to handle data changes.
  • We have many measure that range from the "standard" like R^2 through proprietary measures of (non-financial) risk and can and do compare estimated outcomes to actual performance.
  • We normally measure the performance of the project by standard measurement in Text Mining such as precision, recall, F1, ROUGE which are easily to compute.
  • Accuracy over time.
  • Analysis of model success over time, performance by percentile.
  • Calculate lift by slices of 5% of scored customers.
  • Compare projection to actual results.
  • Constant follow up of predicted vs actual figures.
  • Empirical validation
  • Examine predicted vs actual outcomes.
  • For example Monthly comparison of model detected fraud and actual fraud.
  • I create myself my own lift metrics, and try to get executives familiar with the fact that we need to (1) measure lift, (2) use sound metric to measure lift.
  • I use ROC curves.
  • In predictive models we use RAUC to measure performance.
  • KPI on a monthly basis : based on Confusion Matrix. True Positive Rate, False Positive Rate, False Negative Rate, Gain Charts.
  • Lift charts and area under ROC.
  • Lift charts, ROC curves, RMSE - actual vs. predicted.
  • Lift in x%, ROC (area under curve).
  • Map actual results vs model predictions
  • Mean Average Percent Error for time series
  • Model performance - Lift and KS.
  • Monthly validation of all model scores against actual outcomes.
  • Percent correctly identified
  • Prediction capabilities.
  • Prediction vs. reality (time series).
  • Predictions will be matched against real data as it comes in.
  • ROC curve / error rate
  • The whole focus of our activity is on accuracy of performance. This is assessed simply as r2 of relationship between outcomes and predictions made ahead of time.
  • Tracking type I and Type II errors by implementing multivariate analysis methods vs standard SQC.