Copyright (c) 2011 Rexer Analytics All Rights Reserved
2008 Data Miner Survey:
2008 Data Miner Survey:  

Thank you for your interest in the 2nd Annual Rexer Analytics Data Miner Survey.  

This research examined the analytic behaviors, needs, preferences, and views of
data mining professionals.  It was conducted as a service to the data mining
community.  It was not conducted for, or sponsored by, any third party.  Rexer
Analytics is committed to freely disseminating our research findings through report
summaries, conference presentations, and personal contact.  If you would like a
copy of our 40 page summary report, please contact us at
DataMinerSurvey@RexerAnalytics.com.  Summaries of this research were also
presented at the November 2008 SPSS Directions Conference and the December
2008 Oracle BIWA Summit.  







































This survey has been conducted since 2007.  Highlights for each year are available
online.  
Contact us to receive the full summary reports (FREE).
2008 HIGHLIGHTS:

  • 34-item survey of data miners, conducted on-line in early 2008

  • 348 responses from individuals in 44 countries

  • The most commonly used algorithms are decision trees, regression, and
    cluster analysis.  The use of time series and survival analysis increased this
    year.

  • Dirty data, data access issues, and explaining data mining to others remain
    the top challenges faced by data miners

  • Data miners are most likely to use descriptive stats, outlier detection, and
    face validity to identify / address dirty data

  • Data miners spend only 20% of their time on actual modeling.  More than a
    third of their time is spent accessing and preparing data.

  • Data mining is playing an important role in organizations.  Half of data miners
    indicate their results are helping to drive strategic decisions and operational
    processes.

  • The most prevalent concerns with how data mining is being utilized are:
    resistance to using data mining in contexts where it would be beneficial,
    insufficient training of some data miners, and lack of model refreshing

  • SPSS Clementine was identified as the primary software used by more data
    miners than any other software product.   SPSS and SAS continue to
    dominate the software market.  However, Statistica, R, and the Salford
    products saw increased usage this year.

  • In selecting their analytic software, data miners place a high value on
    dependability, the ability to handle very large datasets, and quality output

  • The findings vary somewhat depending on the domain in which the data
    miner works, the tools used, geography, and several other dimensions