Copyright (c) 2011 Rexer Analytics All Rights Reserved
2008 Data Miner Survey:
2008 Data Miner Survey:
Thank you for your interest in the 2nd Annual Rexer Analytics Data Miner Survey.
This research examined the analytic behaviors, needs, preferences, and views of data mining professionals. It was conducted as a service to the data mining community. It was not conducted for, or sponsored by, any third party. Rexer Analytics is committed to freely disseminating our research findings through report summaries, conference presentations, and personal contact. If you would like a copy of our 40 page summary report, please contact us at DataMinerSurvey@RexerAnalytics.com. Summaries of this research were also presented at the November 2008 SPSS Directions Conference and the December 2008 Oracle BIWA Summit.
This survey has been conducted annually since 2007. Highlights for each year are available online. Contact us to receive the full summary reports (FREE).
34-item survey of data miners, conducted on-line in early 2008
348 responses from individuals in 44 countries
The most commonly used algorithms are decision trees, regression, and cluster analysis. The use of time series and survival analysis increased this year.
Dirty data, data access issues, and explaining data mining to others remain the top challenges faced by data miners
Data miners are most likely to use descriptive stats, outlier detection, and face validity to identify / address dirty data
Data miners spend only 20% of their time on actual modeling. More than a third of their time is spent accessing and preparing data.
Data mining is playing an important role in organizations.Half of data miners indicate their results are helping to drive strategic decisions and operational processes.
The most prevalent concerns with how data mining is being utilized are: resistance to using data mining in contexts where it would be beneficial, insufficient training of some data miners, and lack of model refreshing
SPSS Clementine was identified as the primary software used by more data miners than any other software product. SPSS and SAS continue to dominate the software market. However, Statistica, R, and the Salford products saw increased usage this year.
In selecting their analytic software, data miners place a high value on dependability, the ability to handle very large datasets, and quality output
The findings vary somewhat depending on the domain in which the data miner works, the tools used, geography, and several other dimensions