Data Science Survey

  • 2017 Survey

    The 2017 Data Science Survey data collection is now complete.

    Rexer Analytics Data Science Survey 2015
  • 2015 Survey

    1,220 analytic professionals from 72 countries participated in the 2015 survey

    Highlights:

    • CORE ALGORITHM TRIAD:  Regression, Decision Trees, and Cluster analysis remain the most commonly used algorithms in the field.

     

    • THE ASCENDANCE OF R:  76% of respondents report using R. This is up dramatically from just 23% in 2007. More than a third of respondents (36%) identify R as their primary tool.

     

    • JOB SATISFACTION:  Job satisfaction in the field remains high, but has slipped since the 2013 survey. A number of factors predict Data Scientist job satisfaction levels.

     

    • DEPLOYMENT:  Deployment continues to be a challenge for organizations, with less than two thirds of respondents indicating that their models are deployed most or all of the time. Getting organizational buy-in is the largest barrier to deployment, with real-time scoring and other technology issues also causing significant deployment problems.

     

    • TERMINOLOGY:  The term “Data Scientist” has surged in popularity with over 30% of us describing ourselves as data scientists now compared to only 17% in 2013.

    The full summary report includes additional material about algorithms and software usage, analytic goals, big data, work environments, and more.

  • 2013 Survey

    1,259 analytic professionals from 75 countries participated in the 2013 survey.

    Highlights:

    • FOCUS ON CRM:  In the past few years, there has been an increase among data miners in the already substantial area of customer-focused analytics. Respondents are looking for a better understanding of customers and seeking to improve the customer experience. This can be seen in their goals, analyses, big data endeavors, and in the focus of their text mining.

     

    • BIG DATA:  Many in the field are talking about the phenomena of Big Data. There are clearly some areas in which the volume and sources of data have grown. However, it is unclear how much Big Data has impacted the typical data miner. While data miners believe that the size of their datasets have increased over the past year, data from previous surveys indicate consistent dataset size over time.

     

    •  THE ASCENDANCE OF R:  The proportion of data miners using R is rapidly growing, and since 2010, R has been the most-used data mining tool. While R is frequently used along with other tools, an increasing number of data miners also select R as their primary tool.

     

    • CHALLENGES IN THE USE OF ANALYTICS:  Data miners continue to report challenges at each level of the analytic process. Companies often are not using analytics to their fullest and have continuing issues in the areas of deployment and performance measurement.

     

    • ENGAGEMENT & JOB SATISFACTION:  The Data Miners in our survey are highly engaged with the analytic community: consuming and producing content, entering competitions and searching for education and growth within their jobs. All of these activities lead to high job satisfaction, which has been increasing over time.

     

    • ANALYTIC SOFTWARE:  Data miners are a diverse group who are looking for different things from their data mining tools. Ease-of-use and cost are two distinguishing dimensions. Software packages vary in their strengths and features. STATISTICA, KNIME, SAS JMP and IBM SPSS Modeler all receive high satisfaction ratings.

    The full summary report includes additional material about algorithms and software usage, computing environments, text mining, and more.

  • 2011 Survey   +  “Best Practices” Verbatim Responses

    1,319 analytic professionals from over 60 countries participated in the 2011 survey.

    Highlights:

    • FIELDS & GOALS:  Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past five years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.

     

    • TEXT MINING:  A third of data miners currently report using text mining and another third plan to in the future. Text mining is most often used to analyze customer surveys and blogs/social media.

     

     

    • VISUALIZATION:  Data miners frequently use data visualization techniques. More than four in five use them to explain results to others. MS Office is the most often used tool for data visualization. Data visualization is less prevalent in the Asia-Pacific region.

     

     

    The full summary report includes additional material about algorithms and software usage, the fields applying analytics, text mining, computing environments, data visualization tools, job satisfaction, and more.

  • 2010 Survey   +  “Best Practices” Verbatim Responses

    735 analytic professionals from 60 countries participated in the 2010 survey.

    Highlights:

    • FIELDS & GOALS:  Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.

     

    • MODELS:  About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.

     

    • TOOLS:  After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.

     

    • TECHNOLOGY:  Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.

     

    The full summary report includes additional material about algorithms and software usage, tool selection priorities, data quality, model deployment, future trends, and more.

  • 2009 Survey

    710 analytic professionals from 58 countries participated in the 2009 survey.

    Highlights:

    • ALGORITHMS:  As in previous years, data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis.

     

    • ORGANIZATIONAL IMPORTANCE:  Half of data miners say their results are helping to drive strategic decisions and operational processes. 58% say they are adding to the knowledge base in the field.

     

    • IMPACT OF ECONOMY:  Most data miners feel that the economy will not negatively impact them.

     

    • CHALLENGES:  The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data. However, in 2009 fewer data miners listed data quality and data access as challenges than in the previous year.

     

    • TOOLS:  IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners. Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners. Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.

    The full summary report includes additional material about algorithms and software usage, the fields applying analytics, corporate analytic capabilities, analytic challenges, concerns, analytic success measurement, and more.

  • 2008 Survey

    348 analytic professionals from 44 countries participated in the 2008 survey.

    Highlights:

    • ADDRESSING CHALLENGES:  Dirty data, data access issues, and explaining data mining to others remain the top challenges faced by data miners. Data miners are most likely to use descriptive stats, outlier detection, and face validity to identify / address dirty data.

     

    • TIME ALLOTMENT:  Data miners spend only 20% of their time on actual modeling. More than a third of time is spent accessing and preparing data.

     

    • CONCERNS:  The most prevalent concerns with how data mining is being utilized are: resistance to using data mining in contexts where it would be beneficial, insufficient training of some data miners, and lack of model refreshing.

     

    • TOOLS:  SPSS Clementine was identified as the primary software used by more data miners than any other software product. SPSS and SAS continue to dominate the software market. However, Statistica, R, and the Salford products saw increased usage this year. In selecting their analytic software, data miners place a high value on dependability, the ability to handle very large datasets, and quality output.

    The full summary report includes additional material about algorithms and software usage, tool selection priorities, allocation of time across analytic tasks, analytic challenges, data quality, and more.

  • 2007 Survey

    314 analytic professionals from 35 countries participated in the inaugural 2007 survey.

    Highlights:

    • ALGORITHMS:  Regression, decision trees and cluster analysis were the most commonly used algorithms (mean number of algorithms used: 6.8).

     

    • CHALLENGES:  Top challenges data miners report are dirty data, data access, and explaining data mining to others.

     

    • TOOLS:  SPSS, SPSS Clementine, and SAS are the three most frequently utilized tools (mean number of tools used: 4.5). There is increasing interest in the Oracle Data Mining tool, and decreasing interest in C4.5/C5.0/See5. The primary factors data miners consider when selecting an analytic tool are: 1) the dependability and stability of software, 2) the ability to handle large data sets, and 3) data manipulation capabilities.

    The full summary report includes additional material about algorithms and software usage, tool selection priorities, allocation of time across analytic tasks, analytic challenges, data quality, and more.

© 2017 Rexer Analytics. All Rights Reserved.

Contact Us

30 Vine Street

Winchester, MA 01890