Insights from R Users
Copyright (c) 2012 Rexer Analytics, All rights reserved
Insights from R Users

Over the years, respondents to the 2007-2011 Data Miner Surveys have shown
increasing use of R.  In the
5th Annual Survey (2011) we asked R users to tell us
more about their use of R.  The question asked,
"If you use R, please tell us more
about your use of R.  For example, tell us why you have chosen to use R, why you
use the R interface you identified int he previous question, the pros and cons of R,
or tell us how you use R in conjunction with other tools."
 225 R users shared
information about their use of R.  They provided an enormous wealth of useful and
detailed information.  Below are the verbatim comments they shared.

  • Best variety of algorithms available, biggest mindshare in online data
    mining community, free/open source.  Previous con of 32-bit in-memory
    data limitation removed with release of 64-bit.  Still suffers some
    compared to other solutions in handling big data. Stat-ET plugin gives
    best IDE for R, using widely used Eclipse framework, and is available
    for free.  RapidMiner add-in that supports R is extremely useful, and
    increasingly used as well.

  • The main reason for selecting R, back in the late 1990's, was that it had
    the algorithms I needed readily available, and it was free of charge.
    After getting familiar with R, I have never seen a reason to exchange it
    for anything else. R is open-source, and runs on many different system
    architectures, so distributing code and results is easy. Compared to
    some latest commercial software I've evaluated, R is sluggish for
    certain tasks, and can't handle very large datasets (mainly because I do
    not have a 64-bit machine to work with). On top of that, to be really
    productive with R, one needs to learn other languages, e.g., SQL, but
    that's just how things are. Besides, knowledge of those other languages
    is needed anyway.

  • I've migrated to R as my primary platform. The fact that it's free, robust,
    comprehensive, extensible, and open.  More than offsets it's
    shortcomings - and those are disappearing over time anyway.  I'm using
    Revolution's IDE because it improves my productivity.  R has a
    significant learning curve, but once mastered, it is very elegant.  R's
    shortcomings are it's in-memory architecture and immature data
    manipulation operations (i.e. lack of a built-in SQL engine and inability
    to handle very large datasets).  Nevertheless, I use R for all analytic  
    and modeling tasks once I get my data properly prepared.  I usually
    import flat files in CSV format, but I am now using Revolution's XDF
    format more frequently.

  • Why I use R: Initial adoption was due to the availability of the
    randomForest package, which was the best option for random forest
    research, as rated on speed, modification possibility, model information
    and tweaking. Since adopting R, I have further derived satisfaction from
    the availability of open source R packages for the newest developments
    in statistical learning (e.g. conditional inference forests, boosting
    algorithms, etc.).  I use the R command line interface, as well as
    scripting and function creation, as this gives me maximum modification
    ability. I generally stay away from GUIs, as I find them generally
    restrictive to model development, application and analysis.  I use R in
    conjunction with Matlab, through the statconnDCOM connection
    software, primarily due to the very slow performance of the TreeBagger
    algorithm in Matlab.  Pros of R: Availability of state-of-the-art
    algorithms, extensive control of model development, access to model
    information, relatively easy access from Matlab.  Cons of R: Lack of
    friendly editor (such as Matlab's Mlint; although I am considering
    TinnR, and have tried Revolution R); less detailed support than Matlab.

  • 1.  I employ R-Extension in the Rapid Miner interface  2.  I use R for its
    graphing and data munging capability  3.  I use R for it's ability to
    scrape data from different sources (FTP, ODBC) and implement
    frequent and automated tasks    Pros:  -Highly customizable  -Great
    potential for growth and improved efficiencies  -Fantastic selection of
    packages - versatility  -Growing number of video tutorials and blogs
    where users are happy to share their code    CONS:  -x64 not fully
    integrated yet  -Steep learning curve  -Limited number of coding and
    output examples in package vignettes  -Difficult to setup properly
    ( examples are scarce online)  -Memory constraints with
    i386 32-bit  -Output - reporting design limitations suitable for the
    business environment.

  • I have about 12 years experience with R. It's free, the available libraries
    are robust and efficient. I mostly work with the command line, but I am
    moving towards RStudio because it's available both as a desktop
    application and a browser-based client-server tool set. I occasionally
    use Rcmdr. The only other tool I use as heavily as R is Perl - again, lots
    of experience with it, it's free, and there are thousands of available
    library packages.

  • The analytics department at my company is very new. Hence we  
    haven't yet decided which analytics-tool we'll be using. R is a great
    interim solution as it is free and compatible with a reasonable amount  
    of commercial analytics tools. The reason I use the standard R GUI  
    and script editor is simply because I haven't invested the time in trying
    out different GUI's, and I haven't really had any reason to. The
    advantage of using R (at least compared to SAS, which was the
    analytics-tool at my old job) is mainly that you have much more control
    over your data and your algorithms. The main problem with R is that it
    can't really handle the data sets we need to analyze. Furthermore, your
    scripts can tend to be a bit messy if you are not sure what kind of
    analysis or models you are going to use.

  • I use R for the diversity of its algorithms, packages.  I use Emacs for
    other tasks and it's a natural to use it to run R, and Splus for that   
    matter.  I usually do data preparation in Splus, if the technique I want to
    use is available in Splus I will do all the analysis in Splus.  Otherwise I'll
    export the data to R, do the analysis in R, export results to Splus where
    I'll prepare tables and graphs for presentations of the model(s).  The
    main drawback to R, in my opinion, is that R loads in live memory all
    the work space it is linked to which is a big waste of time and memory
    and makes it difficult to use R in a multi-users environment where
    typical projects consist of several very large data sets.

  • We continue to evaluate R.  As yet it doesn't offer the ease of use and
    ability to deploy models that are required for use by our internationally
    distributed modeling team.  "System" maintenance of R is too high a
    requirement at the moment and the enormous flexibility and range of
    tools it offers is offset by data handling limitations (on 32 bit systems)
    and difficulty of standardizing quick deployment solutions into our
    environment.  But we expect to continue evaluation and training on R
    and other open source tools.  We do, for instance, make extensive use
    of open source ETL tools.

  • I use R extensively for a variety of tasks and I find the R GUI the most
    flexible way to use it. On occasion I've used JGR and Deducer, but I've
    generally found it more convenient to use the GUI.    R's strengths are
    its support network and the range of packages available for it and its
    weaknesses are its ability to handle very large datasets and, on
    occasion, its speed.    More recently, with large or broad datasets I've
    been using tools such as Tiberius or Eureqa to identify important
    variables and then building models in R based on the identified

  • I'm using R in order to know why R is "buzzing" in analytical areas and
    to discover some new algorithms.  R has many problems with big data,
    and I don't really believe that Revolution can effectively support that. R
    language is not mature for production, but really efficient for research :
    for my personal researches, I also use SAS/IML programming (which is
    for me the real equivalent for R, not SAS/STAT).  I'm not against R, it's
    a perfect tool to learn statistics, but I'm not really for data mining : don't
    forget that many techniques used in data mining comes from
    Operational Research, in convergence with statistics.  Good language,
    but not really conceived for professional efficiency.

  • R is used for the whole data loading process (importing, cleaning,
    profiling, data preparation), the model building as well as for creating
    graphical results through other technologies like Python.  We use it also
    using the PL/R procedural language to do in-database analytics &

  • Pro: versatility, big number of algorithms.  Con: needs coding.  
    Currently learning to use it and determining usability for what I do.  If I
    continue to use it, it will be via Rapidminer.

  • Utilize R heavily for survey research sampling and analysis and
    political data mining. The R TextMate bundle is fantastic although
    RStudio is quickly becoming a favorite as well. Use heavily in
    conjunction with MySQL databases.

  • I make a lot of use of the  CDK (Chemistry Development Kit)
    implementation (rcdk).  Mainly for PCA, Clustering, Descriptor profiling
    and Fragmenting.  Pros:   reuse of scripts, flexibility, ability to load lot of
    data, performance.  Cons:  Long learning curve.

  • PRO: free of charge, availability of huge variety of algorithms - access
    to fine-tune method parameters, huge user base with IRC and forum
    help (stackoverflow).  CON: takes much more time to see results of a
    mining project, takes much longer learning curve to produce output if
    you are not familiar with the concept.  RStudio is a nice interface for
    developing quick and dirty solutions as well as testing some ideas.  
    STATISTICA is much simpler to use but does not offer the variety of
    algorithms and access to the parameters, VB for STATISTICA has
    fewer support (e.g. examples on the web, help for VB is of limited use).
    For a very quick data mining task PASW Model is even simpler to use
    than STATISTICA but with a very restricted access to parameters.

  • I use R in conjunction with Matlab mostly, programming my
    personalized algorithms in Matlab and using R for running statistical
    test, ROC curves, and other simple statistical models. I do this since I
    feel more comfortable with this setting. R is a very good tool for
    statistical analysis basically because there are many packages
    covering most of statistical activities, but still I find Matlab more easy to
    code in.

  • I greatly prefer to use R and do use it when working on more "research"
    type projects versus models that will be reviewed and put into
    production. Because of the type of work that I do, SAS is the main
    software that everyone is familiar with and the most popular one that  
    has a strong license. Our organization needs to be able to share across
    departments and with vendors and governmental divisions. We can't
    use R as much as I would like because of how open the software is -
    good for sharing code, bad for ensuring regulators that data is safe.

  • Main reasons for use: 1) Strong and flexible programming language,
    making it very flexible.   2) No cost, allowing me to also having it on my
    personal computer so that I can test things at home that I later use at
    work.  I use RODBC to get the data from server and let the server do
    some of the data manipulation , but control it from within R.  Have also
    started to use RExcel with the goal as using that as a method to deploy
    models to analysts more familiar with Excel than R.

  • Personally I find R easier to use than SAS, mostly because I am not
    constrained in getting where I want to go.  SAS has a canned   
    approach.  I see using GUI's as a "sign of weakness" and as preventing
    understanding the language at its core.  I have not found Rattle to be
    particularly helpful. I have also tried JGR and Sciviews and found I
    could not surmount the installation learning curve.  Their
    documentation did not produce a working environment for me.

  • I use R primarily for my personal educational use - I am in an
    Economics PhD program analyzing household surveys from
    developing countries.  I usually load the household survey tables onto
    an MS SQL server and then manipulate/prepare the data to be
    imported into R, and then run my analysis (typically OLS/Probit/Logit
    regressions along with a range of descriptive statistics).  I use R gui
    because I don't know the R syntax very well yet.  It helps me generate
    the 'baseline' scripts that I tweak to get specifically what I want/need.  
    Once I get used to the syntax, I will probably move away from the GUI

  • We use R only occasionally. Some of our analysts know R from their
    university studies and hence are very familiar with R. Whenever we use
    R, we use it from within RapidMiner processes in RapidMiner (or in
    RapidAnalytics = RapidMiner server), because we use
    RapidMiner/RapidAnalytics for the overall data mining process design,
    automated process optimization, process sharing among team
    members, automated scheduled process deployment, result and report
    sharing, etc.

  • (Primary use of R dates from 2001-2006)  Reason to use R: I thaught R
    and SPSS to students + most diverse statistical package for research.
    Availability of diverse packages allowed to use R in any domain, e.g. for
    me: sound analysis, categorization, linear models, ...  Pro: the only limit
    of R, is your knowledge; no GUI limitations; Open source (check by
    community + free).  Con: starting knowledge as limit for analysis. This
    doesn't stimulate advanced analytics as from the beginning.

  • Free and so many packages.  Very powerful and convenient on small
    data sets, but slow and a memory hog.   Can't handle large data sets
    (often refuses to allocate memory, so a memory strategy needs to be
    built into the application code).  Extremely inefficient even for simple
    tasks.  For example, I rewrote the "which" function to run twice faster  
    than the native function.  Very limited support for parallel processing.

  • R has a good price point compared to other packages.  R-Studio seems
    to be the best of the crop aside from the Revo GUI.  R is mostly used
    here in conjunction with the python  rpy2  library.

  • We use R because of its very powerful libraries for data mining and
    forecasting in finance and because of its open source nature.  
    RapidMiner best supports the overall data mining process and
    seamlessly integrates R.  Therefore we use RapidMiner as overall
    framework and R within RapidMiner whenever it adds value and
    functionality needed.

  • I use R primarily for analysis of psychology experiment results. Some
    data mining based on psychological factors is often necessary.  I use   
    R because it is Free, powerful, script-able, and multi-platform. I use  
    emacs because I can write my experiments (Python), process the data
    (Python and R), analyze the data (R), and write papers (LaTeX) from
    within the same environment.

  • It is fast to write code for quick test (answer to both, why R and why
    command line interface). It has strong visualization tools and is easy to
    use. Definitely strong tool, but might not be that easy to use for people
    without (functional) programming background. I use R within KNIME
    mostly for data manipulation or outside KNIME for plotting.

  • Powerful scripting language for non standard operations.  The
    integration in RapidMiner enables me to use any R script within my
    workflows, this ensures the reusability, understandability and eases the
    deployment a lot! I use the RapidAnalytics Community Edition for
    deploying everything together into productive environments.

  • The ease of integrating R via the JMP interface has enabled me to take
    advantage of several methods I would not otherwise be able to do as
    easily or expeditiously. The vast of amount work that has been done
    with R is mind boggling. Unfortunately, my R programming skills are not
    where I wish they were and therefore integrate with JMP and its strengths.

  • We've used R (and S) for many years.  It has an extensive feature set
    (with add-ins) and is open source.  RapidMiner is a great front-end for R
    in our in depth analytics.  We use WebFOCUS RStat to add predictive
    analytics to our management reports.

  • I use R because of the wide range of community support available, the
    ability to write my own code, its interface with Python, and the large
    number of packages available.  Not to mention it is very stable, able to
    handle data sets of a moderate size better than WEKA, and the
    visualization tools are incredible.

  • R add-on packages are comprehensive and so far reliable.
    Documentations are also good. The command line interface gives the
    most natural experience for exploring models and debugging scripts. I
    frequently use R in conjunction with RapidMiner (the R extension).

  • R allows for very rapid development and a huge variety of packages.
    Also, it was what I learned in grad school, so when thinking how to do a
    problem, R solutions jump to mind first.    I use Rstudio b/c of the Linux
    web service IDE.    I have huge troubles with scale, but am investigating
    RHIPE as a solution.

  • Many of our newly hired university grads know R and to write scripts in
    R.  RapidMiner nicely integrates R and has the best GUI and highest
    flexibility for the overall data mining process design from ETL and
    modeling to automated deployment and reporting.

  • I tend to write large scripts/small applications in R, and the TextMate
    editor with an R plugin is a great way of working with moderately sized
    sets of scripts, libraries, etc.

  • I'm a bioinformatician and the Bioconductor collection of R packages is
    one of the best platform for the analysis of 'omic' data.  I'm confident  
    with the unix-like terminal so the command line plus a good editor (vim,
    textmate) is my platform of choice for developing.

  • R has a huge variety of algorithms originating from the active
    community. These algorithms can be integrated within the necessary
    data processing within RapidMiner and thus be easily deployed in
    conjunction with complex data transformation settings.

  • I use SPSS base for what it can do. For what it doesn't do well or offer I
    use R.  Found learning curve a bit steep but worth the effort. Mainly
    confined to Cmdr & Rattle but can use the command line interface. My
    biggest gripe with R is getting the results into Excel [which my clients
    use] in a neat fashion.

  • I recently used R to fit our data. I used command line and script-based
    interfaced because I am familiar with those in Python.

  • I use some elements, such as factor analysis, statistical criterias.  
    KNIME is data mining soft, so statistical analysis features is not the
    primary in this package.

  • R is in general powerful since you can program everything - although
    sometimes this is hard work. The reusability of parts and the missing
    functions for process design are cons and here RapidMiner is much
    better. I have just started to use the combination with RapidMiner but
    this seems to be the optimal combination of both pros to me.

  • Availability and extent of a range of methodologies.  Tinn-R allows for
    programming and debugging.

  • I've been using R for marketing analytics and customer intelligence for
    7 years.   R Studio was released only in Feb & has become extremely
    popular in a short time.

  • I've chosen R for its flexibility and wide user base in academia. Pros:
    large user base to hire from; ability to script and automate. Cons:
    arcane interface, data management/handling is no where near as useful
    as SAS. We use R for experimental purposes and to close gaps when
    our primary tool cannot address and we don't want to pay exorbitant
    SAS/STAT fees.

  • Pros:  powerful and flexible, free and open source, large community,
    huge number of algorithms.  Cons:  not friendly to the beginner, poor
    data import/export, lots of things have to be done in console even when
    using a GUI, poor handling of big data sets.

  • RStudio Server is a great tool for working remotely with R.  In general, I
    like R because I can use pure code, no clicking around.

  • CRAN packages available for R often incorporate procedures not
    available elsewhere. R is clearly more clumsy and slower than its
    competitors for doing routine estimation on large datasets.

  • We're moving to R from SAS because of the cost difference, flexibility,
    integration with other tools (graphics/reporting/presentation) and
    community support.  Cons are the learning curve and re-work to
    transition from existing tools.

  • R is a language in our research group due to its flexibility and good
    graphics. We increasingly use it in conjunction with KNIME to establish
    workflows and report results.

  • Pro: free, open-source, quiet complete.  Con: documentation and
    interface not so easy to exploit, compared to Visual C# with MSDN.    
    Combining RapidMiner & R allow to tweak perfectly inputs & outputs
    and to extend available functions of RM.

  • We use R because of its extensive set of machine learning libraries. A
    single commercial data mining application simply does not have
    everything that we need. Pros - a rapidly expanding community of
    users. Cons - speed.

  • R is an open source with plenty of packages for variety fields. It is very
    user friendly and easy to be conducted with KNIME.

  • Using R in preparation to data mining, i.e. primarily for descriptive
    statistics. Biggest pro is ease of use. Biggest con is the handling of
    large data sets.

  • I use R because a) it is free, b) it can do anything I want it to do as long
    as I can figure out how to do it.  It doesn't do data management very   
    well so I have to prepare the data in another software, usually SQL
    Server or Access and save it as text before I can actually use R.  R has
    the best visualization packages and data mining algorithm packages,
    it's just not always easy to figure out how to use them.  Most of what I
    would do in R I could do in other software, but it would be several rather
    than all in one place.

  • R is easy to use, has  a lot of methods, is free. I write my scripts and  
    use them.  There are errors in some programs; so we have to be careful
    when using it. But I find it relevant for teaching and processing data.

  • Why I have chosen R:  - Open Source  - Huge user base  - Accurate
    results   - Based on C and C++.    Cons:  - R programming language is
    not as intuitive like Matlab.

  • R is very good for academic research. It is really good for student to
    learn how DM algorithm work. But for industrial and business use, R is
    not very suitable for me, because of low speed and big data.

  • I use R because it supports me in many steps of my data analysis
    needs: data preprocessing, descriptive analysis and plots. I used to plot
    data with another tool and then convert it to eps. With R I don't have to
    worry about it. I can focus on the analysis, not on the tool. The
    disadvantage is that R has a steep learning curve. Because of the
    many contributors, sometimes it is hard to find the best package for you.

  • Great programming environment in spite of steep learning curve.  I
    complement R with KNIME for data preparation and presentation.

  • I use R mostly for graphic capabilities. The modeling tools are rarely
    used as R cannot easily handle large datasets.

  • Pros: free, powerful language, excellent choice of models and methods
    via packages.  Horrible user interface.

  • R seems to be gaining traction in industry; more of my clients are
    asking me to create models using it.  While I don't like the database
    interface tools, they are useable.

  • Rarely for some lesser known, nonstandard algorithms.  Pros - vast
    selection of even exotic methods.  Cons - unstable, unfriendly.

  • Pro: Free, easy to identify packages that does what I need.                 
    Con: Having to know the command lines.

  • Pro: Price, general capability, open source encourages growth.          
    Con: Utterly unsupported, so can't base enterprise applications on it.

  • PRO's  R has much innovation.  CON's  would like web service
    forecasting, or Java code generation for forecasting.

  • Pros: large library of statistical function.  Cons: interface/syntax very
    ugly compared to matlab.

  • Pros: Powerful, free, flexible, scalable.  Cons: Does not handle well
    large datasets.

  • We use R in combination with the Pentaho BI Suite. Pentaho developed
    an R plugin which lets a user make selections of data to analyze which
    is pushed to RServe through the plugin and which pushes back the
    output to the Pentaho portal.

  • I use R because it's free and capable of drawing graphs. Now looking  
    for a GUI to be used.  Lack of Japanese interface is a con of R, at this

  • I have used the time series, trees and tests capabilities of R. It is very
    powerful but : (1) the language is not so easy, and (2) you have to
    understand what you do, that is it is not possible to do some quick and
    dirty analysis without reading some statistical materials.

  • Free; provides statistical methods or relevant packages; I am used to
    command line and do not need GUIs yet; con: visual output is not as
    good as in matlab.

  • Pro: comprehensive.  Con: not supported (once called an author of
    some R code and he could not speak English).

  • Pro: AVAILABILITY, thoroughness.  Con: manual (or lack thereof).

  • Free, stable, and lots package, but still difficult to combine with other
    program languages.

  • Extended range of analytic routines, all callable from within KNIME.

  • I'm learning how to use efficiently R, to gain access to specific
    algorithms. Other interfaces used: RExcel, JMP and SAS IML Studio.

  • Great user community and works well w/ Matlab.  Anything that plugs
    into these 2 packages will be used.

  • R for sophisticated statistical analysis (analysis scripting), KNIME for
    the rest.

  • R is free and many of my measurement friends use this so I had to
    learn the program. I still prefer SAS.

  • R offers some things that SPSS Statistics doesn't offer.  And SPSS
    interfaces with R.

  • RapidMiner offers access to R. The advantage of R is that a new
    algorithm can easily be developed -- and then be applied within

  • Pros: * ease of graphical output for data exploration  * full coding
    capability ( conditional execution - i.e.: "if" -, loops, subroutines - i.e.
    functions)  * broad availability of innovation driven by user community  *
    support from user community  * matrix language (vectorized
    computations)  * open source technology

  • Con: it's not easy to us -  that's why I use R from within IBM SPSS
    Statistics, where you can create user defined dialog boxes for R code
    and you can also use the tables API to generate very good looking

  • R has something useful not yet implemented in KNIME.  Revolution
    interface is pretty cool.  R has learning curve flatter than KNIME.  R has
    specific capabilities nothing else provides.  R has huge forums for
    help.  R is sound.

  • A mix of ad hoc descriptive analyses and visualizations, along with
    more engineered code for predictive and prescriptive analytics that
    integrate with other systems.

  • R is the only software package that can within a few lines of code run  
    the transformations and modeling calculations I require.  Other software
    packages do not have the flexibility to do high end maths, and nth
    dimensional matrixs.

  • No other software package has access to the latest output from the
    research community.  The fields we work in (social network analysis,  
    link analysis) need cutting-edge solutions so it is impossible to stay with
    a package such as SAS where the main algorithms haven't changed in
    numerous years.  We would be left behind by the competition.

  • R combines the features of a programming language with the need for
    cutting edge algorithms, diagnostics, and data management.  It can't be
    replicated elsewhere.

  • New data mining methods quickly available with R packages on
    internet.  Use available packages to build own data mining methods.

  • R is limited and not able to support enterprise requirements.   Quality of
    R software is unacceptable.

  • Prototyping only, not suitable for commercial use given licensing.

  • I have been using several R interfaces (R Commander GUI, JGR GUI
    and R's command line interface and RStudio) with my personal
    projects.  I believe that RStudio is very user friendly.

  • I use it on its own, and occasionally from within SAS or SPSS. RStudio
    is a huge improvement in ease-of-use for programming, especially with
    our cross-platform users and cluster users. Our users need a point-and-
    click interface though. If the RStudio team gets Deducer integrated into
    it, we'll be all set.

  • You can always find a library for the algorithm you want (no matter how
    obscure the algorithm or how new). Cost is zero. Millions of users. You
    can always find an answer to any doubt you might have. Manuals are
    always available. Dozens of books available.

  • I consider R the standard software in statistical matters. I've learned it at
    University and I continue to use it. I trust in it in terms of mathematical
    and numeric stability.  I use the standard command line interface
    because I haven't had (yet) the opportunity to explore other user
    interface, but I would be glad to learn new GUIs (and I hope you can
    suggest me the best GUI through the results of your survey).

  • Started using R after being introduced to it in Graduate School. I like it
    because of a large availability of packages (without cost!). It's a   
    software that respects the intellectual capability of the end user -   
    doesn't dumb down things.

  • Excellent graphics, wide variety of routines available, runs on multiple
    platforms (including Linux), many graphical interfaces available (some
    better than others for specific purpose), flexibility of programming
    language and interface to various databases.

  • I am using R since '99 in academic research and then for enterprises
    needs. His modular structure could very pleasant if you work on a
    desktop or on a remote server (e.g.. in batch mode). An additional and
    usable GUI could be a benefit, additional love on performance
    (execution) a need for the future an big dataset analysis.

  • Drawbacks of mining in R are difficulty in using surrogate split, and not
    good input/output handling.

  • I chose R because is open source and I'm work in a non profit research
    organization.  I like the possibility of development software using R as
    statistics engine of this application.  I'm develop with Pyton and .NET.  I
    like the graphics possibilities.  I'm using R for my thesis and I try to
    introduce this software in my job.

  • R is open source and easy to expand by creating packages.  R is a
    high-level language that allows one to quickly build a solution to a
    specific problem at hands.  There is an active community that quickly
    encode new algorithms into R packages.

  • Not supported, buggy, but comprehensive.

  • For any advanced analysis.  To avoid buying extremely expensive
    software add-ons.

  • For decision trees and other algorithms not available in base SAS.

  • R integration in KNIME makes both extremely powerful together. I
    usually perform data shaping in KNIME, and many modelling tasks on
    the shaped data in R.

  • More options around algorithms not contained or not tweakable in
    SPSS.  Handles different levels of data aggregation more easily.  Some
    tasks easier to automate in R than in SPSS syntax.

  • Algorithms that are not available in SPSS.

  • Complement some missing algorithms implementation.

  • R offers several packages not available on Matlab.

  • I use it to add algorithms not available in SAS or JMP.

  • Because it extends the great RapidMiner tool.

  • Packages like 'caret' are why I use R.  Nowhere else will get you as far
    out on the cutting edge of machine learning research.

  • I use it just for tests.

  • I use R because it's free and perfect for casual, interactive, ad hoc
    analysis that isn't available in the RDBMS.

  • R is free and offers alternative methods that can easily be incorporated
    into other packages.

  • High speed, programmability and flexibility make it the best choice.
    Combined with acquisition cost and it was a no brainer.

  • Great for custom code, or to modify existing procedures. Great variety
    of  available algorithm, easy to integrate codes in C or Fortran for speed
    calculations when needed.

  • I think that every dataminer must master "R" in todays day, it's a great
    software that have a long learning curve, and it can be a good
    alternative for companies who can't afford a commercial software.

  • Late in 2010, I started using R. It quickly became my analytics software
    of choice because of its data visualization capabilities and the great
    amount of available libraries.

  • It's hard to imagine not using R via scripts - at least for anything except
    very straightforward tasks. It is also hard to imagine not using R in a
    research environment - its flexibility is very hard to approach.

  • It's powerful, I can use it on any platform, it's free, it has nearly every
    technique that commercial software has, there is seemingly a better
    user community who are always willing to help, the studies being done
    with R are cutting edge, and it's easy to expand and modify since it is
    open source.

  • It's powerful, it's extensible, it's flexible, it's very well community
    supported and it's free.

  • Why use R: for the moment it is simply the best.  For this time being, my
    major use of R is programming and experimenting with new methods.

  • R is usually good enough for what we do.  It's free.  We didn't need to
    bother with interfaces beyond the native command window.

  • R is widely used (=supported & developed & robust); has strong  
    analytic capabilities (=requirement); has many GUIs (= fun and easy to

  • TeradataR (developer exchange package) provides easy to use and
    feature rich capabilities to use R against *massive* data sets in

  • Flexibility and good quality and variety of graphics/charts.

  • Flexibility, expandability

  • Flexibility, Packages

  • It is much easier to program repetitive tasks in R, especially when I  
    want to automate optimization of algorithm parameters by trying all
    possibilities. That is something that is very difficult to automate in  
    visual tools.  The other reason is that almost every algorithm is
    implemented in R, or will be implemented in the future.

  • I started by using it to teach graduate classes and ended up growing
    fond of it.

  • Free, easy to use, many user written algorithms.

  • Freeware, lots of implemented methods.

  • I use R primarily because of its increasingly widespread use.  The
    software is also free, which makes it even more attractive.

  • I use R because is open source, I have Linux at home, is very good for
    academic purposes.

  • Free fallback; double-checking; includes kitchen sink.

  • Free libre open source software.

  • I was using R on the university, and it is free of charge and easily

  • Low cost.  Wide functionality.  Access to source/code and authors.

  • I have chosen to use R because it allows free access to the tool. It also
    incorporates a large quantity of algorithms and it can manage large

  • It is free open source and so are all add-on libraries.  Graphical
    facilities, high graphical quality.  Ease of combining existing tools with
    our own code and tools.  Possibility to use huge number of already
    optimized processes.

  • It's free, there are big amount of algorithms not available directly in

  • It is free, but there are many limitations that are not easily overcome,     
    so I only use it in specific situations.

  • It's powerful and free.

  • It's free and everyone starts to use.

  • Chosen because free and popular among colleagues.

  • Cost and capability

  • Cost!  Also much more able to customize analysis as needed.

  • Free and has a large set of functions.

  • It is a free open source tool.

  • Being a free software is a R competitive advantage.

  • It's free.

  • Free

  • Free

  • Free

  • All type of dataset is read in R. So I like this.

  • As a simple exploratory tool before more serious mining.

  • I am better at programming with R than with SAS.

  • Graphics mostly

  • Some graphics

  • Copulas

  • Copulas

  • Copulas

  • Mostly for Copulas.

  • Bioconductor

  • Graphics, ggplot2, automation of tasks, marketing experimental design.

  • Pros: superior graphics tools, many packages, objected oriented
    programming capabilities, open source, free, excellent community for
    sharing ideas/problems.

  • Limited IBI.

  • Just to incorporate borrowed algorithms when needed.

  • Large base of contributed analysis packages (CRAN).

  • SVM, Boosting, and Bagging.

  • Time series

  • Various time series routines.

  • I use R together with IBM SPSS Statistics to create time series models
    using the algorithm ARCH / GARCH.

  • Also use R scripts inside STATISTICA (to augment analysis flows).

  • Use scripts that are already available for running R routines from JMP.

  • Mostly distribution fitting routines, some variance estimation routines.

  • Mostly simulation related functions.

  • I use R from KNIME.  I am not an R expert.  So I do the data
    manipulation in KNIME and only the statistics in R.

  • I chose R to deploy time series models. I learned to program the
    command line interface.

  • Basically for data regression.

  • Very specific preprocessings can be quickly coded.

  • Mostly time series and specialized variance estimation procedures.

  • Number of models and graphics.

  • I started to use R for monthly time series forecasts but expanded the
    use to statistical modelling etc.

  • Use some open source modules.

  • Open system, many good algorithms, good programming language.

  • Power and versatility of available methods/libraries.

  • Powerful, universally available, free.

  • R has a wonderfully large collection of algorithms and tools to work   
    with.  Papers often accompany these tools explaining the ideas
    underlying them. The python like nature of R allows one to easily
    modify data and automate analysis.

  • Variety of algorithms.

  • Variety of user contributed modules are available.

  • Wide breadth of models to test.

  • Wide variety of algorithms, free.

  • I also use R from within KNIME.

  • I use R through KNIME.

  • Vendor independence; more innovation.

  • Script based R is actually very useful but learning curve is steeper.
    Time could be better spent on analyzing outputs.

  • Scripting capabilities

  • Simulation tools

  • Because R is free and powerful for data manipulation.

  • Open source and do not have to deal with licensing issues.    Pros and
    cons - too much to write down here!

  • Community

  • Reputation in academic circles, availability of different packages.

  • Rapid prototyping tool, occasional need for sophisticated graphics.

  • We currently use R because the subset of our client doing the modeling
    already know it.

  • Preliminary scoping out of models, for articulation to developers.

  • Primarily for social network analysis not available in commercial tools I

  • I don't use other interfaces because I think could be outdated.

  • Best suitable

  • Clients preferences

  • Very rare just for R & D, or a one off report.

  • Limited use.

  • Learning

  • Learning

  • When I really have to use that crap.

  • I'd like to, but the learning curve is rather too steep at the moment -
    especially for neophyte researchers - e.g. research students.

  • I use add-ons.  I think they are R, not really sure.

  • Occasionally use R for a quick look at data.

  • Just trying another tools, but not primary.

  • R is just another tool.

  • Currently evaluating R; don't know much about the available interfaces.

  • I would like to use R with STATISTICA, but just haven't explored this.