This podcast provides the content from DataScience.LA in a convenient, easy to listen format. Interviews with prominent (and not so prominent) practitioners in or around (or visiting) the LA area! Data Science at it's finest :)
First things first, Dirk Eddelbuettel was recently named ordinary. This seems contradictory, since Dirk is a known HPC expert, an organizer of the R in Finance conference, the creator of Rcpp, and a Debiancontributor. These are only a few of the many accolades bestowed upon Dirk without even a hint of puffery. And yet, Dirk Eddelbuettel is considered ordinary. What makes Dirk ordinary? It should be mentioned that to the Vienna-based nonprofit that provides R’s leadership, the R Foundation for Statistical Computing, ‘ordinary’ means something quite different to the lay person. Ordinary members provide guidance and direction for the R Project for Statistical Computing. It’s hard to imagine someone more qualified than Dirk for this task.
A longtime R user, Dirk’s professional life is in finance as a self-described quant (one of the ‘Rocket Scientists of Wall Street’). Dirk has worked for some of the largest financial organizations in the world, and has open sourced packages which allow for tasks ranging from vanilla options pricing to discounted cash flow analysis. Even though the world of finance is a “massive net importer” of open source tools, he has succeeded in providing finance packages for the open source community. In addition to these contributions, he has also provided the larger R community with the infinitely useful Rcpp package.
It’s easy to see why, with all of these contributions, the R Foundation would find Dirk ordinary. I was incredibly fortunate at useR! 2014 to sit down with Dirk and have a long conversation about many of these ordinary topics, the video of which is included here. Enjoy!
At the useR! 2014 conference, without a doubt one of the overriding themes was R’s history, legacy, and future as an interface into the “best of the best” algorithms which were available. Romain Francois’ package, Rcpp11, is at the forefront and was explicitly showcased as one of the ways in which R was staying true to it’s roots as an interface.
An R programmer for over a decade, Romain started as an “R first” developer, only later moving to more traditional software engineering languages. In this interview, Romain and I discuss a wide range of topics, from R’s history as an interface for algorithms, through Romain’s experiences in HPC and his collaboration with Hadley Wickham on dplyr. Romain also shares a little bit about his other passion, stand-up comedy.
Heather Turner is a biostatistician and Senior Research Fellow at the University of Warwick, as well as a consultant. She became familiar with R during her PhD studies as an R user, and a few years later she developed the Generalized Non-Linear Models package which is still in use today. Our conversation, at useR! 2014, begins with this transition from R user to R developer.
As an academic with roots outside of the traditional Computer Science background, Dr. Turner had some fascinating insights about what makes the R programming language interesting and different. She is a former editor of the R Journal and the local organizer for the useR! 2011 conference, and holds a wealth of information about the field of data science, the state of the technology, and the wider community.
There is certainly more in this interview than I can mention in this brief introduction - from her new work using Shiny to communicate in statistical education to how to encourage younger generations to enter the fields of statistics and data science. This is genuinely a must listen podcast, with lots of great information and some fantastic level-headed advice. I hope you find it as enlightening as I have!
Yihui Xie intended to make it easier to do his homework, but instead found himself tackling one of the greatest problems in modern science: reproducibility of results. Through his work on the knitR package, he has assembled a toolchain which allows the user to produce beautiful, ready-to-distribute documents which contain a whole, self-supporting, and reproducible analysis. In this interview, Yihui discusses how he came to the R programming language and how he set about building knitR. He also mentions the great momentum and energy of the R community in China, and what he’s currently focused on at RStudio.
David Smith is an integral part of the R community. His background in computational statistics goes back to the early 90s . David has worked at Revolution R, one of the leading R companies, for nearly a decade. His name has been included in lists such as:
Top 20 influencers in big data
30 most influencial Data Scientists on Twitter
#3 Best Big Data Twitter account
Top Big Data Executives and Experts to Follow on Twitter
David Smith is a powerhouse of connections, information, and knowledge not only about the R community but also the Data Ecosystem as a whole. After having the privilege of a lovely conversation regarding the state of our industry at useR! 2014, however, I can tell you that above all things, David Smith cares about the happiness and well-being of the community he has seen flourish around him in recent years.
In this wide-ranging interview, David and I talk about how he became involved in mathematical software, his transition from a statistician to his current role as Chief Community Officer at Revolution Analytics, and what that role entails.
Tal Galili is, in many ways, a central spoke of the R community. Both gregarious and thoughtful, he has grown his website r-bloggers.com into the definitive aggregation of the R community’s voice through his genuine, passion-driven intensity. Tal had a simple desire as a young programmer - to learn more about his chosen tools - and looked to the internet to find other voices like his. When googling for “R blogs”, Tal found numerous blogs about pirates, but only a handful about R. This interview details how Tal started R-Bloggers and decided to challenge the status quo, as well as his new projects.
Max provides some amazing insights into the why and how of caret, an R package he created. He also discusses his book on Applied Predictive Modeling which he co-authored with Kjell Johnson, including details on how he set out to write the book he wished he would have had. As a special bonus, Max also describes Quinlan’s C5.0, an alternate “forest of decision trees” algorithm, the secrets of which were hidden behind commercial licensing for years – and which has recently been ported and made available to the R ecosystem. Whether you are a beginner just getting your feet wet with R and predictive modeling, or a seasoned data scientist, this interview has something for everyone.
Hadley Wickham is famous. He’s not Kardashian famous, but walking around useR! and seeing the community’s reaction to him, there’s no question, he’s ‘R famous’. If you have the good fortune to see his talks, tutorials, or sessions in person, you owe it to yourself to do so. He projects depth and wisdom with a booming voice, which combines with a hard-won confidence brought about by years of honing his craft and developing his expertise. He takes as much time as is needed to answer questions, listens to every single bit of feedback and succeeds in making you feel that what you say indeed matters. Hadley Wickham has poise. It’s also quite obvious, if you watch him for long enough, that this fame suits him like an itchy sweater made by a loving grandparent. It brings warmth and it comes from a place of love, but it’s always a little uncomfortable regardless of how well it may fit.
Hadley and I had a long ranging interview at useR 2014, shown above, discussing R’s strengths and revealing its weaknesses together. We reveal Hadley’s evil plans for world domination, as well as his not-so-evil plans to help users better manage their workflow. Enjoy!