Removing all R CMD check warnings

Making R packages is an important aspect of the statistician’s work. Or at least it should be: it is quite annoying when a new method appears in the literature but no implementation is readily available.

A favourite mantra of mine when making R packages is the following: an R package is more than the sum of its functions. A functioning R package needs to be able to interact properly with the R environment (through the NAMESPACE); a good R package also needs great documentation; a great R package will also include a vignette to guide new users and explain how all the functions interact with one another.

The main reference for how to make R packages is Writing R extensions. Everything you need to know is there, if you know what you are looking for. Another, very useful reference is Hadley Wickam’s book on R packages. This book explains the different components of an R package, and it also serves as an introduction to his devtools package.

In what follows, I don’t want to go over how to make an R package; the above references do a better job than I could hope to do. Rather, I want to share my experience about some of the most annoying part of making an R package: passing the R CMD check. Removing the errors is the most important part, and what kind of errors you get really depends on the package (the log file is typically quite useful in figuring out what triggered the errors). On the other hand, you also want to minimize the number of warnings and notes, and most warnings you probably want to remove altogether.

Read more

By how much will Clinton win?

American politics is great for statistics: there are huge amounts of polls being conducted every week, some positions are up for re-election every other year, and there is really only two parties. Moreover, the complicated nature of the whole election process, which for example involves the electoral college for the presidential election, makes it more interesting than most democracies around the world. It’s for all these reasons that an incredible website like FiveThirtyEight is possible.

Read more

Oscar 2016 predictions

For the past three years, I have tried to predict the winners in all categories at the Academy Awards. But last year, I was able to combine my passion for both movies and statistics: as part of my Data Analysis course at McGill University, we had to come up with a prediction model for four categories: Best Picture, Best Director, Best Actor, and Best Actress. And my model performed quite well: it was the only one to predict correctly the four winners.

This year, I decided to repeat the experience again, especially since the Best Picture category is more competitive this year than last year. I have shared my predictions below, for all categories; however, I have used a statistical model only for the four categories mentionned above. All other categories are based on my own judgement (and readings I have done). My predictions are in bold font.

After the Academy Awards, I will update this post and point out the winners (I will indicate them in italics). I may also write a post on my prediction model.

Update (2016/02/28): Well, I didn’t do as well as I would have liked: 14/24.

Read more

Makefile and Beamer presentations

I have been wondering about Makefiles for some time now, and recently I finally got around learning about them so that I could use make to regenerate all the different versions of a manuscript I’m working on. And I thought I would take the opportunity to explain how they can be useful for Beamer presentations.

Read more

Tutorial: Optimising R code

The R language is very good for statistical computations, due to its strong functional capabilities, its open source philosophy, and its extended package ecosystem. However, it can also be quite slow, because of some design choices (e.g. lazy evaluation and extreme dynamic typing).

This tutorial is mainly based on Hadley Wickam’s book Advanced R.

Before optimising…

First of all, before we start optimising our R code, we need to ask ourselves a few questions:

  1. Is my code doing what I want it to do?

  2. Do I really need to make my code faster?

  3. Is considerable speed up even possible?

Read more