Multivariate t distribution

I am currently teaching a graduate course in Multivariate Analysis (the course website can be found here). A few weeks ago, I introduced the family of elliptical distributions. In this blog post, I want to discuss the multivariate t distribution, how to generate samples, and highlight the issue of uncorrelatedness vs independence.

Elliptical distributions

If we generate samples from a multivariate normal, we can easily see that the contour lines are ellipses:


n <- 10000
p <- 2
Sigma <- matrix(c(1, 0.5, 0.5, 1), ncol = p)
Y <- data.frame(rmvnorm(n, sigma = Sigma))

# Plot the data

ggplot(Y, aes(X1, X2)) + 
  geom_point(alpha = 0.2) +
  geom_density_2d() +

Elliptical contours of multivariate normal

Elliptical distributions are a generalization of the multivariate normal distribution that retain this property that lines of constant density are ellipses.

Read more

Oscar 2019 predictions

Again this year, I will be using a prediction model to try and predict the winners in the top four categories: Best Picure, Best Director, Best Actor, and Best Actress. As in previous years, I still provide predictions for the other categories, but they were derived on a more ad hoc basis.

According to most pundits, the Best Picture race is wide open. But as you’ll see below, my prediction is less equivocal. To see this, it’s informative to look at how the winning probabilities have changed over time as more information was coming in:

Winning probabilities

As we can see, my model does not weight all guild awards the same way: a win at the Producers’ Guild Awards (PGA) is worth a lot less than a win at the Directors’ Guild Awards (DGA) or at the Bafta. Still, given the preferential ballot system used by the academy, my model is probably underestimating the chances of a movie with broad appeal like Green Book.

My predictions are below, in bold. After the Academy Awards tomorrow, I will update this post and point out the winners–I will indicate them in italics.

Update (2019/02/25): I got 16 out of 24 right. This year really convinced me that my model for Best Picture is missing something about the broad appeal movies like Spotlight, The Shape of Water, and Green Book, and so it may be time to update it… Stay tuned!

Read more

What I (currently) do at the Saskatchewan Health Authority

During the summer of 2016, my wife and I moved to Saskatoon so that she could pursue her medical residency training at the University of Saskatchewan. At the point, I was only wrapping up the third year of my PhD, and so I wasn’t necessary looking for a job. But one thing led to another, and I ended up applying for a new biostatistician position with the Saskatoon Health Region (SHR).

The job description talked about being a connection point between the health care system and the Saskatchewan Centre for Patient-Oriented Research (SCPOR), by providing both study design and methodological support for researchers. I thought I would be a good fit: McGill’s focus on data analysis and report writing provided me the tools to do the job; meanwhile I could learn more about the health care system itself and use my academic background to provide effective consultation to researchers.

Read more

Predictive model for the Oscars

A few years ago, as part of the graduate course Data Analysis and Report Writing in the Department of Epidemiology, Biostatistics and Occupational Health at McGill University, we explored the topic of predictive modeling using a dataset containing movies, directors and actors who were nominated for an Academy Award. The goal was to select some variables and build a predictive model for the winner in four categories: Best Picture, Best Director, Best Actor, and Best Actress. As a movie fan, this was the dream assignment: I could combine my love of movies with my love of statistics! And it payed off: I was the only one in my class to correctly predict all four winners.

Read more

Oscar 2018 predictions

For the past few years, I have tried to predict the winners in all categories at the Academy Awards. Again, I will be using statistics and data analysis to inform my decision in some categories: Best Picure, Best Director, Best Actor, Best Actress, Best Supporting Actor, and Best Supporting Actress

As for the last three years, I stick to what the model tells me for my prediction in these categories. However, I’m skeptical about the predictions I have for best picture: several pundits see The Shape of Water as a front runner, but my model only gives it a 16% chance of winning. Due to rule changes that now require a preferential ballot for Best Picture, the winner has been difficult to predict in recent years. Since The Shape of Water possibly has a broader appeal than Lady Bird and Three Billboards Outside Ebbing, Missouri, it may prevail in the end. But I still believe Three Billboards Outside Ebbing, Missouri is the actual front-runner; but I do think my model is under-estimating The Shape of Water’s chances and over-estimating Lady Bird’s.

In the next few days, I will write another post in which I’ll describe how my model works. I’ll take the opportunity to try and explain why my model is so bearish for The Shape of Water.

My predictions are below, in bold. After the Academy Awards next weekend, I will update this post and point out the winners–I will indicate them in italics.

Read more