Blog logotrial and stderr


A 2 post collection

Advice for Using R: Pure Invective

 •  Filed under r

I think the best piece of advice I could give to someone considering doing an R project of more than twenty lines of code is the following:

Don't. Use Python with Pandas.

I have many beefs with R, but the main one is the type system. It is nearly impossible to predict what most functions will do based on their type. This is largely due to the fact that everything is a vector, so it's ambiguous whether a function operates on a whole vector or on each of its elements.

Also, the vector might accidentally be a factor, which behaves differently from a normal vector. Casting from a factor to a numeric vector, like much of the language, does the least intuitive imaginable thing.

All of the more complex structures are extremely wiley. Data frames behave in absurd ways. They were badly designed. The whole language was badly designed. That's why tidyverse was invented, with tibbles as a nice replacement for data frames. But few functions outside the tidyverse document their behavior when run on tibbles, and very often you'll get a data frame out the other end.

The debugger is awful; breakpoints cannot be set from sourced files. The typing difficulties make this especially egregious.

There is no namespacing; you must invent it yourself.

Even the basic types behave in absurd ways:

> "111" > 27

In short, even once you have a good grasp on the syntax and some of the usual operations, it is incredibly difficult to predict what any given R statement will actually do. It's like solving a puzzle in the dark: you will know when you have the right piece exactly when it snaps into place.

Beginning R Resources for Experienced Programmers

 •  Filed under r

Basics and Syntax

R Language for Programmers

5 Kinds of Subscripts in R

The Google Style Guide, which is better than the Advanced R one

In RStudio, auto-format your code with with the Styler Addin: install('styler')

Technique and Structure

Structuring R Projects

Advanced R, starting with Data Structures and Functional Programming

Data Manipulation using dplyr and tidyr


1:1 generates a sequence of length one, [1]. 1:0 generates a sequence of length two, [1, 0]. Therefore, when an object x with zero rows is passed into for (i in 1:nrow(x)), the code in the for loop will be executed twice, which is probably very different from what you expect. Therefore always use seq_len() or seq_along() in place of : ranges in your for loops. E.g., for (i in seq_len(nrow(x))). Reference.

Trying to learn R reminds me of when I, as an undergraduate math-philosophy double major, had to take a logic class in the philosophy department. In the math department, the class could have been compressed into two weeks. But in the philosophy department, even spread out over three months, it was the subject of much frustration for the less mathematically inclined philosophy students.

Similarly, most resources for learning R are directed at non-programmers, and as such are painfully gentle. So I will be maintaining here a list of satisfyingly terse explainers and tutorials.