I think the best piece of advice I could give to someone considering doing an R project of more than twenty lines of code is the following:
Don't. Use Python with Pandas.
I have many beefs with R, but the main one is the type system. It is nearly impossible to predict what most functions will do based on their type. This is largely due to the fact that everything is a vector, so it's ambiguous whether a function operates on a whole vector or on each of its elements.
Also, the vector might accidentally be a factor, which behaves differently from a normal vector. Casting from a factor to a numeric vector, like much of the language, does the least intuitive imaginable thing.
All of the more complex structures are extremely wiley. Data frames behave in absurd ways. They were badly designed. The whole language was badly designed. That's why tidyverse was invented, with tibbles as a nice replacement for data frames. But few functions outside the tidyverse document their behavior when run on tibbles, and very often you'll get a data frame out the other end.
The debugger is awful; breakpoints cannot be set from sourced files. The typing difficulties make this especially egregious.
There is no namespacing; you must invent it yourself.
Even the basic types behave in absurd ways:
> "111" > 27  FALSE
In short, even once you have a good grasp on the syntax and some of the usual operations, it is incredibly difficult to predict what any given R statement will actually do. It's like solving a puzzle in the dark: you will know when you have the right piece exactly when it snaps into place.