I use the brms package, which uses the standard R formula syntax to generate Stan code, which gets translated into C++ code by rstan, get compiled and executed. In this process, a optimized C++ compiler can make a big difference.
Here I ran the brms sample code for the spatial simultaneous autoregressive (SAR) sample code using clang++ and zapcc and benchmared the performance (three times):
clang++: 66.89 (lagsar), 49.02 (errorsar)
zapcc: 56.80 (lagsar), 18.43 (errorsar)
Such a performance boost can be achieved by simply switching a compiler!