I’ve been reading some very good things about Puma (all-Rack stack, proper thread support, & more). I gave it a try with the thread-friendly Rails4, on the thread-friendly Rubinius. To my surprise, the heroku+rails4+puma+rubinius performance was woeful. I’m not sure what I got wrong, but I was seeing 2 orders of magnitude slower requests than before, with most timing out. I’m not trying to bag Rubinius here, I think it’s a fantastic idea, but for whatever reason it isn’t working for me. Even though Heroku now supports it, perhaps it’s still early days – I’m going to leave it for a while and let it stabilise, at least to the 2.0 release.
So Rubinius didn’t work so great for me – but what about Puma itself? Puma suggests1 you use JRuby or Rubinius as they support true multi-threading, not the total lack of concurrency present in the RMI (the standard/reference ruby interprater). However, it turns out you can use it quite effectively as a Unicorn replacement simply by treating it like Unicorn.
In Unicorn you only have workers (forked processes) for parallelism. Puma introduces threads, and the default is 16. But it also has workers (which are not enabled by default). In a true concurrency environment you would normally have 1 worker per CPU core, and use threads for the rest – in the case of the RMI though, without operating-system-level threads, these workers are the only way to use OS concurrency.
So the key to make Puma behave like Unicorn is to specify multiple workers, but just 1 thread per worker. Your config will look like so:
# config/puma.rb workers 5 threads 1,1
That is, 5 workers (for my app, 5 is the golden number to work within Heroku’s 512MB memory limit) and 1 thread per worker. As long as you leave the threads at 1,1. Now your app is using Puma as a drop-in replacement for Unicorn, ready to be scaled with threads when your underlying Ruby implementation can support them.
Actually what I do on heroku is to use ENV variables (as they suggest for unicorn):
# config/puma.rb workers Integer(ENV["PUMA_WORKERS"] || 5) threads Integer(ENV["PUMA_THREADS_MIN"] || 1), Integer(ENV["PUMA_THREADS"] || 1)
That allows you to tune the performance with a simple
heroku config:set command rather than needing to push & redeploy.
To complete the setup, modify your Profile and Gemfile
# Procfile web: bundle exec puma -p $PORT -e $RACK_ENV -C config/puma.rb
# Gemfile gem "puma", "~> 2.0.1"
If you want to try it with rubinius, read my next post.
What about allowing just 2 threads per worker with Puma on RMI you ask? I ran a bunch of simple ‘ab’ profiles. It seems with any more than 1 thread, the threads just fall over each other and block horribly (not particularly surprisingly due to the GIL). It’s better to leave those requests in the Heroku request queue than jam them through a single worker. No doubt this will be a different story with a true-concurrency ruby engine, but until then my conclusion is that it has a distinctly negative effect.
I also tried doubling the workers on a 2x sized Heroku dyno. I didn’t get any performance benefit from this (it actually went down), so my conclusion is that two 1x dynos with 5 workers each is better than one 2x dyno with 10. As always, your milage will certainly vary!
Why bother with Puma if you’re going to treat it just like Unicorn? Aside from a small speed increase, and being thread-ready, Puma also handles incoming requests better. In Unicorn, the worker is tied up from the moment the client hits the server until the request is finished, so if the client request is slow, or the HTTP body is large, that worker is wasted. You can read more about that problem on heroku.
In conclusion: you can treat Puma like unicorn by setting the threads to 1, and using workers as you do in unicorn. The speed increase may not be massive, since you’re still process bound, but it’s a more modern server and ready to go multi-threaded when ruby is. Play around with your worker count, the perfect number is 1 below whatever number causes memory warnings on Heroku.