I’ve been reading some very good things about Puma (all-Rack stack, proper thread support, & more). I gave it a try with the thread-friendly Rails4, on the thread-friendly Rubinius. To my surprise, the heroku+rails4+puma+rubinius performance was woeful. I’m not sure what I got wrong, but I was seeing 2 orders of magnitude slower requests than before, with most timing out. I’m not trying to bag Rubinius here, I think it’s a fantastic idea, but for whatever reason it isn’t working for me. Even though Heroku now supports it, perhaps it’s still early days – I’m going to leave it for a while and let it stabilise, at least to the 2.0 release.
So Rubinius didn’t work so great for me – but what about Puma itself? Puma suggests1 you use JRuby or Rubinius as they support true multi-threading, not the total lack of concurrency present in the RMI (the standard/reference ruby interprater). However, it turns out you can use it quite effectively as a Unicorn replacement simply by treating it like Unicorn.
In Unicorn you only have workers (forked processes) for parallelism. Puma introduces threads, and the default is 16. But it also has workers (which are not enabled by default). In a true concurrency environment you would normally have 1 worker per CPU core, and use threads for the rest – in the case of the RMI though, without operating-system-level threads, these workers are the only way to use OS concurrency.
So the key to make Puma behave like Unicorn is to specify multiple workers, but just 1 thread per worker. Your config will look like so:
# config/puma.rb workers 5 threads 1,1
That is, 5 workers (for my app, 5 is the golden number to work within Heroku’s 512MB memory limit) and 1 thread per worker. As long as you leave the threads at 1,1. Now your app is using Puma as a drop-in replacement for Unicorn, ready to be scaled with threads when your underlying Ruby implementation can support them.
Actually what I do on heroku is to use ENV variables (as they suggest for unicorn):
# config/puma.rb workers Integer(ENV["PUMA_WORKERS"] || 5) threads Integer(ENV["PUMA_THREADS_MIN"] || 1), Integer(ENV["PUMA_THREADS"] || 1)
That allows you to tune the performance with a simple heroku config:set
command rather than needing to push & redeploy.
To complete the setup, modify your Profile and Gemfile
# Procfile web: bundle exec puma -p $PORT -e $RACK_ENV -C config/puma.rb
# Gemfile gem "puma", "~> 2.0.1"
If you want to try it with rubinius, read my next post.
What about allowing just 2 threads per worker with Puma on RMI you ask? I ran a bunch of simple ‘ab’ profiles. It seems with any more than 1 thread, the threads just fall over each other and block horribly (not particularly surprisingly due to the GIL). It’s better to leave those requests in the Heroku request queue than jam them through a single worker. No doubt this will be a different story with a true-concurrency ruby engine, but until then my conclusion is that it has a distinctly negative effect.
I also tried doubling the workers on a 2x sized Heroku dyno. I didn’t get any performance benefit from this (it actually went down), so my conclusion is that two 1x dynos with 5 workers each is better than one 2x dyno with 10. As always, your milage will certainly vary!
Why bother with Puma if you’re going to treat it just like Unicorn? Aside from a small speed increase, and being thread-ready, Puma also handles incoming requests better. In Unicorn, the worker is tied up from the moment the client hits the server until the request is finished, so if the client request is slow, or the HTTP body is large, that worker is wasted. You can read more about that problem on heroku.
In conclusion: you can treat Puma like unicorn by setting the threads to 1, and using workers as you do in unicorn. The speed increase may not be massive, since you’re still process bound, but it’s a more modern server and ready to go multi-threaded when ruby is. Play around with your worker count, the perfect number is 1 below whatever number causes memory warnings on Heroku.
Pingback: Trying out Rubinus | Don't Dream: Do.
I was sad to read this because I’ve contemplating just the same thing – reducing memory usage drastically by using a true-threaded server instead of process-based.
Even with ruby 2.0 my shared memory is horrible (only about 20mb/160mb), I wish I knew what was causing it.
Were you using ruby 2.0 with the poor threaded puma performance? I was hoping even with the GIL it would still thread decently…
Thanks,
Kevin
I tried using ruby 1.9.3, rubinus2.0.0rc1. I put my details up at: http://omegadelta.net/2013/06/16/trying-out-rubinus/ it’s really easy to deploy a rubinius test to Heroku, just 1 config line change!
I don’t know what went wrong. Maybe I need to try a different mysql gem or something. But it was really slow. Everything works fine on my dev machine. If I get some time I’ll try investigating what’s going on with Heroku. For now, I figure 1 RC library (Rails 4) is enough change to deal with.
I think you mean “MRI” not “RMI” :-)
Pingback: Free Background Processes on Heroku with One Web Dyno - travisluong.com
بوابة We indulge in, cause I came across precisely what I’d been in search of. You could have wrapped up this four working day very long seek out! Lord Many thanks dude. Use a good morning. Bye