Puma on Heroku with MRI

Or, How to use Puma like Unicorn

I’ve been reading some very good things about Puma (all-Rack stack, proper thread support, & more). I gave it a try with the thread-friendly Rails4, on the thread-friendly Rubinius. To my surprise, the heroku+rails4+puma+rubinius performance was woeful. I’m not sure what I got wrong, but I was seeing 2 orders of magnitude slower requests than before, with most timing out. I’m not trying to bag Rubinius here, I think it’s a fantastic idea, but for whatever reason it isn’t working for me. Even though Heroku now supports it, perhaps it’s still early days – I’m going to leave it for a while and let it stabilise, at least to the 2.0 release.

So Rubinius didn’t work so great for me – but what about Puma itself? Puma suggests1 you use JRuby or Rubinius as they support true multi-threading, not the total lack of concurrency present in the RMI (the standard/reference ruby interprater). However, it turns out you can use it quite effectively as a Unicorn replacement simply by treating it like Unicorn.

In Unicorn you only have workers (forked processes) for parallelism. Puma introduces threads, and the default is 16. But it also has workers (which are not enabled by default). In a true concurrency environment you would normally have 1 worker per CPU core, and use threads for the rest – in the case of the RMI though, without operating-system-level threads, these workers are the only way to use OS concurrency.

So the key to make Puma behave like Unicorn is to specify multiple workers, but just 1 thread per worker. Your config will look like so:

# config/puma.rb
workers 5
threads 1,1

That is, 5 workers (for my app, 5 is the golden number to work within Heroku’s 512MB memory limit) and 1 thread per worker. As long as you leave the threads at 1,1. Now your app is using Puma as a drop-in replacement for Unicorn, ready to be scaled with threads when your underlying Ruby implementation can support them.

Actually what I do on heroku is to use ENV variables (as they suggest for unicorn):

# config/puma.rb
workers Integer(ENV["PUMA_WORKERS"] || 5)
threads Integer(ENV["PUMA_THREADS_MIN"] || 1), Integer(ENV["PUMA_THREADS"] || 1)

That allows you to tune the performance with a simple heroku config:set command rather than needing to push & redeploy.

To complete the setup, modify your Profile and Gemfile

# Procfile
web: bundle exec puma -p $PORT -e $RACK_ENV -C config/puma.rb
# Gemfile
gem "puma", "~> 2.0.1"

If you want to try it with rubinius, read my next post.

What about allowing just 2 threads per worker with Puma on RMI you ask? I ran a bunch of simple ‘ab’ profiles. It seems with any more than 1 thread, the threads just fall over each other and block horribly (not particularly surprisingly due to the GIL). It’s better to leave those requests in the Heroku request queue than jam them through a single worker. No doubt this will be a different story with a true-concurrency ruby engine, but until then my conclusion is that it has a distinctly negative effect.

I also tried doubling the workers on a 2x sized Heroku dyno. I didn’t get any performance benefit from this (it actually went down), so my conclusion is that two 1x dynos with 5 workers each is better than one 2x dyno with 10. As always, your milage will certainly vary!

Why bother with Puma if you’re going to treat it just like Unicorn? Aside from a small speed increase, and being thread-ready, Puma also handles incoming requests better. In Unicorn, the worker is tied up from the moment the client hits the server until the request is finished, so if the client request is slow, or the HTTP body is large, that worker is wasted. You can read more about that problem on heroku.

In conclusion: you can treat Puma like unicorn by setting the threads to 1, and using workers as you do in unicorn. The speed increase may not be massive, since you’re still process bound, but it’s a more modern server and ready to go multi-threaded when ruby is. Play around with your worker count, the perfect number is 1 below whatever number causes memory warnings on Heroku.

  1. “Puma is designed to be used on a Ruby implementation which provides true parallelism, such as Rubinius and JRuby.” – puma []

4 comments on “Puma on Heroku with MRI

  1. Pingback: Trying out Rubinus | Don't Dream: Do.

  2. I was sad to read this because I’ve contemplating just the same thing – reducing memory usage drastically by using a true-threaded server instead of process-based.

    Even with ruby 2.0 my shared memory is horrible (only about 20mb/160mb), I wish I knew what was causing it.

    Were you using ruby 2.0 with the poor threaded puma performance? I was hoping even with the GIL it would still thread decently…

    Thanks,
    Kevin

  3. I tried using ruby 1.9.3, rubinus2.0.0rc1. I put my details up at: http://omegadelta.net/2013/06/16/trying-out-rubinus/ it’s really easy to deploy a rubinius test to Heroku, just 1 config line change!

    I don’t know what went wrong. Maybe I need to try a different mysql gem or something. But it was really slow. Everything works fine on my dev machine. If I get some time I’ll try investigating what’s going on with Heroku. For now, I figure 1 RC library (Rails 4) is enough change to deal with.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>