After upgrading to Rails4, I started to see this error
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query: in my logs. It seemed random in some places (normal, but generally larger than average queries), and less random in others (high failure rate in long running queries, e.g. in workers).
It turns out, I had inadvertently set the
reaping_frequency to 10s during my upgrade to Rails4. It wasn’t Rails4’s fault, but part of the config changes I made around this upgrade.
At some point in Rails, 10 seconds was flagged as the default (replacing the previous “no reaping” default). This caused problems, and was reverted. Sadly not before the practice spread over the internet, including into Heroku’s official recommendation for threaded servers which I implemented, and at the time of writing is still Heroku’s recommendation.
ActiveRecord’s connection reaper is designed to remove “dead” connections, and it seems this was promoted as a good thing to do at the time, especially in multi-threaded environments. However, it appears that it does not function correctly and can kill alive connections too. Rails issue #9907 indicated that it can kill longer running queries, and the Rails commit message when it was disabled by default (again) indicated it could cause segfaults in multi-threaded environments. Sounds bad all-round!
Searching my codebase for
reaping_frequency and removing all traces of this config (especially the
|| 10 on the config line recommended by Heroku – which was enabling the reaper, with a frequency of 10s by default) fixed the issue.
If you needed the reaper for it’s “good” side effects (as in you have connections that are for some reason actually dying that need clearing up), then I suggest it might be better to solve that problem another way, one with less side-effects, for starters you could try and stop them dying in the first place.