Changes to how Rails 3.2.13 and 4.0 encodes unicode in JSON

Heads up: how Rails handles unicode in JSON has changed in rails 3.2.13 and 4.0.

Previously unicode characters would be encoded with \u1234 notation. This encoding was actually a bit buggy, anything over two bytes (e.g. outside the Basic Multilingual Plane) would not render correctly1. With 3.2.13 and 4.0, Rails now just passes through UTF-8. That’s legal under the JSON spec2.

Does this change matter? Probably not, unless you interface with a buggy JSON parser that doesn’t support UTF-8. Also, if you’re using .length to get the bytesize of a JSON message you will now need to use .bytesize instead (previously length would have been equivalent, as it was an ASCII string).

If you need to get the old ascii encoding back, good news, you can! Replace your use of object.to_json with JSON.generate(object, :ascii_only => true). E.g.:

object = {:unicode => "I \u{1F49B} you"}
JSON.generate(object, :ascii_only => true)
 => "{"unicode":"I \\ud83d\\udc9b you"}"

Not only does this encode the unicode characters, it does it correctly even for the extended characters which the old Rails 3.2.12 to_json did not. Awesome!

This is useful for the odd time you have a service that can’t handle UTF-8 encoded JSON. But what if you don’t have control over the code that calls to_json?

If you really need to re-instate the old to_json method, you can do exactly that with a monkey patch. Create a file named config/initializers/json_escape.rb (make sure all your initialisers will auto-load) and give it the contents of this gist.

This monkey-patches ActiveSupport to bring the old to_json back. You may notice some extra commented out code. That code was the patch provided on this bug report. It was never used in rails, but if you prefer you could switch how things are encoded to support characters outside the Basic Multilingual Plane (BMP). I tried it, and while it may have meant the non-BMP characters worked, it further escaped the BMP ones unnecessarily, bloating the size of the JSON (which was important for my APNS use-case). It would also be possible to monkey-patch to_json to use JSON.generate with :ascii_only, but since I solved my problem a neater way I never wrote that code.

The code is taken directly from Rails 3.2.12. If rails removed or changed ESCAPED_CHARS or escape_regex then you could re-define those constants (escape_regex not a constant, just pick one of the two forms) from that same source in this initialiser.

If you simply want the old functionality back, this monkey patch may be for you. If you want proper JSON encoding both UTF-8 and ASCII, then I recommend leaving to_json alone and using JSON.generate as above. Incidentally I’m not using this monkey patch. I did test it as a workaround, but in the end I was able to fix my issue and now use UTF-8 directly.

  1. Try: "\u{1F49B}".to_json on an old install if you dont’ believe me (.should == "\ud83d\udc9b"). []
  2. 3. Encoding: JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. []

One comment on “Changes to how Rails 3.2.13 and 4.0 encodes unicode in JSON

  1. Pingback: Announcing: gs-apns, a new fork of the apns gem. | Don't Dream: Do.