aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorZhiming Wang <zmwangx@gmail.com>2015-01-01 23:45:48 -0800
committerZhiming Wang <zmwangx@gmail.com>2015-01-01 23:45:48 -0800
commiteea089c884a1f15bdd7600d4d72573189ce6ff9f (patch)
treeee600c4245cc88925e8dd9d6ed804c08120b2bd8
parent8c7797ee2ed19a165d2b4b41db31205309495f0d (diff)
downloadmy_new_personal_website-eea089c884a1f15bdd7600d4d72573189ce6ff9f.tar.xz
my_new_personal_website-eea089c884a1f15bdd7600d4d72573189ce6ff9f.zip
20150101 OS X system ruby encoding annoyance
-rw-r--r--source/_posts/2015-01-01-os-x-system-ruby-encoding-annoyance.md43
1 files changed, 43 insertions, 0 deletions
diff --git a/source/_posts/2015-01-01-os-x-system-ruby-encoding-annoyance.md b/source/_posts/2015-01-01-os-x-system-ruby-encoding-annoyance.md
new file mode 100644
index 00000000..2bef694d
--- /dev/null
+++ b/source/_posts/2015-01-01-os-x-system-ruby-encoding-annoyance.md
@@ -0,0 +1,43 @@
+---
+layout: post
+title: "OS X system ruby encoding annoyance"
+date: 2015-01-01 22:49:39 -0800
+comments: true
+categories:
+---
+I've been using RVM (with fairly up-to-date Rubies) and pry since my day one with Ruby (well, almost), so it actually surprises me today when I found out by chance how poorly the system Ruby behaves when it comes to encoding.
+
+The major annoyance with the current system Ruby (2.0.0p481) is that it can't convert `UTF8-MAC` to `UTF-8` (namely, NFD to NFC, as far as I can tell), at least not with Korean characters. Consider the following script:
+
+```ruby utf8-mac.rb
+# coding: utf-8
+require 'hex_string'
+str = "에이핑크"
+puts str.to_hex_string
+puts str.encode("UTF-8", "UTF8-MAC").to_hex_string
+```
+
+Here are what I get with the system Ruby and the latested brewed Ruby:
+
+```bash
+> /usr/bin/ruby --version
+ruby 2.0.0p481 (2014-05-08 revision 45883) [universal.x86_64-darwin14]
+> /usr/local/bin/ruby --version
+ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin14]
+> /usr/bin/ruby utf8-mac.rb
+e1 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
+e1 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
+> /usr/local/bin/ruby utf8-mac.rb
+e1 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
+ec 97 90 ec 9d b4 ed 95 91 ed 81 ac
+```
+
+As you can see, in the case of the system Ruby, NFD is left untouched. This leads to problems with, for instance, Google Translate. One obvious solution is to outsource the task to `iconv`, but I have the impression that outsourcing language features to shell commands is a generally despised practice.
+
+There's one more surprise. While `pry` with latest Rubies tend to handle Unicode very well (unlike `irb`), I tried `pry` with the current system Ruby, and it doesn't work; due to this annoying limitation, I couldn't even test the above problem interactively, and had to resort to a script. Maybe the problem can be resolved by compiling Ruby with `readline` or whatever; I didn't bother. The bottom line is, the system Ruby is not very pleasant for men in the 21st century — good Unicode support ought to be a must. (By the way, NFD in HFS+ is maddening. It breaks Terminal, iTerm, Google Translate, scp with Linux hosts, and the list goes on.)
+
+P.S. In Dropzone 3 custom actions you can select a custom Ruby with the `RubyPath` meta field, e.g.,
+
+```ruby
+# RubyPath: /usr/local/bin/ruby
+```