From 9a88e9ff0385f66e7c565a394908503dc6e916ad Mon Sep 17 00:00:00 2001 From: neodarz Date: Fri, 28 Apr 2017 00:30:19 +0200 Subject: Site updated at 2017-04-28T00:29:42+02:00 source branch was at: f1965c50670f611ef54f9471490d45a554f7d866 Correct a link --- ...-01-01-os-x-system-ruby-encoding-annoyance.html | 62 ++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 build/blog/2015-01-01-os-x-system-ruby-encoding-annoyance.html (limited to 'build/blog/2015-01-01-os-x-system-ruby-encoding-annoyance.html') diff --git a/build/blog/2015-01-01-os-x-system-ruby-encoding-annoyance.html b/build/blog/2015-01-01-os-x-system-ruby-encoding-annoyance.html new file mode 100644 index 00000000..89f18c5d --- /dev/null +++ b/build/blog/2015-01-01-os-x-system-ruby-encoding-annoyance.html @@ -0,0 +1,62 @@ + + + + + + + +OS X system ruby encoding annoyance + + + + + + + + + +
This blog has been archived.
Visit my home page at zhimingwang.org.
+ +
+
+

OS X system ruby encoding annoyance

+ +
+

I've been using RVM (with fairly up-to-date Rubies) and pry since my day one with Ruby (well, almost), so it actually surprises me today when I found out by chance how poorly the system Ruby behaves when it comes to encoding.

+

The major annoyance with the current system Ruby (2.0.0p481) is that it can't convert UTF8-MAC to UTF-8 (namely, NFD to NFC, as far as I can tell), at least not with Korean characters. Consider the following script:

+
# coding: utf-8
+require 'hex_string'
+str = "에이핑크"
+puts str.to_hex_string
+puts str.encode("UTF-8", "UTF8-MAC").to_hex_string
+

Here are what I get with the system Ruby and the latested brewed Ruby:

+
> /usr/bin/ruby --version
+ruby 2.0.0p481 (2014-05-08 revision 45883) [universal.x86_64-darwin14]
+> /usr/local/bin/ruby --version
+ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin14]
+> /usr/bin/ruby utf8-mac.rb
+e1 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
+e1 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
+> /usr/local/bin/ruby utf8-mac.rb
+e1 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
+ec 97 90 ec 9d b4 ed 95 91 ed 81 ac
+

As you can see, in the case of the system Ruby, NFD is left untouched. This leads to problems with, for instance, Google Translate. One obvious solution is to outsource the task to iconv, but I have the impression that outsourcing language features to shell commands is a generally despised practice.

+

There's one more surprise. While pry with latest Rubies tend to handle Unicode very well (unlike irb), I tried pry with the current system Ruby, and it doesn't work; due to this annoying limitation, I couldn't even test the above problem interactively, and had to resort to a script. Maybe the problem can be resolved by compiling Ruby with readline or whatever; I didn't bother. The bottom line is, the system Ruby is not very pleasant for men in the 21st century — good Unicode support ought to be a must. (By the way, NFD in HFS+ is maddening. It breaks Terminal, iTerm, Google Translate, scp with Linux hosts, and the list goes on.)

+

P.S. In Dropzone 3 custom actions you can select a custom Ruby with the RubyPath meta field, e.g.,

+
# RubyPath: /usr/local/bin/ruby
+
+
+ + + -- cgit v1.2.1