1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
|
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="pandoc" name="generator"/>
<meta content="Zhiming Wang" name="author"/>
<meta content="2015-01-01T22:49:39-0800" name="date"/>
<title>OS X system ruby encoding annoyance</title>
<link href="/img/apple-touch-icon-152.png" rel="apple-touch-icon-precomposed"/>
<meta content="#FFFFFF" name="msapplication-TileColor"/>
<meta content="/img/favicon-144.png" name="msapplication-TileImage"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="/css/normalize.min.css" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/theme.css" media="all" rel="stylesheet" type="text/css"/>
<link href="/css/highlight.css" media="all" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="archival-notice">This blog has been archived.<br/>Visit my home page at <a href="https://zhimingwang.org">zhimingwang.org</a>.</div>
<nav class="nav">
<a class="nav-icon" href="/" title="Home"><!--blog icon--></a>
<a class="nav-title" href="/"><!--blog title--></a>
<a class="nav-author" href="https://github.com/zmwangx" target="_blank"><!--blog author--></a>
</nav>
<article class="content">
<header class="article-header">
<h1 class="article-title">OS X system ruby encoding annoyance</h1>
<div class="article-metadata">
<time class="article-timestamp" datetime="2015-01-01T22:49:39-0800">January 1, 2015</time>
</div>
</header>
<p>I've been using RVM (with fairly up-to-date Rubies) and pry since my day one with Ruby (well, almost), so it actually surprises me today when I found out by chance how poorly the system Ruby behaves when it comes to encoding.</p>
<p>The major annoyance with the current system Ruby (2.0.0p481) is that it can't convert <code>UTF8-MAC</code> to <code>UTF-8</code> (namely, NFD to NFC, as far as I can tell), at least not with Korean characters. Consider the following script:</p>
<div class="sourceCode"><pre class="sourceCode ruby"><code class="sourceCode ruby"><span class="co"># coding: utf-8</span>
require <span class="st">'hex_string'</span>
str = <span class="st">"에이핑크"</span>
puts str.to_hex_string
puts str.encode(<span class="st">"UTF-8"</span>, <span class="st">"UTF8-MAC"</span>).to_hex_string</code></pre></div>
<p>Here are what I get with the system Ruby and the latested brewed Ruby:</p>
<div class="sourceCode"><pre class="sourceCode bash"><code class="sourceCode bash"><span class="op">></span> <span class="ex">/usr/bin/ruby</span> --version
<span class="ex">ruby</span> 2.0.0p481 (2014-05-08 revision 45883) [<span class="ex">universal.x86_64-darwin14</span>]
<span class="op">></span> <span class="ex">/usr/local/bin/ruby</span> --version
<span class="ex">ruby</span> 2.2.0p0 (2014-12-25 revision 49005) [<span class="ex">x86_64-darwin14</span>]
<span class="op">></span> <span class="ex">/usr/bin/ruby</span> utf8-mac.rb
<span class="ex">e1</span> 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
<span class="ex">e1</span> 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
<span class="op">></span> <span class="ex">/usr/local/bin/ruby</span> utf8-mac.rb
<span class="ex">e1</span> 84 8b e1 85 a6 e1 84 8b e1 85 b5 e1 84 91 e1 85 b5 e1 86 bc e1 84 8f e1 85 b3
<span class="ex">ec</span> 97 90 ec 9d b4 ed 95 91 ed 81 ac</code></pre></div>
<p>As you can see, in the case of the system Ruby, NFD is left untouched. This leads to problems with, for instance, Google Translate. One obvious solution is to outsource the task to <code>iconv</code>, but I have the impression that outsourcing language features to shell commands is a generally despised practice.</p>
<p>There's one more surprise. While <code>pry</code> with latest Rubies tend to handle Unicode very well (unlike <code>irb</code>), I tried <code>pry</code> with the current system Ruby, and it doesn't work; due to this annoying limitation, I couldn't even test the above problem interactively, and had to resort to a script. Maybe the problem can be resolved by compiling Ruby with <code>readline</code> or whatever; I didn't bother. The bottom line is, the system Ruby is not very pleasant for men in the 21st century — good Unicode support ought to be a must. (By the way, NFD in HFS+ is maddening. It breaks Terminal, iTerm, Google Translate, scp with Linux hosts, and the list goes on.)</p>
<p>P.S. In Dropzone 3 custom actions you can select a custom Ruby with the <code>RubyPath</code> meta field, e.g.,</p>
<div class="sourceCode"><pre class="sourceCode ruby"><code class="sourceCode ruby"><span class="co"># RubyPath: /usr/local/bin/ruby</span></code></pre></div>
</article>
<hr class="content-separator"/>
<footer class="footer">
<span class="rfooter">
<a class="rss-icon" href="/rss.xml" target="_blank" title="RSS feed"><!--RSS feed icon--></a><a class="atom-icon" href="/atom.xml" target="_blank" title="Atom feed"><!--Atom feed icon--></a><a class="cc-icon" href="https://creativecommons.org/licenses/by/4.0/" target="_blank" title="Released under the Creative Commons Attribution 4.0 International license."><!--CC icon--></a>
<a href="https://github.com/zmwangx" target="_blank">Zhiming Wang</a>
</span>
</footer>
</body>
</html>
|