こんにちは。NaCl東京支社の小田です。AdaにType Parametersがあったことに驚いている、生粋のRailsプログラマです(^^)。今、上司からたまたま勧められた萩谷昌己さんの「ソフトウェア考現学」 を読んでいます。私が生まれた年にかかれた古い本ですが、time-testedな記述も多く、とても楽しみながら読んでいます。生まれ年の本を読む格別は、生まれ年のワインを飲むそれに似ているとよく言われますが、それを実感しています。以下は、その本のまえがきの一部になります。溢れ出る仏教感から諸行無常の鐘の音がきっと聞こえるはずです。興味がわいた方はぜひ。
Character Propertyはある文字集合を表します。ここではプロパティと呼びます。この文字集合はそれ自身に含まれる文字をマッチさせるために、正規表現リテラルの中で使用します。使用する際は、プロパティの名前を使用し \p{プロパティの名前}
のように記述します。例えば、ひらがなにマッチさせたい正規表現は、Hiraganaプロパティを使用すると以下のようにかけます。
正規表現リテラル内では、Hiraganaの他にも様々なプロパティを使用することができます。ただ、エンコーディングスキーマ毎に利用できるプロパティが異なっているので注意してください。UTF-8はRubyがサポートしている全てのプロパティ を利用できますが、EUC-JPとShift_JISのような文字集合としてASCIIとJIS X 0208を採用しているエンコーディングスキーマでは、以下のプロパティのみをサポートしてます。
この機能の一つの利点は、UTF-8とShift_JISとEUC-JPのいずれのエンコーディングスキーマでも使用できるプロパティが存在する点です。特にHiraganaとKatakanaをそれぞれのエンコーディングスキーマがサポートしているため、バイト列を意識することなく正規表現をかけます。すごくうれしいですね。
各プロパティに含まれる文字を余すことなく列挙できるプログラマは少ないと思います。おそらく多くのプログラマは、セマンティックスを理解するためにRubyのソースコードをその都度読んでいるのではないでしょうか。これはみんなの貴重な時間を無駄にしているため、あまりうれしくない状況です。
使用方法は簡単で、このプログラムを「script_to_codepoints」という名前で保存して、実行権限をつけ、引数にプロパティ名を渡して実行するだけです。一応、Windows 10でも動作することを確認しました。Mac OS Xでは試していないですがおそらく動くと思います。
以下はKatakanaを引数に指定して実行した例です。端末によっては、必要なフォントがないために一部の文字を表示できないケースがあります。その場合は、標準出力をファイルにリダイレクトし、そのファイルをWindowsに移動して、最後にIEで確認してみてください。
$ ruby script_to_codepoints Katakana
From U+30A1 to U+30FA
[[ "U+30A1" , "ァ" ] , [ "U+30A2" , "ア" ] , [ "U+30A3" , "ィ" ] , [ "U+30A4" , "イ" ]]
[[ "U+30A5" , "ゥ" ] , [ "U+30A6" , "ウ" ] , [ "U+30A7" , "ェ" ] , [ "U+30A8" , "エ" ]]
[[ "U+30A9" , "ォ" ] , [ "U+30AA" , "オ" ] , [ "U+30AB" , "カ" ] , [ "U+30AC" , "ガ" ]]
[[ "U+30AD" , "キ" ] , [ "U+30AE" , "ギ" ] , [ "U+30AF" , "ク" ] , [ "U+30B0" , "グ" ]]
[[ "U+30B1" , "ケ" ] , [ "U+30B2" , "ゲ" ] , [ "U+30B3" , "コ" ] , [ "U+30B4" , "ゴ" ]]
[[ "U+30B5" , "サ" ] , [ "U+30B6" , "ザ" ] , [ "U+30B7" , "シ" ] , [ "U+30B8" , "ジ" ]]
[[ "U+30B9" , "ス" ] , [ "U+30BA" , "ズ" ] , [ "U+30BB" , "セ" ] , [ "U+30BC" , "ゼ" ]]
[[ "U+30BD" , "ソ" ] , [ "U+30BE" , "ゾ" ] , [ "U+30BF" , "タ" ] , [ "U+30C0" , "ダ" ]]
[[ "U+30C1" , "チ" ] , [ "U+30C2" , "ヂ" ] , [ "U+30C3" , "ッ" ] , [ "U+30C4" , "ツ" ]]
[[ "U+30C5" , "ヅ" ] , [ "U+30C6" , "テ" ] , [ "U+30C7" , "デ" ] , [ "U+30C8" , "ト" ]]
[[ "U+30C9" , "ド" ] , [ "U+30CA" , "ナ" ] , [ "U+30CB" , "ニ" ] , [ "U+30CC" , "ヌ" ]]
[[ "U+30CD" , "ネ" ] , [ "U+30CE" , "ノ" ] , [ "U+30CF" , "ハ" ] , [ "U+30D0" , "バ" ]]
[[ "U+30D1" , "パ" ] , [ "U+30D2" , "ヒ" ] , [ "U+30D3" , "ビ" ] , [ "U+30D4" , "ピ" ]]
[[ "U+30D5" , "フ" ] , [ "U+30D6" , "ブ" ] , [ "U+30D7" , "プ" ] , [ "U+30D8" , "ヘ" ]]
[[ "U+30D9" , "ベ" ] , [ "U+30DA" , "ペ" ] , [ "U+30DB" , "ホ" ] , [ "U+30DC" , "ボ" ]]
[[ "U+30DD" , "ポ" ] , [ "U+30DE" , "マ" ] , [ "U+30DF" , "ミ" ] , [ "U+30E0" , "ム" ]]
[[ "U+30E1" , "メ" ] , [ "U+30E2" , "モ" ] , [ "U+30E3" , "ャ" ] , [ "U+30E4" , "ヤ" ]]
[[ "U+30E5" , "ュ" ] , [ "U+30E6" , "ユ" ] , [ "U+30E7" , "ョ" ] , [ "U+30E8" , "ヨ" ]]
[[ "U+30E9" , "ラ" ] , [ "U+30EA" , "リ" ] , [ "U+30EB" , "ル" ] , [ "U+30EC" , "レ" ]]
[[ "U+30ED" , "ロ" ] , [ "U+30EE" , "ヮ" ] , [ "U+30EF" , "ワ" ] , [ "U+30F0" , "ヰ" ]]
[[ "U+30F1" , "ヱ" ] , [ "U+30F2" , "ヲ" ] , [ "U+30F3" , "ン" ] , [ "U+30F4" , "ヴ" ]]
[[ "U+30F5" , "ヵ" ] , [ "U+30F6" , "ヶ" ] , [ "U+30F7" , "ヷ" ] , [ "U+30F8" , "ヸ" ]]
[[ "U+30F9" , "ヹ" ] , [ "U+30FA" , "ヺ" ]]
From U+30FD to U+30FF
[[ "U+30FD" , "ヽ" ] , [ "U+30FE" , "ヾ" ] , [ "U+30FF" , "ヿ" ]]
From U+31F0 to U+31FF
[[ "U+31F0" , "ㇰ" ] , [ "U+31F1" , "ㇱ" ] , [ "U+31F2" , "ㇲ" ] , [ "U+31F3" , "ㇳ" ]]
[[ "U+31F4" , "ㇴ" ] , [ "U+31F5" , "ㇵ" ] , [ "U+31F6" , "ㇶ" ] , [ "U+31F7" , "ㇷ" ]]
[[ "U+31F8" , "ㇸ" ] , [ "U+31F9" , "ㇹ" ] , [ "U+31FA" , "ㇺ" ] , [ "U+31FB" , "ㇻ" ]]
[[ "U+31FC" , "ㇼ" ] , [ "U+31FD" , "ㇽ" ] , [ "U+31FE" , "ㇾ" ] , [ "U+31FF" , "ㇿ" ]]
From U+32D0 to U+32FE
[[ "U+32D0" , "㋐" ] , [ "U+32D1" , "㋑" ] , [ "U+32D2" , "㋒" ] , [ "U+32D3" , "㋓" ]]
[[ "U+32D4" , "㋔" ] , [ "U+32D5" , "㋕" ] , [ "U+32D6" , "㋖" ] , [ "U+32D7" , "㋗" ]]
[[ "U+32D8" , "㋘" ] , [ "U+32D9" , "㋙" ] , [ "U+32DA" , "㋚" ] , [ "U+32DB" , "㋛" ]]
[[ "U+32DC" , "㋜" ] , [ "U+32DD" , "㋝" ] , [ "U+32DE" , "㋞" ] , [ "U+32DF" , "㋟" ]]
[[ "U+32E0" , "㋠" ] , [ "U+32E1" , "㋡" ] , [ "U+32E2" , "㋢" ] , [ "U+32E3" , "㋣" ]]
[[ "U+32E4" , "㋤" ] , [ "U+32E5" , "㋥" ] , [ "U+32E6" , "㋦" ] , [ "U+32E7" , "㋧" ]]
[[ "U+32E8" , "㋨" ] , [ "U+32E9" , "㋩" ] , [ "U+32EA" , "㋪" ] , [ "U+32EB" , "㋫" ]]
[[ "U+32EC" , "㋬" ] , [ "U+32ED" , "㋭" ] , [ "U+32EE" , "㋮" ] , [ "U+32EF" , "㋯" ]]
[[ "U+32F0" , "㋰" ] , [ "U+32F1" , "㋱" ] , [ "U+32F2" , "㋲" ] , [ "U+32F3" , "㋳" ]]
[[ "U+32F4" , "㋴" ] , [ "U+32F5" , "㋵" ] , [ "U+32F6" , "㋶" ] , [ "U+32F7" , "㋷" ]]
[[ "U+32F8" , "㋸" ] , [ "U+32F9" , "㋹" ] , [ "U+32FA" , "㋺" ] , [ "U+32FB" , "㋻" ]]
[[ "U+32FC" , "㋼" ] , [ "U+32FD" , "㋽" ] , [ "U+32FE" , "㋾" ]]
From U+3300 to U+3357
[[ "U+3300" , "㌀" ] , [ "U+3301" , "㌁" ] , [ "U+3302" , "㌂" ] , [ "U+3303" , "㌃" ]]
[[ "U+3304" , "㌄" ] , [ "U+3305" , "㌅" ] , [ "U+3306" , "㌆" ] , [ "U+3307" , "㌇" ]]
[[ "U+3308" , "㌈" ] , [ "U+3309" , "㌉" ] , [ "U+330A" , "㌊" ] , [ "U+330B" , "㌋" ]]
[[ "U+330C" , "㌌" ] , [ "U+330D" , "㌍" ] , [ "U+330E" , "㌎" ] , [ "U+330F" , "㌏" ]]
[[ "U+3310" , "㌐" ] , [ "U+3311" , "㌑" ] , [ "U+3312" , "㌒" ] , [ "U+3313" , "㌓" ]]
[[ "U+3314" , "㌔" ] , [ "U+3315" , "㌕" ] , [ "U+3316" , "㌖" ] , [ "U+3317" , "㌗" ]]
[[ "U+3318" , "㌘" ] , [ "U+3319" , "㌙" ] , [ "U+331A" , "㌚" ] , [ "U+331B" , "㌛" ]]
[[ "U+331C" , "㌜" ] , [ "U+331D" , "㌝" ] , [ "U+331E" , "㌞" ] , [ "U+331F" , "㌟" ]]
[[ "U+3320" , "㌠" ] , [ "U+3321" , "㌡" ] , [ "U+3322" , "㌢" ] , [ "U+3323" , "㌣" ]]
[[ "U+3324" , "㌤" ] , [ "U+3325" , "㌥" ] , [ "U+3326" , "㌦" ] , [ "U+3327" , "㌧" ]]
[[ "U+3328" , "㌨" ] , [ "U+3329" , "㌩" ] , [ "U+332A" , "㌪" ] , [ "U+332B" , "㌫" ]]
[[ "U+332C" , "㌬" ] , [ "U+332D" , "㌭" ] , [ "U+332E" , "㌮" ] , [ "U+332F" , "㌯" ]]
[[ "U+3330" , "㌰" ] , [ "U+3331" , "㌱" ] , [ "U+3332" , "㌲" ] , [ "U+3333" , "㌳" ]]
[[ "U+3334" , "㌴" ] , [ "U+3335" , "㌵" ] , [ "U+3336" , "㌶" ] , [ "U+3337" , "㌷" ]]
[[ "U+3338" , "㌸" ] , [ "U+3339" , "㌹" ] , [ "U+333A" , "㌺" ] , [ "U+333B" , "㌻" ]]
[[ "U+333C" , "㌼" ] , [ "U+333D" , "㌽" ] , [ "U+333E" , "㌾" ] , [ "U+333F" , "㌿" ]]
[[ "U+3340" , "㍀" ] , [ "U+3341" , "㍁" ] , [ "U+3342" , "㍂" ] , [ "U+3343" , "㍃" ]]
[[ "U+3344" , "㍄" ] , [ "U+3345" , "㍅" ] , [ "U+3346" , "㍆" ] , [ "U+3347" , "㍇" ]]
[[ "U+3348" , "㍈" ] , [ "U+3349" , "㍉" ] , [ "U+334A" , "㍊" ] , [ "U+334B" , "㍋" ]]
[[ "U+334C" , "㍌" ] , [ "U+334D" , "㍍" ] , [ "U+334E" , "㍎" ] , [ "U+334F" , "㍏" ]]
[[ "U+3350" , "㍐" ] , [ "U+3351" , "㍑" ] , [ "U+3352" , "㍒" ] , [ "U+3353" , "㍓" ]]
[[ "U+3354" , "㍔" ] , [ "U+3355" , "㍕" ] , [ "U+3356" , "㍖" ] , [ "U+3357" , "㍗" ]]
From U+FF66 to U+FF6F
[[ "U+FF66" , "ヲ" ] , [ "U+FF67" , "ァ" ] , [ "U+FF68" , "ィ" ] , [ "U+FF69" , "ゥ" ]]
[[ "U+FF6A" , "ェ" ] , [ "U+FF6B" , "ォ" ] , [ "U+FF6C" , "ャ" ] , [ "U+FF6D" , "ュ" ]]
[[ "U+FF6E" , "ョ" ] , [ "U+FF6F" , "ッ" ]]
From U+FF71 to U+FF9D
[[ "U+FF71" , "ア" ] , [ "U+FF72" , "イ" ] , [ "U+FF73" , "ウ" ] , [ "U+FF74" , "エ" ]]
[[ "U+FF75" , "オ" ] , [ "U+FF76" , "カ" ] , [ "U+FF77" , "キ" ] , [ "U+FF78" , "ク" ]]
[[ "U+FF79" , "ケ" ] , [ "U+FF7A" , "コ" ] , [ "U+FF7B" , "サ" ] , [ "U+FF7C" , "シ" ]]
[[ "U+FF7D" , "ス" ] , [ "U+FF7E" , "セ" ] , [ "U+FF7F" , "ソ" ] , [ "U+FF80" , "タ" ]]
[[ "U+FF81" , "チ" ] , [ "U+FF82" , "ツ" ] , [ "U+FF83" , "テ" ] , [ "U+FF84" , "ト" ]]
[[ "U+FF85" , "ナ" ] , [ "U+FF86" , "ニ" ] , [ "U+FF87" , "ヌ" ] , [ "U+FF88" , "ネ" ]]
[[ "U+FF89" , "ノ" ] , [ "U+FF8A" , "ハ" ] , [ "U+FF8B" , "ヒ" ] , [ "U+FF8C" , "フ" ]]
[[ "U+FF8D" , "ヘ" ] , [ "U+FF8E" , "ホ" ] , [ "U+FF8F" , "マ" ] , [ "U+FF90" , "ミ" ]]
[[ "U+FF91" , "ム" ] , [ "U+FF92" , "メ" ] , [ "U+FF93" , "モ" ] , [ "U+FF94" , "ヤ" ]]
[[ "U+FF95" , "ユ" ] , [ "U+FF96" , "ヨ" ] , [ "U+FF97" , "ラ" ] , [ "U+FF98" , "リ" ]]
[[ "U+FF99" , "ル" ] , [ "U+FF9A" , "レ" ] , [ "U+FF9B" , "ロ" ] , [ "U+FF9C" , "ワ" ]]
[[ "U+FF9D" , "ン" ]]
From U+1B000 to U+1B000
[[ "U+1B000" , "𛀀" ]]