Hacking Sanzang¶ ↑
Supported platforms¶ ↑
These programs should work on all platforms with Ruby 1.9 or later. Regular testing takes place on GNU/Linux operating systems. Sanzang also runs on JRuby, although the runtime performance is usually slower than MRI, and text encoding support is limited to UTF-8. Sanzang has not been tested on other Ruby implementations such as Rubinius.
Languages and scope¶ ↑
The translation program may not be very useful for all languages. It was designed specifically for dealing with the more difficult aspects of translating from ancient Chinese. For a language like Sanskrit or Tibetan, it may not be so useful.
Multiprocessing and fork(2)¶ ↑
Most every Unix-like system supports the fork(2) system call and therefore has the potential to run Sanzang in a multiprocessing mode for batches. However, each Ruby implementation and port may be different, and Sanzang will check to see if the fork method has been implemented before attempting to utilize multiprocessing. To avoid potential errors, the translator will not attempt to use multiprocessing on platforms in which fork(2) is not implemented.
Note that for Windows, the Cygwin port of Ruby does implement does support fork(2) perfectly, so Ruby 1.9+ on Cygwin can utilize the fork method, so it supports standard multiprocessing. Therefore, Cygwin is the most robust environment for running Sanzang on a computer with Windows.
For JRuby, Sanzang does not use fork(2) and multiprocessing, but rather uses Java threads to achieve high performance for batch processing.
Text encoding quirks¶ ↑
Converters for several encodings have not yet been implemented by MRI. Most of these are obscure and not widely used. Perhaps the most notable is EUC-TW, which is an old Unix encoding for traditional Chinese. Text encodings that cannot be converted to and from UTF-8 are not currently supported.
Reserved characters¶ ↑
Sanzang internally uses an ASCII control character as a temporary marker. The following character will be removed from any translated text:
-
0x1F – “US” – “Unit separator”