Sunday, April 16, 2006

Chinese Debian Mini Howto

http://isis.poly.edu/~qiming/chinese-debian-mini-howto.html

0. Introduction
The purpose of this mini howto is to help users of Debian GNU/Linux to build a minimum Chinese environment, so that they can read and input Chinese in their systems.

To support Chinese language display and input under Debian GNU/Linux, you will need to do the following basic steps.
*Generate relevant locales
*Install Chinese fonts
*Install an input method (IM) engine
*Set locale
*Adjust application settings (if necessary)

In the rest of this text, I will explain these steps one by one. Most of the commands below need to be run in a terminal with root privilege.

1. Generating Locales
Run dpkg-reconfigure locales, and choose the following items.
*en_US ISO-8859-1
*zh_CN GB2312
*zh_CN.GBK GBK
*zh_CN.UTF-8 UTF-8
*zh_TW BIG5
*zh_TW.UTF-8 UTF-8
Some of these are optional. For example, if you are using Simplified Chinese only, you would not need the last two items. After this you will be prompted for the default locale you want to use.

NOTE: Sometimes you need to reboot to get the new locales working. To avoid potential problems, it is strongly recommended that you get this correct from the beginning when you install your system.

2. Installing Chinese Fonts
Install at least the following free fonts. Each entry below is of the form package_name (font_name).
*ttf-arphic-gbsn00lp (AR PL SungtiL GB)
*ttf-arphic-gkai00mp (AR PL KaitiM GB)
*ttf-arphic-bsmi00lp (AR PL Mingti2L Big5)
*ttf-arphic-bkai00mp (AR PL KaitiM Big5)
The first two are for Simplified Chinese, and the other two for Traditional Chinese.

These packages can be installed by running
apt-get install ttf-arphic-bkai00mp ttf-arphic-bsmi00lp ttf-arphic-gbsn00lp ttf-arphic-gbsn00lp

There are other fonts available. You can find them by searching for "xfonts" using dselect.

3. Installing Input Method (IM) Engine(s)
You will need an IM engine to input Chinese characters under X. There are a few IMs around, including xcin, chinput, scim, etc. Personally I found scim a good tool.

There are several packages related to scim. The easiest way to install it is by running
apt-get install scim scim-chinese scim-tables-zh
The package names may change in the future (as they did in the past). In that case, you can search for packages beginning with "scim" using dselect, examine their descriptions, and choose the input methods you need. After that dselect will do the rest of the job by selecting all dependent packages.

After that, create a new file /etc/X11/Xsession.d/95xinput with the following content.

/usr/bin/scim -d
XMODIFIERS="@im=SCIM"
export XMODIFIERS

This script will be run every time X windows starts. In case you want to be more flexible, you can put something more complicated in the file. For example,

case "$LANG" in
zh_TW*)
/usr/bin/scim -d
XMODIFIERS="@im=SCIM"
;;
zh_CN*)
/usr/bin/scim -d
XMODIFIERS="@im=SCIM"
;;
esac
export XMODIFIERS

This takes effect only when you restart X. The simplest way to do that is to press "Ctrl-Alt-BackSpace".

To use scim, simply press "Ctrl-Space", and a small window will appear at the lower right corner of the desktop.

It is advisable that you configure scim (right click on its icon on the panel, then configure) and remove all unwanted input methods. You will need to restart scim (probably X, too) to make this to take effect.

4. Setting Locale
It is highly recommended that you use gdm or kdm as your X display manager, because then you will be able to select your language settings at the login window, which can be different from system default, and can be different for different login.

If you are using a X display manager that does not support this, you will have to put an additional line such as
export LANG=zh_CN.gb2312
in /etc/X11/Xsession.d/95xinput.
NOTE: It would not work if you run this after you login. You will need to restart X for this to work.

REMARK: One "side effect" of this is that once you set the language to be Chinese, then all the menu become Chinese. For those who want to keep the English menu but still want to view/input Chinese, you can set locale to be zh_CN.gb2312, but change the settings for some environment variables. For example, I have the following lines in the above file.

ENCODING="en_US"
#export LC_ALL=$ENCODING
export LC_MESSAGES=$ENCODING
#export LC_COLLATE=$ENCODING
#export LC_CTYPE=$ENCODING
export LC_TIME=$ENCODING
export LC_NUMERIC=$ENCODING
#export LC_MONETARY=$ENCODING
#export LC_PAPER=$ENCODING
#export LC_NAME=$ENCODING
export LC_ADDRESS=$ENCODING
export LC_TELEPHONE=$ENCODING
export LC_MEASUREMENT=$ENCODING
export LC_IDENTIFICATION=$ENCODING

Then I got English display of menu, time and date, etc. You should comment/uncomment these items according to your needs.

Note that "scim" works fine no matter what locale you choose.

5. Application Settings
5.1 Web Browsers
For applications such as Mozilla (1.7) and/or other browsers, usually you do not have to change much. If a Chinese webpage does not display correctly, try to check if the character encoding is correct, and make sure that you have installed the corresponding fonts.

For Mozilla version 1.6, there are some locale packages such as mozilla-locale-zh-cn or mozilla-locale-zh-tw.

For Mozilla-Firefox, you might need to install one of the mozilla-firefox-locale-zh-cn or mozilla-firefox-locale-zh-tw packages.

5.2 Editors
My favorite text editor is VIM with GTK support, or simply gvim. For gvim to display Chinese characters correctly, just add the following lines to $HOME/.gvimrc.

set enc=euc-cn
set tenc=euc-cn
set fileencoding=euc-cn
set guifont=AR\ PL\ KaitiM\ GB\ 12

The last line specifies the font and the font size to use. You can change it to any of the four fonts as in Section 2 above, and adjust the font size until you feel comfortable.

Note that even when you set LC_MESSAGE to en_US, gvim might still display Chinese menu if your locale is set to Chinese. In this case, you need a bit trick here.

Firstly, you need to create a file with the content something like the following.

#!/bin/sh
# Start application $1 with English environment

if [ -z "$1" ]; then
echo "Usage: $0 app arg1 arg2 ... "
exit 1
fi

export LANG=en_US

PROG=$1
shift
exec $PROG $*

Let us call this file enstart.sh, and put the file in a directory that is in your $PATH, e.g., /usr/local/bin. Make sure that it is executable by running chmod +x /usr/local/bin/chstart.sh in a terminal.

This small shell script will set $LANG so that the application it runs would think that it is running in a full English system.

To run gvim, we run enstart.sh gvim instead. You can add this to your desktop/panel shortcut, or make it an alias.

5.3 Terminal Emulators
RXVT (www.rxvt.org) is a nice terminal-emulator that intends to replace xterm.
It has a variant rxvt-ml which supports display of Chinese and Japanese characters.

The gnome-terminal, the default terminal emulator in the GNOME environment, supports Chinese by selecting the character encoding from the terminal menu.

The default terminal emulator in KDE, Konsole, works fine with Chinese automatically if the locale is set correctly. You can also select/change character encoding from the menu.

5.4 Display of Chinese File Names in FAT Partitions
You will need kernel support for this. To be able to mount a FAT (either 16 or 32 bit), you will need the following module.

fat
vfat

To display Chinese characters properly, you will need at least one of the following modules:

nls_cp936 (for simplified Chinese)
nls_cp950 (for traditional Chinese)
nls_utf8 (for Unicode characters)

Then in the file /etc/fstab, add another line like the following
/dev/hda5 /mnt/dos vfat noauto,user,codepage=936,iocharset=cp936 0 0
for simplified Chinese, and replace the number 936 to 950 for traditional Chinese, and to utf8 for Unicode characters.

NOTE: you should change the partition (/dev/hda5) and mount point (/mnt/dos) to the partition you want to mount, and the directory you want it to be mounted to, respectively.

5.5 XMMS
Unfortunately, unlike other programs, my XMMS does not display Chinese file/song names correctly, even when I set both the system locale and the language option of gdm to be zh_CN.gb2312, which is quite weird. My guess is that this has to do with the locale I set when I first install the system.

Fortunately, similar trick as in the case of gvim can be applied here, but this time we want to run with Chinese support. Let us create the following file and call it chstart.sh

#!/bin/sh
# Start application $1 with Chinese environment

if [ -z "$1" ]; then
echo "Usage: $0 app arg1 arg2 ... "
exit 1
fi

export LANG=zh_CN.gb2312

PROG=$1
shift
exec $PROG $*

To run XMMS, just type chstart.sh xmms in a terminal, or create a shortcut on desktop/panel to do it, or use an alias.

5.6 Instant Messengers
It was actually amazing to find out that popular instant messengers such as AMSN and Yahoo Messenger do not support Chinese. It is possible to input Chinese in YM, but difficult to make it displayed properly. AMSN simply stops taking any keyboard input when the locale is set to Chinese.

A work-around for AMSN is to use the enstart.sh to start it instead (but of course you cannot expect to input Chinese with it).

Fortunately, there are instant messengers that do support Chinese. Skype is an example, but only works when gdm/kdm has the locale set to zh_CN.gb2312 otherwise it will not work even if you start it with the chstart.h script above.

Pidgin (then GAIM) is another cute IM that was reported to work with Chinese. It is compatible with AIM, Yahoo! Messenger, MSN Messenger, Google Talk (Jabber), etc. It now even supports QQ, the most popular ICQ-like messenger in China.

5.7 Java

(This section is contributed by Guohan Lu, lguohan at gmail.com)

One of things trouble me a lot is the Java on Debian do not support Chinese natively. I got the solution to Java1.5 recently (tested for GB2312).

Here is the way (two steps):
1. make a directory fallback under JavaHome/lib/fonts/
2. add a Chinese TrueType or Type 1 font under fallback

An example:

mkdir /usr/lib/j2sdk1.5-sun/jre/lib/fonts/fallback

ln -s /usr/share/fonts/truetype/arphic/gbsn00lp.ttf .


Note the trailing dot at the end of the above line.

Some explanations for the fallback:

"... If the runtime environment has a directory lib/fonts/fallback and this directory contains valid TrueType or Type 1 fonts, the runtime automatically adds these fonts as fallback fonts for 2D rendering..."
--- from http://java.sun.com/j2se/1.5.0/docs/guide/intl/fontconfig.html

Note: This solution should be considered as quick and dirty, the best solution would be change the fontconfig file for Java. But, I haven't figured out that yet.