UTF-8 translator

From TNG_Wiki
Jump to navigation Jump to search
UTF-8 translator
Summary Translates to alternative coding
Download link UTF8.zip
Download stats
Author(s) Chris Moss
Homepage mossfamilytree.info
Contact Developer contact author

The UTF-8 translator is a small utility program designed to help the creators of TNG mods prepare language translation additions to the standard language files. The .cfg file needs to be able to be able to represent both UTF-8 and ISO-8859/1 characters (such as é) yet this is not directly possible in a text-only file. The solution is to represent the second character set (one or the other) as HTML entities (e.g. é).

It uses PHP but should only be used on the command line of a home WAMP-type server or web server, not using a browser.

It works on the .cfg file and translates by default any files in the languages/ folder, providing two sets of output: the native coding of the file and a version using 'html entities' for the other version. If present, the existing non-native version is removed. By default it uses the UTF-8 version but can also use the ISO version. It looks for lines starting "%target:languages/" and processes text up to the next "%" directive, depending on whether this text needs to be translated or ignored..

Using the translator

The translator is a PHP program designed to be run from the command line of a system that has PHP installed. It takes one argument and output should be directed to a file. e.g.

php utf8.php xxx.cfg > yyy.cfg

This should work ok on a Mac or Linux server, but if you are using a PC with a WAMP-type server you will probably need to find out where the php binary is and may end up with something like:

c:\wamp\bin\php\php5.3.8\php utf8.php xxx.cfg >yyy.cfg

Another possibility is to run it on your web server, but DON'T do it through a browser (the html entities will disappear). If you don't have console access you may be able to do it via a cron job. You may need to provide full pathnames for the files involved for this one-off job.

If your file is in ISO-8859-1 (or ANSI so-called) and doesn't have a UTF-8 version, then you should add a ISO directive after the file name. ie.

php utf8.php xxx.cfg ISO > yyy.cfg

The output will still be in UTF-8 and html entities so this provides a way of converting it to UTF-8 even if you aren't using it yourself. In this case any Polish or Czech files will use 8859-2.

If you wish to keep the ISO version, you can instead use the directive ISOKEEP instead of ISO. In this case the UTF8 file will contain the html entities and the ISO file will have the ISO characters.

Do check the output after running this program on a cfg for the first time. Note that both versions of each language section are regenerated from one version (either the UTF-8 or the ISO version). Remember that any comments or other differences in the other part will be lost (including e.g. comments that apply to a following section) so you may wish to adjust these. Any non-ASCII characters (if any) in other parts of the file will be unaffected. Any copyfile or newfile sections following the language sections will be preceded by a %target:files% statement.