Difference between revisions of "Changing to UTF-8"

From TNG_Wiki
Jump to navigation Jump to search
(46 intermediate revisions by 8 users not shown)
Line 1: Line 1:
<font color=gray><div align="center" style="text-align: center;font-size: 200%; font-weight: bold; margin: 10px auto;">
+
{{Languages|Changing to UTF-8}}
'' Changing your TNG site to UTF-8''
+
{{TNGmod
</div></font>
+
| mod_name        = Change coding
 
+
| mod_summary    = Allows a simple change of character set and collation sequence for the TNG database
{|align=right
+
| mod_validation  =  
|__TOC__
+
| mod_last_update = 23 Mar 2018
|}
+
| download_link  = [http://mossfamilytree.info/download.php?mod=changecoding&version=11.1.0.2 11.1.0.2]
 +
| download_stats  = [http://mossfamilytree.info/downloadstats.php?mod=changecoding show]
 +
| mod_author      = [[user:Chris Moss|Chris Moss]]
 +
| mod_url        =
 +
| mod_support    = [http://mossfamilytree.info/suggest.php contact author]
 +
| mod_contact    = [http://mossfamilytree.info/suggest.php contact author]
 +
| mod_version    = 11.1.0.2
 +
| min_TNG_ver    =
 +
| max_TNG_ver    = 12.1.2
 +
| TNG_file_list  =  
 +
| related_mods    =
 +
| notes          = only available with English instructions
 +
}}
 +
Increasingly UTF-8 is being used on the web as it handles all character sets in use. But often a TNG site is uploaded from a local database which uses Windows 1252 (ANSI) or ISO-8859-1 which only handle some Western European languages. Converting the TNG site involves changing both the database and a number of settings within TNG.
  
 
Notes on changing a TNG Site and database to UTF-8 provided by [[User:TheKiwi]]
 
Notes on changing a TNG Site and database to UTF-8 provided by [[User:TheKiwi]]
  
 +
== Create a copy or backup of your database ==
 +
You can use phpMyAdmin to make a copy of your database (always make a copy in case something goes wrong!!!!!!!)
 +
# In the databases list on the left click on the Database to select it.
 +
#  On the right side click on the Operations Tab. In the section called "Copy Database to" enter a name for your new database, and make sure that "Structure and Database" and "CREATE DATABASE before copying" are both checked, then click "Go".
  
== Create a copy of your database ==
+
When done this will have created a copy of your database.  
1 - Use phpMyAdmin to make a copy of your database (always make a copy in case something goes wrong!!!!!!!)
 
 
 
1.1 - In the databases list on the left click on the Database to select it.
 
 
 
1.2 - On the right side click on the Operations Tab. In the section called "Copy Database to" enter a name for your new database, and make sure that "Structure and Database" and "CREATE DATABASE before copying" are both checked, then click "Go".
 
  
When done this will have created a copy of your database.
+
Alternatively you can back up your database using any of the methods described in [[Database - Backup]].
  
 
== Change database to UTF-8 ==
 
== Change database to UTF-8 ==
2 - You need to run a script to change the structure of your database to UTF-8
 
 
2.1 - download the script from
 
 
http://www.phoca.cz/documents/38-tools/154-how-to-change-collation-in-database
 
 
and put it into your site's folder, then load the page to your site.
 
 
http://URLToYourSite/tool_phoca_changing_collation/
 
 
(NOTE - you need to replace "URLToYourSite" above with the actual URL to your site.)
 
 
2.2 - Fill out the 5 boxes with the values needed for your site including choosing a collation of the form
 
 
utf8_xxxxxx_ci
 
 
xxxxxx is whatever you choose to use as your utf8 collation - eg swedish, general, unicode etc. The collation affects how characters are sorted, eg does ø sort with o or come at the end of the alphabet? Does å sort with a or come at the end of the alphabet? Does ß sort with s or with ss? etc. This page
 
 
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
 
 
has some information about how some characters are handled in different collations. Choose the collation based on the principal language that your site is about.
 
 
Once you've decided on what collation you want to use and entered it into the 5th box, click the Submit button. This will convert the database, tables and columns to the collation you've chosen. The progress will be shown as it goes along. The last tables altered are tng_xnotes, so if you don't see these as the last items listed in the output the script didn't complete.
 
 
2.3 - when this has completed (on a large database it can take some time) a message appears at the bottom of the page with a link back to the Home Page. Until this link appears the script hasn't completed.
 
 
2.4 - When the script has completed, you should delete the folder "tool_phoca_changing_collation" from your website.
 
 
== TNG 6 Mod to globallib.php ==
 
3 - edit the tng_db_connect function in  globallib.php to get the queries performed in UTF-8. Otherwise even though the data is now stored in the database as UTF-8 it's not being retrieved properly by the queries
 
 
NOTE THIS STEP ISN'T REQUIRED FOR TNG 7
 
 
3.1 - Change line 52 of globallib.php to be
 
<pre>
 
global $textpart, $session_charset;
 
</pre>
 
 
3.2 - Add these two lines
 
 
<pre>
 
if ($session_charset == 'UTF-8')<br>
 
@mysql_query("SET NAMES 'utf8'");
 
</pre>
 
 
 
after line 53 so that the whole function becomes
 
 
 
<pre>
 
function tng_db_connect($dbhost,$dbname,$dbusername,$dbpassword) {<br>
 
global $textpart, $session_charset;<br>
 
$link = @mysql_connect($dbhost, $dbusername, $dbpassword);<br>
 
if ($session_charset == 'UTF-8')<br>
 
@mysql_query("SET NAMES 'utf8'");<br>
 
if( $link && mysql_select_db($dbname))<br>
 
return $link;<br>
 
else if( $textpart != "setup" && $textpart != "index" ) {<br>
 
echo "Error: TNG is not communicating with your database. Please check your database settings and try again.";<br>
 
exit;<br>
 
}<br>
 
return( FALSE );<br>
 
}
 
</pre>
 
 
 
== Change TNG settings ==
 
4 Change the settings that TNG is using for Character Set.
 
 
4.1 - In the TNG Admin ------> Setup ------> General Settings ------> Language change the Character Set to "UTF-8" (without the "" marks) and save the changes.
 
 
4.2 - Do the same in TNG Admin ------> Languages ------> each language you support to change them to UTF-8
 
 
 
== Change text files to UTF-8 ==
 
5 - Change the encoding in the text files for each language that you support.
 
  
=== on Macintosh system ===
+
You need to run a script to change the structure of your database to UTF-8. The previously recommended script has some problems so a new mod has been created which is much simpler to use and works in a flexible way.
  
5.1 - Here's what works on Macintosh. You need [[TextWrangler]] - a free text editor. If you don't already have TextWrangler, or its big brother BBEdit, you can download the free TextWrangler from
+
Download the mod in the normal way, unzip, place it in your mods directory and click the install button.
  
http://www.barebones.com/products/textwrangler/
+
[[File:change_to_utf8.png|thumb|left|380px|After running the change coding mod]]
 +
When it is installed, click on the green "Installed" line and then hit the "Change database" button below. The conversion will take a number of seconds and when it is complete a summary will appear similat to that shown at the left. Note in this case that the default language is English but the language actually in use when the conversion was done is German. The "utf8_swedish_ci" collation is suitable for most languages (including German), but if you want a different one, then you can edit the options of the mod and put another collation, as long as it's consistent with the character set chosen.
  
5.1.1 - For each language folder open the files admintext.php, alltext.php, text.php and cust_text.php in TextWrangler
+
If for any reason you want to change back to one of the other two supported character sets, ISO-8859-1 and ISO-8859-2, then choose the Edit options button on the mod and put one of these in the first box. In this case you will need to change the collation to latin1_swedish_ci or similar (there is no 'general' for the latin languages, the MySQL variant of ISO/8859).
  
5.1.2 - For each file after it is open, go to the File menu and choose "Reopen using Encoding ------> Western (Windows Latin 1).
+
The mod alters all the appropriate settings for the different language folders, including cookies and session variables, so you shouldn't need to do anything else. Once you have changed the database it's ok to uninstall the mod as it shouldn't be needed again. But's ok to run the mod again if, for instance, an incompatible table has been installed into your database or the settings have been disturbed by an update.
  
5.1.3 - At the bottom of the TextWrangler window is a pop up menu should now say "Western (Windows Latin 1)". Click on this and choose "Unicode™ (UTF-8, no BOM)"
+
But '''don't''' try and import an ISO Gedcom file into a UTF-8 database (or vice versa). There will be no checking or conversion and you will end up with a mess if it contains accented characters.
  
5.1.4 - Save the file.
+
===The Places table===
 +
One particular issue that occurs in some cases is the table "tng_places" in which the main entry "place" is required to be unique. An        earlier version of this mod used utf8_general_ci which didn't exactly correspond to latin1_swedish_ci in that characters with accents such as ä are treated as identical to a and it may therefore happen that the conversion of that table fails.
  
5.1.5 - repeat for the other files for each of your languages.
+
In the case of a truly multinational database it may be best to convert the whole database first to utf8_bin and then attempt a translation to utf8_general_ci, which will work on most tables but fail on tng_places (and possibly tng_countries). Having the tng_places table as utf8_bin will ensure that there are no clashes caused by the translation. Leaving the whole database as utf8_bin will make many other operations such as searches fail. Changing the collation only is generally quicker as no data is changed.
  
That's it. Your site should now be running in UTF-8.
+
== Make sure that you have all the necessary UTF-8 Encoded language files ==
 +
In the languages directory, there are pairs of folders for each language, e.g. French and French-UTF8, which have all the strings used in the two different character sets. If you have deleted the -UTF8 directories to save space, then these should be restored before running the change script above. Most mods will also include changes to both of these sets, normally changing the cust_text.php file.
  
=== on Windows system ===
+
However if you made changes to cust_text.php file yourself then you need to make sure these changes are also in the corresponding -UTF8 version of cust_text.php. The steps below will help you do that on Macintosh and Windows computers
  
5.2 - need instructions here on how to do this on Windows. It needs to use a text editor that can handle saving files in UTF-8 format WITHOUT writing the BOM (Byte Order Mark) to the file, which causes odd characters <font style="font-size:200%"></font> to appear on TNG pages, and your Admin Menu may display a blank page because there are characters before the <?php which should be first in the file.
+
=== On Macintosh system ===
  
5.2.1 - Until someone provides better instructions, you can convert the ANSI $text variable files in your language folder using Notepad.  When doing a Save As change the Encoding to UTF-8. You may want to save the file as xxx_UTF8.php so you can then rename the original files to xxx_ANSI.php as a backup before renaming the new files from XXX_UTF8.php to xxx.php.
+
Here's what works on Macintosh. You need BBEdit - a free text editor. If you don't already have BBEdit, you can download it from [https://www.barebones.com/products/bbedit/download.html https://www.barebones.com/products/bbedit/download.html]
  
5.2.2 - You will then need to open the files with an ASCII editor that sees the BOM '''''' before the <?php so you can remove the BOM.  PHP Designer is such an editor but is no longer free. [http://www.brothersoft.com/php-editor-36654.html PHP Editor] is another editor that will allow removing the BOM.
+
# For each language folder open the file cust_text.php in BBEdit
 +
# After the file is open, go to the File menu and choose "Reopen using Encoding ------> Western (Windows Latin 1).
 +
# At the bottom of the BBEdit window is a pop up menu should now say "Western (Windows Latin 1)". Click on this and choose "Unicode (UTF-8)".
 +
# Save the file and place in the appropriate directory.
  
5.2.3 - Notepad++ version 5.1.2 will allow converting the $text variable files to UTF-8 without a BOM.
+
=== On Windows system ===
  
=== Download UTF-8 versions ===
+
You need to use a text editor that can handle saving files in UTF-8 format WITHOUT writing the BOM (Byte Order Mark) to the file such as Notepad++ ([http://notepad-plus-plus.org/download/ Current Version]).
  
5.3 - Or alternatively have a set of files for each language available for download.
+
Load the cust_text.php file and save it specifying UTF-8 encoding without BOM, and place it in the corresponding folder (e.g. French-UTF8).
  
 
== Related links ==
 
== Related links ==
Line 134: Line 76:
 
*[[Setup - Language#Cust_Text|Cust Text in Setup - Language]] provides examples of overriding $text in both the text.php and admintext.php files
 
*[[Setup - Language#Cust_Text|Cust Text in Setup - Language]] provides examples of overriding $text in both the text.php and admintext.php files
 
* [[TNG charset]]
 
* [[TNG charset]]
 
+
*[[Database_Collation_-_Explain_Choosing|Selecting your TNG Database Collation]]
 +
*[[UTF-8 translator]] - a utility for replicating the ISO or UTF parts of a mod using html entities
  
 
[[Category:Charset]]
 
[[Category:Charset]]
[[Category:TNGprogrammerguide]]
+
[[Category:Programmer]]
 +
[[Category: Mods for TNG v12]]
 +
[[Category: Mods for TNG v11]]
 +
[[Category: Mods for TNG v10]]

Revision as of 09:08, 8 August 2018

Change coding
Summary Allows a simple change of character set and collation sequence for the TNG database
Validation
Mod Updated 23 Mar 2018
Download link 11.1.0.2
Download stats show
Author(s) Chris Moss
Homepage
Mod Support contact author
Contact Developer contact author
Latest Mod 11.1.0.2
Min TNG V
Max TNG V 12.1.2
Files modified
Related Mods
Notes
only available with English instructions


Increasingly UTF-8 is being used on the web as it handles all character sets in use. But often a TNG site is uploaded from a local database which uses Windows 1252 (ANSI) or ISO-8859-1 which only handle some Western European languages. Converting the TNG site involves changing both the database and a number of settings within TNG.

Notes on changing a TNG Site and database to UTF-8 provided by User:TheKiwi

Create a copy or backup of your database

You can use phpMyAdmin to make a copy of your database (always make a copy in case something goes wrong!!!!!!!)

  1. In the databases list on the left click on the Database to select it.
  2. On the right side click on the Operations Tab. In the section called "Copy Database to" enter a name for your new database, and make sure that "Structure and Database" and "CREATE DATABASE before copying" are both checked, then click "Go".

When done this will have created a copy of your database.

Alternatively you can back up your database using any of the methods described in Database - Backup.

Change database to UTF-8

You need to run a script to change the structure of your database to UTF-8. The previously recommended script has some problems so a new mod has been created which is much simpler to use and works in a flexible way.

Download the mod in the normal way, unzip, place it in your mods directory and click the install button.

After running the change coding mod

When it is installed, click on the green "Installed" line and then hit the "Change database" button below. The conversion will take a number of seconds and when it is complete a summary will appear similat to that shown at the left. Note in this case that the default language is English but the language actually in use when the conversion was done is German. The "utf8_swedish_ci" collation is suitable for most languages (including German), but if you want a different one, then you can edit the options of the mod and put another collation, as long as it's consistent with the character set chosen.

If for any reason you want to change back to one of the other two supported character sets, ISO-8859-1 and ISO-8859-2, then choose the Edit options button on the mod and put one of these in the first box. In this case you will need to change the collation to latin1_swedish_ci or similar (there is no 'general' for the latin languages, the MySQL variant of ISO/8859).

The mod alters all the appropriate settings for the different language folders, including cookies and session variables, so you shouldn't need to do anything else. Once you have changed the database it's ok to uninstall the mod as it shouldn't be needed again. But's ok to run the mod again if, for instance, an incompatible table has been installed into your database or the settings have been disturbed by an update.

But don't try and import an ISO Gedcom file into a UTF-8 database (or vice versa). There will be no checking or conversion and you will end up with a mess if it contains accented characters.

The Places table

One particular issue that occurs in some cases is the table "tng_places" in which the main entry "place" is required to be unique. An earlier version of this mod used utf8_general_ci which didn't exactly correspond to latin1_swedish_ci in that characters with accents such as ä are treated as identical to a and it may therefore happen that the conversion of that table fails.

In the case of a truly multinational database it may be best to convert the whole database first to utf8_bin and then attempt a translation to utf8_general_ci, which will work on most tables but fail on tng_places (and possibly tng_countries). Having the tng_places table as utf8_bin will ensure that there are no clashes caused by the translation. Leaving the whole database as utf8_bin will make many other operations such as searches fail. Changing the collation only is generally quicker as no data is changed.

Make sure that you have all the necessary UTF-8 Encoded language files

In the languages directory, there are pairs of folders for each language, e.g. French and French-UTF8, which have all the strings used in the two different character sets. If you have deleted the -UTF8 directories to save space, then these should be restored before running the change script above. Most mods will also include changes to both of these sets, normally changing the cust_text.php file.

However if you made changes to cust_text.php file yourself then you need to make sure these changes are also in the corresponding -UTF8 version of cust_text.php. The steps below will help you do that on Macintosh and Windows computers

On Macintosh system

Here's what works on Macintosh. You need BBEdit - a free text editor. If you don't already have BBEdit, you can download it from https://www.barebones.com/products/bbedit/download.html

  1. For each language folder open the file cust_text.php in BBEdit
  2. After the file is open, go to the File menu and choose "Reopen using Encoding ------> Western (Windows Latin 1).
  3. At the bottom of the BBEdit window is a pop up menu should now say "Western (Windows Latin 1)". Click on this and choose "Unicode (UTF-8)".
  4. Save the file and place in the appropriate directory.

On Windows system

You need to use a text editor that can handle saving files in UTF-8 format WITHOUT writing the BOM (Byte Order Mark) to the file such as Notepad++ (Current Version).

Load the cust_text.php file and save it specifying UTF-8 encoding without BOM, and place it in the corresponding folder (e.g. French-UTF8).

Related links