Database import/export/conversion utility

From PGVWiki
Revision as of 03:22, 8 June 2010 by GhostInTheMachine (talk | contribs) (→‎Upgrading)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The capability to backup the PhpGedView database tables is an important feature for security of your data The utility described here is still in development (as of January 2009) and is not yet part of an official release. If you are cautious, however, you can protect your data using this tool, and also improve handling of non-latin character searches and sorting by converting database to UtF-8.

Overview

The utility has 3 simple functions:

  • Export (individual database tables, or all of them). The table data are stored as SQL files in the index directory, and can be used to restore a corrupt database (but with table structure intact). During export, a conversion is made to arrive at readable, UTF-8 encoded text.
  • Import. The saved files are used to re-populate the database tables.
  • Conversion. The actual database (works for MySQL only ) tables are converted from default (8 bit latin) to UTF-8 encoding.

Where to get it?

The utility db_export.php is currently in SourceForge tracker. After retrieving it from the tracker, copy it to main (root) directory of phpGedView.

How to use it?

The instructions below are from the tracker, written by Greg Roach, with minor adjustments based on email exchange.

This module allows you to export and import the contents of your database to text files containing SQL scripts. This is not the most compact or efficient format, but it simple, reliable and portable. If you want efficient, there are much more appropriate tools available.

A separate file is generated for each table, and each can be imported/exported separately. Although an "import all" option is provided, you may find that your server time limits prevent this from completing.

Files are created in the index directory, so this can be useful for creating backups of your system. They can also be useful for transferring data between different databases, for example, between sqlite on your home PC and MySQL on your webserver.

For MYSQL users only, there is also an option to convert your database tables from latin (single byte) encoding to UTF8 (multibyte) encoding. This change is necessary to allow the database to perform case-insensitive searching of letters containing diacritics and non-latin character sets.

By default, PHP communicates with MySQL in latin text. When PGV sends UTF8 text to the database, MySQL treats multibyte characters as a number of "extended ascii" characters, and converts these characters to UTF. This incorrect encoding (and possible double-encoding) is what prevents the database from searching data. Luckily, the reverse encodings take place when we fetch the data back, so apart from the wasted conversion effort, this process is transparent to PGV.

In order to use UTF8, we need to tell MySQL that we are using UTF8. However, to do this on an existing database means that we will fetch old/existing data using different conversions to those used to store it - and our data will be corrupted.

We therefore need to convert our data before changing the way we talk to MySQL. The steps to be taken are:

  1. Export all the tables
  2. Convert all the tables to utf8
  3. Update your configuration files as follows:
    • For version 4.2 (anything above 4.1.6) edit config.php to set $DB_UTF8_COLLATION=true;
    • For versions 4.1.6 and below, edit file includes/functions.php and uncomment the line(s) dbquery("SET NAMES 'UTF8'");
  4. import all the tables

Notes

NOTE: If your login name contains non-ascii characters, then you *may* get logged out between steps 3 and 4. To prevent this, create an admin account with an all-ascii username before starting.

  1. If you have the default MySQL configuration (latin tables and latin communication), then steps 1 and 4 are optional. However, we can't reliably detect this configuration, so unless you are certain, don't skip them. Also, running step 2 repeatedly may corrupt data, meaning you do need to run steps 1 and 4. In fact, unless you have a very good reason (e.g. you know exactly what you are doing and have large tables on a server with very low execution time limits), then don't skip these steps.
  2. If you have low server time limits, then it may not be possible to import some of your large tables before it times out. These tables are likely to be your gedcom-data tables. They can easily be reconstructed by reimporting the gedcom.
  3. This procedure can destroy your data and prevent you from logging in. Do not do it unless you understand what you are doing. Even if you know what you are doing, you should make a backup before you start.

Upgrading

If you upgrade phpGedView to new version after making the conversion to UTF-8, make sure that you make the changes to the configuration files as described above before you start using phpGedView, otherwise the data may get corrupted.