Import GEDCOM

From PGVWiki
Revision as of 21:40, 9 May 2009 by Peerx (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This article describes in detail the Import step of creating or updating a Family Tree. See the Manage Gedcoms article for a desription of all three steps. The Import step is performed always, even while creating a new Family Tree. The Import step will be invoked while using the Add Gedcom, Upload Gedcom, Upload Replacement and Import. The Import process involves several steps, and in each you may be asked some questions.

Select the File

You can either Import a file that is already on the server, or combine Upload with Import by selecting a file on your computer. If the file already exists on the server, an apropriate warning will be issued. In case of re-importing the same file, the file name will already be filled in for you. If you are using the Upload Replacement function, the replacement file should have exactly the same name as the original file.

To continue to the next step, press the "Save Configuration" button.

Validate Gedcom

In this step the software will read your gedcom, and point out to some potential problems that may require fixing. For some problems you will be given an option to accept the proposed fix. After rewiewing the listed problems you can press Continue to proceed with the fixes, Skip Cleanup (not recommended) to proceed without fixes, or Cancel to back up.

A sample screen showing the Validate step. The Upload has already been completed, and the screen shows several errors.

The problems belong generally to two categories, File Format and Gedcom:

File format fixes

Those issues usually arise because different software sources cannot agree on a common standard. They are similar to Video (Beta and VHS) or DVD (Blue Ray versus HD DVD) format wars. They include the BOM mark, convention on ending the lines, text encoding etc. Those errors should always be fixed.

BOM Cleanup

A byte-order mark (BOM) is the Unicode character at code point U+FEFF ("zero-width no-break space"). It is conventionally used as a marker to indicate that text is encoded in UTF-8, UTF-16 or UTF-32. For some obscure reasons PHP has problems with BOM, and we need to clean it up not to cause us grief later.

MAC Style Line Endings Cleanup

PGV runs better with DOS (Return/Line Feed) or UNIX (Line Feed) style line endings than with Mac (Return). Fortunately the Teletype machine did not have more symbols for ending the line, so only three options are in use. The Mac endings will be replaced with DOS endings.

Convert ANSI to UTF-8

Gedcom standard originally had its own encoding of non-ASCII characters, called ANSEL. The program checks if the Gedcom header has the (ANSI/ANSEL) marker and if so, converts the file to the currently used common standard, UTF-8.

Gedcom fixes

Those are problems with Gedcom file, possibly resulting from manual editing, non-standard usage by another Genealogy program, or data entry not following established rules. You will have a chance to verify the error based on an example, and agree (or not) to global fixing of all such errors.

Head Cleanup

The Gedcom file should begin with a record 0 HEAD and end with 0 TRLR. If there are lines before the Head record, they will be removed.

Remove Empty Lines

The Gedcom file should not have empty lines - each line should start with a number followed by a Tag. Empty lines will be removed from the file.

Cleanup Dates

Valid gedcom dates are in the form DD MMM YYYY (i.e. 01 JAN 2004). It is possible to enter dates in incorrect format, especially if the Genealogy program used does not enforce the correct practice. The dates in the form YYYY-MM-DD, DD-MM-YYYY, and MM-DD-YYYY, as well as dates using "/", "\", "-" and "." as separators will be corrected, if possible.

Cleanup Places

Some programs, most notoriously FTM, put data in the PLAC field when it should be on the same line as the event. For example:

1 SSN
2 PLAC 123-45-6789

should really be:

1 SSN 123-45-6789

The program will attempt to find such places. However, it currently also detects "false positives" like

1 OCCU 
2 DATE 1984 
2 PLAC Jamestown, James City County, Virginia, USA

and attempting to fix it will place Jamestown, James City County, Virginia, USA as Occupation. If you used FTM and know that the errors are mostly of the first type, go ahead and accept the fix; otherwise it is safer to not accept this correction. You can complete the import and then use the Check functionality to weed out such problems.

Import options

Third stage of the import process: Verify and Import options

After the Gedcom file has been succesfully (or not) validated, you will be presented with possible warnings (for example A GEDCOM with this file name has already been imported into the database.), and the options below. In some cases you will need to press Continue after answering the first questions.

Replace Data?

Do you want to erase the old data and replace it with this new data? The Import process replaces the Family tree data with those found in the imported Gedcom. Select Yes to proceed with the import.

Keep Media Links?

Select Yes to keep the original media links and not overwrite them with the data from the new Gedcom. The No option removes existing media links from the database and replaces them with Gedcom data.

This option is useful when you export your Gedcom from PhpGedView to an off-line maintenance program that does not handle embedded media pointers properly, and then subsequently re-import that changed Gedcom into PhpGedView. Under such circumstances, the media pointers within the Gedcom you exported to your off-line editing program are destroyed, and you would have to re-link all of your media files to the proper Person, Family, and Source records after you re-import the Gedcom into PhpGedView.

The Yes option tells PhpGedView to keep the existing media links so that you do not have to re-create them after you import the changed Gedcom, but this requires the off-line editing program to always produce the same Person, Family, and Source identification numbers.

Family Tree Maker is one of several off-line editing programs that does not properly handle media object pointers within the GEDCOM. Legacy, among many others, does handle these properly.

Note

In version 4.1.4 of PhpGedView and above, the "Keep Media Links" feature is only available if it applies to the GEDCOM being imported. So it only shows up if: you are importing a GEDCOM that existed before and the new GEDCOM being imported doesn't have any OBJE records in it and the database has media in it. Selecting "Keep Media Links" on a GEDCOM that has OBJE records in it will cause duplication of media items.

Time Limit

Sets the maximum time limit the program will run continuously during import. The Import process may take quite a long time, depending on the the size of your Gedcom. If the import hits the PHP time limit before completion, it will fail (you can find the limit in the PHP Info entry of the admin menu, look for variable max_execution_time).

Somewhat counterintuitively, you need to reduce the time limit for the Import to succeed. After the time limit elapsed, and the Import did not complete, the program will suspend processing and wait for your input. When you press the "Continue" button, the Import will continue, with the time period counting from scratch. In other words, if the Import would normally take 3 minutes, and the PHP time limit is 60 seconds, you can put 50 seconds as time limit and complete the Import in 4 steps.

Import Married Names

In version 4.1.4 called Calculate married names this function calculates (not imports) Married Names. Adding married names can be very usefull, especially if you are searching for a person known to you by her (his) name after marriage (activate this option with Show Married Names On Individual List). It will also display the married names alongside the maiden names.

If you choose this option, PhpGedView will look through all of the females in your GEDCOM file and automatically create a married name subrecord for them in their Gedcom record, using the husband name. This rule works only in English tradition rules; with others, like Spanish etc., it is probably better to add married names by hand.

For an example, if Jane Doe married John Smith, she will acquire a new married name Jane Smith, and a record in Gedcom:

2 _MARNM Jane /Smith/

If one Married Name already exists for the person, an additional one will be created if the Married Last Name does not match the husbands last name (see Known problems below). If you already have edited Married Names, do not use this function.

Change Individual ID

This tool was designed for users whose Genealogy programs use a different Gedcom ID for the individuals every time the Gedcom is exported. These changing IDs make it difficult to administer PhpGedView because the ID is how people are referenced in the program.

Many genealogy programs also use the RIN or REFN tag to give each person a unique identifier that can be used to reference the individual. This tool will replace all of the individual IDs in the Gedcom file with the whatever field (RIN or REFN) you specify.

Complete the Import

After pressing Continue yet again, the Import process will begin. The program will read the Gedcom file, populate the database with data, calculate married names if requested, and in the end present you with a statistic of the work: number of top level records found and time spent processin each of them. If no errors occured, your Family Tree is ready.

The process usually completes succesfully, even if the Gedcom file does not conform to strict standard of Gedcom 5.5.1 - in fact you can import Gedcoms from virtually any Genealogy program. The most often encountered problems are with time and memory, limitations of the webserver and PHP installation. You can adjust the time in the parameter Time Limit and configure the memory - see Configure memory usage for php.

Known Problems/Bugs

This section needs to be reviewed / corrected by knowledgeable developers.

Version 4.1

  • Import Married names will add a new Married Name if the existing Married Name does not match exactly the husband name. Use with caution if you already have Married names and have edited them.