How Remote Linking Works

From PGVWiki
Jump to navigation Jump to search

How Remote Linking Works

This page provides a description of how remote linking of GEDCOM records works under the hood. For information on how to use and setup a new remote link on your site, go to the How to Remote Link page.

GEDCOM for Remote Links

First let's look at what happens in the GEDCOM. According to the GEDCOM specification the RFN tag was designed in anticipation of this purpose. So we have taken advantage of the RFN tag instead of creating our own. Here is the definition of the RFN tag from the GEDCOM spec:

n @XREF:INDI@ INDI 
...
+1 RFN <PERMANENT_RECORD_FILE_NUMBER> 

RFN {REC_FILE_NUMBER}:= A permanent number assigned to a record that uniquely identifies it within a known file. 

PERMANENT_RECORD_FILE_NUMBER:= {Size=1:90} <REGISTERED_RESOURCE_IDENTIFIER>:<RECORD_IDENTIFIER> 
The record number that uniquely identifies this record within a registered network resource. The number will be 
usable as a cross-reference pointer. The use of the colon (:) is reserved to indicate the separation of the 
"registered resource identifier" (which precedes the colon) and the unique "record identifier" within that resource 
(which follows the colon). If the colon is used, implementations that check pointers should not expect to find a 
matching cross-reference identifier in the transmission but would find it in the indicated database within a network. 
Making resource files available to a public network is a future implementation. 

RECORD_IDENTIFIER:= {Size=1:18} An identification number assigned to each record within a specific database. The 
database to which the RECORD_IDENTIFIER pertains is indicated by the REGISTERED_RESOURCE_NUMBER which precedes the 
colon (:). If the RECORD_IDENTIFIER is not preceded by a colon, it is a reference to a record within the current 
GEDCOM transmission. 

REGISTERED_RESOURCE_IDENTIFIER:= {Size=1:25} This is an identifier assigned to a resource database that is available 
through access to a network. This is for future GEDCOM releases.

In summary, the RFN tag allows us to say that the given record is a resource found somewhere else. The value of the RFN tag has two parts, the Registered Resource ID and the Record ID that are separated by a colon (:). For example: 1 RFN S1:I1. The Record Identifier (I1) part is pretty self explanatory, but what about the Registered Resource ID (S1)? It seems that is is an ID to some sort of genealogical resource. That resource could be local or remote, but not in the same GEDCOM and what does the ID (S1) point to? We decided that the best way to store this information about another genealogical resource would be as a source (SOUR) record. So the Resource ID (S1) becomes a pointer to information about a remote resource stored in a source (SOUR) record.

Let's look at some GEDCOM now. Let's say that someone in my GEDCOM (I100) was recently married, but all of the spouse's information is already available on another website (http://remotesite/pgv) and her ID on the remote site is I555. We don't want to copy all of her information along with all of her pedigree into our GEDCOM we just want to link to it. The first thing we do is create a source record for our remote site:

0 @S100@ SOUR
1 TITL Title of the remote site
1 URL http://remotesite/pgv/genservice.php?wsdl
1 _DBID gedcom.ged

Next we create a stub for the remote person that we can use to link to our local person. (We add a 1 SOUR citation to the person as well, though it is not required, so that other GEDCOM programs will show something intelligent about the remote link, but all that is needed is the RFN line.):

0 @I1111@ INDI
1 RFN S100:I555
1 SOUR @S100@
2 PAGE I555
1 FAMS @F59@

The purpose in creating the stub record is to make these changes the most compatible with other GEDCOM programs. First the stub record ensures that we have a record for the remote ID, second it makes sure that we don't have ids like "S1:I100" in our family records.

We also need a local family record to link them up:

0 @F59@ FAM
1 HUSB @I100@
2 WIFE @I1111@

PhpGedView will recognize the RFN tag in the stub INDI record, split it up at the : and look up S100 to find the location and details of the remote genealogical resource. It then contacts the remote resource and asks for the record for I555. It will then merge the record of I555 with the stub record to create the record used to display the data on the screen.

We were able to accomplish all of this by only adding one new GEDCOM tag, the _DBID tag. This tag has two purposes. The first purpose is to tell PGV that the SOUR record contains information about a genealogical resource that we will be linking to. The second purpose is to store a database ID if the remote resource located at the given URL hosts multiple genealogical datasets. In PhpGedView these equate to GEDCOM files and you would enter the filename of the GEDCOM you are linking to.

We want to also be able to link to people in other GEDCOMs hosted on the same site. This can be accomplished by leaving out the URL field in the source record or by leaving it blank. But in this case you will have to fill in the _DBID field:

0 @S100@ SOUR
1 TITL Title of the remote site
1 _DBID gedcom.ged

SOAP

PhpGedView uses SOAP (Simple Object Access Protocol) for all of its communication with remote sites. SOAP is an XML standard meant to be a cross-platform and implementation independent way for services to communicate with each other. SOAP and its related technologies are often referred to by the term Web Services.

For information on the SOAP API in PhpGedView, please see the API documentation here: http://www.phpgedview.net/devdocs/webservice_api.php

You can also get the WSDL file for any PGV site by going to the URL: http://yoursite/pgv/genservice.php?wsdl

What Happens in PhpGedView

When PhpGedView comes across a record with an RFN it splits the RFN at the : and gets an instance of a ServiceClient object from the ID before the colon. The ServiceClient object (servicelient_class.php) handles all of the communication with the remote site and links. You can get a ServiceClient for the given ID by using the getInstance() static function.

$SClient = ServiceClient::getInstance('S1');

For handling local resources we have the LocalClient object (localclient_class.php) which extends the ServiceClient. Based on the type of resource specified by the given ID (S1) the getInstance method will either return a ServiceClient or a LocalClient which both implement the same API.

For a stub record like this:

0 @I1111@ INDI
1 RFN S100:I555
1 SOUR @S100@
2 PAGE I555
1 FAMS @F59@

PGV will retrieve the remote record for the linked individual and will merge it with the local stub. Any ID references in the remote record will be converted to SS:ID to signal that it is also a remote record. So if the remote record looks like this:

0 @I555@ INDI
1 NAME Jane /Smith/
2 GIVN Jane
2 SURN Smith
1 SEX F
1 BIRT
2 DATE 18 MAR 1930
1 FAMC @F345@

After it is merged with the local stub record it will look like this:

0 @I1111@ INDI
1 RFN S100:I555
1 SOUR @S100@
2 PAGE I555
1 FAMS @F59@
1 NAME Jane /Smith/
2 GIVN Jane
2 SURN Smith
1 SEX F
1 BIRT
2 DATE 18 MAR 1930
1 FAMC @S100:F345@
1 CHAN
2 DATE date record was download

Notice how the FAMC link changed to S100:F345. We also set the change record to the date the record was merged. This merged record is then stored in the database so that we have a cache of the data locally and don't have to go to the remote site all of the time. The only thing stored in the GEDCOM file is the stub record.

The date in the CHAN record is used to know if we should check the remote site for updates. Currently the remote site will be checked every 14 days if the record is accessed (eventually this will need to be made a configurable parameter, but for now it is hard-coded). After 14 days the web service will be contacted again to see if there were any updates and the CHAN record is again updated in the database.

So things are looking good, except now we need to do something with the FAMC @S100:F345@ link. We don't want to create stub records every time we come across a remote link and we don't want just download everything from the remote site. So we download records one at a time as they are accessed from within our local PGV. PGV is set to recognize IDs with colons like S100:F345 and download the F345 from the remote S100 site. So let's suppose that the remote F345 record looks like this:

0 @F345@ FAM
1 HUSB @I500@
1 WIFE @I501@
1 CHIL @I555@
1 CHIL @I556@

When PGV downloads it, it converts it to the following record and stores it in the database:

0 @S100:F345@ FAM
1 HUSB @S100:I500@
1 WIFE @S100:I501@
1 CHIL @I1111@
1 CHIL @S100:I556@

PhpGedView does the same thing for the other records S100:I500, S100:I501, etc converting the IDs and caching them in the database as it needs them.

For local references to GEDCOMs in the same PGV site, we don't need to cache the records locally, since they are already accessible. Instead we just look up their ID in the other GEDCOM file and load them up. We still replace the IDs after they are downloaded so that we know where they came from, but we do not cache them again in the database.