How to recreate the pgv news table from the news.dat backup file

From PGVWiki
Revision as of 18:48, 16 May 2010 by Momse (talk | contribs) (→‎Checking the file)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

The news.dat file from a previous backup can be used to recreate the pgv_news table. Basically it involves scrubbing unnecessary information from news.dat, and importing the new file using a CSV import into pgv_news from the phpMyAdmin console. You will only need to do this if your PGV database gets hosed and you don't have a good recent backup of the tables but do have a recent backup from the PGV Admin menu. These news and journal items are not stored in the gedcom file.

Scrubbing news.dat

The bulk of the task lies in scrubbing news.dat. The first step is to copy news.dat to a new file. Let's call the new file news_clean.dat. At this point, news_clean.dat looks a little like this:

a:27:{i:36;a:5:{s:2:"id";s:2:"36";s:8:"username";s:6:"vhhawk";s:4:"date";s:10:"1232506030";s:5:
"title";s:28:"New Arkansas twigs to expand";s:4:"text";s:682:"By chance I discovered 3 new Doe 
families living in proximity in the 1860 census in Clark County, Arkansas. I also still have 
work to do to expand the Amanda and "Jonathan" Bacon family, not yet done.";}i:35;a:5:s:2:"id";
s:2:"35";s:8:"username";s:13:"crudupfamilies.ged";s:4:"date";s:10:"1231822465";s:5:"title";s:20
:"The Illinois Bacons";s:4:"text";s:488:"I have been working recently to update the Illinois 
Bacon lines through the 1930 census records. I would surely welcome any input to help me 
straighten them out. ";}i:34;a:5:s:2:"id";s:2:"34";s:8:"username";s:6:"vhhawk";s
:4:"date";s:10:"1231132454";s:5:"title";s:16:"Nobels of Texas";s:4:"text";s:148:"Currently 
crawling the censuses and the "Texas Death Index" for these people. There's more than the 
usual number of misspellings. It's challenging.";}i:33;a:5:

You can use any text editor to work on this file. Above is the first three items taken from an actual news.dat backup file, with some minor changes to the text for the sake of privacy. You want the above data in news_clean.dat to look like this after your first edit:

"36";"vhhawk";"1232506030";"New Arkansas twigs to expand";"By chance I discovered 3 new 
Doe families living in proximity in the 1860 census in Clark County, Arkansas. I also still 
have work to do to expand the Amanda and "Jonathan" Bacon family, not yet done."
"35";"crudupfamilies.ged";"1231822465";"The Illinois Bacons";"I have been working 
recently to update the Illinois Bacon lines through the 1930 census records. I would surely 
welcome any input to help me straighten them out."
"34";"vhhawk";"1231132454";"Nobels of Texas";"Currently crawling the censuses and the 
"Texas Death Index" for these people. There's more than the usual number of misspellings. 
It's challenging."

The key thing to note is that you've reduced the data to five elements for each news/journal item. Each element is delimited by double-quotes on each side, and each element is separated by the semicolon character. The end of each row has no semicolon. The five elements are the item number, the user or gedcom, the date in Unix format, the title of the item, and the text of the item. If there is a userid in field 2, then it's a user journal item. If there is a gedcom in field 2, then it is a welcome page news item. You don't have to put these in any order: the import process and the display software are able to figure out what goes where.

Embedded special characters " and ;

Notice that the above file uses the doublequote and the semicolon as delimiters. If any of these characters are embedded within the text field they must be changed. In the above, the text fields contain "Jonathan" and "Texas Death Index". Replace those with single-quotes such as 'Jonathan' and 'Texas Death Index'. In the same way, if your text field contains semicolons (;) replace those with other punctuation.

Embedded HTML and vertical spaces

If you have any hypertext links or other embedded HTML, the original news.dat file will contain interpreted HTML code. That is, a less-than symbol < will be written as an ampersand & followed by "lt;" -- you must change these back to the symbolic characters prior to import. Any vertical blank spaces should be converted to HTML paragraph < P > code.

I attempted to show those special characters here but Wiki keeps interpreting them.

Checking the file

A good way to further check your file is to log onto your site and create a new (bogus) news or journal item, exit from your site, open a phpMyAdmin page, click on your pgv_news table, click on Export, and then choose CSV format with everything else left at default. Compare your news_clean.dat file with the just exported datafile.

Importing news_clean.dat

When you're ready to import news_clean.dat, open up a phpMyAdmin session, go to pgv_news, and click on Import / CSV with all other defaults left alone. Browse to news_clean.dat, and proceed. If there are any problems with the import, it will tell you at what line number the fault occurred. Reopen news_clean.dat, navigate to that line number, and fix the problem. In phpMyAdmin, you can choose to delete previously imported lines in the pgv_news table and restart the import. If you don't do this you will get errors about reusing an index number. When the import is finally clean, you're done.