Difference between revisions of "Gedcom Import Purge"

From TNG_Wiki
Jump to: navigation, search
(Sites using this mod)
(Added a description of the TNGv12 Citation Medialinks problem to the Background section)
Line 40: Line 40:
 
* Gedcom does not support Medialinks for places or cemeteries, and any such links in the database must have been created manually, using TNG data entry screens.
 
* Gedcom does not support Medialinks for places or cemeteries, and any such links in the database must have been created manually, using TNG data entry screens.
  
Also, the "pristine" Gedcom Import process purges Places that ''do not have'' geocodes or descriptive notes, presumably because it assumes that Places without such data will be replaced (as needed) by the data in the Gedcom file. However, it does not pay attention to the Placelevel field, or to Medialinks.
+
Note that
 +
# Media, Source, People, and Family records all have Gedcom IDs that should be (and almost always are) consistent from one Gedcom import to the next,
 +
# Medialink records form a link between a Media record and another object - specifically, a Source, Person, Family, Event, Place, or Cemetery,
 +
# In Medialink records, (1) the mediaID field, (2) the awkwardly-named personID field (which might be a sourceID, personID, familyID, CemeteryID, or even a Placename), and (3) the eventID form a unique key, assuring that existing Source, People, and Family Medialinks to a given Media item are overwitten during a new Gedcom Import.
  
As a result, the pristine Gedcom Import process
+
Thus<br>
# Retains Media links that have been deleted from the source database that produced the Gedcom, forcing the site Admin to delete those Medialinks manually (assuming that the Admin realizes that they exist).
+
'''Problem 1''' - The Gedcom Import process retains Media links that have been deleted from the source database that produced the Gedcom, forcing the site Admin to delete those Medialinks manually (assuming that the Admin realizes that they exist).
# Deletes Places that have Placelevels or Medialinks, but no geocodes or notes, thus deleting data that cannot be replaced by data in the Gedcom file.
+
 
 +
(Custom Events do not retain an eventID from one Gedcom Import to the next, so, for the moment, I cannot explain why old Custom Event medialinks do not survive a new Gedcom Import, but they do not. This result is surely related to the fact the that, in Medialinks to Events, the primary link relationship is to the personID or familyID (in the Medialinks.personID field), and eventID's, in effect, just modify those Medialinks.)
 +
 
 +
Also, the "pristine" Gedcom Import process purges Places that ''do not have'' geocodes or descriptive notes, presumably because it assumes that Places without such data will be replaced (as needed) by the data in the Gedcom file. However, it does not pay attention to the Placelevel field, or to Medialinks. As a result, <br>
 +
'''Problem 2''' - The Gedcom Import process deletes Places that have Placelevels or Medialinks (but no geocodes or notes), thus deleting data that cannot be replaced by data in the Gedcom file.
 +
 
 +
=== Citation Medialinks ===
 +
(This section describes a problem that was introduced in TNGv12 that is not yet addressed by the this mod)
 +
 
 +
TNGv12 introduced Citation Medialinks, resulting in some unintended consequences. The TNG data entry process does not provide a way for users to link Citations to Media items, and, until TNGv12, the Gedcom Import ignored Citation Media. In most cases, ignoring Citation Media was OK because those media links (which are Gedcom OBJE references) were duplicated in higher-level records; that is, in the parent Event record and/or the Event's parent Person or Family record.
 +
 
 +
It is important to note that
 +
* In many Genealogy applications, one citation can be applied to multiple events, event in multiple records. That is, a citation that describes the birth, death, and marriage of a husband and wife can be linked to the husband's name, birth, and death events, the wife's name, birth, and death events, plus their Family record's marriage event.
 +
* In contrast, in Gedcom, and in TNG, citations are strictly subordinate to a parent Event record.  Consquently, the citation mentioned above sould become seven unique Citation records, each with a unique citationID. Also, if the citation is linked to a Media item, there would also be seven essentially-duplicate Citation Medialinks.
 +
 
 +
Citation Medialinks in TNG present several problems:
 +
# Since Citations do not have Gedcom IDs they get new CitationIDs in each Gedcom Import, and old Citation Medialinks survive a new Gedcom Import.
 +
#* Significantly, in Citation Medialinks, the citationID is the primary link destination.  That is, unlike Event Medialinks, Citation Medialinks have the the citationID in the the field medialinks.personID.  Citation Medialinks might not be retained, and might not present the other problems below if citationID was a separate field in the Medialinks table, making Citation medialinks subordinate to the Citation's parent eventID, and grandparent personID or FamilyID
 +
# TNGv12 also introduced Citation Medialinks in the Person Profile's Source Citation list. In many cases, this is a nice new feature, but
 +
#* The compiled list of Source Citations in the Person Profile normally merges identical Citations, and
 +
#* The Citation Medialink that has been added to the Person Profile Source Citations is represented by the unique medialinkID, not the MediaID. This makes all citations that have Medialinks unique, resulting in duplicate Source Citations.
 +
# For Media items that link to a Citation, the "Link" in Media search results is now dominated by Citation links.  These search results represent each Medialink with the linked recordID.  For Sources, People, Families, and Places, the recordID is generally meaningful to the user.  But citationIDs do not carry meaning on their own, and cannot be looked up, so there is really no way to know what a citationID represents.
 +
#* This applies to the end-user media search program, browsemedia.php, and to the Admin search program admin_media.php.
 +
# When Media items are displayed (by showmedia.php), the web page tries to list all records that link to that media item - that is, as with Media search results, it lists the media item's medialinks. But unlike the search programs, showmedia.php turns the recordIDs into hyperlinks to that record. Well, since there is no way to link to a Citation, citationIDs sometime (but evidently not always) break that list of links.
 +
#* (I have not figured out exactly why this occurs with some media items and not others.)
 +
 
 +
'''Possible solutions:'''
 +
Partial solutions include:
 +
* Manually purge Citation Medialinks before a Gedcom Import with the simple SQL query
 +
*: DELETE FROM tng_medialinks WHERE linktype=''
 +
* This mod could purge all Citation Medialinks before each Gedcom Import,
 +
* Patch the getCitations function in personlib.php to remove linked media items from citations, or perhaps to change the link's ID from the medialinkID to the mediaID.
 +
* Patch showmedia.php to handle Citation Medialinks
 +
* Patch browsemedia.php and admin_media.php to sort Citation Medialinks to the end of the links, or perhaps to omit Citation Medialinks.
 +
 
 +
More comprehensive workarounds:
 +
* After a Gedcom Import, remove all Citation Medialinks with the SQL query shown above
 +
* Modify the Gedcom Converter mod to remove Citation Media references when the parent or grandparent events have the same Media reference, or to remove Citation Media altogether.
 +
 
 +
The ultimate solution, perhaps:
 +
* Remove the new TNG Gedcom Import code that creates Medialinks for Citation Media.
  
 
== Conflicts And Dependencies ==
 
== Conflicts And Dependencies ==

Revision as of 12:14, 9 August 2019

Gedcom Import Purge
Summary Causes the Gedcom Import Process to delete Medialinks records that have been deleted in the source database, and to retain some Places records that otherwise would be purged.
Validation
Mod Updated 15 May 2018
Download link
v12.0.0.4.zip
TNG 12.0
Download stats
Author(s) Robin Richmond
Homepage Robin Richmond's Genealogy Database
Mod Support My Mod Support form or TNG Community Forums
Contact Developer My Mod Support form
Latest Mod v10.1.0.3 & v12.0.0.4
Min TNG V 10.1
Max TNG V at least 12.0
Files modified
admin_dataimport.php, admin_gedimport.php, gedimport_trees.php, gedimport_misc.php, English/data_help.php, English cust_text.php. installs rrgedcomimportpurge_dbsetup.php
Related Mods
Notes


Purpose of the Mod

This mod changes the Gedcom Import process to

  1. Add a flag (a new database field) to Medialinks that are created by Gedcom Imports,
  2. Optionally, purge Medialinks that have that flag, but retains Medialinks without the flag (i.e. that were created through TNG data entry), and
  3. Retain Places with Placelevels and/or Medialinks.

(The Medialink purge can be prevented if the user unchecks a new "Purge Medialinks" checkbox on the Gedcom Import kickoff form.)

Without this mod, the Gedcom Import Process:

  • Leaves all Medialinks intact, including those that have been removed from the source database, and
  • May purge some Places that have Placelevels and/or Medialinks that cannot be replaced through the Gedcom Import.

Background

When the "pristine" (unchanged by a mod) Gedcom Import process starts, and you tell it to replace "all current data", TNG deletes all People, Families, Events, Notes, Citations, Sources, and Repositories in that tree, because TNG assumes that they will all be replaced from data in the Gedcom file. But it does not delete any Media Links at all, presumably because

  • Many TNG admins don't count on Gedcom to load (all) of their media items and links, and instead manually create Media items and links, and
  • Gedcom does not support Medialinks for places or cemeteries, and any such links in the database must have been created manually, using TNG data entry screens.

Note that

  1. Media, Source, People, and Family records all have Gedcom IDs that should be (and almost always are) consistent from one Gedcom import to the next,
  2. Medialink records form a link between a Media record and another object - specifically, a Source, Person, Family, Event, Place, or Cemetery,
  3. In Medialink records, (1) the mediaID field, (2) the awkwardly-named personID field (which might be a sourceID, personID, familyID, CemeteryID, or even a Placename), and (3) the eventID form a unique key, assuring that existing Source, People, and Family Medialinks to a given Media item are overwitten during a new Gedcom Import.

Thus
Problem 1 - The Gedcom Import process retains Media links that have been deleted from the source database that produced the Gedcom, forcing the site Admin to delete those Medialinks manually (assuming that the Admin realizes that they exist).

(Custom Events do not retain an eventID from one Gedcom Import to the next, so, for the moment, I cannot explain why old Custom Event medialinks do not survive a new Gedcom Import, but they do not. This result is surely related to the fact the that, in Medialinks to Events, the primary link relationship is to the personID or familyID (in the Medialinks.personID field), and eventID's, in effect, just modify those Medialinks.)

Also, the "pristine" Gedcom Import process purges Places that do not have geocodes or descriptive notes, presumably because it assumes that Places without such data will be replaced (as needed) by the data in the Gedcom file. However, it does not pay attention to the Placelevel field, or to Medialinks. As a result,
Problem 2 - The Gedcom Import process deletes Places that have Placelevels or Medialinks (but no geocodes or notes), thus deleting data that cannot be replaced by data in the Gedcom file.

Citation Medialinks

(This section describes a problem that was introduced in TNGv12 that is not yet addressed by the this mod)

TNGv12 introduced Citation Medialinks, resulting in some unintended consequences. The TNG data entry process does not provide a way for users to link Citations to Media items, and, until TNGv12, the Gedcom Import ignored Citation Media. In most cases, ignoring Citation Media was OK because those media links (which are Gedcom OBJE references) were duplicated in higher-level records; that is, in the parent Event record and/or the Event's parent Person or Family record.

It is important to note that

  • In many Genealogy applications, one citation can be applied to multiple events, event in multiple records. That is, a citation that describes the birth, death, and marriage of a husband and wife can be linked to the husband's name, birth, and death events, the wife's name, birth, and death events, plus their Family record's marriage event.
  • In contrast, in Gedcom, and in TNG, citations are strictly subordinate to a parent Event record. Consquently, the citation mentioned above sould become seven unique Citation records, each with a unique citationID. Also, if the citation is linked to a Media item, there would also be seven essentially-duplicate Citation Medialinks.

Citation Medialinks in TNG present several problems:

  1. Since Citations do not have Gedcom IDs they get new CitationIDs in each Gedcom Import, and old Citation Medialinks survive a new Gedcom Import.
    • Significantly, in Citation Medialinks, the citationID is the primary link destination. That is, unlike Event Medialinks, Citation Medialinks have the the citationID in the the field medialinks.personID. Citation Medialinks might not be retained, and might not present the other problems below if citationID was a separate field in the Medialinks table, making Citation medialinks subordinate to the Citation's parent eventID, and grandparent personID or FamilyID
  2. TNGv12 also introduced Citation Medialinks in the Person Profile's Source Citation list. In many cases, this is a nice new feature, but
    • The compiled list of Source Citations in the Person Profile normally merges identical Citations, and
    • The Citation Medialink that has been added to the Person Profile Source Citations is represented by the unique medialinkID, not the MediaID. This makes all citations that have Medialinks unique, resulting in duplicate Source Citations.
  3. For Media items that link to a Citation, the "Link" in Media search results is now dominated by Citation links. These search results represent each Medialink with the linked recordID. For Sources, People, Families, and Places, the recordID is generally meaningful to the user. But citationIDs do not carry meaning on their own, and cannot be looked up, so there is really no way to know what a citationID represents.
    • This applies to the end-user media search program, browsemedia.php, and to the Admin search program admin_media.php.
  4. When Media items are displayed (by showmedia.php), the web page tries to list all records that link to that media item - that is, as with Media search results, it lists the media item's medialinks. But unlike the search programs, showmedia.php turns the recordIDs into hyperlinks to that record. Well, since there is no way to link to a Citation, citationIDs sometime (but evidently not always) break that list of links.
    • (I have not figured out exactly why this occurs with some media items and not others.)

Possible solutions: Partial solutions include:

  • Manually purge Citation Medialinks before a Gedcom Import with the simple SQL query
    DELETE FROM tng_medialinks WHERE linktype=
  • This mod could purge all Citation Medialinks before each Gedcom Import,
  • Patch the getCitations function in personlib.php to remove linked media items from citations, or perhaps to change the link's ID from the medialinkID to the mediaID.
  • Patch showmedia.php to handle Citation Medialinks
  • Patch browsemedia.php and admin_media.php to sort Citation Medialinks to the end of the links, or perhaps to omit Citation Medialinks.

More comprehensive workarounds:

  • After a Gedcom Import, remove all Citation Medialinks with the SQL query shown above
  • Modify the Gedcom Converter mod to remove Citation Media references when the parent or grandparent events have the same Media reference, or to remove Citation Media altogether.

The ultimate solution, perhaps:

  • Remove the new TNG Gedcom Import code that creates Medialinks for Citation Media.

Conflicts And Dependencies

No known conflicts. This mod counts on the presence of the Blue Info Button mod to format small help icons on the Gedcom Import kickoff screen, but Blue Info Button is not strictly required. (See the visualizations below.)

Related Mods

  • Blue Info Button is not strictly required, but it is certainly recommended and has a very small footprint.
  • If the optional Show Mod Names mod is installed, this mod will utilize its functionality.
  • Gedcom Import Mediatype and Gedcom Import Monitor are related only in that they also affect the Gedcom Import kickoff form and the Gedcom Import process. Aspects of those mods are coordinated with this mod (Gedcom Import Purge), but there are 'no dependencies among them and this mod.

Requirements

  • A working TNG installation.
  • An installed current version of the Mod Manager.
  • You should backup files listed in the panel on the right.

Installation

  1. Remove and delete previous version of this mod.
  2. Backup the files updated by this mod. They are listed in the panel at the upper right.
  3. Download the .zip file, and extract its .cfg file and subfolder to the mods folder.
  4. Follow the normal automated installation for Mod Manager, as shown in the example Mod Manager - Installing Config Files.
  5. You must also run the database setup program that is installed by this mod. The only link to the database setup program is in this mod's Description field in the Mod Manager. The database setup program creates the database field that is used to flag Medialinks that are created by the Gedcom Import process. If you fail to run that setup program, your next Gedcom Import will encounter a fatal database error.

In the event of a problem

  1. Try using the Mod Manager Remove capability
  2. Contact me through My Mod Support form.

Visualization of this Mod

In the Mod Manager, After Installation, showing the database setup hyperlink.
Gedcom import purge-modman.png
You must follow the hyperlink shown above to run the setup program that creates the database field through which TNG remembers which media links were created by a Gedcom Import. The program that installs the new database field looks like this:
Gedcom import purge-createdb.png
Admin >> Import/Export >> Import screen AFTER INSTALLATION:
Gedcom import mods-after.png
The visualization above shows 5 (well, really 6) installed mods:
  1. Changes from Gedcom Import Purge (this mod) are outlined in red.
  2. Changes from Gedcom Import Mediatype (which is independent of this mod) are outlined in orange.
  3. Changes from Gedcom Import Monitor (which is independent of this mod) are outlined in green.
  4. The "Show Mod Names" button installed by the optional Show Mod Names mod is outlined in brown. Most of my mods "register" themselves so that Show Mod Names can tell you (or, potentially, me) which of my mods affect the current program, and which version of each mod is installed. (If you choose to install Show Mod Names, and use its default mod parameters, that button is visible only in Admin programs, not in any end-user programs.)
  5. The optional but recommended Blue Info Button mod formats help links as shown in the visualization. Without Blue Info Button, the help links would just be underlined question marks.
  6. The visualization also is affected by the Gedcom Converter mod, which creates the "Gedcom Converter" tab up in the TNG Tab menu. But Gedcom Converter is otherwise unrelated to this mod (Gedcom Import Purge).
As the Gedcom Import starts:
Gedcom import purge-purge.png

Note that the Gedcom Import process will not purge any medialinks the first time you run it after installation, since the new database field hasn't yet been populated by flags that say that the Medialink was created by a Gedcom Import.

Revision History

      • The latest version of the mod is at the top of this table ***
Mod Version TNG Version Date Note
12.0.0.4 12.0+ 15 May 2018 No functional changes; made compatible with TNGv12.
10.1.0.3p 10.1-11.1 26 Mar 2017 Removed the second line from the cust_text.php target location search string
10.1.0.3 10.1-11.1 26 Mar 2017 A technical update that, mostly, just makes Show Mod Names optional, and avoids a installation conflict introduced by TNGv11.0.1. That is, this version omits the adminlib.php patch that was part of v2 of this mod, and changes this mod's database setup program so that it works independently of that adminlib.php patch (which was fixed in TNGv11.0.1).
10.1.0.2 10.1-11.0.1 25 May 2016 Removed a unneeded <script> element that incorrectly referenced an external file, and that - under rare circumstances I still don't understand - could cause the database setup program to log you out from your TNG session. Also fixes a similarly incorrect <script> element in adminlib.php.
10.1.0.1b 10.1-11.0 25 Mar 2016 Fixed an error in a JavaScript warning. Cleaned up the code. Changed the Mod Parameters to strings rather than boolean values to be more tolerant of data entry errors.
10.1.0.1a 10.1-10.1.3 21 Feb 2016 No new end-user functionality. Now depends on Show Mod Names v2+.
10.1.0.1 10.1-10.1.3 6 Feb 2016 New mod.

Sites using this mod

If you download and install this mod, please add your site to the table below.

URL User Note Mod-Version TNG-Version User-language
Robin Richmond's Genealogy Database - admin function; not visible. Robin Richmond Mod developer 10.1.0.3 11.1.0 English
Hooley Family Links Rick Hooley Public/Private See Here See Here EN