Gedcom Data Model

From TNG_Wiki
Jump to navigation Jump to search
Construction The pageris not yet complete. Construction


Gedcom (GEnealogical Data COMmunication) is a widely-used and reasonably thorough text file format for sharing data among genealogy applications. It was originally meant to be nothing more than a file format, but it so specific about the structure of data and the meaning of all data elements that unquestionably defines a (hierarchical) data model. All of its record structures are tagged and those tags have specific meaings (e.g. SOUR - a data soure suc wit doesn't just support the use of tags; it speithat represent a setof specific set of data the meaning of for communication, not a data model. But it does serve, in practice, a data model, since certain data formats, values, and relationships must be assumed for data to be shared effectively.

I will use the terms “object” and “record” interchangeable in this article, as well as “object type” and “record type”

Object Types

i.e. level 0 records

People (which it calls "Individuals"), Families, Sources, Repositories, and Notes..

The File Format(Briefly)

  1. The Gedcom file format and data model are hierarchical, with six “fundamental” Gedcom object types, and several derived object types and data elements. See the article on GEDCO M for a description of the Gedcom file structure.
  2. All entities (records and fields) in a Gedcom file are identified by by “tags” (similar to XML or HTML tags), which are short upper-case-letter identifiers
  3. Each line of a Gedcom file consists of
    1. An integer that represents the level of the cor; i.e its depth in the hierarchy
    2. A tag, and
    3. Optionally, either a single value or a recordID that refers to a level 0 record.
  4. Here's an example of a Gedcom structure that reache5 level. Each :
    1. A Level 0 person record (INDI tag)
      1. Level 1 – Multiple Media Items (OBJE tags)
      2. Multiple Personal Facts and Events (Name, Sex, Birth, Death, etc.)
        The Event can contain
        1. Level 2 - A Place structure (a PLAC tag, with the Place Name)
          1. Level 3 – MAP tag
            1. Level 4 LATI tag, with latitude value
            2. Level 4 LONG tag, with longitude value
        2. Level 2 - A Date (DATE tag) with the Date value
        3. Level 2 - Multiple Source Citations (SOUR tags)
          1. Level 3 - Lines that define the Page detail and Quay (reliability) values (PAGE and QUAY tags)
          2. Level 3 - A DATA record (DATA tag),
            1. Level 4 - a TEXT line (TEXT tag) that defines one line of text
              1. Level 5 -additional lines of text (CONT tags)
  5. Your Image Caption

    Some Derived Object Types

    Events

    1. As noted in the previous section, the term "Event" is really a common alias for the Gedcom concepts "Event", "Attribute", "Ordinance", and "Fact",  which are just classifications of tags that always occur at Gedcom level 1, immedialy below level 0 Person and Family tags. Their substructures are so similar they are almost away treated as the same thing.object type, though it is an important Gedcom concept.
    2. An Event has an Eventtype such as ‘Birth’, ‘Death’, ‘Name’, ‘Sex’, ‘Marriage’, etc., eacj opf wich is represented by a "Tag" that is derived from their English sames. Fpr example, the five Eventtypes just above are represented by the tags 'BIRT', 'DEAT', "NAME', "SEX', and 'MARR.beyond that,
      • Events can occur multiple times per record (even Birth & Death for People and Marriage for Families)
      • Can have a Date and a Place (at Gedcom level 3
      • Also at level 3, can be linked to one or more Media Items via the Media Item's Gedcom ID.
      • Can have these subordinate records (among others):
        1. AGNC (Agency) which may be, for example
          • A facility such as a hospital where a birth or death occurred or
          • An organization through which an event occurred - perhaps an adoption agency or a branch of the military (army, navy etc.
        2. AGE of the person when the event occurred
        3. CAUS, most often used as a cause of death. (typically used only with Death events)
        4. SOUR, a link to a source of data, which is possble because Sources are level 0 records.
        5. OBJE (for Object) A media item. (Of course lots or things in Gedcom would ordinarily be though of a objects, but that's the Gedcom-defined tag for a media item.
      • A value may occur directly on the line that defined the event. For example "1 OCCU Carpenter" Some eventtypes can have a descriptive value (e.g. Occupation, Religious affiliation)
    3. ALL Eventtypes can occur multple times in a person or family, for a couple of reasons:
      • Many eventtypes truly occure more than once in a persons. Examples include occupation, place of residence, graduation and military service
      • For eventtypes such as Birth and Deat, multiples occurence represent differen opinions of the date and/or time, which can exist because they could be different in different sources.

    Places

    1. To Gedcom, Places are just Place Name values that are subordinate to Events.
    2. Each value is independent
    3. A Place name can be spelled or qualified differently within different events, e.g. “Houston Texas” could be represented in different events as “Huston, TX”, “Houston, Harris County, Texas”, “Houston, Harris, Texas, USA”, etc.
    4. Place Names are almost always recorded as a comma separated sequence of jurisdiction names, for example "Texas, USA", "Houston, Harris County, Texas, USA".
      1. A researcher focused on the USA might omit "USA"
    1. Places may have a subordinate structure that defines Latitude and Longitude values for putting the Placename on a map.
    2. Most applications do treat Places as an object, and store a list of Places that are used in the database, so that place name spelling, jurisdictional qualifications, geocodes, and so on, are stored only once.

    Source Citations

    1. Source Citations (or just “Citations”) connect data (stored in an Event) with a Source.
    2. In Gedcom, Source Citations are subordinate to an event, and do not have distinctive external identifiers.
    3. They begin with a Gedcom level 2 SOUR line, which contains a pointer to (the Gedcom record ID) of a level 0 Source record.
    4. Citation can contain following level 3 or 4 records:
    5. Location of the supporting data within the source (Sll of either of things like Book chapter, Page number, Jornal article, etc).
    6. A description or transcription of the data that supports the parent data
    7. A rating of the reliability of the source.
    8. Date when the data was found (when the source was read). Note that the date of publication of the source is in the Sourcce record.
    9. Media items - typicaly scanned pages from a written source, or perhaps photos of the a place that is important to the citation.

    Associations

    1. Associations occur within a Person record, and associate the person with another person.
    2. Some applications allow People to be associated with Families, as well.
    3. Associations carry a value such as “Godparent”, “Maid of Honor”, “Friend”, etc.

    etc., etc.