GDMUML is a representation of the GENTECH Genealogical Data Model using the UML. It takes the GENTECH model (see GENTECH, 2000) as a starting point and aims to preserve the semantics of the original model. The GENTECH model is represented as an Entity-Relationship model. This is a well established tool used for database design, developed by Peter Chen (see Chen, 1976). This is an appropriate method for designing an RDBMS to store genealogical data.
The UML can also be used to model logical database designs. However, GDMUML does not do that. Instead, it focuses on system object modeling. An example will help differentiate the perspectives. The RESEARCH-OBJECTIVE entity in the GENTECH model specifies a primary and foreign key, indicating that it is represented as records in a database table, each with a matching record in a different (foreign) database table, the PROJECT table.
By contrast, GDMUML defines two classes ResearchObjective and Project. It does not specify how these might be implemented and thus avoids introducing database terminology. However, it does preserve the relationship between the objects, by indicating that the classes are associated and that one Project may have one-or-more ResearchObjectives. This approach defers to the implementation stage, the decision about how to make the objects persistent. Instead it focuses on creating clearly defined classes that represent the objects in the system.
The first step in developing a model is to discover the vocabulary of the domain (see Booch et al, 1999). The entities in GENTECH's model almost map one-to-one to classes in GDMUML. The main exceptions are the "associative entities" which serve to tie together several database tables, in many-to-many relationships. Some of these relationships imply the existence of additional classes.
Once the classes have been defined, identifying the relationships between them is important. This further helps to understand the role of each class in the system and what its responsibilities are. These relationships are best conveyed with a static class diagram. The class diagrams created here are from the conceptual perspective (see Fowler, 2000). Thus, as with the GENTECH model, the GDMUML model is not meant to be used to implement a desktop application, a database, or a XML schema. However, it is hoped that GDMUML might serve as the basis for these kinds of implementation models.
A GDMUML class is displayed in bold italics, the first letter of each noun capitalized, e.g. Source and SourceInstance. A GENTECH entity is displayed with all characters capitalized and bold, e.g. SOURCE or REPOSITORY-SOURCE.
GDMUML follows GENTECH's subdivision of the model into submodels. These include the Evidence submodel, the Administrative submodel, and the Conclusional submodel. The submodels are not independent of one another but they do serve to group together classes that describe major portions of the system. The Utility submodel has also been added to group those classes that provide a supporting or utility role.
The Evidence submodel is concerned with the classes that describe the acquisition of data. Data may come into the system from quite varied sources. The system needs to record where this data originated. This data also feeds into the Conclusional submodel to provide the basis for creating deductions.
The Administrative submodel is concerned with the classes that describe the conduct of research. What projects are defined and what are their objectives? Which researchers are assigned to which projects? What research tasks are scheduled to be completed? Which sources require searches? These are the issues this submodel deals with.
The Conclusional submodel is concerned with the classes that describe the assertions made from the acquired data. Assertions can be made from direct evidence or from other assertions. An important aspect of the GENTECH model is its support for creating an audit trail of the deductions which are reached.
The Utility submodel contains utility classes such as Place and Date, which are used by the other submodels. It also has has a class stereotype that models a generic collection.
A SourceInstance represents the origin of an item of data. An instance of this class records the specific location of a piece of data. The location is recorded in a SourceLevels collection. Each SourceLevel item is logically equivalent to a component of a bibliographic entry or citation.
The current convention (see Mills, 1997) is to document the origin of a fact using a bibliography and a source note. A bibliographic entry is used as a roadmap to locate a source. The bibliographic entry doesn't refer to a specific fact. It is a summary representation of the source for the reader's convenience. Each SourceInstance will have this information recorded in the SourceLevels collection that it contains. A bibliographic entry can be created from a subset of the SourceLevels collection, but the object model does not define a class to represent it.
Similarly, the Mills style guide (see Mills, 1997) for source notes has recommended formats for full citations and short citations. The citation points the reader to the location of a particular fact in the source. As such, it is more detailed than its bibliographic counterpart and it uses more of the SourceLevels collection to record this information. Various citation formats can be created from a subset of the SourceLevels collection, but the object model does not define classes to represent them.
More than one copy may exist for a given source and all of the potential copies are instances of the SourceInstance class. For example, for a given census many microfilm copies may exist, some may be of better quality than others. In some cases, each visit to a source may represent a separate SourceInstance. For example, a tombstone examined 10 years ago may have revealed details that are no longer visible today.
It may be convenient to create a SourceInstance that represents the physical source which is examined in a Repository. For example, a copy of a book or a reel of microfilm. Since it is not used to record a specific fact, the SourceLevels collection would contain a set of elements similar to those needed for a bibliographic entry. Such a SourceInstance would be useful for planning Searchs, recording the results of a failed Search, or for recording a source for which a Repository is sought. Such a SourceInstance could serve as a generalization (superclass) of all references to this physical copy.
The GENTECH model's REPOSITORY-SOURCE entity has semantics similar to SourceInstance. Like the GENTECH entity, it holds information which describes this particular copy of the data source, e.g. its call number and a description of the condition of the copy that was examined.
A SourceLevel is logically equivalent to a component of a bibliographic entry or citation.
An example will make this clearer. The 1870 U.S. census for Livingston County, Kentucky, has an entry for the William Sharp household. The particular piece of data that is to be represented is the age of his spouse, Delila at the date of the census. To completely record this, a hierarchy of SourceLevels, such as these might be used:
Each of the items in the above list are instances of class SourceLevel. SourceLevels form an ordered list, ranging from general to specific. The most specific SourceLevel is a subclass called LowestSourceLevel. It is the SourceLevel from which assertions are commonly generated.
Each SourceLevel may have references to these classes:
The GENTECH SOURCE entity is represented by GDMUML's SourceLevel class.
This class represents the collection of SourceLevels. In general, it could be one of several container classes, although in this case the hierarchical and ordered nature of the collection suggests a tree structure. The SourceLevels collection divides the Evidence submodel into classes which deal with SourceInstances as a whole versus classes which interface with SourceLevel items, the component parts of SourceInstances.
The SourceLevels collection is a continuum of progressively more detailed items, ending at the specific datum being referenced. This last instance is the LowestSourceLevel.
Each SourceLevel may have an instance of CitationPart associated with it. This class simply holds a string and type identifier for a component of a citation or bibliographic entry.
This class corresponds to the GENTECH CITATION-PART and CITATION-PART-TYPE entities.
An instance of the Representation class corresponds to a physical or electronic copy of a SourceInstance. Examples include, a disk file which is a digital camera image of a tombstone, the text of a transcription of a tombstone inscription, reference to a physical file number containing an original photograph, and a xerographic copy of a census page.
Although this meaning of Representation corresponds to the SourceInstance as a whole, it is possible to have many Representations that correspond to different SourceLevels. For example, in the case of the census example used above, if a photocopy of page 175A was a Representation, then the SourceLevel corresponding to the page reference would have this Representation instance. However, a text extract of the census information for the age column could also be a Representation for the LowestSourceLevel.
This class corresponds to the GENTECH REPRESENTATION and REPRESENTATION-TYPE entities.
This class represents the place where an instance of SourceInstance was found. Some examples include: a library, a website with online census images, and a cemetery. This class corresponds to the GENTECH REPOSITORY entity.
This class represents a collection of SourceInstances. The criteria for grouping the SourceInstances in the collection, is defined by the Researcher. A SourceInstance may belong to more than one SourceGroup. This class is closely tied to the Administrative submodel, since it is an organizational aid. This class corresponds to the GENTECH SOURCE-GROUP and SOURCE-GROUP-SOURCE entities.
The class diagram illustrates the following class relationships:
An object diagram shows the relationships between real-world instances of each of the classes. This example portrays a Researcher's visit to the Pacific Region's NARA (a Repository) located in San Bruno, CA (a Place). A reel of microfilm was located for the 1870 U.S. census for Livingston County, Kentucky (a SourceInstance). Using a microfilm reader, page 175A was found and a photo-copy of the image of the page was made and this was stored as a hardcopy in file number 11052000 (a Representation).
The Delila_Age_1870_Census instance of SourceInstance contains a SourceLevels collection which contains SourceLevel items. Note that in the diagram, only 7 of the 10 SourceLevel items are shown.
The SourceLevels are mapped to the levels of the bibliographic entry, citation, and excerpt. Each SourceLevel has a Researcher associated with it. Also, each SourceLevel has a CitationPart associated with it corresponding to one of the citation components. Some SourceLevels have a Place associated with them, e.g. the jurisdiction location of the census (Livingston County, Kentucky), and the location of the publisher of the microfilm, (Washington). Finally, two SourceLevels have Representations. The photocopy corresponds to the "page" SourceLevel, and a text excerpt would correspond to the LowestSourceLevel.
This class represents a person who is conducting research. Its main use is to assign responsibility for data that enters the system. This class corresponds to the GENTECH RESEARCHER entity.
This class represents a project which a researcher is working on. The GENTECH GDM describes it this way (page 64):
One project might consist of all information about a person's ancestors, both on the researcher's father side, and on the researcher's mother's side. Another project is all ancestors on only one side of the researcher's family, such as the mother's side; this researcher might have another project for the father's side. Another project is a one-name study. Other types of genealogical projects include a study of the descendants of a particular person or couple, and the descendants of a particular group of people. Finally, a project can be undertaken for another person, in which case there is a client associated with the project.This class corresponds to the GENTECH PROJECT entity.
This class represents a scale for measuring certainty. It is used by a project to assign a surety level to assertions. It does not define a single scale. Instead, it captures the scale and stores it as part of a project. This class corresponds to the GENTECH SURETY-SCHEME entity.
Each SuretyScheme defines a set of certainty levels that can be assigned to assertions. This collection represents all possible levels defined for a particular scheme.
A SuretySchemePart represents a single certainty level from the SuretySchemeParts collection. An Assertion is assigned a SuretySchemePart to indicate the Researchers certainty in the conclusion. This class corresponds to the GENTECH SURETY-SCHEME-PART entity.
This class represents the goals which a researcher is trying to reach in a particular project. It corresponds to the GENTECH RESEARCH-OBJECTIVE entity.
This class represents the activities which a researcher has completed or is planning to complete, in order to meet a ResearchObjective. It is an abstract class, i.e. it is a generalization of all kinds of project activities that a researcher might perform. The subclasses, Search, Analysis, DataEntry, Import, Export, Report, Archive, and Restore are some representative activities. This class corresponds to the GENTECH ACTIVITY entity.
This class represents examination of a SourceInstance to find information, usually related to the ResearchObjective. A particular Search can take place at a known Repository for SourceInstances which are relevant to the ResearchObjective - in this case the Search is for SourceInstances. The Search could also be for a specific known SourceInstance - in this case the Search is for specific data or to determine if the data in a known SourceInstance is relevant. Another type of Search might try to locate a Repository which holds a known SourceInstance. This class corresponds to the GENTECH SEARCH entity.
This class represents the input of data into the system as a result of a Researcher examining a SourceInstance that revealed data relevant to the ResearchObjective. This will include recording information that corresponds to the classes: SourceInstance, SourceLevels, CitationParts, and Representations. (There is no corresponding entity in the GENTECH-GDM.)
This class represents the analysis of the data which has been entered into the system. Analysis can lead to the generation of Assertions based upon the raw data or upon other Assertions. Analysis can also direct future research to seek out new SourceInstances. (There is no corresponding entity in the GENTECH-GDM.)
This class represents the input of data into the system from a file. The file might be a GEDCOM file, an XML file, or other file type supported by the system. If the file represents a full implementation of GDMUML, then its content can be combined with existing data. If it has a different structure, the import process may need to make appropriate transformations or isolate the data. (There is no corresponding entity in the GENTECH-GDM.)
This class represents the output of a file to represent the data collection for a project. This process is opposite of Import. It should support standard data exchange formats, such as GEDCOM and XML variants. It should also support export of a full implementation of GDMUML. This provides Researchers, working on the same Project, with another method of data sharing. (There is no corresponding entity in the GENTECH-GDM.)
This class represents the generation of a summary report based upon the data collection. There are many standard report types which could be generated such as "family group sheet", descendant tree, ancestor tree, Ahnentafel Report, NGS Quarterly Report, etc. (There is no corresponding entity in the GENTECH-GDM.)
This class represents the output of a file that is a snapshot of the system data collection and configuration. An Archive differs from an Export in that it is used as a complete backup for the local system and not as a method of exchanging data with other Researchers. (There is no corresponding entity in the GENTECH-GDM.)
This class represents the restoration of a system to a snapshot taken by a previous Archive. (There is no corresponding entity in the GENTECH-GDM.)
This class represents other tasks that do not have a defined class. This class corresponds to the GENTECH ADMINISTRATIVE-TASK entity.
The class diagram illustrates the following class relationships:
An Assertion represents a conclusion reached. The most primitive Assertion is built directly from a source fragment in a LowestSourceLevel. For example, a census record may be used to make the assertion that "Delila Neal was 45 years of age in 1870".
An Assertion may also be derived from other Assertions by using the Assertions collection. In this way, a hierarchy of assertions can be built from primitive to more complex.
In the GDM, assertions take the form of simple statements. The Unified Theory of Genealogical Data, proposed in the GDM (GENTECH, 2000, page 12), classifies assertion statements into three basic types:
In general, an assertion statement will have two Subjects. A Subject can have a Place and Date associated with it. Depending on the type of statement, the Assertion may also have a "value". For example, in the case of a statement describing a marriage, the "value" may be "groom" to indicate the role one subject played in the event.
The two Subjects of an assertion statement are selected from four subclasses: Event, Characteristic, Group, and Persona. The GENTECH-GDM suggests that only a small subset of the possible Subject combinations make sense. Examples of the following subject combinations can be found in the GDM specification:
|Subject1||Subject2||Assertion "Value"||GDM page reference|
The GDM only associates one SuretySchemePart with an Assertion, to allow a Researcher to indicate her certainty in the statement. It would be more useful to have SuretySchemeParts tied to each of the elements of the Assertion, e.g. the date, place, and subjects.
This class corresponds to the GENTECH ASSERTION entity.
An AtomicAssertion is a subclass of Assertion. It represents an Assertion derived from a LowestSourceLevel datum. (There is no corresponding entity in the GENTECH-GDM.)
A higher level Assertion will be based upon one or more lower level Assertions. An Assertions collection represents this grouping of lower level Assertions. The derived Assertion will have a reference to such a collection. This is in contrast to an AtomicAssertion which is derived from a LowestSourceLevel.
This class corresponds to the GENTECH ASSERTION-ASSERTION entity.
This class serves as a base class for the GENTECH-GDM subject types: Characteristic, Event, Group, and Persona. It contains references to the Subject's Date and Place. (There is no corresponding entity in the GENTECH-GDM.)
This Subject type represents any data that uniquely characterizes an individual and helps to distinguish them from someone else. This class contains a CharacteristicParts collection which holds the "characteristic" data. For example, the person's name could be stored as several parts in this collection. This Subject subclass will have a Date, indicating when the characteristic was noted, and the Place where it was noted. This class corresponds to the GENTECH CHARACTERISTIC entity.
This class contains a collection of CharacteristicPart items which describe some characteristic of an individual. (There is no corresponding entity in the GENTECH-GDM.)
A CharacteristicPart contains an element of data and an enumerated part type. For example, three CharacteristicParts might be strings with the corresponding enumerated part types: FirstNamePartType, MiddleNamePartType, and LastNamePartType. This class corresponds to the GENTECH CHARACTERISTIC-PART and CHARACTERISTIC-PART-TYPE entities.
This Subject type represents a happening to the associated Subject in the assertion statement. This Subject subclass will have a Date and Place of occurrence. A member variable of this class will hold a value of an EventType enumeration. Each unique identifier of the enumeration specifies a possible event type, such as: Marriage, Birth, or Burial. This class corresponds to the GENTECH EVENT entity.
This Subject type indicates that the subject is a member of some collection of related subjects. For example, the children of the union of a man and a woman would constitute a group. So would characteristics of an individual, where each characteristic is a name that the individual was known by. In this case, the group consists of all known names for the individual. The Group class may also have a Date and Place associated with it. A member variable of this class will hold a GroupType value. The GroupType may be a generic type, like "Children of a Union" or a type specific to a project such as: "Children of William Sharp and Delila Neal", or "Neighbors of William Sharp in Livingston Co., Ky". This class corresponds to the GENTECH GROUP entity.
This class represents the identity of a individual. When data is entered into the system it is attributed to a Persona. Each SourceInstance will generate new sets of Personas for the individuals for which data is collected. After examining several SourceInstances, an Assertion is made that some of the Personas referenced in these SourceInstances refer to the same individual. This Assertion would be a statement that a Persona is a member of a particular Group. In this case, the Group represents a collection of Personas that are believed to refer to the same person. One can also use a "Group-Persona" statement to assign a unique Persona to a Group of Personas. The Persona class does not have a Date or Place associated with it. This class corresponds to the GENTECH PERSONA entity.
The class diagram illustrates the following class relationships:
This class is used to define a specific geographic location. It corresponds to the GENTECH PLACE, PLACE-PART, and PLACE-PART-TYPE entities.
This class is used to define a date or a date-range. It allows multiple calendar systems and has methods for conversion between them. The GENTECH model refers to this as a "Date Expert System".
This class represents a generic collection. It has methods for adding and removing items from the collection, for iterating over the items in the collection and for retrieving the count of items. The items may be of any type.
The class diagrams in this document use the collection stereotype, to define various types of collection classes. In Figure 5, the "collection" stereotype is a shorthand for the more complete representation shown in the top diagram. It represents a class that has several interfaces for access, iteration, modification, etc. of the elements of a collection. The class which "owns" the collection, or which "has-a" collection, is on the left side of the two diagrams. The individual elements which make up the collection are represented by the "item" class on the right side. The relationship between the items and the collection class is one of composition. When the collection goes out of existence so do the items in it.Figure 5. Collection Stereotype Class Diagram.
Chen, P.P. The Entity relational model - Towards a Unified View of Data. ACM Transaction on Database Systems. Vol. 1, No 1, 1976, pp 9-36.
Grady Booch, James Rumbaugh, and Ivar Jacobson. The Unified Modeling Language User Guide. Addison-Wesley: 1999.
Martin Fowler with Kendall Scott. UML Distilled. Second Edition. A Brief Guide to the Standard Object Modeling Language. Addison-Wesley: 2000.
GENTECH. Genealogical Data Model, Phase 1. A Comprehensive Data Model for Genealogical Research and Analysis. May 29, 2000.
Mills, Elizabeth Shown. Evidence! Citation and Analysis for the Family Historian. Genealogical Publishing Company: Baltimore, Maryland. 1997.