GDMUML Specification

Revision 0.4, 31-December-2002

Stan Mitchell


  1. What is GDMUML?
    1. Conventions Used in This Document
  2. GDMUML Submodels
    1. Evidence Submodel
    2. Administrative Submodel
    3. Conclusional Submodel
    4. Utility Submodel
  3. Classes of the Evidence Submodel
    1. SourceInstance
    2. SourceLevels collection
    3. SourceLevel
    4. LowestSourceLevel
    5. CitationPart
    6. Representation
    7. Repository
    8. SourceGroup
    9. Evidence Submodel Class Diagram
    10. Evidence Submodel Object Diagram
  4. Classes of the Administrative Submodel
    1. Researcher
    2. Project
    3. SuretyScheme
    4. SuretySchemeParts collection
    5. SuretySchemePart
    6. ResearchObjective
    7. Activity
    8. Search
    9. DataEntry
    10. Analysis
    11. Import
    12. Export
    13. Report
    14. Archive
    15. Restore
    16. AdministrativeTask
    17. Administrative Submodel Class Diagram
  5. Classes of the Conclusional Submodel
    1. Assertion
    2. AtomicAssertion
    3. Assertions collection
    4. Subject
    5. Characteristic
    6. CharacteristicParts collection
    7. CharacteristicPart
    8. Event
    9. Group
    10. Persona
    11. Conclusional Submodel Class Diagram
  6. Classes of the Utility Submodel
    1. Place
    2. Date
    3. Collection
    4. Collection Stereotype Class Diagram
  7. Glossary
  8. References

1. What is GDMUML?

GDMUML is a representation of the GENTECH Genealogical Data Model using the UML. It takes the GENTECH model (see GENTECH, 2000) as a starting point and aims to preserve the semantics of the original model. The GENTECH model is represented as an Entity-Relationship model. This is a well established tool used for database design, developed by Peter Chen (see Chen, 1976). This is an appropriate method for designing an RDBMS to store genealogical data.

The UML can also be used to model logical database designs. However, GDMUML does not do that. Instead, it focuses on system object modeling. An example will help differentiate the perspectives. The RESEARCH-OBJECTIVE entity in the GENTECH model specifies a primary and foreign key, indicating that it is represented as records in a database table, each with a matching record in a different (foreign) database table, the PROJECT table.

By contrast, GDMUML defines two classes ResearchObjective and Project. It does not specify how these might be implemented and thus avoids introducing database terminology. However, it does preserve the relationship between the objects, by indicating that the classes are associated and that one Project may have one-or-more ResearchObjectives. This approach defers to the implementation stage, the decision about how to make the objects persistent. Instead it focuses on creating clearly defined classes that represent the objects in the system.

The first step in developing a model is to discover the vocabulary of the domain (see Booch et al, 1999). The entities in GENTECH's model almost map one-to-one to classes in GDMUML. The main exceptions are the "associative entities" which serve to tie together several database tables, in many-to-many relationships. Some of these relationships imply the existence of additional classes.

Once the classes have been defined, identifying the relationships between them is important. This further helps to understand the role of each class in the system and what its responsibilities are. These relationships are best conveyed with a static class diagram. The class diagrams created here are from the conceptual perspective (see Fowler, 2000). Thus, as with the GENTECH model, the GDMUML model is not meant to be used to implement a desktop application, a database, or a XML schema. However, it is hoped that GDMUML might serve as the basis for these kinds of implementation models.

1.1 Conventions Used in This Document

A GDMUML class is displayed in bold italics, the first letter of each noun capitalized, e.g. Source and SourceInstance. A GENTECH entity is displayed with all characters capitalized and bold, e.g. SOURCE or REPOSITORY-SOURCE.

2. GDMUML Submodels

GDMUML follows GENTECH's subdivision of the model into submodels. These include the Evidence submodel, the Administrative submodel, and the Conclusional submodel. The submodels are not independent of one another but they do serve to group together classes that describe major portions of the system. The Utility submodel has also been added to group those classes that provide a supporting or utility role.

2.1 Evidence Submodel

The Evidence submodel is concerned with the classes that describe the acquisition of data. Data may come into the system from quite varied sources. The system needs to record where this data originated. This data also feeds into the Conclusional submodel to provide the basis for creating deductions.

2.2 Administrative Submodel

The Administrative submodel is concerned with the classes that describe the conduct of research. What projects are defined and what are their objectives? Which researchers are assigned to which projects? What research tasks are scheduled to be completed? Which sources require searches? These are the issues this submodel deals with.

2.3 Conclusional Submodel

The Conclusional submodel is concerned with the classes that describe the assertions made from the acquired data. Assertions can be made from direct evidence or from other assertions. An important aspect of the GENTECH model is its support for creating an audit trail of the deductions which are reached.

2.4 Utility Submodel

The Utility submodel contains utility classes such as Place and Date, which are used by the other submodels. It also has has a class stereotype that models a generic collection.

3. Classes of the Evidence Submodel

3.1 SourceInstance

A SourceInstance represents the origin of an item of data. An instance of this class records the specific location of a piece of data. The location is recorded in a SourceLevels collection. Each SourceLevel item is logically equivalent to a component of a bibliographic entry or citation.

The current convention (see Mills, 1997) is to document the origin of a fact using a bibliography and a source note. A bibliographic entry is used as a roadmap to locate a source. The bibliographic entry doesn't refer to a specific fact. It is a summary representation of the source for the reader's convenience. Each SourceInstance will have this information recorded in the SourceLevels collection that it contains. A bibliographic entry can be created from a subset of the SourceLevels collection, but the object model does not define a class to represent it.

Similarly, the Mills style guide (see Mills, 1997) for source notes has recommended formats for full citations and short citations. The citation points the reader to the location of a particular fact in the source. As such, it is more detailed than its bibliographic counterpart and it uses more of the SourceLevels collection to record this information. Various citation formats can be created from a subset of the SourceLevels collection, but the object model does not define classes to represent them.

More than one copy may exist for a given source and all of the potential copies are instances of the SourceInstance class. For example, for a given census many microfilm copies may exist, some may be of better quality than others. In some cases, each visit to a source may represent a separate SourceInstance. For example, a tombstone examined 10 years ago may have revealed details that are no longer visible today.

It may be convenient to create a SourceInstance that represents the physical source which is examined in a Repository. For example, a copy of a book or a reel of microfilm. Since it is not used to record a specific fact, the SourceLevels collection would contain a set of elements similar to those needed for a bibliographic entry. Such a SourceInstance would be useful for planning Searchs, recording the results of a failed Search, or for recording a source for which a Repository is sought. Such a SourceInstance could serve as a generalization (superclass) of all references to this physical copy.

The GENTECH model's REPOSITORY-SOURCE entity has semantics similar to SourceInstance. Like the GENTECH entity, it holds information which describes this particular copy of the data source, e.g. its call number and a description of the condition of the copy that was examined.

3.2 SourceLevel

A SourceLevel is logically equivalent to a component of a bibliographic entry or citation.

An example will make this clearer. The 1870 U.S. census for Livingston County, Kentucky, has an entry for the William Sharp household. The particular piece of data that is to be represented is the age of his spouse, Delila at the date of the census. To completely record this, a hierarchy of SourceLevels, such as these might be used:

  1. Kentucky, Livingston County.
  2. 1870. U.S. census population schedule.
  3. Micropublication M593, roll 482
  4. Washington: National Archives
  5. Carrsville Precinct No. 5
  6. Page 175A
  7. Dwelling 313, Family 321 (William Sharp household)
  8. line 13 (Delila Sharp)
  9. column 4, (Age at last birthday)
  10. 45

Each of the items in the above list are instances of class SourceLevel. SourceLevels form an ordered list, ranging from general to specific. The most specific SourceLevel is a subclass called LowestSourceLevel. It is the SourceLevel from which assertions are commonly generated.

Each SourceLevel may have references to these classes:

The GENTECH SOURCE entity is represented by GDMUML's SourceLevel class.

3.3 SourceLevels collection

This class represents the collection of SourceLevels. In general, it could be one of several container classes, although in this case the hierarchical and ordered nature of the collection suggests a tree structure. The SourceLevels collection divides the Evidence submodel into classes which deal with SourceInstances as a whole versus classes which interface with SourceLevel items, the component parts of SourceInstances.

3.4 LowestSourceLevel

The SourceLevels collection is a continuum of progressively more detailed items, ending at the specific datum being referenced. This last instance is the LowestSourceLevel.

3.5 CitationPart

Each SourceLevel may have an instance of CitationPart associated with it. This class simply holds a string and type identifier for a component of a citation or bibliographic entry.

This class corresponds to the GENTECH CITATION-PART and CITATION-PART-TYPE entities.

3.6 Representation

An instance of the Representation class corresponds to a physical or electronic copy of a SourceInstance. Examples include, a disk file which is a digital camera image of a tombstone, the text of a transcription of a tombstone inscription, reference to a physical file number containing an original photograph, and a xerographic copy of a census page.

Although this meaning of Representation corresponds to the SourceInstance as a whole, it is possible to have many Representations that correspond to different SourceLevels. For example, in the case of the census example used above, if a photocopy of page 175A was a Representation, then the SourceLevel corresponding to the page reference would have this Representation instance. However, a text extract of the census information for the age column could also be a Representation for the LowestSourceLevel.

This class corresponds to the GENTECH REPRESENTATION and REPRESENTATION-TYPE entities.

3.7 Repository

This class represents the place where an instance of SourceInstance was found. Some examples include: a library, a website with online census images, and a cemetery. This class corresponds to the GENTECH REPOSITORY entity.

3.8 SourceGroup

This class represents a collection of SourceInstances. The criteria for grouping the SourceInstances in the collection, is defined by the Researcher. A SourceInstance may belong to more than one SourceGroup. This class is closely tied to the Administrative submodel, since it is an organizational aid. This class corresponds to the GENTECH SOURCE-GROUP and SOURCE-GROUP-SOURCE entities.

3.9 Evidence Submodel Class Diagram

The class diagram illustrates the following class relationships:

Figure 1. Evidence Submodel Class Diagram.
evidence3.jpg (118K)

3.10 Evidence Submodel Object Diagram

An object diagram shows the relationships between real-world instances of each of the classes. This example portrays a Researcher's visit to the Pacific Region's NARA (a Repository) located in San Bruno, CA (a Place). A reel of microfilm was located for the 1870 U.S. census for Livingston County, Kentucky (a SourceInstance). Using a microfilm reader, page 175A was found and a photo-copy of the image of the page was made and this was stored as a hardcopy in file number 11052000 (a Representation).

The Delila_Age_1870_Census instance of SourceInstance contains a SourceLevels collection which contains SourceLevel items. Note that in the diagram, only 7 of the 10 SourceLevel items are shown.

The SourceLevels are mapped to the levels of the bibliographic entry, citation, and excerpt. Each SourceLevel has a Researcher associated with it. Also, each SourceLevel has a CitationPart associated with it corresponding to one of the citation components. Some SourceLevels have a Place associated with them, e.g. the jurisdiction location of the census (Livingston County, Kentucky), and the location of the publisher of the microfilm, (Washington). Finally, two SourceLevels have Representations. The photocopy corresponds to the "page" SourceLevel, and a text excerpt would correspond to the LowestSourceLevel.

Figure 2. Evidence Submodel Object Diagram.
evidence3_obj.jpg (139K)