Initial Data Model

Version 0.1, by Zach Tomaszewski

for ICS 691-3, Fall 2001, taught by Dr. Joan Nordbotten


The following in an (very) initial data model for an animal classification system. This system contains multimedia records for many types of animal. The system allows for the construction of a variety of classifications by using Node objects. By adding new classifications or exploring existing ones, users can examine a variety of relationships between animals. They can alternate classifications depending on their current information need. Users can be anyone with an interest in animal classification, from junior high students and amateur enthusiasts to researchers and zoological taxonomist's.

SSM Data Model

An SSM model

Description of Entities

Classification System Entity
id<dec(10)> -- a unique system id for each entity
name<vchar(60)> -- the name of this entity (see below for more)

These attributes (as well as the relationship to Descriptions, discussed later) are common to CS Root, CS Node, and Animal, so it seemed wise to make one superclass for them all. However, it may turn out more advantageous to just define id and name within these entities, rather than inheriting them.

Classification System Root

CS Root serves to placehold the top of a classification system. For example, the well-known Kingdom-Phylum-Order-Family-Genus-Species classification is known as the Linnaean System. If this system was being modeled, would likely equal "Linnaean System".

Classification System Node
type<vchar(40)> -- describes the taxon or classification level

The Nodes will form the bulk of the classification structure, and possible be the most common record type in the database. Type serves to describe the level or taxon of the node. For example, if this were a node somewhere between a "Linnaean System" Root and a "polar bear" Animal record, the name may equal "Ursus" and the type equal "Genus." (It would then Contain another node, type="Species" and name="maritimus", which would then Contain the "polar bear" record.)

Nodes are weak entities so that they will disappear when they cease to have a parent (are no longer "Contained" by another Entity).

image<image> -- a picture of the animal type

I have called this entity Animal, but there is no reason that plants or fungi or other types of life (or even non-life!) could not use this record in the same way. Currently, I'm keeping things simple by only having an image. Other options (in future versions of the database) include a video of the animal or an audio file of the call or sound it makes (this would be especially useful for birds).

One known problem is what the name field should contain. I don't care to use the species because that ties the records to a Linnaean classification system and would be redundant. However, some sort of granularity really needs to be decided. In order for all systems to use the same records, this granularity should be quite fine. I think I will go with the animal's common name that corresponds to species granularity. (So "polar bear" rather than just "bear", "sea otter" rather than just "otter", etc.)

Entity Description<vchar(1000)> -- a description of the entity; this may better served as a link to a document or text object, rather than as variable character field.
Entity Author<vchar(50)> -- the name of the person who designated or created this entity
Entity Date<date> -- the date the entity was discovered or created
[Entity Resources][reference to bibliographic record] -- resources to substantiate claims made under Entity Author and Entity Date
Description Author<vchar(50)> -- the name of the person who wrote this description record
Description Date<date> -- the date this record was added to the system.
[Description Resources][reference to bibliographic record] -- resources used in writing the Entity description

Description objects currently need much more thought! First off, these fields do not apply to all CS Entities or the information may not be available, so they may have to be optional. (As discussed in Relationships, the Description entity itself is already optional.) Secondly, the fact that I feel I should list the references here to demonstrate different relationships is likely a sign I need to resign this whole side of the chart.

The idea behind Entity Author and Entity Date is for recording when a Species or Genus was first published. (This is important information in zoological taxonomy.) The Entity Resources would be links to those works. Also, the system may eventually allow users to create their own classifications. In this case, they may like to record their names and a description of each node or the system as a whole, as well as provide a resource for further information. However, none of this applies very well to Animals.

The Description Author and Date serve to tell when this information was added to the system. This is a way to know how current the information is or who to blame if it is incorrect. Description Resources would be the resources they used to compiling that information (so authors can show they aren't really the ones to blame after all!) or to cite the source of the image or other materials.

Though there is some overlap, after examining these problems here, a better implementation may be to split this object in two. An Entity Desription could describe Roots and Nodes and an Animal Description could describe Animals. I'll have to consider the ramifications of that, but on the surface it seems like a better idea.

Descriptions will likely continue to be weak entities since it makes no sense to have a description when what it describes no longer exists.

Bibliographic Record
Place of Publication/URL<vchar(50)>
Date of Publication<date>
Comments<vchar(250)> -- this would perhaps be better called Notes.

I think this is pretty clear. It's just your basic, simple bibliographic record.

Description of Relationships

E1: CS Entity(0,1) -- Only one Entity is described by a Description.
E2: Description(1,1) -- A Description describes 1 and only 1 Entity

E1: Description(0,n) -- A Description can have any number of source BRs.
E2: Bibliographic Record(0,n) -- A BR can be a source for any number of Descriptions.

Contains (Root to Node)
E1: CS Root(1,n) -- A Root must have at least one Node under it.
(Otherwise, there would be no point to the Root because the Root describes a classification system, which must consist of nodes.) Ideally, there would always be at least 2 Nodes directly under a root, but I'm not certain I want to restrict that system by making that a requirement.
E2: CS Node(0,1) -- A Node can have zero or one Root as its "parent." (See note below)

I don't know how to state that a Node can have only one parent (something that is "Contain"-ing it). At least, since it is a weak entity, it will disappear if it has no parent at all. However, it can have either a Root or a Node as a parent, but not both at the same time. I need an XOR operator of some kind.

Contains (Node to Node)
E1: CS Node(0,n) -- A Node can Contain any number of other Nodes. (It can't contain itself, but I'm not sure how to note that in SSM notation.)
E2: CS Node(0,1) -- A Node can have zero or one other Nodes as its "parent."

Contains (Node to Animal)
E1: CS Node(0,n) -- A Node can Contain any number of Animals.
E2: Animal(0,n) -- An Animal can have any number of Nodes containing it.

Again, there may be an XOR problem here. I haven't decided whether I want to allow a Node to Contain both Animals and other Nodes at the same time. It would mean that some Animals are not located at the "leaves" or finest points of the system. This could prove to be confusing. On the other hand, it may be a way to note that the Animals go under this Node somewhere, but it hasn't been determined yet just where. I could go either way with this.


Because of the redundant/recursive use of Nodes, this system can be a rather hard to imagine. Here's an example:

An example of two classification systems -- Linnaean and Climate -- using the same animal record.

As you can see, the root/node architecture allows for the construction of practically any number of classification systems. There can be any number of Node levels; indeed, the Climate Zone classification only uses one such level. (Other likely Nodes may be "desert", "rain forest", "tundra", "marine", etc.) Also, I have only shown one branch here of a tree-like structure. At each level of the Linnaean System there are a number of other Nodes--many Families under the same Order, many Species under the same Genus, etc. These systems all use the same animal records. (Not every classification system will use all of them, however; for example, perhaps someone wishes only to classify snakes on the basis of their venom and how they interact with known anti-venoms.)

Ideally, queries will be able to explore multiple classifications and join the results. For example, imagine someone compiling a identification guide for marine mammals. Using only the Linnaean System, they would have to sift through all mammals in order to find groups such as whales and dolphins (Order="Cetacea"). Using only a Climate Zone classification that includes a "Marine" zone, they would be swamped with fish, sponge, coral, and other irrelevant animal records. However, if they could intersect these results, they would find whales and dolphins, as well as sea otters, dugongs, and other marine mammals not easily found traversing only the Linnaean system.

As you can see, the usefulness of nodes is really served only as structural placeholders between the Root and the Animal (though the Nodes can each have their own descriptions as well, perhaps describing the common characteristics of the animals below, or the nature of the climate described by this Node). If the Root or a parent Node is removed, all inferior Nodes should also vanish.

I considered the possibility of allowing Nodes to have more than one parent. This would be especially advantageous for cross-classification linking. For example, when populating a "Marine" climate zone with Animals, a creator could just link directly to the Order="Cetacea" Node in the Linnaean system and have access to everything beneath that Node. However, this could lead to a very messy dependencies and a blurring of different classifications. It would be better to query Order="Cetacea" and add all the Animal records so returned directly to the new "Marine" climate node.

I hope these example shed some light on the nature of this system.


Some revision is certainly necessary, but I think I'm off to a good start so far. Comments or questions are appreciated!