Project Implementation Proposal

Data Model v0.4, by Zach Tomaszewski

for ICS 691-3, Fall 2001, taught by Dr. Joan Nordbotten


Table of Contents


Introduction

The following in a data model and beginning implementation plan for an animal classification system. This system contains multimedia records for many types of animal. The system allows for the construction of a variety of classifications by using Node objects. By adding new classifications or exploring existing ones, users can examine a variety of relationships between animals. They can use alternate classifications, or even intersect different classifications, depending on their current information need.

Context

I imagine this system as a web-based, open-access system. It will be an animal encyclopedia with text and images of specific animal species. It can serve as a partial index to related works, especially since cited resources can be URLs as well as print materials. It will be a taxonomy modeling tool. Most importantly, the combination of multiple user-created classifications will make it a powerful searching and metadata system.

The system would probably be setup and maintained by either a volunteer group, non-profit organization, or university. There should be a graphical Web interface so that users can browse, see classification structures, and get the full benefit of the images and URL links to other sources. The system will be implemented using PostgreSQL.

Users

Possible users include:

Users such as high school students and some amateur enthusiasts will likely use the system mainly as an encyclopedia to find more information on certain animals. Such uses do not involve building new classifications or adding animal records. These users will be Users in terms of the SSM below.

Other amatuer enthusiasts may compile information from various sources and create new animal records or edit current ones. Taxonomists may use the system to build models of their work. Biology researchers may add their findings to animal records. They may also use the system to examine possible relationships made visible through intersecting classficiations, such as effect of climate on gestation period. These users that create new records will be Creators in terms of the SSM below.

Sample Queries

Here are some example information needs that illustrate the system.

1. Are most of the animals that live in the arctic white in color?
User: a high school student
Required classifications: Climate (one zone of which is "arctic")
Comments: This search would simply involve examining the images of the animals contained by the Arctic node of the Climate classfication.
2. Give me a list of marine mammals.
User: biology author researching for an identification guide
Required classifications: Linnaean; Climate (one zone of which is "marine")
Comments: This search would involve intersecting the set of the animals contained by all nodes under the Mammalia node in the Linnaean classification with the set of the animals under the Marine node in the Climate classification.
3. Do state birds predominantly come from one certain family or order?
User: a bored amateur birdwatcher
Required classifications: Linnaean; State Birds (simply 50 nodes, each containing a bird)
Comments: This search would involve intersecting the set of the animals contained by all nodes under the Aves node in the Linnaean classification with the set of the animals under the State Birds root. (Ideally, when results are displayed graphically, the tree structure of Orders, Families, and Genuses can be maintained.)
4. What is the difference between the arctic and the tundra climate zones? And where can I find more information?
User: a generic user confused by the Climate classification
Required classifications: Climate classification.
Comments: This search would involve pulling up the description fields for the Arctic and Tundra nodes. The keywords fields may also provide some help. If this isn't enough information, hopefully the creator of these nodes cited some helpful, related resources that the user could examine.
5. Hey, who screwed up my Ursus maritimus animal record?!
User/Creator: an irrate Arctic biologist who granted access to her record to all other creators
Required classifications: None.
Comments: This search would need only pull up all creator records that share an edited relationship with this animal record.

SSM Data Model

Complete SSM model

-- SSM model --

Description of Entities

Animal
id<integer> -- a TreeEntitiy foreign key
species_name<char(60)> -- the Latin Genus-species name for this animal
common_name<char(60)> -- the common English name for this animal.
keywords<vchar(200)> -- keywords describing this animal to aid in searching
Description<text> text icon -- a description of the animal, perhaps including such things as physical characteristics, habitat, main diet, etc.
Image filename: <char(30)> -- the name of the image file
pixel-width: <integer> -- the width of the image in pixels
pixel-height: <integer> -- the height of the image in pixels
format: <char(4)> -- {JPEG | GIF | PNG | BMP}
comments: <vchar(500)> -- any comments on the image
image: <image> image icon -- a picture of the animal type

Root
id<integer> -- a TreeEntitiy foreign key
classif_name<char(60)> -- the name of the classification system stemming from this root
keywords<vchar(200)> -- keywords describing this classification to aid in searching
scope_all<boolean> -- whether this classification aims to include all animals (as opposed to just a subset)
description<vchar(1000)> -- a description of this classification system

Node
id<integer> -- a TreeEntitiy foreign key
taxon_name<char(60)> -- the name of the particular taxon or group or classification level represented by this node
type<char(40)> -- describes the type of taxon or the level within the classification system
keywords<vchar(200)> -- keywords describing this node to aid in searching
Description<vchar(1000)> -- a description of this entity

TreeEntity
id<sequence> -- a unique system id for this entity

Resource
id<dec(12)>
author<char(50)>
title<char(200)>
publisher<char(60)>
location_or_URL<char(100)>
pub_date<date>
notes<vchar(250)> -- notes or comments about this resource

User
username<char(20)> -- a unique username
age<integer> -- assumes that centenarians are not web-surfing taxonomists
education<char(11)> {ELEMENTARY | MIDDLE | HIGH SCHOOL | COLLEGE | GRAD | TEACHER | PROFESSOR}
nationality<char(30)>
email<char(30)>
created<date>
last_accessed<date>
time_inactive<derived> -- how long it has been since this user logged in

Creator
password<char(12)>
name<char(50)>
address<char(120)>
release_prefs<char(8)>{USERNAME | NAME | EMAIL | ADDRESS} -- how much information to release to other users; each choice includes those before it.
greeting<vchar(750)> -- an introductory message to other users of the system

Description of Relationships

RcontainsN
E1: Root(1,n) -- A Root must contain some Nodes.
E2: Node(0,1) -- A Node may or may not be contained by a Root

NcontainsN
E1: Node(0,n) -- A Node may or may not contain other Nodes.
E2: Node(0,1) -- A Node may be contained by another Node.

NcontainsA
E1: Node(0,n) -- A Node may or may not contain Animals
E2: Animal(0,n) -- An Animal may be contained by any number of Nodes, or none at all.

cite
E1: TreeEntity(0,n) -- An entity may cite any number of Resources.
E2: Resource(1,n) -- A Resource must be cited at least once.

edited
E1: Creator(0,n) -- A Creator may edit any number of entities.
E2: TreeEntity(0,n) -- An entity can be edited any number of times.
time<timestamp> -- The time and date the new TreeEntity was edited.

opened
E1: Creator(0,n) -- A creator may not have created an entity yet; if he has, he can create any number of them
E2: TreeEntity(1,1) -- Each set of access permissions is related to only a single Entity.
E3: Creators(0,n) -- A Creator grants access to any number of Creators, 0 being all.
time<timestamp> -- The time and date the new TreeEntity was added.

For more information on the above entities and relationships, see v0.3. The examples there may also make the system easier to understand.

Implementation

Tables

I forsee the need to create a table for each SSM entity and relationship above, plus a table for Image objects. Thus, I will have the following tables:

Here are some example CREATE statements from this system.

CREATE SEQUENCE teid_counter;

CREATE TABLE tree_entities (
   teid INTEGER DEFAULT nextval('teid_counter') PRIMARY KEY,
   entity_oid OID
);

CREATE TABLE roots (
   id REFERENCES tree_entities PRIMARY KEY,
   classif_name char(60),
   keywords varchar(200),
   scope_all boolean,
   description varchar(1000)
);

CREATE TABLE r_contains_n (
   root_id REFERENCES roots,
   node_id REFERENCES nodes UNIQUE
);

Indices

I plan to index the following columns for faster retrieval (in order of imporance):

If possible, it would be nice to also form indices of keywords from the the content of all the keywords and descriptions fields, as well as from animal text objects.

Functions

Certain functions will also be required. They should do the following:

get_contained_animals(TEid)
recursively return all animals under this node (or root)
get_contained_nodes(TEid)
recursively return all nodes under this node (or root)

Conclusion

Comments or questions are appreciated!




Appendices

Thank you for reading this far! The following is a working list of notes, thoughts, and running commentary concerning current problems and future developments. Though not officially part of this report, the content of these appendices delimit the known problems . Thus comments on these points would be most helpful.

Known Design Issues and Future Ideas

New Issues

Implementation of Creator:

After reading more on SQL3, I have learned the GRANT, REVOKE NOTIFY, and LISTEN commands might be possible replacements to the opened and edited relationships. GRANT does require a sequence, but this could be (relatively) easily incorporated as the TreeEntity ID. However, I do not know if this access would be inherited from this Entity to Node, Root, and Animal, which is where it is actually needed.

Benefits of switching:

Drawbacks of switching:

This needs more study.

Update: After further study, I realized that GRANT applies to the sequence relation (a counter basically), and not to the rows numbered using that sequence. Thus, after all that, I realized I can not use GRANT to control access to the entity objects of the system because they will correspond to table rows and not distinct tables.

Arrays rather than contains relationship tables?

At first this seemed like a good idea, but after further consideration I realized it would make determining parents impossible. Admittedly, working up a tree will be much more rare, but it still seems a necessary feature.

OIDs not the answer.

Just as this page was going to "press" (this project is ongoing and doesn't seem to stop for progress reports) I realized that, while OIDs are unique identifiers for rows, I can't seem to use them without also knowing the table they're in. This was the reason I started using them in the first place; I do not yet know how to determine which table (roots, nodes, or animals) a certain entity (row) is in. I suppose could just SELECT FROM all three tables; because the teid is unique, the id will occur in only one of the tables. The other alternative is to include a table-identifying column in tree_entities.

From v0.3's Known Design Issues:

Entities

Animal:

Root:

Resource:

User:

Relationships

contains:

opened:

Wishlist for Version 2.0

See v0.3's Wishlist.