[IMAGE of a bull's 

head watermark]

The WWW Watermark Archive Initiative

© 1997 Robert W. Allison and James A. Hart. All rights reserved.




Diagram of a Relational Database for a

WWW Archive of Paper Types and Watermarks

These diagrams represent entity-relationship and relational database designs for the individual archives.



[entity-relationship design for watermarks and paper]


[diagram 
of a relational 
database for watermarks and paper]



What do the arrows mean?

These diagrams may seem complex, at first. Since different disciplines approach the study of paper and watermarks from different points of view, we realized the system had to be not only complete, but also flexible, allowing different approaches to the data. So called "flat files" (a database in which all the date is in a single file, appearing as a single page on the WWW) is a cumbersome vehicle for any set of data as extensive and complex as that envisaged by the IPH Standard. A relational database design allows a medievalist, for example, to view information about scribes and illuminators apart from watermark descriptions. Likewise, a conservationist or forensics.expert can go directly to the the Paper Description File for information on the physical constitution of the paper such as ash content..

For those unfamiliar with database diagrams, you may want to study some straightforward explanations of the basic concepts of database design.

The following statements correspond, by letter, to the relationships depicted in the second diagram, above.

  1. An Organization can employ or house more than one Person, or the same Person in more than one Job (or role).
  2. An Organization can apply the physical content to more than one piece of Paper. For example, a printer can print more than one book. Since a piece of Paper can have more than one type of content or treatment, more than one Organization may be involved in the content of a single piece.
  3. Recalling from the Entity-Relationship diagram that there are multiple kinds of relationships between Organization and Source, i.e. binding, printing, and distributing, an Organization may be involved in producing more than one paper-bearing Source. For example, a publisher may publish more than one book, but a book may have only one publisher. Note that links B and C may both be involved in a single piece of Paper, because the content or treatment of a particular piece may be different than the majority of the other pieces in the paper-bearing Source.
  4. An Organization may keep more than one Facsimile in its collection, but a particular Facsimile may only be kept in one organization at a time. As designed, the database keeps no history of any movement of a Facsimile, but only its current location.
  5. An Organization may keep more than one Mould in its collection, but a particular Mould may only be kept in one organization at a time. As designed, the database keeps no history of any movement of a Mould; only its current location.
  6. A Person may have had more than one Job (or role), either in her/his lifetime, or concurrently, with the same Organization or with more than one.
  7. A Person may have been responsible for the Artistic content ( taken to mean any literary or artistic creation that is applied to paper ) of one or more Sources.
  8. A Person may have applied the Physical content to one or more pieces of Paper, or, perhaps, more than one type of content to the same piece. Note that, with this design, an "autograph" is identified by having the same Person responsible for both the Artistic and Physical content.
  9. A Watermark may have been reproduced one or more times in a number of different forms, e.g. a tracing and a dylux.
  10. An existing Mould may contain one or more Watermarks. As designed, this database assumes that the same information will be kept about paper watermarks and about the watermark designs in actual moulds. Should we want to keep other information about the mould watermark designs, a separate file, perhaps called Filigree, may be called for.
  11. An Organization may be responsible for the Artistic content ( "intellectual property", perhaps ) of more than one Source.
  12. A piece of Paper may have more than one kind of Physical content or treatment, for example, lettering and illustration.
  13. A piece of Paper may contain more than one Watermark.
  14. A Source may contain one or more pieces of Paper, but a piece may be part of only one Source, i.e. no compound sources.
  15. A Source may contain one or more sets of Artistic content. Examples of multiples: anthologies, hymnals, texts written by copyists and illustrated by artists.
  16. A Person may be associated with more than one piece of Paper. E.g., a Papermaker may have made more than one piece of paper; a Binder may have bound a book with many types of paper in it.
  17. An Organization (Papermill) may have made more than one piece of Paper.


What Constitutes An Archive?

We wrestled with a number of issues that led us to ask this question, including:

  1. Records in this data base include both raw data and analytical judgments. They thus constitute a publication about which different analyses might exist, and over time surely will exist.
  2. Security and data integrity becomes a problem if we permit multi-user update of all records.
  3. Because some fields call for judgments, we need to provide for recording of arguments, criteria and bases of judgment so that multiple people can submit differing analysis or theories for the same object. In order not to violate fundamental principles of relational design rules (by having multiple occurences of the same field in a single record), we need to have a separate "analysis" field included in the record to accommodate this need.
  4. Given the mixed nature of the data (raw and data resulting from analysis and judgment), we canot presume that once mounted, a particular archive will never change or be updated. We need to provide for regular update of data by the original submitter (institutional or individual). We envisage a system in which institutions maintaining archives will, behind the scenes, make corrections and changes and, ultimately, release a revised "edition."
  5. Since revisions might involve changing judgments, and since the record of such changes is itself important to those who want to review the history of past judgments in order to make their own in an informed way, some method of keeping a record of successive "editions" of the archive is desirable.
  6. The above observations all constitute a complex question of how the Archive should function as a publication medium. In addition to the above considerations, then, our design -- a single system -- must both meet the needs of researchers and guard publishers' credits and rights.

We concluded that the best solution is a system which provides for a completely separate "instance" of the database for each "submitter". In plain terms, no matter whether an archive is stored with others on the same machine, or whether the group of archives is maintained by a single institution or on different machines half a world away from each other, each will have its own, separate but similarly organized set of files. Since the search engine has to cope with databases on separate machines anyway, this requirement only changes the quantity of databases, not the nature of the searching/indexing problem.

NOTE: This approach does impose one additional, non-relational, but very "webby", requirement. Databases will have to be able to point to each other when two records contain information about the same object ("entity"). So, for example, if two scholars study the same piece of paper and enter data about it in their separate databases, the two records should be crosslinked, once the relationship is discovered. That way, a researcher, upon finding one record, will have immediate access to the other. So, we have added a "Same as..." section to the appropriate files for storing URLs to records about the same item.
We envisage institutions the world over maintaining what might be considered composite archives -- each consisting of individual archives prepared by various staff members, or by independent scholars who publish the results of their work through an affiliated institution. The records mounted in any archive will appear to the public under the logo and name of the hosting institution (or individual). The images and descriptions prepared by any institution or individual will "belong" to that agent according to international principles of intellectual ownership, and the institutions and creators of the data and images will have the right to receive credit for their publication.

We envisage a central registry of participating institutions, perhaps maintained by a board of overseers chosen from among the participating institutions and individuals. Such a board might also bear responsibility for maintenance of the standard.

Rationale

Some of the decisions that went into this design may not be obvious. Here, for both understanding and historical purposes, we document some of the discussion, in no particular order.


Go to:


Robert W. Allison, Requirements and presentation designer
Dept. of Philosophy & Religion, Bates College

and

James Hart, Database design, technical concepts, and programming
Information Services, Bates College Lewiston, Maine, 04240


Many thanks to Dexter Edge, University of London, for sharing his design work.

/Faculty/wm-initiative/db-diagram.html