The WWW Watermark Archive Initiative
© 1997 Robert W. Allison and James A. Hart. All rights reserved.
Diagram of a Relational Database for a
WWW Archive of Paper Types and Watermarks
These diagrams represent entity-relationship and relational database
designs for the individual archives.
What do the arrows mean?
These diagrams may seem complex, at first. Since different disciplines approach the study of paper and watermarks from different points of view, we realized the system had to be not only complete, but also flexible, allowing different approaches to the data. So called "flat files" (a database in which all the date is in a single file, appearing as a single page on the WWW) is a cumbersome vehicle for any set of data as extensive and complex as that envisaged by the IPH Standard. A relational database design allows a medievalist, for example, to view information about scribes and illuminators apart from watermark descriptions. Likewise, a conservationist or forensics.expert can go directly to the the Paper Description File for information on the physical constitution of the paper such as ash content..
For those unfamiliar with database diagrams, you may want to study some
straightforward explanations of the basic concepts of database
design.
The following statements correspond, by letter, to the relationships depicted in the second diagram, above.
- An Organization can employ or house more than one Person, or the same Person in more than one Job (or role).
- An Organization can apply the physical content to more than one piece
of Paper. For example, a printer can print more than one book. Since a
piece of Paper can have more than one type of content or treatment, more
than one Organization may be involved in the content of a single piece.
- Recalling from the Entity-Relationship diagram that there are multiple
kinds of relationships between Organization and Source, i.e. binding,
printing, and distributing, an Organization may be involved in producing
more than one paper-bearing Source. For example, a publisher may publish
more than one
book, but a book may have only one publisher. Note that links B and C may
both be involved in a single piece of Paper, because the content or
treatment of a particular piece may be different than the majority of the
other pieces in the paper-bearing Source.
- An Organization may keep more than one Facsimile in its collection,
but a particular Facsimile may only be kept in one organization at a
time.
As designed, the database keeps no history of any movement of a Facsimile,
but
only its current location.
- An Organization may keep more than one Mould in its collection,
but a particular Mould may only be kept in one organization at a time.
As designed, the database keeps no history of any movement of a Mould;
only its current location.
- A Person may have had more than one Job (or role), either in her/his
lifetime, or
concurrently, with the same Organization or with more than one.
- A Person may have been responsible for the Artistic content ( taken to
mean any literary or artistic creation that is applied to paper ) of one
or
more Sources.
- A Person may have applied the Physical content to one or more pieces
of Paper, or, perhaps, more than one type of content to the same piece.
Note that, with this design, an "autograph" is identified by having the
same Person responsible for both the Artistic and Physical content.
- A Watermark may have been reproduced one or more times in a number of
different forms, e.g. a tracing and a dylux.
- An existing Mould may contain one or more Watermarks. As designed,
this database assumes that the same information will be kept about paper
watermarks and about the watermark designs in actual moulds. Should we
want to keep other information about the mould watermark designs, a
separate file, perhaps called Filigree, may be called for.
- An Organization may be responsible for the Artistic content (
"intellectual property", perhaps ) of more than one Source.
- A piece of Paper may have more than one kind of Physical content or
treatment, for example, lettering and illustration.
- A piece of Paper may contain more than one Watermark.
- A Source may contain one or more pieces of Paper, but a piece may be
part of only one Source, i.e. no compound sources.
- A Source may contain one or more sets of Artistic content. Examples
of multiples: anthologies, hymnals, texts written by copyists and
illustrated by artists.
- A Person may be associated with more than one piece of Paper. E.g., a
Papermaker may have made more than one piece of paper; a Binder may have
bound a book with many types of paper in it.
- An Organization (Papermill) may have made more than one piece of Paper.
What Constitutes An Archive?
We wrestled with a number of issues that led us to ask this question,
including:
- Records in this data base include both raw data and analytical
judgments. They thus constitute a publication about which different
analyses might exist, and over time surely will exist.
- Security and data integrity becomes a problem if we permit multi-user
update of all records.
- Because some fields call for judgments, we need to provide for
recording of arguments, criteria and bases of judgment so that multiple
people can submit differing analysis or theories for the same object. In
order not to violate fundamental principles of relational design rules (by
having multiple occurences of the same field in a single record), we need
to have a separate "analysis" field included in the record to accommodate
this need.
- Given the mixed nature of the data (raw and data resulting from
analysis and judgment), we canot presume that once mounted, a particular
archive will never change or be updated. We need to provide for regular
update of data by the original submitter (institutional or individual). We
envisage a system in which institutions maintaining archives will, behind
the scenes, make corrections and changes and, ultimately, release a revised
"edition."
- Since revisions might involve changing judgments, and since the
record of such changes is itself important to those who want to review the
history of past judgments in order to make their own in an informed way,
some method of keeping a record of successive "editions" of the archive
is desirable.
- The above observations all constitute a complex question of how the
Archive should function as a publication medium. In addition to the above
considerations, then, our design -- a single system -- must both meet the
needs of researchers and guard publishers' credits and rights.
We concluded that the best solution is a system which provides for a
completely separate "instance" of the database for each "submitter". In
plain terms, no matter whether an archive is stored with others on the
same machine, or whether the group of archives is maintained by a single
institution or on different machines half a world away from each other,
each will have its own, separate but similarly organized set of
files. Since the search engine has to cope with databases on
separate machines anyway, this requirement only changes the quantity of
databases, not the nature of the searching/indexing problem.
- NOTE: This approach does impose one
additional, non-relational, but very "webby", requirement. Databases will
have to be able to point to each other when two records contain
information
about the same object ("entity"). So, for example, if two scholars study
the same piece of paper and enter data about it in their separate
databases, the two records should be crosslinked, once the relationship is
discovered. That way, a researcher, upon finding one record, will have
immediate access to the other. So, we have added a "Same as..." section
to the appropriate files for storing URLs to records about the same item.
We envisage institutions the world over maintaining what might be
considered composite archives -- each consisting of individual archives
prepared by various staff members, or by independent scholars who publish
the results of their work through an affiliated institution. The records
mounted in any archive will appear to the public under the logo and name
of the hosting institution (or individual). The images and descriptions
prepared by any institution or individual will "belong" to that agent
according to international principles of intellectual ownership, and the
institutions and creators of the data and images will have the right to
receive credit for their publication.
We envisage a central registry of participating institutions, perhaps
maintained by a board of overseers chosen from among the participating
institutions and individuals. Such a board might also bear responsibility
for maintenance of the standard.
Rationale
Some of the decisions that went into this design may not be obvious.
Here, for both understanding and historical purposes, we document some of
the discussion, in no particular order.
- Collections may contain paper bearing objects, facsimiles of
watermarks, actual paper moulds or any combination of the three. There is
no separate Collection file, because the combination of the Organization
data, and a Collection field (for the name of a special collection) in the
Source, Facsimile, and Mould files is sufficient for complete
identification.
- Mould parameters are descriptors which apply both to paper and to
existing moulds. They aren't entities in and of themselves. So, these
fields ("mould parameters") are included in both the Paper and Mould
files. The Mould Parameter file was eliminated.
- The nature of the relationship between watermark and paper depends on
whether one considers watermarks to be uniquely associated with a sheet,
or common to many sheets in many sources. We have depicted it as if they
are considered unique. This unique relationship is also important because
of the connection to facsimiles of the watermark.
- By our current interpretation, Bibliographical references are
collected on a single, growing page maintained by the person(s)
responsible for the particular archive. We did not feel that we wanted to
get into the business of developing a standard, centralized bibliography
with all the complexities of formatting and data entry that such a beast
would entail.
-
NOTE: Since an Archive is defined as one, discrete set of records created
by any one person or team of persons responsible for it, (see What Constitutes an Archive, above) all bibliographical
references (links) within a particular Archive will be to a single
Bibliography page maintained within that archive. We leave matters of
format to the individual Institutions or project directors.
- Since a collection of watermark facsimiles may be electronic (isn't
that what this is all about? :-), the location may be a URL.
- URLs, as database addresses, pose a severe problem if an entire
database needs to be moved to a new location. All references to records
in that database will, immediately, become invalid. To deal with this
problem, we will be proposing a world-wide registry for watermark
databases which is both human readable, and machine addressable. This
registry would contain the names and the URL base addresses for the
databases and have a program that could be called as a CGI which would
return the appropriate web page given the name of the database and the
partial path name of the file, rather than the full URL.
Go to:
Robert W. Allison,
Requirements and presentation designer
Dept. of Philosophy & Religion, Bates College
and
James Hart, Database design, technical
concepts,
and programming
Information Services, Bates College Lewiston, Maine, 04240
Many thanks to Dexter Edge, University of London, for sharing his design
work.
/Faculty/wm-initiative/db-diagram.html