Semistructured Data Model

For many years, the relational data model has dominated database design. In relational databases, information is represented in terms of mathematical relations between items and their properties (which are often called "entities" and "attributes"). A relation is normally shown in the form of a table with rows and columns. In most relational database systems, each table column represents a different property and each row represents a different item. Thus, a table cell at the intersection of a row and a column contains the value of a property for a given item, as shown in this simple example:

Item Number Type Color Length Weight
1 Type C Red 6 cm 1.35 kg
2 Type A Green 9 cm 2.12 kg
3 Type A Blue 2 cm 0.5 kg
4 Type D Red 5 cm 2.1 kg
5 Type B Yellow 4 cm 1.3 kg

This way of using relational tables is not actually required by the relational data model, in which tabular relations can represent more abstract concepts and relationships. However, the relational model lends itself to this highly structured way of organizing information in terms of predefined classes of items that possess common properties. Thus, most relational systems adopt a "class-based" approach to organizing information, as opposed to an "item-based" approach in which each item of interest and each property of an item is an independent structural unit within the database. However, class-based tables are not well suited to representing relatively unstructured or idiosyncratic information, which is often best represented by means of flexible hierarchies of independent items with varying sets of properties. Many branches of scholarship are rife with complicated spatial, temporal, and logical hierarchies and loosely structured texts, which are hard to represent in a tabular fashion as predefined classes of items. They call for an item-based approach rather than a class-based approach.

For this reason, OCHRE’s database structure is fundamentally item-based rather than class-based. It does not rely on the relational data model but instead makes use of the "semistructured data model", which is much better suited to a hierarchical, item-based design (although, in principle, an item-based design could be implemented in a relational system). Semistructured database systems such as OCHRE employ the Extensible Markup Language (XML) and the XML Query Language (XQuery), which can easily accommodate loosely structured hierarchies while also being able to accommodate highly structured tables.

The growing importance of the semistructured data model, and of XML as a means to implement it, is apparent in the recent incorporation of the XML data format and querying mechanisms within leading database platforms such as Oracle and IBM’s DB2, as an alternative to the relational data model and SQL. These database platforms are no longer purely relational but also support the semistructured data model. Large numbers of XML data objects ("documents") can now be stored, indexed, and queried efficiently within sophisticated and highly scalable database systems, like OCHRE, that are built upon widely available database platforms which implement the XML standards published by the World Wide Web Consortium (i.e., XML Schema and XQuery).