|
Fully Communication Oriented Information Modeling G. P. Bakema, J. P. C. Zwart, H. van der Lek The basic NIAM philosophy is: information analysis intends to model the communication about a certain Universe of Discourse (UoD) but does not intend to model the UoD itself. Consequently all existing elements in NIAM must be consistent with this philosophy. If they are not they must be redefined. Furthermore NIAM must be extended with new elements, such as the complete redundancy free recording of the structure of the declarative sentences spoken by the user expressing (elementary) facts about the UoD. This implies also the need to be able to model complex identification structures occurring in this communication. In this paper we present Fully Communication Oriented NIAM (FCO-NIAM) as a possible solution covering all the above mentioned desirability's. We use a generic FCO-NIAM metagrammar (GenMG) which not only enables us to treat both FCO-NIAM and the Relational Model from a single point of view but also to design a simple CASE-tool architecture. This architecture allows the registration of information grammars (IG's) in these Data Models and the transformation of NIAM IG's into relational schemata including the sentence structures via simple updates in the population of the GenMG. Presently FCO-NIAM is both taught in college-level educations and used in practice on a broad scale in the Dutch NIAM-scene. 1 Introduction In a previous paper (Van der Lek, Bakema & Zwart, 1992) we presented Communication Oriented NIAM (CO-NIAM) as the most interesting representative of Communication Oriented Information Modeling. An English translation of this paper, titled Unifying object types and fact types: a practically and didactically productive theory, was handed out at the NIAM gUIde working conference 1993 held in Utrecht in the Netherlands. It is available from the authors on demand. The present paper deals with Fully Communication Oriented NIAM (FCO-NIAM). Like the previous paper it is written in the heuristic and example based style typical of didactical materials on the college-level. The ideas developed in both papers were given a formal basis in (Van der Lek, 1993). 1.1 Fundamental principles The following set of structural principles was the starting point for our work (besides the basic set of methodological NIAM-principles such as using natural language, concrete examples, user interviews and prescription based information modeling): P1:Communication principle: The purpose of information analysis is not to model the structure of the Universe of Discourse (UoD) itself but to model the structure of the communication about the UoD by the users. The product of an information analysis is an Information Grammar (IG), which formally describes on a type level the structure of the relevant fact stating sentences in the user communication about the UoD. P2': Conceptualization principle: An IG exclusively models aspects of the user communication the UoD. P2'': 100 percent principle: An IG deals with all the aspects of the user communication about the UoD. The formulations of principles P2' and P2'' are rephrasings in view of P1 of principles with the same name in (Griethuyzen, 1982). We can summarize P2' and P2" in one principle P2: P2: 100 percent conceptualization principle: An IG models all the aspects of the user communication about the UoD and nothing but the user communication about the UoD. See (Nijssen & Halpin, 1989, p. 11) P3: Redundancy free modeling principle: An IG models the user communication about the UoD in a redundancy free way. Each aspect of the communication about the UoD may appear only once in an IG. These principles imply that an information grammar (IG) should model the structure of the fact stating user sentence types also (at least for elementary facts) and in a redundancy free way. The main aspect of our work is an attempt to incorporate these principles in NIAM in a consistent way, because we felt strongly that in N-ary nested NIAM (Nijssen & Halpin, 1989) these principles had not been carried through completely. The way in which we did this in our previous paper was guided by three more principles: P4: Unification principle: All non-lexical object types are nominalizations of fact types. This principle solves the most serious violation of the communication principle. It ensures essentially that all non-lexical object types are populatable constructs, unifying fact types and non-lexical object types (including subtypes) into a single concept. P5: Substitution principle Elementary user sentences can be regenerated from the IG plus its label population (LP) by substituting either object type expressions (OTE's) or labels in the roles of fact type expressions (FTE's). We imposed this principle on ourselves in order to enable the user to verify the correctness of the modeled declarative sentences (i.e. fact expressions of elementary facts). The separate treatment of OTE's apart from FTE's is a direct consequence of the redundancy free modeling principle (P3). P6: Generic principle A Generic Meta Grammar (GenMG) is used which can contain IGs in various data models (NIAM, Relational, ...) as its population. We adopted this principle for theoretical, practical
and didactical reasons; theoretical: NIAM and the Relational Model use
different terminologies, yet have a lot in common; practical: improvement
of CASE-tool architecture; didactically: teaching information systems
methodologies in a generic way. In our previous paper we showed how a
relational representation of the GenMG allows us to generate a relational
schema from a CO-NIAM IG via ordinary updates on the GenMG population.
This is accomplished by the Group, Lexicalize and Reduce (GLR) algorithm.
In paragraph 3 we will treat the Group, Lexicalize and Reduce algorithm (GLR algorithm) and illustrate the behavior of FTE'S, OTF-'s and LP's under this algorithm. In paragraph 4 we will discuss some extensions to NIAM concerning complex identification structures encountered in practice (generalization, recursive identification structures and set and sequence types). These extensions have been proposed by others as well. We show how these identification structures can be modeled in FCO-NIAM and we indicate their treatment in relational schema derivation. In paragraph 5 we will consider specialization (subtyping). We illustrate FCO-NIAM subtype modeling and introduce a novel way of subtype identification, using an example which clarifies part of the structure of our GenMG. Unless stated otherwise all features in paragraphs 4 and 5 are presently supported in the prototypical CASEtool built by our students. In paragraph 6 we comment on the educational and
practical impact of FCO-NIAM.
In the dialogue analyst and user group sentences
of the same type together. FE2.1 and FE2.2 clearly express two facts of
the same fact type. FE3, FE4 and FE5 all belong to different fact
types. These fact types must be given names. This is standard NIAM practice
as well. In the example: FE1.1 and FE1.2 belong to fact type FLOOR, FE2.1
and FE2.2 belong to fact type EMERGENCY EXITS, FE3 belongs to ROOM, FE4
belongs to CAPACITY, FE5 belongs to EQUIPMENT and FE6.1 and FE6.2 belong
to FACILITY. In Figure 2.2 these fact types are listed together with one
of their sentences. Analyst and user next classify sentence parts as
either labels or object expressions (OE's). If a sentence
part is classified as a label then the name of the lexical object type
(LOT) the label belongs to must be given. If a sentence part is classified
as an OE then it is a nominalized fact and so the name of the corresponding
non-lexical object type (NOLOT) (or nominalized fact type) must
be given (qualification step). The OE must then be analyzed further: it
may contain other OE's or labels. OE's are grouped into types called object
type expressions (OTE's) the same way as FE's are. The remaining unclassified
part of the sentence is called the predicate. The predicate together
with place holders (indicators for the places where labels or OTE's
are to be filled in) is called a fact type expression (FTE)
or sentence type (ST). A fact type then is the collection of all
FTE's with the same meaning (according to the user) and the same place
holder pattern (Van der Lek, Bakema & Zwart, 1992). In fact types
the place holders are traditionally called roles.
We will not go into the methodology of the classification/qualification
procedure here. We only remark that the procedure is different front the
classification/qualification procedure in N-ary nested NIAM, yet not unlike
it in an algorithmic sense. The results of this classification/qualification
for each fact type of the example UoD are given in figure 2.2. A remark convening the terminology used here: In
previous papers( Van der Lek, Bakema & Zwart 1992; Van der Lek 1993A)
we used the wordt 'formulation' where we use 'expression' here, because
we consider it better English and because it stresses the point that sentences
putting facts into words are the means by which we communicate (express)
those facts. This is also why we prefer the term 'fact type expression'
to 'sentence type' here in contrast to our previous paper, although we
already noted the synonymity (hyponymity, really, since not all sentences
relevant in to UoD express facts) of these terms. Here and there we probably
won't be able to avoid mixing them.
![]() Figure 2.3: FCO-NIAM IG + LP After the classification/qualification a NIAM diagram can be drawn (Figure 2.3). These diagrams are quite like N-ary nested NIAM diagrams, differing from these in the following aspects:
To the diagram the standard constraints, such as uniqueness constraints, mandatory role constraints etc. must be added. These do not concern us here, as the standard NIAM procedure can he employed. Figure 2.3 has them drawn in. As for subtype constraints: see § 5. A few remarks must be made, however:
2.2 From an information grammar to sentences It is very easy to (re)generate the original user sentences from the Information grammar (IG) and its label population (LP). We use the algorithm of the substitution principle in order to do that: a) Start with a fact type expression (FTE), and one of the tupels of the population of the corresponding fact type. b) Its place holders must be filled in. b1) If the corresponding role is played by a lexical object type (LOT), then fill the label in directly.b2) If the corresponding role is played by a non-lexical object type (NOLOT), then fill in one of the object type expressions (OTE's) that may he used by that role. As each OTE will contain roles itself, continue with step b for the OTE.
a) The fact type expression is F4: " < 6 > has < 7 > seats." The tupel has '2, 1' as role value of role 6 (and '20' as role value of role 7). b1) < 7 > is played by a LOT. '20' may he directly substituted. b2) < 6 > is played by a NOLOT. We use O3: 'room < 4 >. < 5 >'. b1) < 5 > is played by a LOT. We substitute label '1' directly. b2) < 4 > is played by a NOLOT. We use O1.2: '< 1 >'. b1) < 1 > is played by a LOT. We substitute label '2' directly. When we capitalize the first letter we have indeed a user sentence: FE4: "Room 2.1 has 20 seats". This process can and has been completely automated.
Our CASE-tool generates these sentences from the IG and LP instantly.
A standard way of showing all sentences modeled
in an IG plus LP is depicted in figure 2.4. Here all OTE's have been substituted
until only roles played by LOT's are left. Instead of the role numbers
the names of the LOT's are shown (as in figure 2.2). To each of these
sentence types the label population is added. This form provides a convenient
shorthand for the user, who will not be familiar with the NIAM
diagram.
3 The GLR-algorithm for relational schema generation The purpose of this paragraph is to illustrate how an elementary FCO-NIAM information grammar (including FTE'S, OTE's and LP's) can be transformed into a relational schema by a series of updates on the population of the generic metagrammar. 3.1 Tupel numbers and tupel references In the generic metagrammar (GenMG) we use tupel numbers. Each tupel can be identified by giving the name of the fact it belongs to together with its tupel number (unique per fact type). This facilitates the recording and handling of the tupels. In an FCO-NIAM IG we write tupel numbers followed by a colon in front of the tupels (figure 3.1). In addition to this we use so-called tupel references. The role values of every role played by a non-lexical object type (NOLOT) are not (combinations of) labels, but are the tupel numbers of the tupels in the population of the NOLOT playing the role. For example: in figure 3.1 in fact type ROOM the role value of role 4 in tupel 2 is [3]. This refers to tupel number 3 from fact type FLOOR, which has as role value '1', a label identifying the first floor. Compare this to the corresponding tupels in figure 2.3). In an FCO-NIAM IG we enclose the tupel references in square brackets (Figure 3.1). The role values of all roles played by lexical object types (LOT's) are simply the labels themselves. Please note this does not affect in any way the substitution algorithm (see § 2.2). It is just another way of writing down a population of an FCO-NIAM IG: more compact and redundancy-free. ![]() Figure 3.1: FCO-NIAM IG with tupel references 3.2 The generic metagrammar (GenMG) in relational form Here we present the main part of our generic metagrammar (GenMG) in relational form. We have the complete GenMG in FCO-NIAM form of course, but we want to concentrate on the main parts of it in its most concise and conveniently arranged manifestation. Please note that because the GenMG is itself also an FCO-NIAM information grammar (IG) it can be populated with itself. This autopopulation can then be subjected to the GLR algorithm in order to transform it into its relational form. This again stresses the point that the GenMG can contain as population IG's in various data models, FCO-NIAM and the Relational Model in particular. If an MG is to be generic for a set of Data Models, it must contain the union of all the model-specific fact/object types but only the intersection of all the model-specific constraints. In order to ensure that a certain population will represent a correct IG in a certain Data Model, we temporarily 'turn on' the model-specific constraints which do not belong to the intersection. After this validation we 'turn' those specific constraints 'off' again. The GLR-algorithm actually works in this way, turning off the FCO-NIAM specific constraints and turning on those specific for the Relational Model.
In figure 3.2 we show the main part of the GenMG
in relational form, populated with the FCO-NIAM IG of figure 3.1. We will
now comment briefly on all the tables. Columns with an obvious meaning
will not be discussed. Table OBJECTTYPE/FACTTYPE. In FCO-NIAM the three concepts LOT, fact type and NOLOT are unified in the single Concept objecttype/facttype. In the FCO-NIAM form Of the GenMG the three 'old' concepts are subtypes of OBJECTTYPE/FACTTYPE (see § 5).
Table ROLE.
Table UNIQUENESS CONSTRAINT.
A verbalization of two tupels about the same constraint:
"Uniqueness constraint 3 concerns (among others) role 4." "Uniqueness
constraint 3 concerns (among others) role 5." Table PRIMARY KEY. A fact type can have more than one uniqueness constraint.
In such cases, the GLR-algorithm requires to assign one as the primary
key (if that is not possible, the FCO-NIAM IG cannot be transformed to
a relational schema which meets the entity integrity requirement. This
can happen because FCO-NIAM is more powerful than the Relational Model.
Standard transformations are being worked out to arrive at a relational
schema by changing the communication (see § 4). In the example of
figure 3.1 this does not occur, so all uniqueness constraints are marked
as primary. Table MANDATORY ROLE CONSTRAINT. This table is analogous to UNIQUENESS CONSTRAINT.
Table SUBSET CONSTRAINT.
In this paper equality constraints are treated as
pairs of subset constraints in opposite directions. (Exclusion constraints
lead to a similar table not discussed in this paper.) Table EXPRESSION.
Table OTE IN ROLE.
An example verbalization: "Object type expression
O1.1 may be used by role 2. " Table EXPRESSION PART. EXP_CODE, EXP_NO and PART_NO together identify part
of an expression. An expression part can either be a ROLE or a piece of
TEXT. An expression is made up of a sequence of such parts. For instance:
the three tupels with EXP_CODE 'F' and EXP_NO '1' correspond to F1: "floor
<1> exists". Table POPULATION.
Example verbalizations:
3.3 The example IG as population of the GenMG under the GLR-algorithm The example IG as population of the GenMG Figure 3.2 shows the GenMG populated with the IG
from figure 3.1. Datatypes have been added. Please note that it is possible
to regenerate the entire figure 3.1 (population included) from the tupels
in these tables, excluding only the layout of the diagram. The example IG as population of the GenMG after
grouping Basically, the grouping part of the GLR-algorithm does the following:
![]() Figure 3.3: IG after grouping The result of the grouping operation in diagram-form
is shown in figure 3.3, and the corresponding population of the GenMG
in figure 3.4. Please note the transformation is done only on the population
of the GenMG; figure 3.3 is drawn as an illustration afterwards. We draw your attention to the following points:
The example IG as population of the GenMG after lexicalization In the lexicalization step of the GLR-algorithm
all roles played by NOLOTS are treated. After lexicalization all roles
are played by LOT's. This is accomplished by 'diverting' the line connecting
the role to the NOLOT to the LOT identifying the NOLOT. For instance:
the line from role 9 to NOLOT FACILITY will be diverted to LOT FACILITY
CODE. sometimes two or more 'diversions' must be made before reaching
a LOT. This process also involves the splitting of each role played by
a NOLOT into as many roles as are contained inside the NOLOT. Whenever
a 'diversion' is made, it generates one subset constraint in case no mandatory
role constraint is present or two subset constraints (equivalent to one
equality constraint) if a mandatory role constraint is involved. For instance:
the diversion of role 9 creates one subset constraint (with SC_NO 4 in
table SUBSET CONSTRAINT in figure 3.6). The consequences must he processed
in FTE's, population etc. Figure 3.5 shows the result of this step in diagram form. Figure 3.6 shows the population of the GenMG. We draw your attention to the following points:
![]() Figure 3.5: IG after grouping and lexicalizing
The example IG as population of the GenMG after
reducing In this example no reducing can be carried out.
In general, reducing means removing redundant fact types. After grouping
fact types may exist having both a primary key concerning all roles in
the fact type and an equality constraint concerning all roles in the fact
type (and as many roles in another fact type). In such a case the fact
type can be removed (after moving its FTE to the other fact type). If
for instance role 9 would have had a mandatory role constraint in figures
3.1 and 3.3, then in figure 3.5 an equality constraint between roles 9
and 10 would have been generated and fact type FACILITY would be removed,
transporting fact type expression F6 to EQUIPMENT. Alternative diagram for the example IG after
the GLR-algorithm Figure 3.5 actually shows a relational schema with
domains(LOT's), tables (fact types), primary keys (equal to certain uniqueness
constraints), table columns (roles), NOT NULL indicators (roles without
an OP-indicator), references, including foreign keys (a kind of subset
constraints tupels (tupels). In addition it shows the user sentence types.
A perhaps more familiar way of drawing tables is shown in figure 3.7,
which is completely equivalent to figure 3.5. We emphasize that FCO-NIAM,
the Relational Model and other Data Models can best be seen as special
cases of the Generic Data Model we employ here.
![]() Figure 3.7: Relational schema A few final remarks:
4 The need for complex identification structures Classic NIAM textbooks like (Nijssen & Halpin, 1989) do not treat the modeling of multiple identification (generalization), variable length identification by ordered or unordered enumeration (sequence and set based identification) or recursive identification. We introduce the general term complex identification for these identification schemes. The 100 percent conceptualization principle (P2 in § 1.1) requires us to be able to model complex identification without introducing extra labels (identifying codes or numbers), even though this is a common device in practice. The substitution principle requires us to be able to regenerate user fact expressions containing complex identification by substitution. In § 4.1 we discuss a typical example of generalization. In § 4.2 we illustrate the need for recursive identification with an example from a real life case. In § 4.3 we treat set and sequence based identification. 4.1 Generalization The necessity of incorporating generalization alongside specialization is hardly disputed nowadays. Different treatments exist (for example: Ter Hofstede, 1993, pp. 37-39). The FCO-NIAM way to deal with generalization is the subject of this paragraph. It turns out we don't need to introduce new meta-concepts: our generic metagrammar (GenMG) presented in § 3 can manage this. The classification / qualification process, described in § 2 (figure 2.2), automatically leads to multiple identification structures if they occur in the user sentences. We will not elaborate on the details here, but present a simple example to illustrate the way generalization is modeled in FCO-NIAM instead. A room rental company identifies rooms by room numbers
(room 1, room 2, etc.) or by room names (auditorium, green room, blue
room, etc.). Some rooms have both a room number and a room name (room
1 = the green room, room 2 = the blue room), other rooms have only a room
number (room 17, etc.) and the auditorium only has a name: auditorium.
We overhear parts of dialogues (i.e. communication) between employees
and different visitors:
The fact expressions are identified by a number,
possibly followed by a letter a, b or c if there are different expressions
of the same fact. We have underlined the object expressions already. The
classification / qualification process (compare with figure 2.2) leads
to the following fact types, fact type expressions, object type expressions,
label type and object ORGANIZATION: FTE1: "there is an organization named <ORGANIZATION NAME>." ROOM: FTE2a: "the room <ROOM NUMBER> exists.' MEETING: FTE3: "the meeting of < ORGANIZATION > takes place in < ROOM >." From this the FCO-NIAM information grammar presented in figure 4. 1 follows. The sentences 1a, 1b, 1c, 2 and 3 can easily be regenerated.
![]() Figure 4.1: IG with generalization In the nominalized fact type ROOM roles 2 and 3
are optional (non-mandatory), which is marked with 'OP' in the IG (see
also § 3). The hyphens represent null values in the sense of (De
Troyer, 1993, pp. 18-22), that is in the "no information' interpretation.
The nominalized fact type ROOM violates the N-ary
nested NIAM rule stating that nominalized fact types must have exactly
one uniqueness constraint covering all its roles (so-called n-rulefor
nominalization). In FCO-NIAM this well-formedness requirement (a meta-grammatical
constraint) on IG's is attenuated, reading: Every role in a non-lexical object type (i.e. a nominalized facttype) must be covered by at least one uniqueness constraint,. every combination of roles that is covered by a certain uniqueness constraint must have its own existence postulating fact type expression and its own object type expression. Of course, for a nominalized fact type only those tupel populations are allowed for which an object type expression is recorded. This is implied by the communication and substitution principles (P1 and P5 in § 1.1). A less conceptually(or if you wish: more practically) inclined information analyst would probably have introduced a unique room number for each room. In that case the IG shown in figure 4.2 would have originated. We challenge the reader to try to add FTE's, OTE's and LP's to this IG in such a way that the original fact expressions can be regenerated from the IG plus its LP.
![]() Figure 4.2: IG with artificial room number A final remark:
4.2 The need for recursive identification structures The example discussed in this paragraph is taken from a project carried out by the engineering bureau of the Dutch Railway Company. Co-author Van der Lek was involved as a senior information analyst. Among other things all information on so-called OS-sheets had to be modeled. OS stands for Overview Signals. Figure 4.3 shows a fragment of such an OS-sheet, a bit stylized, together with a legend. We have emphasized a few lines by drawing them bold. Using the legend it is not difficult to verbalize a few facts: "Signal 54 exists." "Signal 46 exists." ... "A route can be set up between signal 54 and signal 46." "A route can he set up between signal 46 and signal 20."
![]() Figure 4.3: Fragment of OS-sheet The corresponding. fact types SIGNAL and ROUTE are
drawn in figure 4.4. From figure 4.3 it is not quite clear which views
a signal might possibly show. On being asked the user declared that every
signal may show every possible view in principle. So: "A signal can show R." "A signal can show Y." ... "A signal can show GB." "A signal can show Y40." See fact type VIEW in figure 4.4. Obviously not all these possible views are relevant for a certain signal on the OS-sheet. Only the views actually used by a signal are drawn in. This became quite clear when we investigated the meaning of the connection lines between signals. The two bold lines between signals 20 and 46 and between signals 22 and 46 were verbalized as follows:
An experienced FCO-NIAM analyst knows of course that sentences in the form of "If ... then ... ." rules can, in good NIAM fashion, simply be regarded as fact expressions. The resulting fact type is called VIEW CONNECTIONS. It has FTE: "If < ROUTE > and the final signal shows <VIEW> then the first signal must show < VIEW >." It is however not drawn as a ternary fact type in figure 4.4 as might be expected, because later on another fact type had to be added as well: OS_CODE_DELIVERY, resulting from sentences such as:
![]() Figure 4.4: View connections, wrong modeling Sentences belonging to VIEW CONNECTIONS and OS_CODE_DELIVERY
have the sentence part between 'If'...... 'then' in common. The NIAM procedure
then requires to create a separate nominalized fact type SIGNALED ROUTE
providing a common object type expression in order to avoid redundancy.
After further classification and qualification the information grammar
in figure 4.4 was completed and populated with the facts put into words
so far. (For easy readability we use labels instead of tupel references.)
At this point a problem arises. Fact typeexpression F5 in figure 4.4: "If <8> then the first signal must show <9> ." strongly suggests a functional dependency between objects of object type SIGNALED ROUTE and objects of object type VIEW. Therefore in fact type VIEW CONNECTIONS uniqueness consul 6 was imposed on role 8 only. But after we added all the facts concerning the other two bold lines in figure 4.3 (a tupel in ROUTE, a tupel in SIGNALED ROUTE with role values '(54,46)' and 'Y', and tupels 3 and 4 in VIEW CONNECTIONS), uniqueness constraint 6 had to be replaced by uniqueness constraint 7 because of the common role value of role 8 in tupels 3 and 4. As a check we regenerated the fact expressions for tupels 3 and 4, yielding the following two sentences:
Obviously this is wrong. Uniqueness constraint 6 was the right one of course and in order to ensure that it would not he violated the view Y had to be replaced by two new virtual views Y-1 and Y-2. See figures 4.5 and 4.6.
![]() Figure 4.5: OS-sheet with virtual views
![]() Figure 4.6: IG with virtual views This solves the problem. Tupels 1 to 4 now represent the following sentences:
In a relational database the table SIGNALED ROUTE (figure 4.7) was created. This table is operational today for benchmarking purposes.
![]() Figure 4.7: Relational table with virtual views Although the database manager was quite satisfied with this solution the designers of the OS-sheets didn't like this at all. Why were they forced to introduce these in their opinion irrelevant view codes Y-1 and Y-2 (and so on?) in their OS-sheets? They had never needed them before. From the point of view of the 100 percent conceptual principle (P2 in § 1.1) they were absolutely right! After a new interview session with the designers of the OS-sheets we rephrased the four bold connections in figure 4.3 in the following way:
Although in figure 4.3 only one-step and two-step
view connections occur, it is quite obvious that potentially no limitation
on the number of routes chained into one signaled exists. We conclude:
object type SIGNALED ROUTE has a recursive identification structure. The
way we model this in FCO-NIAM satisfies the conceptualization principe
and is shown in figure 4.8. It can he easily checked that for object type
SIGNALED ROUTE the n-rule for nominalization (see § 4. 1) holds.
Now we can regenerate the fact expressions 1 to
4 from the populated IG in figure 4.8 via the substitution principle.
In order to be able to regenerate recursive fact expressions we incorporated
recursive identification in FCO-NIAM. In fact, the example discussed in
this paragraph was one of the first real life cases where we really needed
it. For another striking example see (Van der Lek, 1993 A; 1993 B). Other
researchers treat recursion as well (Ter Hofstede, 1993), however without
incorporating the substitution principle. A few final remarks:
![]() Figure 4.8: IG with recursive identification structure 4.3 The need for sequence and set based identification In this paragraph we describe an example taken from (Van der Lek, 1993 B), which played an important part in discussions held in The Netherlands. Quamsplashing is a (fictitious) game. Two teams
having an arbitrary number of members are pitted against each other. The
rules of this enjoyable game are irrelevant for our purposes. An example
schedule for a quamsplash contest is shown in figure 4.9.
A verbalization of these facts (all of fact type GAME SCHEDULE) by the contest officials (object expressions have been designated by double underlining already):
We verified that it is also possible to phrase the second sentence as: "The team consisting of Ted, Marty and Cindy plays against the team consisting of Steve and Colin." The information analyst concludes from these fact expressions, that the fact type GAME SCHEDULE is a homogeneous binary fact type, in which both roles are played by the non-lexical object type TEAM. The fact type expression for GAME SCHEDULE is: " < TEAM > plays against < TEAM >." The object type expression for object type TEAM is: "the team consisting of < PERSON > [[, < PERSON >... ] and <PERSON>]'. We use the Backus Naur Form (BNF) notation here.
Square brackets ('[' and ']') denote optional parts; the three dots ('...')
mean that the part ', <PERSON>' may be repeated ad libitum. The
number of repetitions is of course always final at the instance level,
although it is potentially infinite at the type level. Further analysis (again accomplished by classification
and qualification) of object expressions like 'the team consisting of
Macy, Martin and John', yields an object type expression
'<NAME>' for the non-lexical objects of the object type PERSON,
in which NAME is an identifying label type for object type PERSON. Most Probably a practically inclined information analyst would solve the identification problem for the non-lexical object type TEAM by introducing a new label type TEAM NUMBER. He would then assign a different team number to each of the 5 (not: 6!) different teams and add some appropriate fact type expressions and object type expressions. The populated information grammar would then he the one depicted in figure 4.10.
![]() Figure 4.10: IG with artificial label type TEAM NUMBER But if the contest officials wouldn't need these
artificial team numbers at all ("We always did it this way; we just write
down the names of the team members and everybody knows what to do.'),
then something is wrong in this approach. Indeed, the 100 percent conceptualization
principle is violated again. The information analyst might carry this
through anyway (and would probably introduce other artificial numbers
identifying persons or games), but that does not concern us now. If we want to comply with the 100 percent conceptualization principle (P2, § 1.1) then we must at least be capable of modeling such identification structures. It was accomplished by the Dutch researcher Ter Hofstede by introducing a set type (Ter Hofstede, 1993, pp. 25-31). However, his solution doesn't comprise the modeling of sentence type and object type expressions. Figure 4.11 shows an FCO-NIAM IG modeling game schedules for quamsplash contests, in which object type TEAM is modeled as a set type.
![]() Figure 4.11 : IG with set type The braces '{' and around role 2 designate a set
type. An element of a set may occur in it only once. Uniqueness constraint
2 implies this. Uniqueness constraint 3 enforces that each set uniquely
defines a team. It is not difficult to regenerate the original verbalizations
from the IG plus the three tupels of GAME SCHEDULE via the substitution
process. A few final remarks:
5 Specialization Like generalization, we can incorporate specialization in FCO-NIAM without introducing new concepts at the meta-grammar level, as most others do (Nijssen & Halpin, 1989; Ter Hofstede, 1993, pp. 34-36). At least in the structural sense the whole thing can be done in terms of the concepts introduced in the preceding paragraphs. We illustrate this with an example in the context of meta-modeling. This paragraph therefore has two purposes:
5.1 Specialization or subtyping In (fully) communication oriented information modeling we unified non-lexical object types and fact types into a single concept, which incidentally is the only populatable construct. Besides this there was of course the concept lexical Object type (label type). Label types are the ultimate sources of role values with which the populatable constructs are actually populated. In (Van der Lek, Bakema & Zwart, 1992; Van der Lek, 1993 A) we used the term 'Objecttype' as a generic term for all these concepts, but we now prefer to use the term objecttype/facttype', because this is a better reflection of its function of unifying all constructs which can either play roles (i.e. lexical and non-lexical object types) or which can contain roles (fact types, nominalized or not). The subtype structure of the role containing constructs and the role playing constructs in our GenMG is shown in figure 5. 1. This can of course be checked by the familiar subtype matrix method for determining subtype hierarchies. The corresponding part of the GenMG in FCO-NIAM form is depicted in figure 5.2.
![]() Figure 5.1: Subtype structure of OBJECTTYPE/FACTTYPE The non-lexical object type OBJECTTYPE/FACTTYPE is the pater familias of the subtype hierarchy; it is a nominalization of its existence postulating unary fact type with the same name together with its fact type expression and object type expressions (Remark: NAME is a labeltype prefixed here by 'OT/FT_'; 'or' is used in the non-exclusive sense.):
![]() Figure 5.2: Subtype structure in GenMG of OBJECTTYPE/FACTTYPE In the first sublayer we have the non-disjunct subtypes
of the pater familias OBJECT TYPE and FACT TYPE. In the second layer we
have the disjunct subtypes LEXICAL OBJECT TYPE and NON-LEXICAL OBJECTTYPE
(NOMINALIZED FACT TYPE) of the supertype OBJECTTYPE and the subtypes NOMINALIZED
FACT TYPE (NON -LEXICAL OBJECT TYPE) and NOT-NOMINALIZED FACT TYPE of
the supertype FACT TYPE. The second layer is a partition (total and exclusive)
of the supertype OBJECTTYPE/FACTTYPE, whereas the first layer is a total
but non-exclusive division of the pater familias. All object instances of the object type OBJECT TYPE
play a role in the binary fact type ROLE_PLAYING between OBJECT TYPE and
ROLE (not shown) having as FTE: "<OBJECT TYPE> plays <ROLE>."
All object instances of object type FACTTYPE play roles in other fact
types as well, such as the binary facttype ROLE_CONTAINING between FACTTYPE
and ROLE (not shown) having as FTE: " < FACTTYPE > contains <ROLE>."
All subtypes in the second layer as well as the pater familias OBJECTTYPE/FACTTYPE
also play roles in other fact types (not shown either). Please note that subtype OBJECT TYPE in the first
layer is modeled as a nominalization of its existentence postulating unary
fact type OBJECT TYPE in which the pater familias OBJECTTYPE/FACTTYPE
plays the single role. The same holds for subtype FACT TYPE. The identification
structure for the common subtype NON-LEXICAL OBJECT TYPE (i.e. NOMINALIZED
FACT TYPE) of OBJECT TYPE and FACT TYPE is perhaps somewhat surprising,
but its multiple identification is already familiar from § 4.1. In
fact this illustrates a beautiful interrelationship between generalization
and specialization: we can regard NON-LEXICAL OBJECT TYPE both as a subtype
in the whole hierarchical subtype structure and as a generalization of
its direct parents OBJECT TYPE and FACT TYPE. The substitution principle
holds of course as can be checked easily. A few final remarks:
5.2 Derivable and declarative subtypes There are two ways of specifying which subpopulations of an supertype are also populations of its various subtypes. We call this subtype specification. In NIAM the classical way to specify subtypes is to require that for each subtype a so-called 'subtype defining rule' must be given, in which: A subtype specification is formulated in terms of fact types having roles played by super type(s) of the subtype being specified. In FCO-NIAM this way of subtype specification may
be used as well as can be seen in figure 5.3. However, in FCO-NIAM subtype
specification rules are ordinary derivation rules. This means that the
subtypes in figure 5.2 must be marked with an asterisk (denoting derivable
fact types) if we use this kind of subtype specification. A slightly different version of derivable suptype
specification is shown in figure 5.4, in which we use a more sophisticated
existence postulating fact type for the pater familias OBJECTTYPE/FACTTYPE.
A new way to specify subtypes is that any subtype may also have a definition,
in which: A subtype specification is formulated in terms of existence postulating fact type(s) of supertypes. So: in FCO-NIAM a subtype hierarchy is not necessarily modeled using only derivable facttypes. It can also be done in a pure declarative style. In that case the existence postulating fact types of the different subtypes are not derivable from other fact types, but bear the burden of the subtype specification themselves. In this case figure 5.2 doesn't need any derivation rulesadded because it already contains all the necessary information.
![]() Figure 5.3: Subtype specification: derivable
![]() Figure 5.4: Subtype specification: declarative Final remarks:
6 The educational and practical impact of FCO-NIAM The extension of NIAM presented in this paper yielding Fully Communication Oriented NIAM is accepted widely today in the Dutch NIAM-world. It is being used by several software houses for information modeling. The aspects of FCO-NIAM based on principles Pl to P6 as presented in our previous paper (Van der Lek, Bakema & Zwart, 1992) were adopted in (Nijssen & Schouten, 1993; Nijssen, 1993). Several college- and university-level institutes in The Netherlands use these course books and our own course materials (m which the main accent lies on principles P5 and P6 and the consequences thereof for CASE-architecture (Bakema, Zwart & students, 1993)) in progams of Computer Science, Business Informatics and Information System Development. Other course materials developed very recently (Willemsen, 1994) will be used next course year by several college-level institutes.
These features, the last certainly not being the
least, turned out to be quite interesting for commercial companies as
well. The meta grammar used in our CASE-tool has been nominated as the
internal standard by software house CVI (Centre For Information, Utrecht,
The Netherlands). This year our students built new versions of our
CASE-tool for the Dutch software house BSO/Management Support Baarn and
for several college-level institutes. In these new versions they implemented
a sentence interface as illustrated in figure 2.2, which works for multiple
identification structures (see § 4.1) as well. In that case generalized
object types are created automatically. We thank Susanne Willemsen for reading the text
of this paper, suggesting some improvements to the first draft version
and designing all the figures. She is one of our students and will graduate
in july 1994 on a project developing course materials about Fully Communication
Oriented information modeling (Willemsen, 1994). Dr. Harm van der Lek works as a senior advisor for
the Dutch software house BSO/Management Support in Baarn and is specialized
in information analysis in complex domain, and conceptual problems. He
introduced NIAM as the methodological standard in the Dutch Railway Company
and several other companies. He is chairman of 'the NIAM Group'. He advised
the Hogeschool Gelderland and several other educational institutes to
base their teaching programs about information system development on FCO-NIAM
and he inspired Hogeschool Gelderland to build an FCO-NIAM oriented CASE-tool
for didactical purposes.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||