More techniques actually exist in a database designer’s toolbox than the Entity-Relationship Diagram [ERD], an issue sometimes forgotten or just overlooked. An ERD serves well as one of the deliverables expressing a logical design to system stakeholders and developers. However, as a data modeler begins drafting a database design there are several diagramming approaches to assist in exploring and understanding the solution space, or universe-of-discourse [UOD]. These additional techniques make it possible to flesh out designs more completely. And when employed judiciously, these tools minimize those “holes” occasionally stumbled upon during initial development or later. Regardless of the added time spent in the creation of these supplementary diagrams, resolving problems and issues before code is written is always a lower total cost approach for any project. Foremost among these additional methods is the Data Flow Diagram [DFD].
The DFD is a simple diagramming construct consisting of four component elements: a process, a data store, a data flow, and an external agent. External Agents are people, services, or other functions not part of the application-at-hand, either providing or receiving data from the solution-in-progress, and represented by a square box. Processes, as might be expected, are the functions/services/programs/et al., that receive, transform, or yield data items (represented by a rectangle with rounded corners, or by a circle depending on the chosen type of data flow diagramming method). Data Stores are any actual or virtual persistence of data content - meaning files, tables, message queues, etc. And lastly, Data Flows are the virtual reads/writes/sends/receives occurring throughout the system (drawn as a line with an arrow on the end denoting the direction of the flow). A proper analysis of a solution from a DFD perspective involves a series of diagrams describing the system initially at a very high level, and then drilling down into finer levels process grain. The initial diagram in the set is referred to as the Context Diagram.
The Context Diagram has but a single process within it, and that single process represents the entire solution under discussion. Surrounding the single process are the external agents representing those systems/processes/workgroups which interact with the proposed solution. Connecting the external agents and the process are data flows signifying the major data interfaces between the external agents and the system. The next diagram, referred to as the “Level 0” diagram, should be an explosion of the Context Diagram that would include processes indicating the major components of the solution, however many there might be. Each one of those major processes also may be expanded into a more detailed DFD, and each of the processes in those expansions may be further exploded to provide even more detail. Obviously the database designer need not completely understand the specific internal workings of every single process going against the resulting database. But minimally, the designer should understand at a high level every process supported by the resulting database, and consequently the data needs of those supported processes. This goal of comprehending process data needs will help guide the data modeler in achieving an understanding of how detailed they may drive their DFDs.
When database designers properly incorporate DFDs in their analysis of requirements as a regular part of working on new database models, they will often find that this processed-based-yet-still-data-driven analysis will uncover needs for extra tables that may have been missed. These missed entities could be simple reference tables or more subtle interrelationships that certain specific functions force into being. Additional evaluation of DFDs can be leveraged to create CRUD matrices that may help to further identify even missing processes for maintaining the data within system scope. Beyond the objects of the UOD, DFDs may reveal access patterns that help in index identification, or may clarify object inter-relationships that otherwise appeared fuzzy or uncertain. Having a set of DFDs as an artifact of the database modeling process allows designers to have a greater level of comfort that a given design is complete. And lastly, DFDs also function as a good tool for explaining system functionality to business stakeholders.
About the Author
Todd Schraml is senior data architect and manager of ETL at Innovative Health Strategies, Inc. He can be reached at tschraml@ihsiq.com.