Data-Flow Diagramming Mechanics
Data-Flow Diagramming Mechanics
Data-flow diagrams are versatile diagramming tools. With only four symbols, data-flow diagrams can represent both physical and logical information systems. The four symbols used in DFDs represent data flows, data stores, processes, and sources/sinks (or external entities). The set of four symbols we use in this book was developed by Gane and Sarson (1979) and is illustrated in Figure 6-2. A data flow is data that are in motion and moving as a unit from one place in a system to another. A data flow could represent data on a customer order form or a payroll check. It could also represent the results of a query to a database, the contents of a printed report, or data on a data-entry computer display form. A data flow can be composed of many individual pieces of data that are generated at the same time and that flow together to common destinations.
FIGURE 6-2 Gane and Sarson identified four symbols to use in dataflow diagrams to represent the flow of data: data-flow symbol, data-store symbol, process symbol, and source/sink symbol. We use the Gane and Sarson symbols in this book.
A data store is data at rest. A data store may represent one of many different physical locations for data, including a file folder, one or more computer-based file(s), or a notebook. To understand data movement and handling in a system, the physical configuration is not really important. A data store might contain data about customers, students, customer orders, or supplier invoices.
A process is the work or actions performed on data so that they are transformed, stored, or distributed. When modeling the data processing of a system, it doesn’t matter whether a process is performed manually or by a computer.
Finally, a source/sink is the origin and/or destination of the data. Source/sinks are sometimes referred to as external entities because they are outside the system. Once processed, data or information leave the system and go to some other place. Because sources and sinks are outside the system we are studying, many of their characteristics are of no interest to us. In particular, we do not consider the following:
- Interactions that occur between sources and sinks
- What a source or sink does with information or how it operates (i.e., a source or sink is a “black box”)
- How to control or redesign a source or sink because, from the perspective of the system we are studying, the data a sink receives and often what data a source provides are fixed
- How to provide sources and sinks direct access to stored data because, as external agents, they cannot directly access or manipulate data stored within the system; that is, processes within the system must receive or distribute data between the system and its environment
Definitions and Symbols
Among the DFD symbols presented in Figure 6-2, a data flow is depicted as an arrow. The arrow is labeled with a meaningful name for the data in motion; for example, customer order, sales receipt, or paycheck. The name represents the aggregation of all the individual elements of data moving as part of one packet, that is, all the data moving together at the same time. A rectangle or square is used for sources/sinks, and its name states what the external agent is, such as customer, teller, Environmental Protection Agency (EPA) office, or inventory control system. The symbol for a process is a rectangle with rounded corners. Inside the rectangle are written both the number of the process and a name, which indicates what the process does. For example, the process may generate paychecks, calculate overtime pay, or compute grade-point average. The symbol for a data store is a rectangle with the right vertical line missing. Its label includes the number of the data store (e.g., D1 or D2) and a meaningful label, such as student file, transcripts, or roster of classes. As stated earlier, sources/sinks are always outside the information system and define the system’s boundaries. Data must originate outside a system from one or more sources, and the system must produce information to one or more sinks. (These principles of open systems describe almost every information system.) If any data processing takes place inside the source/sink, we are not interested in it, because this processing takes place outside of the system we are diagramming. A source/sink might consist of the following:
- Another organization or organizational unit that sends data to or receives information from the system you are analyzing (e.g., a supplier or an academic department—in either case, this organization is external to the system you are studying)
- A person inside or outside the business unit supported by the system you are analyzing and who interacts with the system (e.g., a customer or a loan officer)
- Another information system with which the system you are analyzing exchanges information
Many times, students learning how to use DFDs become confused about whether a person or activity is a source/sink or a process within a system. This dilemma occurs most often when a system’s data flow across office or departmental boundaries. In such a case, some processing occurs in one office, and the processed data are moved to another office, where additional processing occurs. Students are tempted to identify the second office as a source/sink to emphasize that the data have been moved from one physical location to another. Figure 6-3A illustrates an incorrectly drawn DFD showing a process, 3.0 Update Customer Master, as a source/sink, Accounting Department. The reference numbers “1.0” and “2.0” uniquely identify each process. D1 identifies the first data store in the diagram. However, we are not concerned with where the data are physically located. We are more interested in how they are moving through the system and how they are being processed. If the processing of data in the other office is part of your system, then you should represent the second office as one or more processes on your DFD. Similarly, if the work done in the second office might be redesigned to become part of the system you are analyzing
FIGURE 6-3 (A) An incorrectly drawn DFD showing a process as a source/sink, (B) A DFD showing proper use of a process.
then that work should be represented as one or more processes on your DFD. However, if the processing that occurs in the other office takes place outside the system you are working on, then it should be a source/sink on your DFD. Figure 6-3B is a DFD showing proper use of a process.
Developing DFDs: An Example
Let’s work through an example to see how DFDs are used to model the logic of data flows in information systems. Consider Hoosier Burger, a fictional fastfood restaurant in Bloomington, Indiana. Hoosier Burger is owned by Bob and Thelma Mellankamp and is a favorite of students at nearby Indiana University. Hoosier Burger uses an automated food-ordering system. The boundary or scope of this system, and the system’s relationship to its environment, is represented by a data-flow diagram called a context diagram. A context diagram is shown in Figure 6-4. Notice that this context diagram contains only one process, no data stores, four data flows, and three sources/sinks. The single process, labeled “0,” represents the entire system; all context diagrams have only one process labeled “0.” The sources/sinks represent its environmental boundaries. Because the data stores of the system are conceptually inside the one process, no data stores appear on a context diagram. After drawing the context diagram, the next step for the analyst is to think about which processes are represented by the single process. As you can see in Figure 6-5, we have identified four separate processes, providing more detail of the Hoosier Burger food-ordering system. The main processes in the DFD represent the major functions of the system, and these major functions correspond to such actions as the following:
1. Capturing data from different sources (Process 1.0)
2. Maintaining data stores (Processes 2.0 and 3.0)
3. Producing and distributing data to different sinks (Process 4.0)
4. High-level descriptions of data transformation operations (Process 1.0)
We see that the system in Figure 6-5 begins with an order from a customer, as was the case with the context diagram. In the first process, labeled “1.0,” we see that the customer order is processed. The results are four streams or flows of data: (1) The food order is transmitted to the kitchen, (2) the customer order is
FIGURE 6-4 A context diagram of Hoosier Burger’s food-ordering system. The system includes one process (food-ordering system), four data flows (customer order, receipt, food order, management reports), and three sources/sinks (customer, kitchen, and restaurant manager).
FIGURE 6-5 Four separate processes of the Hoosier Burger food-ordering system.
transformed into a list of goods sold, (3) the customer order is transformed into inventory data, and (4) the process generates a receipt for the customer. Notice that the sources/sinks are the same in the context diagram (Figure 6-4) and in this diagram: the customer, the kitchen, and the restaurant’s manager. A context diagram is a DFD that provides a general overview of a system. Other DFDs can be used to focus on the details of a context diagram. A level-0 diagram, illustrated in Figure 6-4, is an example of such a DFD. Compare the level of detail in Figure 6-5 with that of Figure 6-4. A level-0 diagram represents the primary individual processes in the system at the highest possible level of detail. Each process has a number that ends in .0 (corresponding to the level number of the DFD).
Two of the data flows generated by the first process, Receive and Transform Customer Food Order, go to external entities (Customer and Kitchen), so we no longer have to worry about them. We are not concerned about what happens outside of our system. Let’s trace the flow of the data represented in the other two data flows. First, the data labeled Goods Sold go to Process 2.0, Update Goods Sold File. The output for this process is labeled Formatted Goods Sold Data. This output updates a data store labeled Goods Sold File. If the customer order were for two cheeseburgers, one order of fries, and a large soft drink, each of these categories of goods sold in the data store would be incremented appropriately. The Daily Goods Sold Amounts are then used as input to Process 4.0, Produce Management Reports. Similarly, the remaining data flow generated by Process 1.0, called Inventory Data, serves as input for Process 3.0, Update Inventory File. This process updates the Inventory File data store, based on the inventory that would have been used to create the customer order. For example, an order of two cheeseburgers would mean that Hoosier Burger now has two fewer hamburger patties, two fewer burger buns, and four fewer slices of American cheese.
The Daily Inventory Depletion Amounts are then used as input to Process 4.0. The data flow leaving Process 4.0, Management Reports, goes to the sink Restaurant Manager. Figure 6-5 illustrates several important concepts about information movement. Consider the data flow Inventory Data moving from Process 1.0 to Process 3.0. We know from this diagram that Process 1.0 produces this data flow and that Process 3.0 receives it. However, we do not know the timing of when this data flow is produced, how frequently it is produced, or what volume of data is sent. Thus, this DFD hides many physical characteristics of the system it describes. We do know, however, that this data flow is needed by Process 3.0 and that Process 1.0 provides this needed data. Also, implied by the Inventory Data data flow is that whenever Process 1.0 produces this flow, Process 3.0 must be ready to accept it. Thus, Processes 1.0 and 3.0 are coupled to each other. In contrast, consider the link between Process 2.0 and Process 4.0. The output from Process 2.0, Formatted Goods Sold Data, is placed in a data store and, later, when Process 4.0 needs such data, it reads Daily Goods Sold Amounts from this data store. In this case, Processes 2.0 and 4.0 are decoupled by placing a buffer, a data store (Goods Sold File), between them. Now, each of these processes can work at its own pace, and Process 4.0 does not have to be vigilant by being able to accept input at any time. Further, the Goods Sold File becomes a data resource that other processes could potentially draw upon for data.
TABLE 6-2: Rules Governing Data-Flow Diagramming