Data Dictionary

When I took the first survey of my undertaking, I found our speech copious without order, and energetick without rules: wherever I turned my view, there was perplexity to be disentangled, and confusion to be regulated; choice was to be made out of boundless variety, without any established principle of selection; adulterations were to be detected, without a settled test of purity; and modes of expression to be rejected or received, without the suffrages of any writers of classical reputation of acknowledged authority.

‐ Preface to Dictionary of the English Language, Samuel Johnson, 1755 [9]

What is it?

A data dictionary is defined in the IBM Dictionary of Computing as a centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format.[1] However, Business Analysts should be aware that the term data dictionary can have several meanings depending on the person one is speaking with. A data dictionary can be:

  1. A read-only set of tables that provides information about an Oracle database. [2]
  2. An automated tool for collecting and organizing the detailed information about system components. [3]
  3. A reference that includes standard definitions of data elements, their meanings, and allowable values. [4]
  4. A collection of data about the data, with the purpose of rigorously defining each and every data element, data structure, and data transform.[5]

As you can see from the list above, the term data dictionary may be interpreted differently by the person you are speaking to if they are a Oracle database administrator (#1 above), database architect (#4 above), system analyst (#2 above), or business analyst (#3 above). And if you are talking a non-technical business person, they may conflate a data dictionary with a glossary of business terms.

Depending on the context, a data dictionary can be a component part of database management system (DBMS) or a separate data store in the form of a database, spreadsheet, or even an index card file (in the old days).

Data Dictionary Types

Generally, there are two data dictionary types that Business Analysts deal with. These are logical data dictionaries and physical data dictionaries. And then some data dictionaries combine the metadata of both a logical and physical dictionary.

A logical data dictionary typically describes information in business terms and focuses on the meaning of terms and their relationship with other terms. For example, from the business perspective there may be a data entity called Client, which includes Client Name, Client ID, and Order History attributes. Once defined, a logical data dictionary rarely needs to change.

In contrast, a physical data dictionary describes the physical attributes of a data element. What is the data type (string, boolean, integer, etc)? What is the maximum length? What is the format? What is the data encoding standard (ANSI, UTF-8, etc)? Where is the data physically stored (database name, column, row, etc)?

While a the attributes of a logical data dictionary are unlikely to change unless there are significant business changes, the physical attributes can change much more frequently. And because the data element could be stored in multiple databases with different physical structures, there may be different physical data dictionaries that include the same logical data entity.

Most business analysts will be creating the logical data dictionary as part of their requirements work, which would then be converted to physical data dictionary as part of the solution definition and design phases of an effort.

 

Why do it?

According to the International Standards Organization (ISO): The increased use of data processing and electronic data interchange heavily relies on accurate, reliable, controllable, and verifiable data recorded in databases. One of the prerequisites for a correct and proper use and interpretation of data is that both users and owners of data have a common understanding of the meaning and descriptive characteristics (e.g., representation) of that data. To guarantee this shared view, a number of basic attributes has to be defined. [9]

This is done through data dictionaries. Or put another way:

Data dictionaries assist with gathering requirements, standardize data element definitions, reduce redundancy, and enhance consistency between systems. What may seem like administrative busy work when you're launching a project can become a time and money-saving resource by revealing complexities that were overlooked in initial specifications and spotting costly inconsistencies before they are deeply embedded in the organization's systems.[8]

Among the many benefits of a data dictionary are:

The data dictionary can be one of the most important requirement artifacts that a BA creates when dealing with software solutions. It ensures that all stakeholders are in agreement on the definition and structure of data that the solution will use. The saying Garbage In, Garbage Out is never more true than when dealing with system data, so spend time ensuring that you have carefully defined the data needs of your solution.

A data dictionary (and glossary) is often one of the first artifacts that gets created when the requirements elicitation process starts. One of the key needs in any project is to make sure everyone is speaking the same language. A glossary helps with this. But unless you dive into the level needed to define terms that goes into a data dictionary, you are unlikely to determine if there are slightly different terms that are used within the business units you are working with.

An Example

An example of a situation where a data dictionary can help identify the usage of a seemingly common term with different meanings comes from the financial services field, where a seemingly standard term such as "Cash" can have different meanings depending on who you talk to. For one person Cash might mean literally cash in hand or immediately accessible money such as that in a bank account. But from various investment perspectives Cash might include short-term Treasury Bills, Certificates of Deposit, certain types of promissory notes, and other types relatively liquid investments. Now imagine a situation where the formula you need to calculate a figure for a regulatory report requires one definition of Cash but your internal systems are all built around a different definition of Cash. Unless you have used a data dictionary to very precisely define what you mean by Cash, your stakeholders and developers may not even realize there is a difference.

 

How do I do it?

Step 1 - Define the Location and Structure

Choose your data dictionary location and structure. Are you going to use Excel? Does your requirements management tool specifically support a data dictionary?

Step 2 - Define the Attributes

Decide what attributes you are going to capture in data dictionary. Logical and Physical data dictionaries frequently have different attributes that should be captures. However, some data dictionaries are set up to capture both types of information. In those cases the business analyst might start the data dictionary by working with the business to define the logical data attributes while the technical team would then expand the data with the physical attributes for the specified data elements.

Logical Attributes

For each data element, the following logical attributes may be used in a data dictionary:

Attribute Name Description
ID Code Every data element in the dictionary should have a unique identification outside of its name. This might be a number, or an alphanumeric combination.
Name Every data element in the dictionary should have a unique name.
Aliases If the data element is potentially referred to by more than one name, the additional names can be recorded as aliases (or synonyms). For example, an element named Price may also be referred to as Cost or Value. Thus Cost and Value would be recorded as Aliases of Price
Definition Every attribute should include a definition that precisely defines the element associated with the Name, and which distinguishes it from other elements that may be similar.
Formula This specifies if the value being defined should be calculated using a specific formula.
Unit If the data element should be specified in a specific unit of measurement (such a miles, centimeters, or a currency)
Update Frequency This attribute how often the business expects the element value to be updated. This may be daily, weekly, monthly, point in time (if updates occur on a non-scheduled basis such as when a client changes their mailing address), or other values.
Definition Source It is often a good idea to keep track of who provided a specific element definition.
Range of Values This is the possible range of values the element needs to support. This could be a range of numbers, a set of letters, or one of a specific set of values (such as State abbreviations if part of an address)
Time Dimension Frequently, a business can want the same value but over a different period of time. For example, a business want to know the total amount of sales to a customer over a 1-month, 6-month, 1-year, and lifetime period. This attribute can also interact with the Update Frequency attribute to provide different results, so there could be a 1-year sales total that is updated on once a year on January 1st to provide a calendar year value, and another 1-year sales total that is updated daily to provide a rolling value.
History Required For some elements, the business may want to maintain not just a current value, but past values as well. For example, if the business was tracking a daily sales total, they may want to maintain specific daily values for the prior 5 years.
Notes This attribute field can be used to provide specific examples, background, or other points of information that the BA feels appropriate.
Components If the element is a compound element, this attribute can be used to indicate the component elements that are incorporated into this element. This can also be used to indicate the relationships with the component elements (mandatory, optional, 1 to 1, 1 to many, etc)
Owner This is the owner of the data value that is being defined. This is usually the business unit or person who is responsible for the data in question. This might mean that they are responsible for the system that generates the data, or are the subject-matter expert for the ensuring the quality and accuracy of the data in question.
Domain This is the context in which the data element exists. For example, data elements within a client domain may include client ID, first name, last name, address, city, state, zip code, order history, preferred payment method, and other data elements that are related to the client. Sometimes a data element can be part of more than one domain.
Default Value This specifies if the data element has a default value, and if so what that value is.
Security Classification If the data element has a particular security classification that should control its use or access to the data, it might be noted here. An example might be sensitive client data (such as Social Security Numbers in the U.S.) that a firm might have to store for regulatory purposes, but for which access should be restricted and which should not be made public.

* This is not intended to be a complete list of possible logical attributes. Please add any new attributes you may use or be aware of.

Physical Attributes

For each data element, the following attributes may be used in a data dictionary:

Attribute Name Description
Size If the element has a specific size (25 characters), or a maximum size (100kb), it should be documented here
Database Name The name of the database the element is stored in
Database location The specific location (column, row, record, etc) in a database that the element is stored in.
Data Type The type of data the elements is stored as. This can include a Boolean value, integer, number, record, or a number of other specific data types.

* This is not intended to be a complete list of possible physical attributes. Please add any new attributes you may use or be aware of.

Step 3 - Define the Process

Decide what your version control system and change process for the data dictionary is. The data dictionary most frequently the document that is worked on by multiple parties simultaneously (business analysts, systems analysts, data analysts, data architects, etc.) and is thus one of the documents most in need of a solid version control and update process.

Step 4 - Adding Elements

Start defining your data elements.

Make sure that each data dictionary entry defines only one data element. When working to define a new element, make sure to figure out if the element under discussion is a primitive or composite data element. Primitive data elements are those that cannot be deconstructed any further. Composite data elements are those that are made up of or incorporate primitive data elements.

For example, a name element is generally a composite data element that is made up of the following primitive data elements: first name, middle name, family name, and name suffix. There may also be a need to specify a name prefix element that would support titles such as Mr., Dr., Ms., and other similar titles.

When defining a composite element you should specify the primitive elements first and then the composite element, making sure that the definition of the composite element explicitly references the primitive elements that it is made up of.

Step 5 - Verify Attributes

The attributes for each data element in the dictionary should be verified with all stakeholder groups in order to reach agreement on naming, definitions, attributes, formulas, and other critical attributes of the data.

 

What Should the Results be?

The end result of a data dictionary effort should be a single artifact that:

 

Advantages

 

Disadvantages

 

Tips

 

References

  1. IBM Dictionary of Computing, 10th edition, 1993
  2. Oracle Database Concepts 11g Release, Data Dictionary
  3. Book Chapter: The Data Dictionary, by Marty Modell in the online version of his A Professional's Guide to Systems Analysis, Second Edition
  4. BABOK Guide, section 9.5.3
  5. The Information System Consultant's Handbook: System Analysis and Design, by William S. Davis and David C. Yen.
  6. Wikipedia Entry - Data Dictionary
  7. Structured Analysis Wiki - Data Dictionaries
  8. Business Data Dictionary Template - Project Connections web site.
  9. Best Practices for Data Dictionary Definitions and Usage, Version 1.1 2006-11-14. Northwest Environmental Data Network.
 

Related Resources