Index - Major Sections
Home

**InHCc HMIS**

Site Map
Health Economic and Reform

Benefits

Discussion

Data and Data Analysis

Health Management

Product and Services
References
Team

_______________

Index - Same Level Subject


 

Index - Child Subjects

Introduction 

The total quantity of data being collected today has been estimated to be doubling every two to eight months. How is this data collected, how is this data stored, and how is this data used is the subject of databases is the subject of this section. 

There are different database designs that serve different purposes. 

The failure to understand that there are difference database systems that serve different purposes is one of the major reasons for the failure to realized the full benefits from a database system. 

Problems with legacy database systems

With traditional file systems, up to 80 percent of a typical application program is taken up with code related to file access (including file definition, record description and valid values. The programmer has to code the field starting positions, lengths, and data types. Every time that any data layout changes, every application program that access that file had to be rewritten.

This means that it was almost impossible for organizations to share data. One organization would not necessarily have data formats as other organizations.

Redundant data.

As you can see in the example table, several names and descriptions are continuously repeated. This increases the amount of physical storage required and more importantly the time it takes to take and input the data

 

Repeating groups (limited data tracking)

In any design of a flat file while a question may have more than one answer repeating files must be added into the table design. This is hard to code in the application and waste space when the files are not all filed out. At the worst, if enough repeating groups are not designed into the system, information may be left out

 

Inconsistent data

When taking data from the same source multiply times, it is very likely that data will be misspelled when a new record is added. As more records are added to the file, the potential for inconsistent data from typing errors increases

Update anomalies

Suppose that you realize that you need to change a class or category that has been defined in your data collection. In a flat file you must go through every record individually in order to insure that all the records are updated correctly

 

Delete Anomalies

Suppose that you are no longer interested in a class or category that has been defined in your data collection. If all records with this class or category are deleted other important information may be deleted. 

Insert Anomalies

Suppose that you want to add a new field to your data. Will you have to go back and recode all of your data or do you insert a special code such as unknown for the field in the older records.

 

Losing the temporal nature of data

 

Storage of the data

 

Ability to reuse the data

 

Determining the relationship between data attributes

More research data is collected without the ability to use the relationship between data.

Relational Databases

The Relational Database solved the problems of the legacy systems.

Types of Database Processing

The early databases were designed around one database such that the whole organization was able to use it for all their processing needs. Many of the Enterprise Resource Planning (ERP) systems still use this system. 

Today it is recognized that a more efficient system can be created by creating one database that serves operational needs and another that serves informational or decision support needs. 

Operational (Transactional)

Operational databases are used to created applications used in the day-to-day operation of the organization. These include scheduling, invoicing, inventory control and other applications that are considered mission critical. They generally require a 1-3 second response time 100% of the time. 

Operational databases generally use the relational database management system, although object orientated databases are arriving on the scene. 

Characteristics 

  • Organized by application

  • Optimized for quick responses

  • Access a small number of records per transaction

  • Require current data

  • Updated in real time

  • Support a large number of concurrent users

  • Requires non-redundant, highly normalized data structure

  • Systems are generally non-standardized and inconsistent across applications

The biggest mistake that can be made in database design is the use of an operational database to perform information gathering. 

Informational (Data Warehouse) 

Informational databases 

Characteristics

The Informational Databases has these characteristics:

  •  Data Access tends to be ad hoc

  • May be non-normalization in design

  • May contain redundant data and be aggregated.

  • Timeliness is not generally an issue.  

  • Access large volumes of data records

  • Current data is not as important as in operational databases.

  • Updated generally as batches

  • Have fewer more sophisticated users

Design of the Data Warehouse

Denormalization means that you are purposely designing your database so that it is not normalized. This is usually done to maximize performance or to simplify end-user reporting. In a normalized database a query may require many joins to process the request. These additional table joins can be expensive in terms of computer time. More will be said about denormalization in the topic on Data warehousing.

Renormalization Techniques

  • Duplicate data: Duplicate data can reduce the number of joins required to process a query.

  • Summary data: Summary data where used often in queries.

  • Horizontal partitioning: split the table into two or more separate tables at the record level, such as by category, thus reducing the number of rows per table accessed. …example – by department

  • Vertical partitioning: Splitting the table into two separate tables at the column level, thus reducing the number of columns per table….example – salary.

Design

Granularity:

Granularity is the major design issue in the data warehouse environment. It profoundly affects the volume of data that resides in the data warehouse and at the same time affects the type of query that can be answered. The volume of data in a warehouse is traded off against the level of detail of a query. (I-45)

 

Partitioning:

Refers to the breakup of data into separate physical units the can be handled independently (I-55).  Partitioning gives the user finer detail (I-58)

            Date

            Type of symptoms

            Type of results

            By geography

            By age

            etc

The separation of operational and informational databases occurs for many reasons: (Inmon, 1996)

  • The data serving operational needs is physically different data from that serving informational or analytical needs.

  • The supporting technology for operational processing is fundamentally different from the technology used to support informational or analytical needs.

  • The user community for operational data is different from the one served by informational or analytical data. 

  • The processing characteristics for the operational environment and the informational environment are fundamentally different.

Research Issues

In general, levels and classifications are not important. Since in the Relation Data, data is collected at the lowest possible level, classifications can be obtained at any time and at any level by "mapping."

Back to Top