Index - Major Sections
Home

**InHCc HMIS**

Site Map
Health Economic and Reform

Benefits

Discussion

Data and Data Analysis

Health Management

Product and Services
References
Team

_______________

Index - Same Level Subject


 

Index - Child Subjects

Introduction

One solution to transform data into a standard format , is to use “middleware.” A better solution is to use a data transformation program that is built into the data warehouse.

Middleware is defined as a set of services that are performed in order to move the data from the source to the database. It includes the following:

  • Data Transfer

  • Data Extraction

  • Data Quality

  • Staging

  • Data Transformation

  • Data Load

  • Directory Services

  • Security

Data Transfer

 

It is highly probable that the data sources will be running on a different platform than the data warehouse. 

Consider the following real world example to better illustrate this function: A home health care agency uses an electronic medical record Job database to keep track of all patient information. The nurse brings a laptop into the field each day, entering each patient's information into a customized Access database. At the end of the day, the nurse dials in to the corporate office and uploads the MDB file into a common directory. The SQL Server administrator has created a DTS package that will automatically pick up the nurse's file and roll it into a master SQL database at midnight each night Using the export functionality of DTS, the nurse is also able to download a new Access database while dialed into the corporate network. This database contains the patient records for the patients who are scheduled to be seen the following day. 

Data Extraction

  • Data – description mapped to a standard table

  • Introduction of the international standard Unicode (16 bit) character set

  • Relationships between attributes in a record

  • Null values:

  • Ability to determine where and how data was transformed

 

The purpose of the data warehouse is to store data that is needed. The questions now become, what is needed, where do we get it, and how of it do we get?

 

Two main theories on sources of data are:

  • Get everything that you can find

  • Get only what you think you need

 

In the old days, computer storage was very much the driving concern of the computer designers. On the physical storage side, data variables were kept to a minimum on the grounds that you got only what was needed and you didn’t want to confuse the users with more data than what they needed (by the designers determination, not the users!)

Controlling redundancies from different data sources.

However, with the price of data storage continues to drop.

The effect that this has on network traffic is very great….

 

Data Quality

Date quality is one of the cornerstones of quality research. Without quality data assurance then the respect for the data warehouse will be lost and we will have wasted a lot of time.

Data quality should be checked at every stage of the process beginning with the collection (input) and going until data analysis.  

Fortunately, many error correction methods can be built into a data entry program. Others can be corrected in a “staging area.” Unfortunately, it must be recognized,  that because of the amount of data that can flow into a data warehouse, almost all “checks” must be done by the computer program.

Some of these are the following:

  • Null fields

  • No data

  • Ranges of values

  • Look up list of values

  • Cross-table inconsistencies

  • Cross-field inconsistencies

  •  Rules for valid entries

  • Accounting balance inconsistencies

  • Inventories checks

  • Personnel checks

  • Spelling

  • Comparison with last entries

  • Comparison with “average” values

  • Time stamps

  • Input personnel

What do you do about the problems that you find?

  • Have the input personnel correct the data before he enters it

  • Set it aside and look up the correct information

It is best to correct the data at the source before any other service are performed.

Standards

  • Patient Identification Number

  • Staff Job Description

  • Terminology

 

Replication

In almost all cases in areas where communicate is not guaranteed, data replication is a better solution than distributed transactions because it is more reliable and cheaper. Distributed transactions, and two-phase commit protocol, is very difficult as the number of participating sites increases. The probability that a local part of a distributed transaction will fail is very high because of the unreliable communicate. 

  • Distribute the Workload Across Multiple Servers

  • Improve Data Availability

  • To move specific subsets of data from a central database server to other databases

  • Tuning and Optimization Through Separation of Online Transaction Processing (OLTP) and Decision Support Systems (DSS)

  • Reduced Network Traffic

  •  Optimized support for changing organizational models provided by flexible distributed data model

 

The field researcher is a good candidate for the use of merge replication. 

 

Example:

During the day, the researcher can use his laptop to query all necessary information from the database (family history, medical history, etc) in order in better make decisions concerning his clients. Afterwards, in the office or at home, he or she again uses the laptop to transmit data to the main database.

 

Problems specific to Replication of data:

Communicate lines in developing countries

 

Backup and Restore

Using a computer different from that used as the operational computer provides backup to the operational data.

  • Standby Server

  • Clustering

Concurrency

By the very nature of the warehouse, the data in the warehouse is not current. How current the data must be will determine the methods for transferring the data from the source.  

            

Immediate Guaranteed consistency (two-phase commit)

Distributed transactions:

Not approbate in developing countries because of the requirement that all database servers must be up and running at all times.

Low latency (short time between updates)

Surveillance systems

            Inventory of medical supplies

            Client load

           

High Latency (long time between updates) 

 

Updating

  • Snapshot
  • Transactional monitoring
  • Replication

Language translation:

 

Back to Top