|
|
|
Index - Major Sections
Site Map
Product and Services _______________ Index - Same Level Subject
Index - Child Subjects |
IntroductionThe total quantity of data being collected today has been estimated to be doubling every two to eight months. How is this data collected, how is this data stored, and how is this data used is the subject of databases is the subject of this section.
The failure to understand that there are difference database systems that serve different purposes is one of the major reasons for the failure to realized the full benefits from a database system. Problems with legacy database systems
With traditional
file systems, up to 80 percent of a typical application program is taken up
with code related to file access (including file definition, record
description and valid values. The programmer has to code the field starting
positions, lengths, and data types. Every time that any data layout changes,
every application program that access that file had to be rewritten.
This means that
it was almost impossible for organizations to share data. One organization
would not necessarily have data formats as other organizations.
Redundant data.
As you can see in the example table, several names and descriptions are continuously repeated. This increases the amount of physical storage required and more importantly the time it takes to take and input the data Repeating groups (limited data tracking) In any design of a flat file while a question may have more than one answer repeating files must be added into the table design. This is hard to code in the application and waste space when the files are not all filed out. At the worst, if enough repeating groups are not designed into the system, information may be left out Inconsistent data When taking data from the same source multiply times, it is very likely that data will be misspelled when a new record is added. As more records are added to the file, the potential for inconsistent data from typing errors increases
Update anomalies Suppose that you realize that you need to change a class or category that has been defined in your data collection. In a flat file you must go through every record individually in order to insure that all the records are updated correctly Delete Anomalies Suppose that you are no longer interested in a class or category that has been defined in your data collection. If all records with this class or category are deleted other important information may be deleted.
Insert Anomalies Suppose that you want to add a new field to your data. Will you have to go back and recode all of your data or do you insert a special code such as unknown for the field in the older records.
Losing the temporal nature of data
Storage of the data
Ability to reuse the data
Determining the relationship between data attributes More research data is collected
without the ability to use the relationship between data. Relational DatabasesThe Relational Database solved the problems of the legacy systems. Types of Database ProcessingThe early databases were designed around one database such that the whole organization was able to use it for all their processing needs. Many of the Enterprise Resource Planning (ERP) systems still use this system. Today it is recognized that a more efficient system can be created by creating one database that serves operational needs and another that serves informational or decision support needs. Operational (Transactional) Operational databases are used to created applications used in the day-to-day operation of the organization. These include scheduling, invoicing, inventory control and other applications that are considered mission critical. They generally require a 1-3 second response time 100% of the time. Operational databases generally use the relational database management system, although object orientated databases are arriving on the scene. Characteristics
Informational (Data Warehouse)
CharacteristicsThe Informational Databases has these characteristics:
Design of the Data WarehouseDenormalization means that you are purposely designing your database so that it is not normalized. This is usually done to maximize performance or to simplify end-user reporting. In a normalized database a query may require many joins to process the request. These additional table joins can be expensive in terms of computer time. More will be said about denormalization in the topic on Data warehousing. Renormalization Techniques
DesignGranularity:
Granularity is the major design issue in the data warehouse environment. It profoundly affects the volume of data that resides in the data warehouse and at the same time affects the type of query that can be answered. The volume of data in a warehouse is traded off against the level of detail of a query. (I-45) Partitioning:Refers to the breakup of data into separate physical units the can be handled independently (I-55). Partitioning gives the user finer detail (I-58) Date Type of symptoms Type of results By geography By age etc The separation of operational and informational databases occurs for many reasons: (Inmon, 1996)
Research IssuesIn general, levels and classifications are not important. Since in the Relation Data, data is collected at the lowest possible level, classifications can be obtained at any time and at any level by "mapping." |
|
|