<< 如何选择主键:业务主键还是代理主键? | 首页 | MySQL性能调优:最佳实践,FAQs,How-To,文章,博客 >>

数据库模型Schema设计的必要考虑

Considerations for schema design

The schema design for a database affects its usability and performance in many ways, so it is important to make the initial investment in time and research to design a database that meets the needs of its users. This section is not intended to provide a detailed guide to database design, but only to present some ideas to consider in designing a database.

A well-designed schema takes into account the following considerations:

  • What are the processes of the business?

    Identify the main processes of the business; for example, taking orders for the product, filling out insurance claims, or tracking promotions. These processes are different for every business, but they must be clearly identified and defined in order to create a useful database. The people who know the processes are the people who work in the business, and interviews are essential to determine these processes.

  • What do the users want to accomplish with the database?

    The database should reflect the business, both in what it measures and tracks and in the terminology used to describe the facts and dimensions of the business. Interviews with managers and users will reveal what they want to know, how they measure the business, what criteria they use to make decisions, and what words they use to describe these things. This information helps determine the contents of the fact and dimension tables.

  • Where will the data come from?

    The data to populate the tables in the database must be complete enough to be useful and must be valid, consistent data. An analysis of the proposed input data and its sources will reveal whether the available data can support the proposed schema.

  • What are the dimensions of the business and their attributes that will be reflected by the dimension tables?

    Independent dimensions should be represented by separate tables. If dimensions are not independent, they can be combined in a single table. Attributes are usually textual and discrete values; for example, product descriptions or geographic locations. They are used to form query constraints and to determine report breaks. The interviews and data analysis will provide guidance in setting up these tables.

  • Are the dimensions going to change over time?

    If a dimension changes frequently, it probably should be measured as a fact, not stored as a dimension.

  • What facts should be measured?

    Facts are usually numerical and continuous values; for example, revenue or inventory. Facts that are additive can be summed to produce valid measures in reports. For example, sales for each month are additive and can be summed to produce year-to-date totals. Month-end inventory balances, however, are not additive in the sense that a yearly total of month-end inventory balances is of dubious value, but a monthly average might be meaningful.

  • Is a family of fact tables needed?

    Facts that are measured with different dimensions or use different timing should be stored in separate tables. For example, a single database can be used for orders, shipments, and manufacturing. Although the facts measured in each area of the business are different, they share some but not all of the same dimensions.

  • What is the granularity of the facts?

    Granularity refers to the level of detail of the information stored in each row of the fact table. Each row should hold the same type of data. For example, each row could contain daily sales by store by product or daily line items by store.

    Differing data granularities can be handled by using multiple fact tables (daily, monthly, and yearly tables) or by modifying a single table so that a granularity flag (a column to indicate whether the data is a daily, monthly, or yearly amount) can be stored along with the data. Also consider the amounts of data, space, and performance requirements in deciding how to handle different granularities.

  • How will changes be handled, and how important is historical information?

    If change occurs infrequently and if historical information is not very important, dimension tables can be modified to reflect only the new reality without any loss of useful data. However, if previous history is important, dimension tables can be modified to reflect both the old and new conditions. If a dimension changes frequently, perhaps it should be considered time dependent and include a time-based attribute; for example, month, quarter, or year.

Schema building blocks

The following figures illustrate some common schema examples. Tables named Fact or Factx represent fact (referencing) tables. The other tables represent dimension (referenced) tables. The following figures apply to both single-star and multistar schemas.

A schema can consist of a single dimension table.

Figure 19. Single dimension table
The Market table

A schema can be a star schema with one fact table and one dimension table.

Figure 20. Star schema with one dimension table
The Market table connects to the Fact table.

A schema can be a star schema with one fact table and several dimension tables.

Figure 21. Star schema with several dimension tables
The Period, Product, and Market tables connect to the Fact table.

A schema can be a multiple star schema, with a family of fact tables that share some, but not necessarily all, dimension tables.

Figure 22. Star schema with several fact tables
The Period table connects to the Fact1 and Fact1a tables. The Product table connects to the Fact1, Fact1a, and Fact2 tables. The Market table connects to the Fact1, Fact1a, and Fact2 tables. The SalesPerson table connects to Fact2.

A schema can be an extended star schema with dimension tables that reference other dimension tables (outboard tables).

Figure 23. Star schema with outboard tables
Period, Product, and Market are connected to Fact1. Country and Type both connect to Market.

A schema can be a star schema with a fact table that contains multiple foreign keys that reference single dimension tables.

Figure 24. Star schema with multiple foreign keys
Period has two connections to Fact 1. Product has three connections to Fact 1. Market has a single connection to Fact1.

Example: Salad dressing database

This example illustrates how the schema design affects both usability and usefulness of the database.

This database tracks the sales of salad dressing products in supermarkets at weekly intervals over a four-year period and is a typical consumer-goods marketing database. The salad dressing product category contains 14,000 items at the universal product code (UPC) level. Data is summarized for each of 120 geographic areas (markets) in the U.S. and for each of 208 weekly time periods spanning four years.

The salad dressing database has one fact table, Sales, and three dimension tables: Product, Week, and Market, as illustrated in the following figure.

Figure 25. Salad dressing database example
Product table (14,000 records) connects to Sales table (3,800,000 records). Period table (208 records) and Market table (120 records) connect to Sales table.

Each record in the Sales fact table contains a field for each of the three dimensions: Product, Period, and Market. The columns in the Sales table containing these fields are the foreign keys whose concatenated values give each row in the Sales table a unique identifier. Sales also contains seven additional fields that contain values for measures of interest to market analysts.

Each dimension table describes a business dimension and contains one primary key and some attribute columns for that dimension.

标签 :



发表评论 发送引用通报