SQL Database Design and Performance
A normalized database prevents functional dependencies in the data so that updating the database is easy and efficient. However, querying the database might require many joins of tables to combine information. As the number of join tables increases, the query running time increases significantly. Therefore, a normalized database might not always be the best choice. A database with the appropriate amount of denormalization reduces the number of tables that have to join together, without adding too much complication to the updating process. This is frequently a good compromise.
There are other kinds of database denormalization, also. For example, suppose you have two tables in a database: Orders and Order Details. The Orders table contains information about a customer's order. The individual products in each order are contained in the Order Details table. Suppose you want to query the total dollar amount for each order. First you have to determine the dollar amount for each product (units * unit price – applicable discount). Then you have to group the amounts by order. Here is what the query looks like:
SELECT "Order ID", SUM("Unit Price" * Quantity * (1.0 - Discount))
AS Total FROM "Order Details"
GROUP BY "Order ID"
Order ID Total
----------------------------------------
10000 108
10001 1363.15000915527
10002 731.800003051758
10003 498.180023193359
10004 3194.19999694824
10005 173.400009155273
10006 87.2000007629395
The calculation for this query is not trivial. For a large set of orders, the query can take a long time to run. The alternative is to calculate the dollar amount of the order at the time it is placed, and then store that amount in a column within the Orders table. With this approach, you have to query only the pre-computed column to return the information that you need:
SELECT "Order ID", "Order Total" AS Total FROM Orders
When you design your tables, it helps to understand the tradeoffs of using variable length columns versus fixed length columns. Variable length columns reduce database size because they take only what is required to store the actual value. Fixed length columns always take maximum space defined by the schema, even when the actual value is empty. The downside for variable length columns is that some operations are not as efficient as those on fixed length columns. For example, if a variable length column starts small and an UPDATE causes it to grow significantly, the record might have to be relocated. Additionally, frequent updates cause data pages to become more fragmented over time. Therefore, we recommend that you use fixed length columns when data lengths do not vary too much and when frequent updates are performed.
The number of rows that a page can hold depends on how compact each row is. A page can hold more rows if the rows are smaller. Therefore, a single disk operation on a table with compact rows can retrieve more rows, making the operation more effective. Additionally, more rows fit in the storage engine cache, potentially improving the hit ratio. Compact rows also prevent wasted space on data pages. This is common with larger rows.
Consider this extreme example: If the record size is somewhat bigger than half of a data page, almost half the space on each data page is wasted.Some database designers opt for wide table design and port their mainframe database schema down to the device. This might not be an efficient design. One possible approach is to break up the most critical tables. Suppose you have a table that has some columns with very stable values and others that change frequently. It makes sense to split the table into two: one with the frequently-referenced columns, and the other with the stable columns. By creating two tables, you have all the benefits of smaller row length. The tradeoff is that a join is required to combine the information.
An index is an ordered subset of the table on which it is created. It permits fast range lookup and sort order. Smaller index keys take less space and are more effective that larger keys. It is a particularly good practice to make the primary key compact because it is frequently referenced as a foreign key in other tables. If there is no natural compact primary key, you can use an identity column implemented as an integer instead.
An index with one or only a few key columns is called a narrow index. An index with many key columns is called a wide index. A wide index is frequently associated with a large key length. An extreme example is an index that includes every column in the table. By creating such an index, you effectively make a duplicate of the original table. This is inefficient, both in terms of database size and query performance.
When you create a publication in SQL Server 2008, you can choose two options for the articles in your publication. These options, which control filtering and the flow of data between the Publisher and the Subscriber, are:
- Download Only (Read only)
There might be situations where the data you want to have on your smart device is stored in lookup tables only. For example, your users might have to browse the company directory on their smart device, but should not be able to edit and change this information. The Download-only article type is suited for this usage. It is smaller, because no metadata is stored on the device, and reduces network traffic during synchronization. - Well-partitioned
With well-partitioned articles in SQL Server 2008, the changes that are uploaded are mapped to only the single partition ID. This is faster, but has several limitations:- Each row in the article must belong to only one partition ID.
- Each article can only be published to a single publication.
- The subscriber cannot insert rows that do not belong to its partition ID.
- Filtering is also affected when using well-partitioned articles. The following limitations apply to filtering:
- A subscriber cannot update columns that are involved in filtering.
- In a join filter hierarchy, a regular article cannot be the parent of a well-partitioned article.
- The join filter in which a well-partitioned article is the child must have the join_unique_key set to a value of 1.
- Each article can have only one subset or join filter. The article can have a subset filter and be the parent of a join filter, but cannot have a subset filter and be the child of a join filter.
0 comments:
Post a Comment