Thursday, 20 June 2013

SQL Server Database Design Practices and Checklist

Talking with customers and solving their real world problems is something I love doing. This is my day job and I never get tired of this. As we meet customers almost every single day we also have our own feel good factors of suggestions. I am not calling these as best practices explicitly because these are general guidelines and there can be exception to these in your environment for the data set and project that you are working. So read this blog post keeping them in mind. Don’t use them something written on stone but as a mini-checklist used to meet your needs.

This will be Part I of many more parts to come in the future as I gather my thoughts on various suggestions I give customers. Do feel free to drop me a line if you want something covered here.

* These are my perspectives of various DO’s and DONT’s. These can be different and you might have your very own standards. These are *general* tips and can be overridden based on the application under question.
Database Design Tips
This is a high level summary of a number of tips discussed later in this blog.
  1. Do create tables with a primary key.
  2. Do Not create tables without clustered index (typically we will have clustered primary keys).
  3. Do Not enforce foreign key relationships in the database with code. Always use Referential Integrity and foreign key declarations.
  4. Do Not develop a system by writing any code before the database schema is modeled and approved by design team.
  5. Do Not reverse-engineer the database schema after the system is built.
  6. Do have consistent data types between columns that are in a primary/foreign key relationship.
  7. Do make database objects descriptive.
Physical DB Model
A physical data model graphically represents the implementation of a database. It is constrained both by the logical data model it represents and the features of the underlying DB. The process of translating a logical model into a physical model includes the following tasks:
  1. Do create “legal” DB names for entities and attributes.
  2. Do assign appropriate data types and null-ability for each attribute.
  3. Do define primary keys, foreign keys and indexes.
  4. Do define rules and defaults.
  5. Do de-normalize if necessary for performance (typically in a Datamart or data Wharehouse database).
Design for Performance
As mentioned earlier, know what is the type of application we are designing. Is it an OLTP, OLAP or Reporting application. Based on this, designs can vary by miles.
  1. Do define the Primary Key on a single integer column which is typically implemented with an identity property. This keeps the dependent tables with a narrower row width and therefore, more rows per page which results in fewer scans.
  2. Do define the Primary Key as clustered. This keeps the secondary indexes much smaller.
  3. Do create non-clustered indexes on Foreign Key columns.
  4. Do make sure the tables have statistics. Keeping “Auto Update Statistics” and “Auto Create Statistics” options ON in the database is the default. For read only databases, it may be necessary to manually create statistics.
  5. Do select the correct “Recovery Model” for a database.
  6. Do use 100% Fill Factor on read-only or seldom updated tables.
  7. Do Not over-index, make sure you know which indexes will be used.
Naming Conventions
A database consists of a collection of tables that contain data and other objects, such as views, indexes, stored procedures, and triggers, defined to support activities performed with the data. The data stored in a database is usually related to a particular subject or process.
Database objects should be named clearly and unambiguously. The name should provide as much information as possible about the object. A name is used to uniquely identifies the object. Make sure this naming is consistent wherever the object is used throughout the database. Saving keystrokes should never be used as a criterion in selecting a name.
The following rules should be followed for all database object names:
  1. Do Not abbreviate unless absolutely necessary due to length restrictions.  If abbreviations are used, it should remain consistent throughout a project and be documented.
  2. Do Not use special characters when creating database objects.
  3. Do restrict names to alphanumeric characters.
  4. Do make the database objects descriptive.
  5. Do use Pascal Naming by forming names by capitalizing the first letter of each word without embedded spaces (e.g., InvoiceDetail).
Abbreviation Standards
  1. Do avoid using abbreviations.
  2. Do Not use a different letter to start an abbreviation then the first letter of the word being abbreviated (i.e., XFER is not an acceptable abbreviation for Transfer).
  3. Do eliminate vowels first, followed by the least significant consonants of the word.
  4. Double letters can be eliminated unless they are required for clarity.
  5. Do Not use an abbreviation that strongly suggests another word. (Example: the word “parent” could be abbreviated PRNT but this would strongly suggest the word “print.”. A better abbreviation for parent is PARNT.)
  6. Do use consistent abbreviations of similar words. If Charge is CHRG then Change should be CHNG.
  7. Do Not use acronyms if the acronym is not generally known or is subject to multiple interpretations.
  8. Do use abbreviations consistently in the system.
  9. Do Not try to deviate from consistency. I have seen systems mark as _dt for some date field while the datatype is of Varchar. These are great confusions and a recipe for disaster.
I have seen a number of database designs and some of the abbreviations are pathetic and an eye sore. Sometimes when I read them and turn towards the developer – they carry a smile with a look that says: “I didn’t design or name the system :)”.
**** Below are a bunch of naming conventions I have seen. You are free to use them or even ignore them if they don’t fit the bill of your organization / project conventions. ****
Database Names
Database names should use the same convention as described above for database objects: no special characters, alphanumeric characters only and initial capitals for delimiters between words.
  1. Do Not use DB as a suffix for Database name.
  2. Do Not make the database name plural, Example: Use LegalPartner not LegalPartners.
  3. Do use qualifier suffixes for database names like ODS, DataMart, Factory, Staging, Extract and Archive for special type of databases.
  4. Do make the database name mnemonic and/or descriptive.
  5. Do use Pascal Naming by forming names by capitalizing the first letter of each word without embedded spaces or underscores (e.g., LegalPartner).
Table Names
  1. Do Not make the table name plural, Example: Use Organization not Organizations.
  2. Do make the table name mnemonic and/or descriptive.
  3. Do use Pascal Naming by forming names by capitalizing the first letter of each word without embedded spaces or underscores (e.g., BillingDetail).
  4. Do name Associative tables, tables created to resolve a one to many or many to many cardinality, a concatenation of the two referenced table names.
  5. Example: PersonAddress resolves many to many between the Person and Address tables.
Personally I am not a big fan of prefix of tbl to all the tables as it can be easily got from meta data of sys.tables in one shot easily.
The same principle applies to Views. But it is worthwhile to prefix the view based on where it is being used. This can be standard view (vw), Reporting View (rv), ETL view (etl) etc. So feel free to use them appropriately.
Column Names
This can get little complicated and specific to how people design systems. I like a three part name: [Primary][Qualifier][Classifier]. As the names suggests the Primary describes the column a bit more like in EnglishProductName or AlternateProductID. The qualifier can be the descriptive word like in: InventoryInHandQty. The final classifier can give me an idea of what datatype we might be using Qty = Quantity = Integer, ID = Identifier = BigInt, Dt = Data datatype etc.
  1. Do Not use spaces or special characters in column names.
  2. Do use underscores ONLY when making Foreign Key columns that require a Modifier to make the column name more understandable.
  3. Do Not abbreviate the column name.
  4. Do use the table name primary word when creating a primary key.
  5. Do use “ID” as a column name suffix, if it is an INTEGER identifier. If it’s a VARCHAR column it should end with “Code”.
The following attributes can be used where tracking when/who changed a row of data is required.
RowStatus_OptionID – (Optional) Used to mark row as Active/InActive.
Other General recommendation
  1. Do Not use spaces or special characters, underscores are the exception for stored procedures.
  2. Do Not name stored procedure with numbers (e.g. spc_myProc666).
  3. Do spell out the whole column name when defining Indexes.
  4. Do Not number the indexes.
  5. Do use the word “All” for an index that covers all columns in the table.
  6. Do use underscores between the column names help readability.
  7. Do name Primary Keys with PK_ followed by the table name, only.
  8. Do name Foreign Keys with FK_ followed by the child table name, then the column name.
  9. Do Not name Foreign Keys with number suffixes, as in FK_Table_Column01.
  10. Do Not use Double Underscores.
  11. Do name Check Constraints with CK_ followed by the table name, then a colum name.
Final Words
I know this is just a subset of what is available in your development checklist. These are just design list that I have made. The next will be add a security list, development list, deployment list, high-availability practices list etc. I think those will follow in the future.
Post a Comment