Feel Tech Evolution: 04/01/2012

Friday, 27 April 2012

Introduction to Transact SQL User-Defined Functions

Introduction

SQL Server 2000 supports User-Defined Functions (UDFs) - one or more Transact-SQL statements that can be used to encapsulate code for reuse. User-defined functions cannot make a permanent changes to the data or modify database tables. UDF can change only local objects for this UDF, such as local cursors or variables.

There are three types of UDF in SQL Server 2000:

Scalar functions

Inline table-valued functions

Multistatement table-valued functions

Scalar functions return a single data value (not a table) with RETURNS clause. Scalar functions can use all scalar data types, with exception of timestamp and user-defined data types.

Inline table-valued functions return the result set of a single SELECT statement.

Multistatement table-valued functions return a table, that was built with many TRANSACT-SQL statements.
User-defined functions can be invoked from a query like built-in functions such as OBJECT_ID, LEN, DATEDIFF, or can be executed through an EXECUTE statement like stored procedures.

1 . Scalar Functions

Scalar functions return a data type such as int, money, varchar, real, etc. They can be used anywhere a built-in SQL function is allowed. The syntax for a scalar function is the following:




CREATE FUNCTION [owner_name.] function_name
 ( [{ @parameter_name  scalar_parameter_type [ = default]} [,..n]])
RETURNS scalar_return_type
[WITH <function_option> >::={SCHEMABINDING | ENCRYPTION]
[AS]
BEGIN
 function_body
 RETURN scalar_expression
END


A simple scalar function to cube a number would look like this:


CREATE FUNCTION dbo.Cube( @fNumber float)
 RETURNS float
AS
BEGIN
 RETURN(@fNumber * @fNumber * @fNumber)
END


Surprisingly, user-defined functions (UDFs) support recursion.  Here is an
SQL Server 2000 UDF using the standard factorial example:


CREATE FUNCTION dbo.Factorial ( @iNumber int )
RETURNS INT
AS
BEGIN
DECLARE @i int


 IF @iNumber <= 1
  SET @i = 1
 ELSE
  SET @i = @iNumber * dbo.Factorial( @iNumber - 1 )
RETURN (@i)
END

On Other Example

The following scalar function returns a maximum amount of books sold for a specified title. This function should be executed in pubs database. If the title has no sales, the UDF will return zero.
/* scalar function to return max amount of books sold ** for the specified title_id */ CREATE FUNCTION dbo.udf_max_copies_sold_for_title (@title_id CHAR(6)) RETURNS INT AS BEGIN

DECLARE @qty INT
-- initialize the variable at 0:
SELECT @qty = 0

SELECT 
    @qty = MAX(qty) 
FROM sales 
WHERE
title_id = @title_id

/* If there are no books sold for title_id specified 
** then return 0:
*/
RETURN ISNULL(@qty, 0)

END

Now we can execute this function as follows:

SELECT dbo.udf_max_copies_sold_for_title ('bu2075')

Results:

----------- 
55

2. In-Line Table Functions

In-line table functions are functions that return the output of a single SELECT statement as a table data type. Since this type of function returns a table, the output can be used in joins of queries as if it was a standard table. The syntax for an in-line table function is as follows:






CREATE FUNCTION [owner_name.] function_name
 ( [{ @parameter_name  scalar_parameter_type [ = default]} [,..n]])
RETURNS TABLE
[WITH <function_option>::={SCHEMABINDING | ENCRYPTION}]
RETURN [(] select_statement [)]


An in-line function to return the authors from a particular state would
look like this:


CREATE FUNCTION dbo.AuthorsForState(@cState char(2) )
RETURNS TABLE
AS
RETURN (SELECT * FROM Authors WHERE state = @cState)

Other Example

The following in-line function returns the product name, quantity ordered, and supplier for the top five best sellers in a particular year in the Northwind database:

/* in-line function to return quantity, product name and supplier
** for the top 5 best-selling products within the specified year
*/

CREATE FUNCTION dbo.udf_top_5_best_sellers (@year INT)
RETURNS TABLE
AS
RETURN

SELECT TOP 5 
     SUM(quantity) AS quantity_ordered, 
     a.ProductID, 
     ProductName, 
     CompanyName AS Supplier
FROM [order details] a INNER JOIN products b
   ON a.productid = b.productid
   INNER JOIN suppliers c ON c.supplierid = b.supplierid
   INNER JOIN orders d ON d.orderid = a.orderid
      AND DATEPART(YEAR, OrderDate) = @year
GROUP BY a.productid, productname, CompanyName
ORDER BY 1 DESC

We can execute this function as follows:

SELECT * FROM dbo.udf_top_5_best_sellers (1998)

Results:

quantity_ordered	ProductID	ProductName	Supplier
659	13	Konbu	Mayumi's
546	24	Guaraná Fantástica	Refrescos Americanas LTDA
542	60	Camembert Pierrot	Gai pturage
513	59	Raclette Courdavault	Gai pturage
414	7	Uncle Bob's Organic Dried Pears	Grandma Kelly's Homestead

Alternatively, we can SELECT only the desired columns from this function:

SELECT ProductID, 
        ProductName 
FROM dbo.udf_top_5_best_sellers (1998)

Results:

ProductID	ProductName
13	Konbu
24	Guaraná Fantástica
60	Camembert Pierrot
59	Raclette Courdavault
7	Uncle Bob's Organic Dried Pears

3. Multistatement Table Functions
Multistatement table functions are similar to stored procedures except that they return a table. This type of function is suited to address situations where more logic is required than can be expressed in a single query. The following is the syntax for a multistatement table function:




CREATE FUNCTION [owner_name.] function_name
 ( [{ @parameter_name  scalar_parameter_type [ = default]} [,..n]])
RETURNS TABLE
[WITH <function_option> >::={SCHEMABINDING | ENCRYPTION]
[AS]
BEGIN
 function_body
 RETURN
END

Hierarchical data, such as an organizational structure, is an example of data that cannot be gathered in a single query. The Northwind Company database's Employees table contains a field called ReportsTo that contains the EmployeeID of the employee's manager. GetManagerReports is a multistatement table function that returns a list of the employees who report to a specific employee, either directly or indirectly.




CREATE FUNCTION dbo.GetManagerReports ( @iEmployeeID int )
RETURNS @ManagerReports TABLE
   (
    EmployeeID   int,
    EmployeeFirstName       nvarchar(10),
    EmployeeLastName   nvarchar(20),
 Title    nvarchar(30),
 TitleOfCourtesy   nvarchar(25),
 Extension   nvarchar(4),
    ManagerID   int
   )
AS
BEGIN


 DECLARE


@iRowsAdded int,   -- Counts rows added to
-- table with each iteration
 @PREPROCESSED   tinyint,  -- Constant
for record prior
-- to processing
 @PROCESSING   tinyint,  -- Constant
for record
-- being processed
 @POSTPROCESSED   tinyint  -- Constant for
records that
-- have been processed


 SET @PREPROCESSED   = 0
 SET @PROCESSING   = 1
 SET @POSTPROCESSED   = 2


 DECLARE @tblReports TABLE (
-- Holds employees added with each pass thru source employees table
  EmployeeID    int,
     EmployeeFirstName        nvarchar(10),
     EmployeeLastName    nvarchar(20),
  Title     nvarchar(30),
  TitleOfCourtesy    nvarchar(25),
  Extension    nvarchar(4),
     ManagerID    int,
  ProcessedState    tinyint
 DEFAULT 0
  )




 --Begin by adding employees who report to the Manager directly.
 INSERT INTO @tblReports
 SELECT EmployeeID, FirstName, LastName, Title, TitleOfCourtesy,
Extension, ReportsTo, @PREPROCESSED
  FROM Employees
 WHERE ReportsTo = @iEmployeeID


 --Save number of direct reports
 SET @iRowsAdded = @@ROWCOUNT


 -- Loop through Employees table until no more iterations are necessary
 -- (e.g., no more rows added) to add all indirect reports.
 WHILE @iRowsAdded > 0
 BEGIN
  --Set just added employees ProcessedState to PROCESSING
-- (for first pass)
  UPDATE @tblReports
   SET ProcessedState = @PROCESSING
  WHERE ProcessedState = @PREPROCESSED


  --Add employees who report to Managers in
-- ProcessedState = PROCESSING
  INSERT INTO @tblReports
SELECT e.EmployeeID, e.FirstName, e.LastName, e.Title,
e.TitleOfCourtesy, e.Extension, e.ReportsTo, @PREPROCESSED
   FROM Employees e
INNER JOIN @tblReports r ON e.ReportsTo = r.EmployeeID
  WHERE r.ProcessedState = @PROCESSING
   AND e.ReportsTo <> @iEmployeeID




  --Save number of rows added for this iteration
  SET @iRowsAdded = @@ROWCOUNT


--Set ProcessedState to POSTPROCESSED for Managers whose
--reports were added in this iteration
  UPDATE @tblReports
   SET ProcessedState = @POSTPROCESSED
  WHERE ProcessedState = @PROCESSING
 END


 --Save all data to output table
 INSERT INTO @ManagerReports
SELECT EmployeeID, EmployeeFirstName, EmployeeLastName, Title,
TitleOfCourtesy, Extension, ManagerID
  FROM @tblReports
    RETURN
END

The output of this function would be used in the same manner as a standard table. Figure 1 demonstrates JOINING the output of GetManagerReports with the Employees table to produce a listing of the organizational structure of the Northwind Company:
User function used in JOIN query

Figure 1: User function used in JOIN query.

Other Example

The following multi-statement function accepts a delimited list as a parameter parses the string and returns the table containing each of the values in the parameter.

/* multi-statement function to parse
** a delimited list and turn it into a rowset
** DEFAULT delimiter is comma
*/

CREATE FUNCTION dbo.parse_comma_delimited_integer(
    @list    VARCHAR(8000), 
    @delimiter VARCHAR(10) = ',')

-- table variable that will contain values
RETURNS @tablevalues TABLE (
    item INT)
AS
BEGIN 
  DECLARE @item VARCHAR(255)
 
  /* Loop over the commadelimited list */
  WHILE (DATALENGTH(@list) > 0)
    BEGIN 
      IF CHARINDEX(@delimiter,@list) > 0
        BEGIN
          SELECT @item =
             SUBSTRING(@list,1,(CHARINDEX(@delimiter, @list)-1))
          SELECT @list =
                SUBSTRING(@list,(CHARINDEX(@delimiter, @list) +
                DATALENGTH(@delimiter)),DATALENGTH(@list))
        END

     ELSE
        BEGIN
          SELECT @item = @list
          SELECT @list = NULL
        END
 
   -- Insert each item into temp table
    INSERT @tablevalues (
        item)
    SELECT item = CONVERT(INT, @item) 
   END

 RETURN
END

Now we can execute this function as follows:

SELECT * FROM dbo.parse_comma_delimited_integer 
('39, 549, 324, 3556, 24, 2132, 345', ',')

Results:

item
39
549
324
3556
24
2132
345

The following multi-statement UDF written for the Northwind database finds customers who have bought products in large quantities during a given year. The UDF accepts the year and amount of total sales and returns the customer name, the product that they have bought in large quantities, and all sales representatives who interfaced with these customers within the given year:

CREATE FUNCTION dbo.udf_top_customers_and_reps (
  @year INT, 
  @amount INT)

RETURNS 
  @temp TABLE(
    ProductName VARCHAR(200), 
    CategoryName VARCHAR(200),
    CustomerID CHAR(5),
    CompanyName VARCHAR(200),
    TotalSales INT, 
    EmployeeNames VARCHAR(2000))

AS 
BEGIN

/* populate the temp table with customers that have purchased any product 
* with total sales greater than the specified amount, within the given year
*/

INSERT @temp (
     ProductName ,
     CategoryName ,
     CompanyName ,
     CustomerID ,
     TotalSales 
)


SELECT 
    ProductName, 
    CategoryName, 
    CompanyName, 
    b.CustomerID, 
    SUM(a.UnitPrice * quantity) AS total_sales 
FROM [order details] a 
INNER JOIN orders b ON a.orderid = b.orderid
INNER JOIN products c ON c.productid = a.productid
INNER JOIN customers d ON d.customerid = b.customerid
INNER JOIN categories e ON e.CategoryID = c.CategoryID
WHERE DATEPART(YEAR, OrderDate) = @year
GROUP BY c.ProductName, e.CategoryName, b.CustomerID, 
d.CompanyName, DATEPART(YEAR, OrderDate) 
HAVING SUM(a.UnitPrice * quantity) > @amount
ORDER BY ProductName

/* now get all the employees that have been involved with the customers
* in the given year and return them in a comma-delimited list
*/


DECLARE @CustomerID CHAR(5), 
    @EmployeeName VARCHAR(200)

DECLARE @Employees TABLE (
 EmployeeName VARCHAR(80))

DECLARE CustomerCursor CURSOR FOR
SELECT CustomerID
FROM @temp a 

OPEN CustomerCursor
FETCH NEXT FROM CustomerCursor INTO @CustomerID

WHILE @@FETCH_STATUS = 0
BEGIN

INSERT @employees
SELECT DISTINCT FirstName + ' ' + LastName 
 FROM Employees a INNER JOIN Orders b ON a.EmployeeID = b.EmployeeID
 AND DATEPART(YEAR,OrderDate) = @year
 WHERE b.CustomerID = @CustomerID


/* create a comma-delimited list of employees */

SELECT @EmployeeName = ''
 SELECT @EmployeeName = @EmployeeName + ', ' + EmployeeName
 FROM @Employees 

SELECT @EmployeeName = SUBSTRING(@EmployeeName, 3, LEN(@EmployeeName)-2)



UPDATE @temp
SET EmployeeNames = @EmployeeName
WHERE CustomerID = @CustomerID

DELETE @Employees
FETCH NEXT FROM CustomerCursor INTO @CustomerID
END
CLOSE CustomerCursor
DEALLOCATE CustomerCursor


RETURN
END

The following execution of the function returns customers who have bought any product in amounts of over $10,000 during 1997:

SELECT * FROM dbo.udf_top_customers_and_reps (1997, 10000)

Results:

ProductName	CategoryName	CustomerID	Companyname	TotalSales	EmployeeNames
Côte de Blaye	Beverages	MEREP	Mère Paillarde	10329	Janet Leverling, Laura Callahan, Margaret Peacock, Michael Suyama, Nancy Davolio, Robert King
Côte de Blaye	Beverages	SIMOB	Simons bistro	10540	Andrew Fuller, Margaret Peacock, Robert King

Invoking Functions

There are a few syntax idiosyncrasies to observe when invoking user-defined functions. SQL Server 2000 provides some system-level user-defined functions in the Master database. These system functions are invoked with a slightly different syntax than ones that you would create. System functions that return a table have the following syntax:


 




::function_name ([argument_expr], [,...])


System functions that return a scalar value use this syntax:


function_name ([argument_expr], [,...])


User-created scalar and rowset functions are invoked in exactly the same
manner.  The syntax for invoking a user-created function looks like this:


[database_name] owner_name. function_name ([argument_expr], [,...])

Limitations

User-defined functions do have some restrictions placed upon them. Not every SQL statement or operation is valid within a function. The following lists enumerate the valid and invalid function operations: Valid:

Assignment statements
Control-flow statements
Variable declarations
SELECT statements that modify local variables
Cursor operations that fetch into local variables
INSERT, UPDATE, DELETE statement that act upon local table variables

Invalid:

Built-in, nondeterministic functions such as GetDate()
Statements that update, insert, or delete tables or views
Cursor fetch operations that return data to the client

Performance Implications

Using UDFs will impact the performance of queries. The extent of the performance impact depends upon how and where you use a user-defined function. This is also true of built-in functions. However, UDFs have the potential for more dramatic performance hits than built-in functions. You should exercise caution when implementing functions in your queries and perform benchmarking tests to insure that the benefits of using your functions exceed the performance costs of using them.

Uses for Functions

Check constraints

Scalar user-defined functions can be used as check constraints for columns in table definitions. As long as an argument to the function is a constant or built-in function or an argument is the column being checked, the function may be used to validate the column's value. These UDF check constraints provide the ability to use more complex logic for determining acceptable column values than Boolean expressions or LIKE patterns would allow.
The following function validates that a serial number follows a specific pattern and portions of the serial number match a specific algorithm for a product type.


CREATE FUNCTION dbo.ValidSerialNumber( @nvcSerialNumber nvarchar(50))
RETURNS BIT
AS
BEGIN

DECLARE
@bValid  BIT,
 @iNumber  INT

 --default to invalid serial number
 SET @bValid = 0

 --Home Office Product
 IF @nvcSerialNumber LIKE
'[0-9][A-Z][0-9][A-Z][0-9][0-9][0-9][0-9]'
 BEGIN
  SET @iNumber = CONVERT(int,RIGHT(@nvcSerialNumber,4))
  IF @iNumber % 7 = 2
  BEGIN
   SET @bValid = 1
  END
 END

 -- Video Game
 IF @nvcSerialNumber LIKE '[0-9][0-9][0-9][A-Z][0-9]5[A-Z]'

 BEGIN
  SET @iNumber = CONVERT(int,LEFT(@nvcSerialNumber, 3))
  IF @iNumber % 2 = 0
  BEGIN
   SET @bValid = 1
  END
 END

RETURN ( @bValid)
END

CREATE TABLE dbo.CustomerProduct
(
 CustomerID  int  NOT NULL PRIMARY KEY,
 ProductID  int  NOT NULL,
 SerialNumber  nvarchar(20) NOT NULL
 CHECK(dbo.ValidSerialNumber(SerialNumber) = 1)
)

Computed columns

Scalar functions can be used to compute column values in table definitions. Arguments to computed column functions must be table columns, constants, or built-in functions. This example shows a table that uses a Volume function to compute the volume of a container:



CREATE FUNCTION dbo.Volume (   @dHeight  decimal(5,2),
@dLength  decimal(5,2),
@dWidth decimal(5,2) )
RETURNS decimal (15,4)
AS
BEGIN
 RETURN (@dHeight * @dLength * @dWidth )
END


CREATE TABLE dbo.Container
(
 ContainerID  int  NOT NULL
 PRIMARY KEY,
 MaterialID  int  NOT NULL
REFERENCES Material(MaterialID),
 ManufacturerID  int  NOT NULL
      REFERENCES
Manufacturer(ManufacturerID)
 Height   decimal(5,2) NOT NULL,
 Length   decimal(5,2) NOT NULL,
 Width   decimal(5,2)  NOT NULL,
 Volume AS
  (
   dbo.Volume( Height, Length, Width )
  )
)

You should note that computed columns might be excluded from being indexed if user-defined functions determine their value. An index can be created on the computed column if the user-defined function is deterministic (e.g., always returns the same value given the same input).

Default constraints

Default column values can be set with user-defined functions. UDFs can be very useful when a hard-coded value or built-in function does not suffice. For example, if a doctor's office wished to save a patient's appointment preference, a user-defined function could calculate the default day and time in a function by using the current date/time when the patient's record was created. If the patient's record were created on a Friday at 10:34 AM the AppointmentPref column would default to "Friday at 10:00" using the following function:


CREATE FUNCTION dbo.AppointmentPreference ( @dtDefaultDateTime datetime )
RETURNS nvarchar(50)
AS
BEGIN

DECLARE @nDay   nvarchar(10),
 @nHour  nvarchar(6),
 @nPreference  nvarchar(50),
 @tiHour  tinyint

 --Get date description
 SET @nDay = DATENAME(dw, @dtDefaultDateTime )

 --Find current hour
 SET @tiHour = DATEPART(hh,@dtDefaultDateTime)

 --Use only 12-hour times
 IF @tiHour > 12
 BEGIN
  SET @tiHour = @tiHour - 12
 END

 --Don't allow appointments during lunch
 IF @tiHour = 12
 BEGIN
  SET @tiHour = 1
 END

 -- These are invalid hours
 IF @tiHour IN(5,6,7,8)
 BEGIN
  SET @tiHour = 4
 END

 --Create preference text
 SET @nPreference = RTRIM(@nDay) + '''s at ' +
CONVERT(varchar(2),@tiHour) + ':00'
 RETURN ( @nPreference)
END


CREATE TABLE dbo.Patient
(
 PatientID  int  NOT NULL PRIMARY KEY
      IDENTITY,
 FirstName  nvarchar(20) NOT NULL,
 LastName  nvarchar(20) NOT NULL,
 Addr1   nvarchar(50),
 Addr2   nvarchar(50),
 City   nvarchar(50),
 State   nvarchar(2),
 ZipCode  nvarchar(20),
 HomePhone  nvarchar(20),
 WorkPhone  nvarchar(20),
AppointmentPref nvarchar(50)
DEFAULT (dbo.AppointmentPreference(GETDATE()))

)

Assignments

Scalar user-defined functions can be used to assign values to scalar variables. They may be used in any situation where a scalar built-in function may be used.



DECLARE @fCube  float

SET @fCube = dbo.Cube( 4.5 )

Control flow
Scalar user-defined functions may be used to control program flow when used in Boolean expressions.



IF dbo.ValidSerialNumber('002A15A') = 1
 PRINT 'Yes'

ELSE
 PRINT 'No'

Case expressions
User-defined functions that return a scalar value can be used in any of the cases of CASE expressions. The following example uses the DailySpecial function in a case function to determine what to display for a given day:



CREATE FUNCTION dbo.DailySpecial( @nvcDay nvarchar(10))
RETURNS NVARCHAR(100)
AS
BEGIN

DECLARE @nvcSpecial nvarchar(100)

 SET @nvcDay = UPPER(@nvcDay)

 IF @nvcDay = 'SUNDAY'
  SET @nvcSpecial = 'Roast beef with green beans and
baked potato'

 IF @nvcDay = 'MONDAY'
  SET @nvcSpecial = 'Chopped beef with green bean
casserole'

 IF @nvcDay = 'TUESDAY'
  SET @nvcSpecial = 'Beef stew'

 IF @nvcDay = 'WEDNESDAY'
  SET @nvcSpecial = 'Beef pot pie'

 IF @nvcDay = 'THURSDAY' OR @nvcDay = 'FRIDAY'
  OR @nvcDay = 'SATURDAY'
  SET @nvcSpecial = 'Beef surprise'

RETURN ( @nvcSpecial )
END


--Use output of DailySpecial function
SELECT   Special =
    CASE DateName(dw, getdate())
WHEN 'Sunday' THEN dbo.DailySpecial('Sunday')
         WHEN 'Monday' THEN
dbo.DailySpecial('Monday')
          WHEN 'Tuesday' THEN
dbo.DailySpecial('Tuesday')
 WHEN 'Wednesday' THEN dbo.DailySpecial('Wednesday')
         ELSE 'It's a mystery!'

      END

Alternative to views
Rowset functions, functions that return tables, can be used as alternatives to read-only views. Since views are limited to a single select statement, user-defined functions can provide greater functionality than a view. Powerful logic can be used when determining the records returned, which is not possible within a view. Also, views cannot accept parameters so a separate view must be created if the WHERE clause must change for different search c riteria.
Alternative to temporary tables Rowset functions can be used as alternatives to temporary tables. For example, if you wished to find authors in the Pubs database who sold no books in a particular state, you could create a couple of functions that would generate the desired resultset.
To find the quantity of books sold for a particular author in a given state you could write the following function:



CREATE FUNCTION dbo.AuthorPoularityForState ( @cState Char(2))
RETURNS  TABLE

AS
RETURN (
        SELECT a.au_id, a.au_fname, a.au_lname,
               SUM(s.qty) AS QTY
        FROM Authors a
  INNER JOIN TitleAuthor ta ON a.au_id = ta.au_id
  INNER JOIN Titles t ON t.title_id = ta.title_id
  INNER JOIN Sales s ON s.title_id = t.title_id
  INNER JOIN Stores st ON st.Stor_ID = s.stor_id
 WHERE  st.state = @cState
 GROUP BY a.au_id, a.au_fname, a.au_lname
 ORDER BY QTY DESC, a.au_lname, a.au_fname
   )
END

You could then create another function that would use the output from the first function to find the authors in a particular state that have not had any sales:



CREATE FUNCTION dbo.ReallyBoringAuthorsForState ( @cState Char(2) )
RETURNS TABLE

RETURN(
SELECT a.au_id as AuthorID, a.au_fname AS AuthorFirstName,
 a.au_lname AS AuthorLastName, @cState AS State
 FROM AuthorPopularityForState(@cState ) pa
 RIGHT JOIN authors a ON pa.AuthorID = a.au_id
 WHERE IsNull(pa.UnitsSold, 0) = 0

)

The following SQL statement would list the California authors who had not sold any books:



SELECT AuthorLastName, AuthorFirstName, AuthorID
 FROM ReallyBoringAuthorsForState('CA')
ORDER BY AuthorLastName, AuthorFirstName

List of authors without book sales
Before the release of SQL Server 2000, temporary tables would likely have been used to generate the interim data to be used for the final query output. By using functions instead of temporary tables, potential table name concurrency problems are avoided. Functions also offer greater code reuse than temporary tables.

Tuesday, 24 April 2012

Progamiing C# Coding Standards | Project standered Code

1. Introduction

Anybody can write code. With a few months of programming experience, you can write 'working applications'. Making it work is easy, but doing it the right way requires more work, than just making it work.

Believe it, majority of the programmers write 'working code', but not ‘good code'. Writing 'good code' is an art and you must learn and practice it.

Everyone may have different definitions for the term ‘good code’. In my definition, the following are the characteristics of good code.

· Reliable

· Maintainable

· Efficient

Most of the developers are inclined towards writing code for higher performance, compromising reliability and maintainability. But considering the long term ROI (Return On Investment), efficiency and performance comes below reliability and maintainability. If your code is not reliable and maintainable, you (and your company) will be spending lot of time to identify issues, trying to understand code etc throughout the life of your application.

2. Purpose of coding standards and best practices

To develop reliable and maintainable applications, you must follow coding standards and best practices.

The naming conventions, coding standards and best practices described in this document are compiled from our own experience and by referring to various Microsoft and non Microsoft guidelines.

There are several standards exists in the programming industry. None of them are wrong or bad and you may follow any of them. What is more important is, selecting one standard approach and ensuring that everyone is following it.

3. How to follow the standards across the team

If you have a team of different skills and tastes, you are going to have a tough time convincing everyone to follow the same standards. The best approach is to have a team meeting and developing your own standards document. You may use this document as a template to prepare your own document.

Distribute a copy of this document (or your own coding standard document) well ahead of the coding standards meeting. All members should come to the meeting prepared to discuss pros and cons of the various points in the document. Make sure you have a manager present in the meeting to resolve conflicts.

Discuss all points in the document. Everyone may have a different opinion about each point, but at the end of the discussion, all members must agree upon the standard you are going to follow. Prepare a new standards document with appropriate changes based on the suggestions from all of the team members. Print copies of it and post it in all workstations.

After you start the development, you must schedule code review meetings to ensure that everyone is following the rules. 3 types of code reviews are recommended:

Peer review – another team member review the code to ensure that the code follows the coding standards and meets requirements. This level of review can include some unit testing also. Every file in the project must go through this process.
Architect review – the architect of the team must review the core modules of the project to ensure that they adhere to the design and there is no “big” mistakes that can affect the project in the long run.
Group review – randomly select one or more files and conduct a group review once in a week. Distribute a printed copy of the files to all team members 30 minutes before the meeting. Let them read and come up with points for discussion. In the group review meeting, use a projector to display the file content in the screen. Go through every sections of the code and let every member give their suggestions on how could that piece of code can be written in a better way. (Don’t forget to appreciate the developer for the good work and also make sure he does not get offended by the “group attack”!)

4. Naming Conventions and Standards

Note :

The terms Pascal Casing and Camel Casing are used throughout this document.

Pascal Casing - First character of all words are Upper Case and other characters are lower case.

Example: BackColor

Camel Casing - First character of all words, except the first word are Upper Case and other characters are lower case.

Example: backColor

1. Use Pascal casing for Class names

public class HelloWorld

{

...

}

2. Use Pascal casing for Method names

void SayHello(string name)

{

...

}

3. Use Camel casing for variables and method parameters

int totalCount = 0;

void SayHello(string name)

{

       string fullMessage
= "Hello " + name;

...

}

4. Use the prefix “I” with Camel Casing for interfaces ( Example: IEntity )

5. Do not use Hungarian notation to name variables.

In earlier days most of the programmers liked it - having the data type as a prefix for the variable name and using m_ as prefix for member variables. Eg:

string m_sName;

int nAge;

However, in .NET coding standards, this is not recommended. Usage of data type and m_ to represent member variables should not be used. All variables should use camel casing.

Some programmers still prefer to use the prefix m_ to represent member variables, since there is no other easy way to identify a member variable.

6. Use Meaningful, descriptive words to name variables. Do not use abbreviations.

Good:

string address

int salary

Not Good:

string nam

string addr

int sal

7. Do not use single character variable names like i, n, s etc. Use names like index, temp

One exception in this case would be variables used for iterations in loops:

for ( int i = 0; i < count; i++ )

{

...

}

If the variable is used only as a counter for iteration and is not used anywhere else in the loop, many people still like to use a single char variable (i) instead of inventing a different suitable name.

8. Do not use underscores (_) for local variable names.

9. All member variables must be prefixed with underscore (_) so that they can be identified from other local variables.

10. Do not use variable names that resemble keywords.

11. Prefix boolean variables, properties and methods with “is” or similar prefixes.

Ex: private bool _isFinished

12. Namespace names should follow the standard pattern

<company name>.<product name>.<top level
module>.<bottom level module>

13. Use appropriate prefix for the UI elements so that you can identify them from the rest of the variables.

There are 2 different approaches recommended here.

a. Use a common prefix ( ui_ ) for all UI elements. This will help you group all of the UI elements together and easy to access all of them from the intellisense.

b. Use appropriate prefix for each of the ui element. A brief list is given below. Since .NET has given several controls, you may have to arrive at a complete list of standard prefixes for each of the controls (including third party controls) you are using.

Control	Prefix
Label	lbl
TextBox	txt
DataGrid	dtg
Button	btn
ImageButton	imb
Hyperlink	hlk
DropDownList	ddl
ListBox	lst
DataList	dtl
Repeater	rep
Checkbox	chk
CheckBoxList	cbl
RadioButton	rdo
RadioButtonList	rbl
Image	img
Panel	pnl
PlaceHolder	phd
Table	tbl
Validators	val

14. File name should match with class name.

For example, for the class HelloWorld, the file name should be helloworld.cs (or, helloworld.vb)

15. Use Pascal Case for file names.

5. Indentation and Spacing

1. Use TAB for indentation. Do not use SPACES. Define the Tab size as 4.

2. Comments should be in the same level as the code (use the same level of indentation).

Good:

// Format a message and
display

string fullMessage =
"Hello " + name;

DateTime currentTime =
DateTime.Now;

string message =
fullMessage + ", the time is : " + currentTime.ToShortTimeString();

MessageBox.Show ( message
);

Not Good:

// Format a message and
display

string
fullMessage = "Hello " + name;

DateTime
currentTime = DateTime.Now;

string message =
fullMessage + ", the time is : " + currentTime.ToShortTimeString();

MessageBox.Show
( message );

3. Curly braces ( {} ) should be in the same level as the code outside the braces.

4. Use one blank line to separate logical groups of code.

Good:

bool SayHello ( string name )

{

string fullMessage = "Hello " + name;

DateTime currentTime = DateTime.Now;

              string message = fullMessage + ", the time is :
" + currentTime.ToShortTimeString();

MessageBox.Show ( message );

if ( ... )

{

// Do something

// ...

return false;

}

return true;

}

Not Good:

bool SayHello (string name)

{

string fullMessage = "Hello " + name;

DateTime currentTime = DateTime.Now;

              string message = fullMessage + ", the time is :
" + currentTime.ToShortTimeString();

MessageBox.Show ( message );

if ( ... )

{

// Do something

// ...

return false;

}

return true;

}

5. There should be one and only one single blank line between each method inside the class.

6. The curly braces should be on a separate line and not in the same line as if, for etc.

Good:

if ( ... )

{

// Do something

}

Not Good:

if ( ... ) {

// Do something

}

7. Use a single space before and after each operator and brackets.

Good:

if ( showResult == true )

{

for ( int i = 0; i < 10; i++ )

{

}

Not Good:

if(showResult==true)

{

for(int i= 0;i<10;i++)

{

}

8. Use #region to group related pieces of code together. If you use proper grouping using #region, the page should like this when all definitions are collapsed.

9. Keep private member variables, properties and methods in the top of the file and public members in the bottom.

6. Good Programming practices

1. Avoid writing very long methods. A method should typically have 1~25 lines of code. If a method has more than 25 lines of code, you must consider re factoring into separate methods.

2. Method name should tell what it does. Do not use mis-leading names. If the method name is obvious, there is no need of documentation explaining what the method does.

Good:

void SavePhoneNumber ( string phoneNumber )

{

// Save the phone number.

}

Not Good:

// This method will save the phone number.

void SaveDetails ( string phoneNumber )

{

// Save the phone number.

}

3. A method should do only 'one job'. Do not combine more than one job in a single method, even if those jobs are very small.

Good:

// Save the address.

       SaveAddress (  address
);

       // Send an email to the supervisor to inform that the address
is updated.

SendEmail ( address, email );

void SaveAddress ( string address )

{

// Save the address.

// ...

}

void SendEmail ( string address, string email )

{

              // Send an email to inform the supervisor that the address
is changed.

// ...

}

Not Good:

       //
Save address and send an email to the supervisor to inform that 

// the address is
updated.

       SaveAddress
( address, email );

       void
SaveAddress ( string address, string email )

{

              //
Job 1.

              //
Save the address.

              //
...

              //
Job 2.

              //
Send an email to inform the supervisor that the address is changed.

              //
...

}

4. Use the c# or VB.NET specific types (aliases), rather than the types defined in System namespace.

int age; (not Int16)

string name; (not String)

object contactInfo; (not Object)

Some developers prefer to use types in Common Type System than language specific aliases.

5. Always watch for unexpected values. For example, if you are using a parameter with 2 possible values, never assume that if one is not matching then the only possibility is the other value.

Good:

If ( memberType ==
eMemberTypes.Registered )

{

// Registered user… do something…

}

else if ( memberType ==
eMemberTypes.Guest )

{

// Guest user... do something…

}

else

{

              // Un expected
user type. Throw an exception

              throw new
Exception (“Un expected value “ + memberType.ToString() + “’.”)

              // If we introduce
a new user type in future, we can easily find 

// the
problem here.

}

Not Good:

If ( memberType ==
eMemberTypes.Registered )

{

              // Registered
user… do something…

}

else

{

              // Guest user...
do something…

// If we introduce another
user type in future, this code will 

// fail and will not be
noticed.

}

6. Do not hardcode numbers. Use constants instead. Declare constant in the top of the file and use it in your code.

However, using constants are also not recommended. You should use the constants in the config file or database so that you can change it later. Declare them as constants only if you are sure this value will never need to be changed.

7. Do not hardcode strings. Use resource files.

8. Convert strings to lowercase or upper case before comparing. This will ensure the string will match even if the string being compared has a different case.

if ( name.ToLower() == “john” )

{

//…

}

9. Use String.Empty instead of “”

Good:

If ( name == String.Empty
)

{

// do something

}

Not Good:

If ( name == “” )

{

// do something

}

10. Avoid using member variables. Declare local variables wherever necessary and pass it to other methods instead of sharing a member variable between methods. If you share a member variable between methods, it will be difficult to track which method changed the value and when.

11. Use enum wherever required. Do not use numbers or strings to indicate discrete values.

Good:

enum MailType

{

Html,

PlainText,

Attachment

}

void SendMail (string message, MailType mailType)

{

switch ( mailType )

{

case MailType.Html:

// Do something

break;

case MailType.PlainText:

// Do something

break;

case MailType.Attachment:

// Do something

break;

default:

// Do something

break;

}

Not Good:

void SendMail (string message, string mailType)

{

switch ( mailType )

{

case "Html":

// Do something

break;

case "PlainText":

// Do something

break;

case "Attachment":

// Do something

break;

default:

// Do something

break;

}

12. Do not make the member variables public or protected. Keep them private and expose public/protected Properties.

13. The event handler should not contain the code to perform the required action. Rather call another method from the event handler.

14. Do not programmatically click a button to execute the same action you have written in the button click event. Rather, call the same method which is called by the button click event handler.

15. Never hardcode a path or drive name in code. Get the application path programmatically and use relative path.

16. Never assume that your code will run from drive "C:". You may never know, some users may run it from network or from a "Z:".

17. In the application start up, do some kind of "self check" and ensure all required files and dependancies are available in the expected locations. Check for database connection in start up, if required. Give a friendly message to the user in case of any problems.

18. If the required configuration file is not found, application should be able to create one with default values.

19. If a wrong value found in the configuration file, application should throw an error or give a message and also should tell the user what are the correct values.

20. Error messages should help the user to solve the problem. Never give error messages like "Error in Application", "There is an error" etc. Instead give specific messages like "Failed to update database. Please make sure the login id and password are correct."

21. When displaying error messages, in addition to telling what is wrong, the message should also tell what should the user do to solve the problem. Instead of message like "Failed to update database.", suggest what should the user do: "Failed to update database. Please make sure the login id and password are correct."

22. Show short and friendly message to the user. But log the actual error with all possible information. This will help a lot in diagnosing problems.

23. Do not have more than one class in a single file.

24. Have your own templates for each of the file types in Visual Studio. You can include your company name, copy right information etc in the template. You can view or edit the Visual Studio file templates in the folder C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\ItemTemplatesCache\CSharp\1033. (This folder has the templates for C#, but you can easily find the corresponding folders or any other language)

25. Avoid having very large files. If a single file has more than 1000 lines of code, it is a good candidate for refactoring. Split them logically into two or more classes.

26. Avoid public methods and properties, unless they really need to be accessed from outside the class. Use “internal” if they are accessed only within the same assembly.

27. Avoid passing too many parameters to a method. If you have more than 4~5 parameters, it is a good candidate to define a class or structure.

28. If you have a method returning a collection, return an empty collection instead of null, if you have no data to return. For example, if you have a method returning an ArrayList, always return a valid ArrayList. If you have no items to return, then return a valid ArrayList with 0 items. This will make it easy for the calling application to just check for the “count” rather than doing an additional check for “null”.

29. Use the AssemblyInfo file to fill information like version number, description, company name, copyright notice etc.

30. Logically organize all your files within appropriate folders. Use 2 level folder hierarchies. You can have up to 10 folders in the root folder and each folder can have up to 5 sub folders. If you have too many folders than cannot be accommodated with the above mentioned 2 level hierarchy, you may need re factoring into multiple assemblies.

16. Make sure you have a good logging class which can be configured to log errors, warning or traces. If you configure to log errors, it should only log errors. But if you configure to log traces, it should record all (errors, warnings and trace). Your log class should be written such a way that in future you can change it easily to log to Windows Event Log, SQL Server, or Email to administrator or to a File etc without any change in any other part of the application. Use the log class extensively throughout the code to record errors, warning and even trace messages that can help you trouble shoot a problem.

17. If you are opening database connections, sockets, file stream etc, always close them in the finally block. This will ensure that even if an exception occurs after opening the connection, it will be safely closed in the finally block.

18. Declare variables as close as possible to where it is first used. Use one variable declaration per line.

19. Use StringBuilder class instead of String when you have to manipulate string objects in a loop. The String object works in weird way in .NET. Each time you append a string, it is actually discarding the old string object and recreating a new object, which is a relatively expensive operations.

Consider the following example:

public string
ComposeMessage (string[] lines)

{

   string
message = String.Empty; 

   for (int
i = 0; i < lines.Length; i++)

{

      message
+= lines [i]; 

}

   return
message;

}

In the above example, it may look like we are just appending to the string object ‘message’. But what is happening in reality is, the string object is discarded in each iteration and recreated and appending the line to it.

If your loop has several iterations, then it is a good idea to use StringBuilder class instead of String object.

See the example where the String object is replaced with StringBuilder.

public string
ComposeMessage (string[] lines)

{

    StringBuilder
message = new StringBuilder();

    for
(int i = 0; i < lines.Length; i++)

{

       message.Append(
lines[i] ); 

}

    return
message.ToString();

}

7. Architecture

1. Always use multi layer (N-Tier) architecture.

2. Never access database from the UI pages. Always have a data layer class which performs all the database related tasks. This will help you support or migrate to another database back end easily.

3. Use try-catch in your data layer to catch all database exceptions. This exception handler should record all exceptions from the database. The details recorded should include the name of the command being executed, stored proc name, parameters, connection string used etc. After recording the exception, it could be re thrown so that another layer in the application can catch it and take appropriate action.

4. Separate your application into multiple assemblies. Group all independent utility classes into a separate class library. All your database related files can be in another class library.

8. ASP.NET

1. Do not use session variables throughout the code. Use session variables only within the classes and expose methods to access the value stored in the session variables. A class can access the session using System.Web.HttpCOntext.Current.Session

2. Do not store large objects in session. Storing large objects in session may consume lot of server memory depending on the number of users.

3. Always use style sheet to control the look and feel of the pages. Never specify font name and font size in any of the pages. Use appropriate style class. This will help you to change the UI of your application easily in future. Also, if you like to support customizing the UI for each customer, it is just a matter of developing another style sheet for them

9. Comments

Good and meaningful comments make code more maintainable. However,

1. Do not write comments for every line of code and every variable declared.

2. Use // or /// for comments. Avoid using /* … */

3. Write comments wherever required. But good readable code will require very less comments. If all variables and method names are meaningful, that would make the code very readable and will not need many comments.

4. Do not write comments if the code is easily understandable without comment. The drawback of having lot of comments is, if you change the code and forget to change the comment, it will lead to more confusion.

5. Fewer lines of comments will make the code more elegant. But if the code is not clean/readable and there are less comments, that is worse.

6. If you have to use some complex or weird logic for any reason, document it very well with sufficient comments.

7. If you initialize a numeric variable to a special number other than 0, -1 etc, document the reason for choosing that value.

8. The bottom line is, write clean, readable code such a way that it doesn't need any comments to understand.

9. Perform spelling check on comments and also make sure proper grammar and punctuation is used.

10. Exception Handling

1. Never do a 'catch exception and do nothing'. If you hide an exception, you will never know if the exception happened or not. Lot of developers uses this handy method to ignore non significant errors. You should always try to avoid exceptions by checking all the error conditions programmatically. In any case, catching an exception and doing nothing is not allowed. In the worst case, you should log the exception and proceed.

2. In case of exceptions, give a friendly message to the user, but log the actual error with all possible details about the error, including the time it occurred, method and class name etc.

3. Always catch only the specific exception, not generic exception.

Good:

void ReadFromFile ( string fileName )

{

try

{

// read from file.

}

catch (FileIOException ex)

{

// log error.

                     // 
re-throw exception depending on your case.

throw;

}

Not Good:

void ReadFromFile ( string
fileName )

{

try

{

// read from file.

}

catch (Exception ex)

{

      // Catching general exception is bad...
we will never know whether 

      // it was a file error or some other
error.             

// Here you are hiding an exception.

      // In this case no one will ever know
that an exception happened.

return "";

}

4. No need to catch the general exception in all your methods. Leave it open and let the application crash. This will help you find most of the errors during development cycle. You can have an application level (thread level) error handler where you can handle all general exceptions. In case of an 'unexpected general error', this error handler should catch the exception and should log the error in addition to giving a friendly message to the user before closing the application, or allowing the user to 'ignore and proceed'.

5. When you re throw an exception, use the throw statement without specifying the original exception. This way, the original call stack is preserved.

Good:

catch

{

       // do
whatever you want to handle the exception 

throw;

}

Not Good:

catch (Exception ex)

{

       // do
whatever you want to handle the exception 

throw ex;

}

6. Do not write try-catch in all your methods. Use it only if there is a possibility that a specific exception may occur and it cannot be prevented by any other means. For example, if you want to insert a record if it does not already exists in database, you should try to select record using the key. Some developers try to insert a record without checking if it already exists. If an exception occurs, they will assume that the record already exists. This is strictly not allowed. You should always explicitly check for errors rather than waiting for exceptions to occur. On the other hand, you should always use exception handlers while you communicate with external systems like network, hardware devices etc. Such systems are subject to failure anytime and error checking is not usually reliable. In those cases, you should use exception handlers and try to recover from error.

7. Do not write very large try-catch blocks. If required, write separate try-catch for each task you perform and enclose only the specific piece of code inside the try-catch. This will help you find which piece of code generated the exception and you can give specific error message to the user.

8. Write your own custom exception classes if required in your application. Do not derive your custom exceptions from the base class SystemException. Instead, inherit from ApplicationException.

Subscribe to: Posts ( Atom )