Tuesday, February 21, 2012

SQL Server INDEXES

INDEXES:
    • Clustered indexes are usually faster than non-clustered indexes (one could come very close to saying always, but there are exceptions).
    • Only place non-clustered indexes on columns where you are going to get a high level of selectivity(that is, 95 percent or more of the rows are unique)
    • All DML statements (INSERT, DELETE, UPDATE) and SELECT statement can benefit from indexes, but insert, deletes, and updates (remember they use a delete and insert approach) are slowed by indexes. The lookup part of a query is helped by the index, but any thing modifies data will have extra work to do (to maintain the index in addition to the actual data).
    • Indexes take up space.
    • Indexes are used only if the first column in the index is relevant to your query.
    • Indexes can hurt as they help – know why you’re building the index, and don’t build indexes you don’t need.
    • Indexes can provide structured data performance to your unstructured XML data, but keep in mind that, like other indexes, there is overhead involved.
    • When you’re thinking about indexes, ask yourself, these questions:

Questions
Response
Are there a lot of inserts or modifications to this table?
If yes, keep indexes to a minimum. This kind of table usually has modifications done through single record lookups of the primary key – usually, this is the only index you want on the table. If the inserts are non-sequential, think about not having a clustered index.
Is this a reporting table? That is, not many insert, but reports run lots of different ways?
More indexes are fine. Target the clustered index to frequently used information that is likely to be extracted in ranges. OLAP installations will often have many times the number of indexes seen in an OLTP environment.
Is there a high level of selectivity on the data?
If Yes, and it is frequently the target of a WHERE clause, then add that index.
Have I dropped the indexes I no longer need?
If not, why not?
Do I have a maintenance strategy established?
If not, why not?


References sites:

http://searchsqlserver.techtarget.com/generic/0,295582,sid87_gci1239952,00.html

http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx

http://tonesdotnetblog.wordpress.com/2008/05/26/twelve-tips-for-optimising-sql-server-2005-queries/

http://blog.sqlauthority.com/2009/02/07/sql-server-introduction-to-force-index-query-hints-index-hint/ -- Forced indexes.

http://blog.decaresystems.ie/index.php/2007/07/11/index-usage-in-sql-server-2005/ -- Index usage

http://databases.about.com/od/sqlserver/a/indextuning.htm

http://www.sql-server-performance.com/tips/rebuilding_indexes_p1.aspx

http://www.sqlteam.com/article/sql-server-indexes-the-basics



Disabling Indexes:

http://msdn.microsoft.com/en-us/library/ms177406.aspx

SQL SERVER 2005: Points to Ponder

SQL SERVER 2005: Points to Ponder

Built in system databases:
master
Purpose: This db holds a special set of tables (system tables) that keeps track of the system as a whole.
Example:
When you create a new db on the server, an entry is placed in the sysdatabases table in the master db. Note: This db is critical to your system and cannot be deleted.
model
Purpose: As its name implies, it forms a Template for any new db that you create.
Example:
You could add a set of audit tables that you include in every db you build.
You could also create a user groups that would be cloned into every db that was created on the system.
Note: Since this db servers as the template for any other db’s, its required db and must be left on the system; you cannot delete it.
msdb
Purpose: A process named SQL Agent Process stores any system tasks in this db.
Example:
Schedule a stored procedure for one time execution, and yes, it has an entry in msdb. Note: you cannot delete it.
tempdb
Purpose:
Whenever you issue a complex or large query that SQL Server needs to build interim tables to solve, it does so in tempdb.
Whenever you create a temporary table of your own, it is created in tempdb.
Whenever there is a need for data to be stored in temporarily, it’s probably stored in tempdb.
tempdb is very different from any other db. Not only are the objects within it temporary; the database itself is temporary. It has the distinction of being the only db in your system that is completely rebuilt from scratch every time you start your SQL Server.
Note: Creating objects directly in tempdb is different than creating from your own db.
There is no way to add a column to specific location in SQL Server. If you want to move a column to the middle, you need to create a completely new table (with different name), copy the data over to the new table, DROP the existing table, and then rename the new one.
This issue of moving columns around get very sticky indeed. Even some of the tools that supposed to automate this often have problems with it. Why? Well, any foreign key constraints you have that reference this table must first be dropped before you are allowed to delete the current version of the table. That means that you have to drop all your foreign keys, make the changes, and then add all your foreign keys back. It doesn’t end there, however, any indexes you have defined on the old table are automatically dropped when you drop the existing table – that means that you must remember to re-create your indexes as part of the build script to create your new version of the table – yuck!
But wait! There’s more! While we haven’t really looked at views yet, I feel compelled to make a reference here to what happens to your views when you add a column. You should be aware that, even if your view is built using a SELECT * as its base statement, your new column will not appear in your view until you rebuild the view. Column names in views are resolved at the time the view is created for performance reasons. That means any views that have already been created when you add your columns have already resolved using the previous columns list – you must either DROP and re-create the view or use an ALTER VIEW statement to rebuild it.

If you want to add a NOT NULL column after the fact, you have the issue of
what to do with rows that already have NULL values. We have shown the solution to that here by providing a default value. The default is then used to populate the new column for any row that is already in our table.
Before CREATE TABLE Employee(

LastRaiseDate datetime NULL,

)
GO
INSERT INTO Employee…
After
ALTER TABLE Employee
ALTER COLUMN LastRaiseDate datetime NOT NULL DEFAULT ‘2005-01-01’
GO
Comparison operators – these work as they do in pretty much any programming language with couple of notable points:
What constitutes "greater than," "less than,"
and "equal to" can change depending on the collation order you have selected. For example, "ROMEY" = "romey" in places where case-insensitive sort order has been selected, but "ROMEY" <> "romey" in a case-sensitive situation.
!= and <> both mean, "not equal". "!<" and "!>" means "not less than" and "not greater than" respectively.
char and varchar: When n(i.e., size) is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified when using the CAST and CONVERT functions, the default length is 30.

Logical Operators – You can use these to combine multiple conditions into one WHERE clause. NOT is evaluated first, then AND, then OR. If you need to change the evaluation order, you can use parentheses. Note that XOR is not supported.
Note that, once we use a GROUP BY, every column in the SELECT list has to either be part of the GROUP BY or it must be an aggregate.

Actually, all aggregate functions ignore NULLs except for COUNT (*). Think about this for a minute – it can have a very significant impact on your results. Many users expect NULL values in numeric fields to be treated as zero when performing averages, but a NULL does not equal zero, and as such, shouldn’t be used as one. If you perform an AVG or other aggregating unless you manipulating them into a non-NULL value inside the function (using COALESE () or ISNULL for example). But beware of this when coding in T-SQL and when designing your database.

Why does it matter in your database design? Well, it can have a bearing on whether you decide to allow NULL values in a field or not by thinking about the way that queries are likely to be run against the database and how you want your aggregates to work.
Note that you can use DISTINCT with any aggregate function, although I question whether many of the functions have any practical use for it. For example, I can’t imagine why you would want an average of just the DISTINCT rows.

ALL is perhaps best understood as being the opposite of DISTINCT. Where DISTINCT is used to filter out duplicate rows, ALL says to include every row. ALL is the default FOR any SELECT statement except for situations where there is a UNION. We can discuss the differences between ALL and UNION in further points.

On the issue of, "Should I specifically state a value for all columns or not?" I really recommend naming every column every time – even if you just use the DEFAULT keyword or explicitly state NULL. DEFAULT will tell SQL Server to use whatever the default value is for that column (if there isn’t one, you’ll get an error).

What’s nice about this is the readability of code – this way it’s really clear what you are doing. In addition, I find that explicitly addressing every column leads to fewer bugs.
A normalized database is one where the data has been broken out from larger tables into many smaller tables for the purpose of eliminating repeating data, saving space, improving performance, and increasing data integrity. It’s great stuff and vital to relational database; however it also means that you wind up getting your data from here, there, and everywhere.

Note: In joining two or more tables, one good principle to adopt on is this to select what you need, and need what you select. What I’m getting at here is that every additional record or column that you return takes up additional network bandwidth, and often, additional query processing on your SQL Server. The upshot is that selecting unnecessary information hurts performance not only for the current user, but also for every other user of the system and for user of the network on which the SQL Server resides.

Select only the columns that you are going to be using and make your WHERE clause as restrictive as possible.

Keep in mind that, however that an IDENTITY and a PRIMARY KEY are completely separate notions – that is, just because you have an IDENTITY column doesn’t mean that the value is unique(for example, you can reset the seed value and count back up through values you’ve used before). IDENTITY values are usually used as the PRIMARY KEY column, but they don’t have to be used that way.

It’s worth nothing that an identity column works sequentially. That is, once you’ve set a seed (the starting point) and the increment, your values only go up (or down if you set the increment to a negative number). There is no automatic mechanism to go back and fill in the numbers for any rows you may have deleted. If you want to fill in blank spaces like that, you need to use SET IDENTITY_INSERT {ON|OFF}, which allows you to turn off (yes, turning it "on" turns it off – that is, you are turning on the ability to insert your own values, which has the effect of turning off the automatic value) the identity process for inserts from the current connection. This can, however, create havoc if you’re not careful or if people are still trying to use the system as you do this, so tread carefully.

I highly recommend explicitly stating the NULL option for every column in every table you ever build. Why? As I mentioned before, there are a large number of different settings that can affect what the system uses a default for the nullability of a column. If you rely on these defaults, then you may find later that your scripts don’t seem to work right (because you or someone else has changed a relevant setting without realizing its full effect).

Note on "Computed Columns":
You cannot use a sub query, and the values cannot come from a different table.
Prior to SQL Server 2000, you could not use a computed column as any part of any key (primary, foreign, or unique) or with a default constraint. For SQL Server 2005, you can now use a computed column in constraints (you must flag the computed column as persisted if you do this however).

Another problem for previous versions (but added back in SQL Server 2000) is the ability to create indexes on computed columns. You can create indexes on computed columns, but there are special steps you must take to do so.
Technically speaking, you can execute a sproc by simply calling it(without using the EXEC keyword). The problem is that this only works consistently if the sproc being called is the first statement of any kind in the batch. Just having , for example, sp_help would have worked in the place of the code, but if you tried to run a SELECT statement before it – it would blow up on you. Not using EXEC leads to very unpredictable behavior and should be avoided.
For the code that you write directly in T-SQL, SQL Server will automatically adjust to the padded spaces issue – that is, an ‘xx’ placed in a char(5) will be treated as being equal (if compared) to an ‘xx’ placed in a varchar (5) – this is not, however, true in your client APIs such as ADO and ADO.NET. If you connect to a char (5) in ADO, then an ‘xx’ will evaluate to ‘xx ‘ with three spaces after it – If you compare it to ‘xx’, it will evaluate to False. An ‘xx’ placed in a varchar (5), however, will automatically have any trailing spaces trimmed, and comparing it to ‘xx’ in ADO will evaluate to true.
Note: If SET ANSI_PADDING is OFF when either CREATE TABLE or ALTER TABLE is executed, a char column that is defined as NULL is handled as varchar.
Note that the actual column being referenced must have either a PRIMARY KEY or a UNIQUE constraint defined on it.
Unlike primary keys, we are not limited to just one foreign key on a table. We can have between 0 and 253 foreign keys in each table. The only limitation is that a given column can reference only one foreign key. However, you can have more than one column participate in a single foreign key. A given column that is the target of a reference by a foreign key can also be referenced by many tables.
By default, you cannot delete a record or update the referenced column in a referenced table if that record is referenced from the dependent table. If you want to be able to delete or update such a record, then you need to set up a CASCADE action for the delete and/or update.

There is something of a gotcha when creating foreign keys that reference the same table the foreign key is being defined on. Foreign keys of this nature are not allowed to have declarative CASCADE actions. The reason for this restriction is to avoid cyclical updates or deletes – that is, situations where the first update causes another, which in turn tries to update the first. The result could be a never-ending loop.
Don’t confuse the space that an extent is taking up with the space that a database takes up. Whatever space is allocated to the database is what you’ll see disappear from your disk drive’s available space number. An extent is merely how things are, in turn, allocated within the total space reserved by the database.
Unlike a primary key, a UNIQUE constraint does not automatically prevent you from having a NULL value. Whether NULLs are allowed or not depends on how you set the NULL option for that column in the table. Keep in mind, however, that, if you do allow NULLs, you will be to insert only one of them(although a NULL doesn’t equal to another NULL, they are still considered to be duplicate from the perspective of a UNIQUE constraint)

A table can have maximum of 1024 columns.
In SQL Server, we have a number of different collation options available to us. Among these options are:
Binary: Sorts by the numeric representation of the character(for example, in ASCII, a space is represented by the number 32, the letter "D" is 68, but the letter "d" is 100). Because everything is numeric, this is the fastest option – unfortunately, it’s also not all the way in which people think, and can also really wreak havoc with comparisons in your WHERE clause.
Dictionary order: This sorts things just as you would expect to see in a dictionary, with a twist you can set a number of different additional options to determine sensitivity to case, accent, and character set.
The point here is that what happens in your indexes depends on the collation information you have established for your data. Collation can be set at the database and column level, so you have a fairly fine granularity in your level of control.
Once the collation order has been set, changing it is very non-trivial (but possible), so be certain of the collation order you want before you set it.
See if you can spot why the use of EXISTS in the WHERE clause of your queries has so much to offer performance-wise where it fits the problem. When you use the EXISTS operator, SQL Server stops as soon as it finds one record that matches the criteria. If you had a million record table, and it found a matching record on the third record, then use of EXISTS option would have saved you the reading of 999,997 records! "NOT EXISTS" works in much the same way.

Derived Tables: A derived table is made up of the columns and rows of a result set from a query. Keep in mind that derived tables aren’t the solutions for everything. For example, if the result set is going to be fairly large and you’re going to have lots of joined records, then you may want to look at using temporary table and building an index on it (derived tables have no indexes). Every situation is different, but now you have one more tool in your arsenal.
Needless to say, page splits can have a very negative impact on system performance, and are characterized by behavior where your process on the server seems to just for a new second (while the pages are being split and re-written).
The main things to understand about a DEFAULT constraint are that:
Defaults are only used in INSERT statements – they are ignored for UPDATE and DELETE statements.
If any value is supplied in the INSERT, then the default is not used.
If no value is supplied, the default will always be used.
You can mix and match any and all constraints as you choose – just be careful not to create constraints that have mutually exclusive conditions. For example, don’t have one constraint that says that col1 > col2 and another one that says that col2 > col1. SQL Server will let you do this, and you wouldn’t see the issues with it until run time.
You cannot disable PRIMARY KEY or UNIQUE constraints.
Temporally disabling an existing constraint: We can run an ALTER statement with an option called NOCHECK that turns off the constraint in question:
Eg: ALTER TABLE Customers NOCHECK CONSTRAINT CN_CustomerPhoneNo To enable the above constraint again, use CHECK option rather than NOCHECK:
Eg: ALTER TABLE Customers CHECK CONSTRAINT CN_CustomerPhoneNo
Choosing what to Use:

Restriction Pros Cons
Constraints Fast.
Can reference other columns. Happens before the command occurs.
ANSI-complaint.
Must be refined for each table.
Can’t reference other tables.
Can’t be bound to data types.
Rules, Defaults Independent objects.
Reusable.
Can be bound to data types.
Happens before the command occurs.
Slightly slower.
Can’t reference across columns.
Can’t reference other tables.
Really meant for backward compatibility only!!!
Triggers Ultimate flexibility.
Can reference other columns and other tables. Can even use .NET to reference information that is external to your SQL Server.
Happens after the command occurs.
High overhead.

Auditing: Displaying Existing Code:
sp_helptext
The syscomments system table
Using sp_helptext is highly preferable, as when new release come out, it will automatically be updated for changes to the system tables.
Not that the restriction on using the ORDER BY clause applies only to the code within the view. Once the view is created, you can still use an ORDER BY clause when you reference the view in a query.
Batches: A batch is a grouping of T-SQL statements into one logical unit. All of the statements within a batch are combined into one execution plan, so all statements are parsed together and must pass a validation of the syntax or none of the statements will execute. Note, however, that this does not prevent runtime errors, from happening. In the event of a runtime error, any statement that has been executed prior to the runtime error will still be in effect. To summarize, if a statement fails at parse-time, then nothing runs. If a statement fails at runtime, then all statements until the statement that generated the error have already run.

For ex:

DECLARE @MyVariable varchar(50)

SELECT @MyVariable = 'hi this is a sample test on batches'

PRINT 'Done with the first Batch....'

GO

PRINT @MyVariable -- This generates an error since @MyVariable isn't declared in this batch

PRINT 'Done with the second Batch'

GO

PRINT 'Done with third Batch' -- Notice that this still gets executed even after the error.

GO

Result:

Done with the first Batch....

Msg 137, Level 15, State 2, Line 2

Must declare the scalar variable "@MyVariable".

Done with third Batch
To separate a script into multiple batches, we make use of the GO statement. The GO statement:
Must be on its own line (nothing other than a comment can be on the same like). For ex: SELECT * FROM Customers WHERE CustomerID = ‘ALFKI’ GO
All statements combined into one execution plan and sent to server independently of any other batches.
Is not a T-SQL command, but, rather, a command recognized by the various SQL Server command utilities(OSQL, ISQL, and the Query Analyzer)
Is tried to execute in a pass-through query using ODBC, OLE DB, ADO, ADO.NET or any other access method, you’ll get an error message back from the server. The GO is merely an indicator to the tool that it is time to end the current batch, and time, if appropriate, to start a new one.
INDEXES:
Clustered indexes are usually faster than non-clustered indexes (one could come very close to saying always, but there are exceptions).
Only place non-clustered indexes on columns where you are going to get a high level of selectivity(that is, 95 percent or more of the rows are unique)
All DML statements (INSERT, DELETE, UPDATE) and SELECT statement can benefit from indexes, but insert, deletes, and updates (remember they use a delete and insert approach) are slowed by indexes. The lookup part of a query is helped by the index, but any thing modifies data will have extra work to do (to maintain the index in addition to the actual data).
Indexes take up space.
Indexes are used only if the first column in the index is relevant to your query.
Indexes can hurt as they help – know why you’re building the index, and don’t build indexes you don’t need.
Indexes can provide structured data performance to your unstructured XML data, but keep in mind that, like other indexes, there is overhead involved.
When you’re thinking about indexes, ask yourself, these questions:

Questions Response
Are there a lot of inserts or modifications to this table? If yes, keep indexes to a minimum. This kind of table usually has modifications done through single record lookups of the primary key – usually, this is the only index you want on the table. If the inserts are non-sequential, think about not having a clustered index.
Is this a reporting table? That is, not many insert, but reports run lots of different ways? More indexes are fine. Target the clustered index to frequently used information that is likely to be extracted in ranges. OLAP installations will often have many times the number of indexes seen in an OLTP environment.
Is there a high level of selectivity on the data? If Yes, and it is frequently the target of a WHERE clause, then add that index.
Have I dropped the indexes I no longer need? If not, why not?
Do I have a maintenance strategy established? If not, why not?

SQLCMD
Keep in mind that the options/flags provided by this command utility are case sensitive.