SQL SERVER 2005: Points to Ponder
Built in system databases:
Example:
Example:
You could also create a user groups that would be cloned into every db that was created on the system.
Example:
Whenever you create a temporary table of your own, it is created in tempdb.
Whenever there is a need for data to be stored in temporarily, it’s probably stored in tempdb.
tempdb is very different from any other db. Not only are the objects within it temporary; the database itself is temporary. It has the distinction of being the only db in your system that is completely rebuilt from scratch every time you start your SQL Server.
This issue of moving columns around get very sticky indeed. Even some of the tools that supposed to automate this often have problems with it. Why? Well, any foreign key constraints you have that reference this table must first be dropped before you are allowed to delete the current version of the table. That means that you have to drop all your foreign keys, make the changes, and then add all your foreign keys back. It doesn’t end there, however, any indexes you have defined on the old table are automatically dropped when you drop the existing table – that means that you must remember to re-create your indexes as part of the build script to create your new version of the table – yuck!
But wait! There’s more! While we haven’t really looked at views yet, I feel compelled to make a reference here to what happens to your views when you add a column. You should be aware that, even if your view is built using a SELECT * as its base statement, your new column will not appear in your view until you rebuild the view. Column names in views are resolved at the time the view is created for performance reasons. That means any views that have already been created when you add your columns have already resolved using the previous columns list – you must either DROP and re-create the view or use an ALTER VIEW statement to rebuild it.
If you want to add a NOT NULL column after the fact, you have the issue of
…
LastRaiseDate datetime NULL,
…
)
GO
INSERT INTO Employee…
ALTER TABLE Employee
ALTER COLUMN LastRaiseDate datetime NOT NULL DEFAULT ‘2005-01-01’
GO
and "equal to" can change depending on the collation order you have selected. For example, "ROMEY" = "romey" in places where case-insensitive sort order has been selected, but "ROMEY" <> "romey" in a case-sensitive situation.
!= and <> both mean, "not equal". "!<" and "!>" means "not less than" and "not greater than" respectively.
Logical Operators – You can use these to combine multiple conditions into one WHERE clause. NOT is evaluated first, then AND, then OR. If you need to change the evaluation order, you can use parentheses. Note that XOR is not supported.
Note that, once we use a GROUP BY, every column in the SELECT list has to either be part of the GROUP BY or it must be an aggregate.
Actually, all aggregate functions ignore NULLs except for COUNT (*). Think about this for a minute – it can have a very significant impact on your results. Many users expect NULL values in numeric fields to be treated as zero when performing averages, but a NULL does not equal zero, and as such, shouldn’t be used as one. If you perform an AVG or other aggregating unless you manipulating them into a non-NULL value inside the function (using COALESE () or ISNULL for example). But beware of this when coding in T-SQL and when designing your database.
Why does it matter in your database design? Well, it can have a bearing on whether you decide to allow NULL values in a field or not by thinking about the way that queries are likely to be run against the database and how you want your aggregates to work.
Note that you can use DISTINCT with any aggregate function, although I question whether many of the functions have any practical use for it. For example, I can’t imagine why you would want an average of just the DISTINCT rows.
ALL is perhaps best understood as being the opposite of DISTINCT. Where DISTINCT is used to filter out duplicate rows, ALL says to include every row. ALL is the default FOR any SELECT statement except for situations where there is a UNION. We can discuss the differences between ALL and UNION in further points.
On the issue of, "Should I specifically state a value for all columns or not?" I really recommend naming every column every time – even if you just use the DEFAULT keyword or explicitly state NULL. DEFAULT will tell SQL Server to use whatever the default value is for that column (if there isn’t one, you’ll get an error).
What’s nice about this is the readability of code – this way it’s really clear what you are doing. In addition, I find that explicitly addressing every column leads to fewer bugs.
A normalized database is one where the data has been broken out from larger tables into many smaller tables for the purpose of eliminating repeating data, saving space, improving performance, and increasing data integrity. It’s great stuff and vital to relational database; however it also means that you wind up getting your data from here, there, and everywhere.
Note: In joining two or more tables, one good principle to adopt on is this to select what you need, and need what you select. What I’m getting at here is that every additional record or column that you return takes up additional network bandwidth, and often, additional query processing on your SQL Server. The upshot is that selecting unnecessary information hurts performance not only for the current user, but also for every other user of the system and for user of the network on which the SQL Server resides.
Select only the columns that you are going to be using and make your WHERE clause as restrictive as possible.
Keep in mind that, however that an IDENTITY and a PRIMARY KEY are completely separate notions – that is, just because you have an IDENTITY column doesn’t mean that the value is unique(for example, you can reset the seed value and count back up through values you’ve used before). IDENTITY values are usually used as the PRIMARY KEY column, but they don’t have to be used that way.
Note on "Computed Columns":
Prior to SQL Server 2000, you could not use a computed column as any part of any key (primary, foreign, or unique) or with a default constraint. For SQL Server 2005, you can now use a computed column in constraints (you must flag the computed column as persisted if you do this however).
Another problem for previous versions (but added back in SQL Server 2000) is the ability to create indexes on computed columns. You can create indexes on computed columns, but there are special steps you must take to do so.
For the code that you write directly in T-SQL, SQL Server will automatically adjust to the padded spaces issue – that is, an ‘xx’ placed in a char(5) will be treated as being equal (if compared) to an ‘xx’ placed in a varchar (5) – this is not, however, true in your client APIs such as ADO and ADO.NET. If you connect to a char (5) in ADO, then an ‘xx’ will evaluate to ‘xx ‘ with three spaces after it – If you compare it to ‘xx’, it will evaluate to False. An ‘xx’ placed in a varchar (5), however, will automatically have any trailing spaces trimmed, and comparing it to ‘xx’ in ADO will evaluate to true.
Unlike primary keys, we are not limited to just one foreign key on a table. We can have between 0 and 253 foreign keys in each table. The only limitation is that a given column can reference only one foreign key. However, you can have more than one column participate in a single foreign key. A given column that is the target of a reference by a foreign key can also be referenced by many tables.
By default, you cannot delete a record or update the referenced column in a referenced table if that record is referenced from the dependent table. If you want to be able to delete or update such a record, then you need to set up a CASCADE action for the delete and/or update.
There is something of a gotcha when creating foreign keys that reference the same table the foreign key is being defined on. Foreign keys of this nature are not allowed to have declarative CASCADE actions. The reason for this restriction is to avoid cyclical updates or deletes – that is, situations where the first update causes another, which in turn tries to update the first. The result could be a never-ending loop.
Don’t confuse the space that an extent is taking up with the space that a database takes up. Whatever space is allocated to the database is what you’ll see disappear from your disk drive’s available space number. An extent is merely how things are, in turn, allocated within the total space reserved by the database.
Unlike a primary key, a UNIQUE constraint does not automatically prevent you from having a NULL value. Whether NULLs are allowed or not depends on how you set the NULL option for that column in the table. Keep in mind, however, that, if you do allow NULLs, you will be to insert only one of them(although a NULL doesn’t equal to another NULL, they are still considered to be duplicate from the perspective of a UNIQUE constraint)
A table can have maximum of 1024 columns.
In SQL Server, we have a number of different collation options available to us. Among these options are:
Dictionary order: This sorts things just as you would expect to see in a dictionary, with a twist you can set a number of different additional options to determine sensitivity to case, accent, and character set.
Once the collation order has been set, changing it is very non-trivial (but possible), so be certain of the collation order you want before you set it.
See if you can spot why the use of EXISTS in the WHERE clause of your queries has so much to offer performance-wise where it fits the problem. When you use the EXISTS operator, SQL Server stops as soon as it finds one record that matches the criteria. If you had a million record table, and it found a matching record on the third record, then use of EXISTS option would have saved you the reading of 999,997 records! "NOT EXISTS" works in much the same way.
Derived Tables: A derived table is made up of the columns and rows of a result set from a query. Keep in mind that derived tables aren’t the solutions for everything. For example, if the result set is going to be fairly large and you’re going to have lots of joined records, then you may want to look at using temporary table and building an index on it (derived tables have no indexes). Every situation is different, but now you have one more tool in your arsenal.
Needless to say, page splits can have a very negative impact on system performance, and are characterized by behavior where your process on the server seems to just for a new second (while the pages are being split and re-written).
The main things to understand about a DEFAULT constraint are that:
If any value is supplied in the INSERT, then the default is not used.
If no value is supplied, the default will always be used.
You cannot disable PRIMARY KEY or UNIQUE constraints.
Temporally disabling an existing constraint: We can run an ALTER statement with an option called NOCHECK that turns off the constraint in question:
Restriction | Pros | Cons |
Constraints |
Fast. Can reference other columns. Happens before the command occurs. ANSI-complaint. |
Must be refined for each table. Can’t reference other tables. Can’t be bound to data types. |
Rules, Defaults |
Independent objects. Reusable. Can be bound to data types. Happens before the command occurs. |
Slightly slower. Can’t reference across columns. Can’t reference other tables. Really meant for backward compatibility only!!! |
Triggers |
Ultimate flexibility. Can reference other columns and other tables. Can even use .NET to reference information that is external to your SQL Server. |
Happens after the command occurs. High overhead. |
The syscomments system table
Batches: A batch is a grouping of T-SQL statements into one logical unit. All of the statements within a batch are combined into one execution plan, so all statements are parsed together and must pass a validation of the syntax or none of the statements will execute. Note, however, that this does not prevent runtime errors, from happening. In the event of a runtime error, any statement that has been executed prior to the runtime error will still be in effect. To summarize, if a statement fails at parse-time, then nothing runs. If a statement fails at runtime, then all statements until the statement that generated the error have already run.
For ex:
DECLARE @MyVariable varchar(50)
SELECT @MyVariable = 'hi this is a sample test on batches'
PRINT 'Done with the first Batch....'
GO
PRINT @MyVariable -- This generates an error since @MyVariable isn't declared in this batch
PRINT 'Done with the second Batch'
GO
PRINT 'Done with third Batch' -- Notice that this still gets executed even after the error.
GO
Result:
Done with the first Batch....
Msg 137, Level 15, State 2, Line 2
Must declare the scalar variable "@MyVariable".
Done with third Batch
All statements combined into one execution plan and sent to server independently of any other batches.
Is not a T-SQL command, but, rather, a command recognized by the various SQL Server command utilities(OSQL, ISQL, and the Query Analyzer)
Is tried to execute in a pass-through query using ODBC, OLE DB, ADO, ADO.NET or any other access method, you’ll get an error message back from the server. The GO is merely an indicator to the tool that it is time to end the current batch, and time, if appropriate, to start a new one.
Only place non-clustered indexes on columns where you are going to get a high level of selectivity(that is, 95 percent or more of the rows are unique)
All DML statements (INSERT, DELETE, UPDATE) and SELECT statement can benefit from indexes, but insert, deletes, and updates (remember they use a delete and insert approach) are slowed by indexes. The lookup part of a query is helped by the index, but any thing modifies data will have extra work to do (to maintain the index in addition to the actual data).
Indexes take up space.
Indexes are used only if the first column in the index is relevant to your query.
Indexes can hurt as they help – know why you’re building the index, and don’t build indexes you don’t need.
Indexes can provide structured data performance to your unstructured XML data, but keep in mind that, like other indexes, there is overhead involved.
When you’re thinking about indexes, ask yourself, these questions:
Questions | Response |
Are there a lot of inserts or modifications to this table? | If yes, keep indexes to a minimum. This kind of table usually has modifications done through single record lookups of the primary key – usually, this is the only index you want on the table. If the inserts are non-sequential, think about not having a clustered index. |
Is this a reporting table? That is, not many insert, but reports run lots of different ways? | More indexes are fine. Target the clustered index to frequently used information that is likely to be extracted in ranges. OLAP installations will often have many times the number of indexes seen in an OLTP environment. |
Is there a high level of selectivity on the data? | If Yes, and it is frequently the target of a WHERE clause, then add that index. |
Have I dropped the indexes I no longer need? | If not, why not? |
Do I have a maintenance strategy established? | If not, why not? |
No comments:
Post a Comment