您的位置:首页 > 数据库

SQL Server的I / O基础--2

2013-10-09 11:15 441 查看

Microsoft SQL Server 2005 I/O Error Message Changes and Additions

SQLServer 2005 has more error and message context information than didprevious versions. This section outlines the significant I/O error message changesand additions.
Error 823Errormessage 823 has been split into different error messages in order toprovide improved context. Error message 823 in SQL Server 2005represents an I/O transfer problem and error message 824 representslogical consistency problems. The 823 error message indicates a serioussystem error condition requiring the operating system issue to be resolved inorder to correct the problem.
Themessage example shown here is the improved 823 error message text.
The operating system returned error <<OS ERROR>> to SQL Server during a <<Read/Write>> at offset <<PHYSICAL OFFSET>> in file <<FILE NAME>>. Additional messages in theSQL Server error log and system event log may provide more detail. This is asevere system-level error condition that threatens database integrity and mustbe corrected immediately. Complete a full database consistency check (DBCCCHECKDB). This error can be caused by many factors; for more information, seeSQL Server Books Online.

SQLServer error 823, occurs when any one of the following API calls give you anoperating system error.
·ReadFile·WriteFile·ReadFileScatter·WriteFileGather·GetOverlappedResultForextended details on the 823 error, see Error message 823 may indicatehardware problems or system problems (http://support.microsoft.com/default.aspx?scid=kb;en-us;828339) on the Microsoft Web site.
Duringread operations, SQL Server 2005 may perform read retries before recordingthat an 823 error condition has occurred. For details, see ReadRetry later in this paper.
Error 824The 824 errorindicates that a logical consistency error was detected during a read. Alogical consistency error is a clear indication of actual damage and frequentlyindicates data corruption caused by a faulty I/O subsystem component.
The exampletext of the824 message is shown here.
SQL Server detected a logical consistency-basedI/O error: <<ERROR TYPE DESCRIPTION>>. It occurred during a <<Read/Write>> of page <<PAGEID>> in database ID <<DBID>> at offset <<PHYSICALOFFSET>> in file <<FILENAME>>. Additional messages in the SQL Server error logor system event log may provide more detail. This is a severe error conditionthat threatens database integrity and must be corrected immediately. Complete afull database consistency check (DBCC CHECKDB). This error can be caused bymany factors; for more information, see SQL Server Books Online.
Error types
The 824 messagecontains extended details about each specific logical error as outlined in thefollowing table.
Note: An 824 error indicatesa serious I/O subsystem stability problem and should be corrected immediately.

Error Type
Description
Checksum
The read resulted in a checksum failure. The checksum stored on the data page does not match the checksum as calculated after the read operation. Data on the page has been damaged and will require a restore to correct it.
Extended Data: “incorrect checksum (expected: ##; actual: ##)”
Contact your hardware manufacture for assistance.
Torn Page
The read resulted in a torn bits failure. The torn bits stored in the data page header do not match the torn bits stored in the individual sectors following the read operation. Data on the page has been damaged and will require a restore to correct it.
Extended Data: “torn page (expected signature: ##; actual signature: ##)”
Contact your hardware manufacture for assistance.
Short Transfer
The requested number of bytes were not read. For example, if the read request was for 8 KB but the returned data was only 4 KB, the condition is flagged as a short transfer error. This indicates that the file is damaged or the I/O subsystem has a severe problem transferring data to and from media.
Extended Data: “insufficient bytes transferred”
Bad Page Id
The page header does not contain the correct value for the expected page ID member. The expected page ID can be calculated using the following formula: (page id = physicaloffset in file / 8192 bytes). When the expected page is not returned, the bad page ID error is indicated.
Extended Data: “incorrect pageid (expected ##:##; actual ##:##)”
This is frequently a condition where the I/O subsystem returns the incorrect data during the read. Microsoft SQL Server Support investigations of these typically reveal that the I/O subsystem is returning data from the wrong offset in the file or the page contains all zeros. Contact your hardware manufacture for assistance.
Restore Pending
By using SQL Server 2005 Enterprise Edition, a single page restore can be performed to correct a corrupt page. If a page is damaged, it is marked as a bad page and any attempt to access it returns an 824 error. This indicates that a restore is required to correct the damaged page before it can be accessed.
Extended Data: “Database ID <<DBID>>, Page <<PAGEID>> is marked RestorePending. This may indicate disk corruption. To recover from this state, perform a restore.”
Stale Read
For details about stale read errors, see Stale Read Protection later in this paper. The behavior is controlled with trace flag –T818.
Briefly, if a page has been recently written to disk and is still stored in the stale read hash table, the Log Sequence Number (LSN) stored in the hash table is compared to the LSN in the page header. If they do not match then the page is flagged as incorrect.
Example message: “stale page (a page read returned a log sequence number (LSN) (##:##:##) that is older than the last one that was written (##:##:##))”
Page Audit Failure
When trace flag –T806 is enabled, a DBCC audit is performed on the page to test for logical consistency problems. If the audit fails, the read is considered to have experienced an error.
Extended Data: “audit failure (a page read from disk failed to pass basic integrity checks)”
Page auditing can affect performance and should only be used in systems where data stability is in question.
Error 832Errormessage 832 is returned when the in-memory checksum audit fails. For detailsabout the in-memory checksum design, see Checksum in the Microsoft SQL Server 2005 Enhancements section in thisdocument.
Followingis an example of the text of the 832 error.
A page that should have been constant has changed(expected checksum: <<VALUE>>, actualchecksum: <<VALUE>>, database <<DBID>>, file <<FILE>>, page <<PAGE>>). This usually indicates a memory failure orother hardware or OS corruption.The 832 messageindicates a serious process stability problem, such as a scribbler, that could leadto data corruption and loss.
Error 833SQLServer 2000 SP4 and SQL Server 2005 include stalled I/O warningsas described later in this document. The following is an example of the 833 text which is written inthe SQL Server error log.
SQL Server has encountered <<##>> occurrence(s) of I/O requests taking longer than <<##>> seconds to complete on file [<<FILE>>] in database [<<DB NAME>>] (<<DBID>>). The OSfile handle is <<HANDLE>>. Theoffset of the latest long I/O is: <<PHYSICALOFFSET>>The 833 messageindicates an I/O is hung, or is just taking a long time. This is likely an I/Osubsystem problem. The information in the message can be used by Microsoft PlatformsSupport or your I/O subsystem vendor to trace the specific IRP and determine theroot cause.
Thefollowing are a few reasons this error may be encountered.
·Malfunctioning virus protection·Heavy use of compression·Network unresponsiveness·Dual I/O path software malfunctions

Microsoft SQL Server 2005 Enhancements

Thefollowing section outlines the core I/O enhancements made in SQL Server 2005.
ChecksumSQLServer 2005 introduces the ability to checksum data pages, log blocks, andbackups. For details on checksum capabilities and usage, see the ALTER DATABASEtopic in the PAGE_VERIFY section in SQL Server 2005 Books Online.
Theexpansion of hardwarecapabilities along with the increased use of virus protection, cachingmechanisms, and other advanced filter drivers increase the complexity of theI/O subsystem and expand the point-of-failure possibilities. Microsoft SQLServer 2005 and Microsoft Exchange Server products provide checksumcapabilities to enhance data protection.
Thechecksum algorithm used by SQL Server 2005 is the same algorithm used byMicrosoft Exchange Server. The SQL Server algorithm has an additional rotationto detect sector swaps.
MicrosoftExchange Server introduced checksum capabilities several years ago with greatsuccess. Search the Microsoft Knowledge Base for more information about errormessage -1018, which indicates a checksum failure for the Exchange Serverproduct. The following is an excerpt from the Exchange Server Knowledge Basearticle KB151789.
“Whenyou perform a transaction with the Jet database, the information store or thedirectory store writes the transaction to a transaction log file (Edb*.log inMdbdata or Dsadata). The transaction is then committed to the Jet database.During this process, the Jet engine calculates the page's checksum value to bewritten, records it in the page header, and then requests that the file systemwrites the 4-KB page of data to the database on disk.

Evenafter you restore from a known good backup, however, the -1018 errors may appear again unless the root causes of the physical datawrite problems are resolved.”
Thechecksum algorithm is not an ECC or CRC32 implementation but a much less CPU-intensivecalculation that avoids affecting database throughput.
The datapage and log throughput affects are limited by the buffer pool caching and read-aheaddesigns. This enables the writes and reads to be done out-of-critical-band whenit is possible.

WritesSQLServer data pages are typically written to disk by the checkpoint or lazywriter processing.
·SQL Server determines when to runcheckpoint activity based on the sp_configure ‘recovery interval’ goal and the amount of log space currently being used. ·SQL Server 2005 determineswhen to write dirty pages from the buffer pool cache based on memory pressureand time of last access of the page.Checksumsare calculated immediately before the data page or log block is written to disk.SQL Server tries to perform writes in groups and in a background mannerwhenever possible to avoid directly affecting user queries. The caching of datapages and grouping of log records helps remove much, if not all, of the commandlatency associated with a write operation. As described, the checksumcalculation activity can frequently be done out-of-band from the original request,thereby reducing any direct affect checksum may add to the write.
Note: The model databaseis checksum (page audit) enabled. Therefore, all new databases created in SQLServer 2005 are checksum enabled to maximize data protection.
ReadsWhen apage or log block is read from disk, the checksum (page audit) value iscalculated and compared to the checksum value that was stored on the page orlog block. If the values do not match, the data is considered to be damaged andan error message is generated.
SQLServer uses read-ahead logic to avoid query stalls caused by I/O waits. Theread-ahead design tries to keep the physical reads and checksum comparisons outof the critical path of the active query, decreasing the performance effects ofchecksum activity.
DamageThechecksum is designed to detect whether one or more bits of the data unexpectedlychanged; it was not designed to correct problems.
Thechecksum is calculated immediately before the write to disk and verified immediatelyafter the physical read from disk. If damage is detected, this indicates aserious I/O subsystem data integrity problem and the I/O subsystem should bethoroughly checked for problems. A failure indicates that data being written to andretrieved from stable media did not maintain its integrity.
Diskdrives, caches, filter drivers, memory, CPUs, and other components should bereviewed in complete detail if the system reports checksum failures. Be cautiousof power outages as well.
PAGE_VERIFY usageThe ALTERDATABASE command is used to change the database’s PAGE_VERIFY protection settings. There are three possible settings; NONE, CHECKSUM,and TORN_PAGE_DETECTION. The database maintains the verification setting. A status value ineach page header indicatesthe type of protection and verification values stored when data was written to stablemedia.
Similar checksummingactivity occurs for log block writes and reads when CHECKSUM protection isenabled. Log writes and reads always use a parity bit design (torn protection) tomark the valid blocks in the log. An additional checksum of the log block isnew and is applied only when the database checksum verification option isenabled.
The following table outlines the verification actions SQL Server 2005performs based on the database’s PAGE_VERIFY option and the page’s status value, which is located in the pageheader. Some of the actions in this table might not seem correct because thepage’s status value on a read appears to override the database’s currentsetting. However, on a read the possible verify action is determined from thepage header status and not from the current database setting.
Forexample, a checksum cannot be checked on the read if the checksum wasn’tcalculated and stored during the write of the page.

Page Header Setting
Actions Before Write
Actions After Read
NONE
The status of the page header is set to NONE for page verify protection.
This maximizes performance but provides NO physical integrity protection beyond that provided by the I/O subsystem itself. This is not a recommended setting and should be used with caution. Backup plans are especially important for databases that are set to the page verify option of NONE.

The page was not stored with any protection values, so no verification occurs during a read.

Page Header Status = NONE
Database’s Page_Verify Setting
Protection Check
NONE
NONE
TORN
NONE
CHECKSUM
NONE
CHECKSUM
The checksum formula is applied to the 8KB data page. The page header checksum value is updated and the page header status is set to CHECKSUM. The page is then written to stable media.
Checksum protection uses the most CPU cycles of the three options because it must examine all bits on the page. However, the algorithm has been tuned and the resulting affect is minimal. Checksum is the default database setting in SQL Server 2005.
If a page is read that was written with either checksum or torn page protection, verification occurs for the type of protection indicated in the page header.
Page Header Status = CHECKSUM
Database’s Page_Verify Setting
Protection Check
NONE
NONE
TORN
CHECKSUM
CHECKSUM
CHECKSUM
TORN
The TORN page protection is established by writing a 2-bit value in the lowest order 2 bits of each 512byte sector of the page. The page header is updated with the torn bit tracking information and the page header’s verify status is set to TORN. The page is then written to disk.
Because the TORN protection uses only 2 bits in each sector of the 8KB page, it requires less CPU cycles but provides far less protection than checksum.
If a page is read that was written with either checksum or torn page protection, verification occurs for the type of protection indicated in the page header.
Page Header Status = TORN
Database’s Page_Verify Setting
Protection Check
NONE
NONE
TORN
TORN
CHECKSUM
TORN

Note:
SQLServer does not rewrite all database pages in response to an ALTERDATABASEPAGE_VERIFY change. The PAGE_VERIFY option can be changed over time, and pages written to disk willreflect the option that was in effect at the time they were written. Therefore,the database can have pages in any one of the three available verificationstates.

There isno single command that establishes a PAGE_VERIFY option and applies it to all pages of the database. This includesbackup and restore.
·Backup and restore operationsmaintain the same physical data integrity as the original database. Backup andrestore operations do provide a checksum option but this is different from the PAGE_VERIFY option.·Rebuilding clustered indexes in the databasecan dirty most of the data and index pages and achieve broad page protectionestablishment. However, heaps, text/image,stored procedures, stored assemblies, and others are not dirtied by a clusteredindex rebuildoperation.·Only reuse of the transactionlog blocks with the appropriate protection can apply the specified protectionto the log blocks.The onlyway to make sure that all user data pages contain the desired page verificationprotection is to copy all data, at the row level, to a new database that wascreated by using the appropriate page verification option.
In-memory checksumsSQLServer 2005 extends the protection of data pages by extending thePAGE_VERIFY CHECKSUM to allow for in-memory checksumming. There are limited situationsfor which this is helpful, such as in-memory scribblers, uncertainty about pagefile stability, and uncertainty about RAM memory stability.
The samechecksum algorithm used by PAGE_VERIFY CHECKSUM is used for the in-memory data page checksum activity. Those pagesthat have been written to stable media with a CHECKSUM status are eligible forin-memory checksumming if the dynamic trace flag –T831 is enabled. The data page must have received the initial checksumvalue during a write to stable media to participate in the in-memory checksumoperations.
Toreduce the performance affect, the in-memory checksum is only audited duringcertain page state transitions. The key page states that trigger in-memorychecksumming are outlined after the table. The following table describes thestates that a page can be in.

PageState
State Description
Dirty
The data page is considered to be dirty when the page has been modified and has not been written to stable media. As soon as a dirty page is saved (written) to stable media, it is considered to be clean.
Clean
The data page is considered to be clean or a constant page when it is the same image as that stored on stable media.
In-memorychecksum auditing occurs on a data page when the following conditions are true:
·The page was written to diskwhen PAGE_VERIFY CHECKSUM was enabled. ·The PAGE_VERIFYCHECKSUMoption is enabled for the database.·Trace flag –T831 is enabled.The PAGE_VERIFY actions continue to occur during the read and write of data pages. Thein-memory checksumming occurs when –T831 is enabled. The following table outlines the checksum actions that occurwhen the database is correctly enabled for CHECKSUM protection, the pagecontains a valid checksum, and trace flag–T831 is enabled.

Action
Description
Page Read
PageState
State Description
Physical Read
Checksum is validated as soon as the read finishes; the checksum is retained for in-memory validation.
Logical Read
No in-memory checksum auditing occurs.
Page Modification
Request

PageState
State Description
Dirty
As soon as a page has been dirtied, the checksum is no longer maintained. The checksum will be recalculated when the page is written to stable media.
No in-memory checksum auditing occurs.
Clean
The transition from clean to dirty triggers in-memory checksum validation. A failure during this transition indicates that the page was damaged during a period in which it was considered to be read-only.
Discard
A page is termed ‘discarded’ when it is returned to the free list.

PageState
State Description
Dirty
A dirty page cannot be discarded. The page must first be written to stable media and returned to a clean state before it can be discarded.
Clean
The act of discarding a clean page triggers in-memory checksum validation. A failure during this transition indicates that the page was damaged during a period in which it was considered to be read-only.
Note: Pages that are never modified (never dirtied) remain in the clean state until they are discarded at which time the checksum is validated for the constant page.
Foradded, always on, protection the lazy writer performs clean (constant) bufferchecksum validations. This is always on and does not require that –T831 be enabled. Every second the lazy writer updates the buffer poolperformance counters and performs various housekeeping activities. During thishousekeeping, the lazy writer sweeps over 16 buffers. When the lazy writerfinds a clean buffer with a valid checksum, it validates the checksum. If afailure is detected, an 832 error message is logged. This is used as a lowaffect, background, in-memory checksum audit activity. Pages that remain in a clean(constant) state for lengthy periods enable the lazy writer audit to catchunexpected damage before the page is discarded.
If theaudit check fails, SQL Server error message 832 is reported to indicate that the error condition was detected. Ifyou are encountering in-memory checksum failures, perform the followingdiagnostics.
·Test backups to make sure thatthe restore strategy remains correctly intact.·Perform full hardware testing, focusingspecifically on memory components.·Review any third-party productsinstalled on the system or that are used in the SQL Server process space. Third-partycomponents could scribble and cause problems. Such components could be COMobjects, extended stored procedures, Linked Servers, or other entities.·Make sure that all operating systemfixes are applied to the server.·Make sure that any virusprotection is up to date and the system is free of viruses.·Review the location of the pagefile for SQL Server I/O compliance requirements.·Enable latch enforcement asdescribed later in this document to help isolate the source of the damage.·Try to use the same inputbuffers or replay a SQL Server Profiler trace to reproduce the problem. If areproduction is obtained, can it be reproduced on another computer? If it canbe reproduced, contact Microsoft SQL Server Support for additional assistance.

Latch enforcementSQLServer 2000 and SQL Server 2005 can perform latch enforcement fordatabase pages located on the buffer pool cache. Latch enforcement changes the virtualmemory protection (VirtualProtect) as the database pages are transitioned between the clean and dirtystates. The following table outlines the virtual protection states.

PageState
Virtual ProtectionState
Dirty
Read Write during the modification.
Clean
Read Only; any attempt to modify the page when this protection is set (termed a scribbler) causes a handled exception, generating a mini-dump for additional investigation.
The databasepage remains in the virtual protection state of Read Only until the modificationlatch is acquired. When the modification latch is acquired. the page protectionis changed to Read Write. As soon as the modification latch is released, thepage protection is returned to Read Only.
Note: The default latchenforcement protection setting is disabled. Latch enforcement may be enabledwith trace flag –T815. SQL Server 2000 SP4 and 2005 allow for thetrace flag tobe enabled and disabled without a restart of the SQL Server process by using theDBCC traceon(815, -1)and DBCCtraceoff(815,-1) commands. Earlier versions of SQL Server require the trace flag asa startup parameter.
Note: The trace flagshould only be used under the direction of Microsoft SQL Server Support as itcan have significant performance ramifications and virtual protection changesmay not be supported on certain operating system versions when you are usingPAE/AWE.
Note: Windows extended support for VirtualProtect in Windows Server 2003 SP1 and Windows XP SP2 to allowvirtual protection of AWE allocated memory. This is a very powerful change but couldaffect the performance of SQL Server if it is configured to use AWE or lockedpages memory due to the extended protection capabilities.
Latchenforcement applies only to database pages. Other memory regions remain unchangedand are not protected by latch enforcement actions. For example, a TDS outputbuffer, a query plan, and any other memory structures remain unprotected bylatch enforcement.
Toperform a modification, SQL Server must update the page protection of a databasepage to Read Write. The latch is used to maintain physical stability of thedatabase page so the modification latch is only held for long enough to makethe physical change on the page. If the page is damaged during this window(scribbled on), latch enforcement will not trigger an exception.
Inversions earlier than SQL Server 2004 SP4, SQL Server latchenforcement protection involved more protection transitions. The followingtable outlines the protection transactions performed by SQL Server earlier thanSQL Server 2000 SP4.

PageState
Virtual ProtectionState
Dirty
Read Write during the modification.
Clean No References
No Access; any attempt to read or write from the page causes an exception.
Clean With References
Read Only; any attempt to modify the page when this protection is set (termed a ‘scribbler’) causes a handled exception, generating a mini-dump for additional investigation.
Becausevirtual protection transitions are expensive, SQL Server 2000 SP4 andSQL Server 2005 no longer transition the page protection to No Access, therebyreducing the number of transitions significantly. The older implementationcould raise an exception for an invalid read try where the newerimplementations cannot. The overhead of No Access protection transitionsfrequently made latch enforcement too heavy for use in a production environment.Leaving the page with Read Only access reduces the number of protection changessignificantly and still helps in the identification of a scribbler.
SQLServer does not return all data pages to Read Write protection as soon as thetrace flag isdisabled. The pages are returned to Read Write protection as they are modified sothat it may take some time to return to a fully non-latch enforced buffer pool.
Checksum on backup and restoreSQLServer 2005 BACKUP and RESTORE statementsprovide the CHECKSUMoption to include checksum protection on the backup stream and trigger thematching validation operations during restore. To achieve a checksum-enabled backup,the BACKUP command mustinclude the CHECKSUMoption.
The backupand restore processes try to work with large blocks of data whenever possible. Forexample, the backup operation examines the allocation bitmaps in the databaseto determine what data pages to stream to the backup media. As soon as a blockof data is identified, the backup operation issues a large 64 KB to 1 MBread from the data file and a matching write operation to the backup stream. Thebackup operation avoids touching individual bytes of the data pages or logblocks to maximize its throughput as a high speed copy implementation.
Backupand restore operations that use checksum capabilities increase data integrityprotection and also increase CPU usage requirements. A backup or restore with thechecksum option requires that each byte be interrogated as it is streamed, therebyincreasing CPU usage.The checksum that is used for backup and restore uses the same algorithm tocalculate the checksum value for the backup media as is used for data pages andlog blocks.

Thefollowing rules apply to the BACKUP andRESTORE command CHECKSUM operations.
·By default, SQL Server 2005BACKUP and RESTORE operations maintain backward compatibility (NO_CHECKSUMis the default). ·The database’s PAGE_VERIFY setting has no affect on backup and restore operations; only theCHECKSUM setting on the backup or restore command is relevant. ·The backup and restore checksumis a single value representing the checksum of the complete stream; it does notrepresent individualpages or log blocks located in the backup stream. The value is calculatedduring the backup and stored with the backup. The value is recalculated duringthe restore and checked against the stored value.·Backup with the CHECKSUM optionwill not change the pages as it saves them to the backup media; a page’sprotection state (NONE, CHECKSUM, or TORN) is maintained as read from thedatabase file. If a checksum was already stored on the data page, it is verifiedbefore the page is written to the backup stream. ·Restore and Verify commands canbe used to validate the CHECKSUM if the backup was created by using theCHECKSUM option. Trying to restore with the CHECKSUM option on a backup withouta checksum returns an error.For moreinformation on backup and restore, see SQL Server 2005 Books Online.
Page-level restoreSQL Server2005 Enterprise Edition introduces page-levelrestore to repair damaged pages. The database can restore a single page from backupinstead of requiring a full database, file group, or file restore. For completedetails, see SQL Server2005 Books Online.
Database available during Undo phaseSQL Server2005 Enterprise Edition enables access tothe database as soon as the Redo phase of recovery is finished. Lockingmechanisms are used to protect the rollback operations during the Undo phase. Toreduce downtime, page-level restore can be combined with the crash recoverycapability of enabling access to the database during the Undo phase of recovery.
Torn page protectionTornpage protection has not significantly changed from SQL Server 7.0 and SQLServer 2000. This section provides details on torn page protection and howit works to help you compare TORN protection and CHECKSUM protection. A tornpage commonly indicatesthat one or more sectors have been damaged.

Common reasonsFollowingare some common problems found by Microsoft SQL Server Support that cause TORNpage error conditions.
·The subsystem or hardware does nothandle the data correctly and returns a mix of sector versions. This has beenreported on various controllers and firmware because of hardware read-aheadcache issues.
·Power outages occur.
·Bit flips or other damage occurson the page header. This indicates that a page status of TORN detection was enabledwhen really was not.
ImplementationTorn page protection toggles a twobit pattern between 01and 10 every time the page is written to disk. Write A obtains bitprotection of 01 and write B obtains bit protection of 10. Then write Cobtains 01 and so on. The low order (last) two bits of each 512-byte sector arestored in the page header and replaced with the torn bit pattern of 01 or 10.The relevant members of the SQL Server data page header areshown in the following list together with a TORN bit layout diagram.

Member
Description
m_flagBits
Bit field where TORN, CHECKSUM or NONE is indicated.
m_tornBits
Contains the TORN or CHECKSUM validation value(s).


Figure1
The torn page toggle bits are established as 01 or 10 and positionedin the low order 2 bits of the m_tornBits value. For theremaining 15, 512-byte sectors, the low order 2 bits of each sector arepositioned in incrementing bit positions of the m_tornBits and the establishedbit pattern is stored in their location. Following are the steps shown in the previous diagram.Step #1: The original sector bit values are storedin the m_tornBits from low order to high order (like a bit array), incrementingthe bit storage positions as sector values are stored.Step #2: The established torn bit pattern is storedin the low order two bits of each sector, replacing the original values.When the page is read from disk and PAGE_VERIFY protection isenabled for the database, the torn bits are audited.Step #1: The low order bits of the m_tornBits are checked forthe pattern of either 10 or 01 to make sure that the header is not damaged. Step #2: The low order two bits in each sector arechecked for the matching torn bit pattern as stored in the low order two bitsof m_tornBits.If either of thesechecks fail, the page is considered TORN. In SQL Server2000 this returnsan 823 error and in SQL Server 2005 it gives you an 824 error.Step #3:SQL Server 2000: Replaces the original values as eachsector is checked, even if an error was detected. This makes it difficult to investigatewhich sector was torn.SQL Server 2005: Enhances troubleshooting by leaving the bitsunchanged when an error is detected. Investigate the page data to betterdetermine the torn footprint condition.Stale read protectionStalereads have become a problem that is frequently reported to Microsoft SQL ServerSupport. A stale read occurs when a physical read returns an old, fullypoint-in-time consistent page so that it does not trigger TORN or CHECKSUMaudit failures; instead, the read operation returns a previous data image ofthe page. This is also called a lostwrite because the most recent data written to stable media is not presentedto the read operation.
A commoncause of stale reads and lost writes is a component such as a hardwareread-ahead cache that is incorrectly returning older cached data instead of thelast write information.
Thiscondition indicates serious I/O subsystem problems leading to page linkagecorruption, page allocation corruption, logical or physical data loss, crashrecovery failures, log restore failures, and a variety of other data integrityand stability issues.

In a SQLServer 2000 SP3-based hot fix (build 8.00.0847), stale readdetection was added. This addition is outlined in the Microsoft Knowledge Basearticle, PRB: Additional SQL ServerDiagnostics Added to Detect Unreported I/O Problems (http://support.microsoft.com/default.aspx?scid=kb;en-us;826433).
Enhancements
Bychanging from a ring buffer to a hash table design, SQL Server 2000 SP4and SQL Server 2005 provide enhanced, low-overhead stale read checking. Theoriginal SQL Server 2000 SP3 implementation only checks for a stalecondition if another error was found first (605, 823, and so forth). The hashtable design, used in newer builds, allows for the page read sanity checking toinclude a stale read check when trace flag –T818 is enabled for any page that is read without a noticeableperformance affect.
For SQLServer 2005, every time a page is written to disk, a hash table entry isinserted or updated with the DBID, PAGEID, RECOVERY UNIT, and LSN that is beingflushed to stable media. When a read is complete, the hash table is searchedfor a matching entry. The matching DBID and PAGEID entry is located. The hashtable LSN value is compared to the LSN value that is stored in the page header.The LSN values must match or the page is considered damaged. Thus, if the mostrecent LSN that was written was not returned during the subsequent read operation,the page is considered damaged.
Tomaintain a high level of I/O performance and limit the memory footprint, the hashtable size is bounded. It tracks only the recent window of data page writes. Thenumber of I/Os tracked varies between the 32- and 64-bit versions of SQL Server 2000 SP4and SQL Server 2005. To optimize speed, each hash bucket and itsassociated entries are designed to fit in a single, CPU cache line, therebylimiting the hash chain length to five entries for each bucket. In 32-bit installations,the total size of the hash table is limited to 64 KB (equating to 2,560 totalentries = 20MB window of data) and on 64-bit installations to 1 MB(equating to 40,960 total entries = 320MB window of data).
The sizerestriction is based on the testing of known bugs that caused stale reads orlost writes. The bug characteristics typically involve a hardware memory cache thatheld the older page data and a read operation that immediately followed or overlappedthe write operation.
Stalled I/O detectionDatabaseengine performance can be highly affected by the underlying I/O subsystemperformance. Stalls or delays in the I/O subsystem can cause reducedconcurrency of your SQL Server applications. Microsoft SQL Server Support has experiencedan increase in I/O subsystem delaysand stall conditions resulting in decreased SQL Server performance capabilities.
For aSQL Server 2000 installation, an I/O stall or delay is frequently detectedby watching sysprocesses for I/O-based log and/or buffer (data page) wait conditions. Whereassmall waits might be expected, some filter drivers or hardware issues have caused 30+ second waitsto occur, causing severe performance problems for SQL Server-based applications.

Startingwith SQL Server 2000 SP4 and SQL Server 2005, SQL Server monitorsand detects stalled I/O conditions that exceed 15 seconds in duration fordata page and log operations. The following Microsoft Knowledge Base articledescribes the SQL Server 2000 SP4 implementation: SQL Server 2000 SP4diagnostics help detect stalled and stuck I/O operations (http://support.microsoft.com/default.aspx?scid=kb;en-us;897284).
SQLServer 2000 SP4 and SQL Server 2005 also increase the visibilityof latch operations. A latch is used to guarantee the physical stability of thedata page when a read from or a write to stable media is in progress. With theincreased latch visibility change, customers are frequently surprised afterthey apply SQL Server 2000 SP4 when a SPID appears to block itself. Thefollowing article describes how the latch information displayed in sysprocessescan be used to determine I/O stall conditions as well as how a SPID can appearto block itself: The blocked column in the sysprocesses table ispopulated for latch waits after you install SQL Server 2000 SP4 (http://support.microsoft.com/kb/906344/en-us).
SQLServer 2005 contains the stalled I/O monitoring and detection. The stalledI/O warning activity is logged when a stall of 15 seconds or longer is detected.Additionally, latch time-out error messages have been extended to clearly indicate that the bufferis in I/O. This indicates that the I/O has been stalled for 300 seconds (five minutes)or more.
There isa clear difference between reportingand recording. Reporting only occursin intervals of five minutes or longer when a new I/O action occurs on the file.Any worker posting an I/O examines the specific file for reporting needs. IfI/O has been recorded as stalled and five minutes has elapsed from thelast report, a new report is logged to the SQL Server error log.
Recordingoccurs in the I/O completion routines, and the lazy writer checks all pendingI/Os for stall conditions. Recording occurs when an I/O request is pending(FALSE == HasOverlappedIoCompleted) and 15 seconds or longer haselapsed.
Note: The FALSE return value from a HasOverlappedIoCompleted call indicates that the operatingsystem or I/O subsystem has not completed the I/O request.
sys.dm_io_pending_io_requests (DMV)SQLServer 2005 provides dynamic access to pending I/O information so that adatabase administrator can determine the specific database, file, and offsetleading to a stall. The dynamic management view (DMV) sys.dm_io_pending_io_requests contains details about the offset andstatus of each outstanding I/O request. This information can be used by Microsoft PlatformsSupport and various utilities to track down the root cause. For moreinformation, go to http://support.microsoft.com and search for information related to IRP and ETW event tracing.
The io_pending columnis a specific key for evaluating the result set. The io_pendingcolumn indicates whether the I/O request is still pendingor if the operating and I/O subsystem have completed it. The value is determinedby using a call to HasOverlappedIoCompleted to determine the status of the I/O request. The following tableoutlines the returned value possibilities for io_pending.

io_pendingValue
Description
TRUE
Indicates the asynchronous I/O request is not finished. SQL Server is unable to perform additional actions against the data range until the operating system and I/O subsystem complete the I/O request.
To learn more about pending I/O conditions, see HasOverlappedIoCompleted in the SDK documentation.
Lengthy asynchronous I/O conditions typically indicate a core I/O subsystem problem that should be addressed to return SQL Server to ordinary operating conditions.
FALSE
Indicates the I/O request is ready for additional processing actions by SQL Server.
If the pending time of the I/O continues to climb, the issue may be a SQL Server scheduling problem. For a discussion of SQL Server scheduler health and associated troubleshooting, see the following white paper.
How to Diagnosis and Correct Errors 17883, 17884, 17887, and 17888 (http://download.microsoft.com/download/4/f/8/4f8f2dc9-a9a7-4b68-98cb-163482c95e0b/DiagandCorrectErrs.doc)
The io_pending_ms_ticks column is the elapsed milliseconds (ms) of the I/O request that wasposted to the operating system.
The io_handle is the file HANDLE that the I/O request is associated with. Thiscolumn can be joined to the dynamic management function (DMF) sys.dm_io_virtual_file_stats column file_handletoobtain specific file and database association from the I/O. The following is an example of a query to obtain this information.

SELECT fileInfo.*,pending.*
FROM sys.dm_io_pending_io_requestsAS pending
INNER JOIN (SELECT * FROM sys.dm_io_virtual_file_stats(-1,-1))
AS fileInfo ONfileInfo.file_handle = pending.io_handle

Thisjoin can be additionally enhanced by adding information such as the databasename or by using the offset to calculate the actual PAGEID; (Offset/8192= PAGEID).
WARNING:DMVs and DMFs access core system structures to produce the result set. Internalstructures must be accessed with thread safety, which may have performanceramifications. The use of DMVs that access core SQL Server components should belimited to avoid possible performance affects.
Read retrySQLServer 2005 extends the use of read retry logic for data pages to increaseread consistency possibilities. A read retry involves performing exactly thesame read operation immediately following a read failure in an attempt tosuccessfully complete the read.
Microsofthas successfully used read retries to compensate for intermittent failures of anI/O subsystem. Read retries can mask data corruption issues in the I/Osubsystem and should be investigated carefully to determine their root cause. Forexample, a disk drive that is going bad may intermittently return invalid data.Re-reading the same data may succeed and the read retry has provided runtimeconsistency. However, this is a clear indicator that the drive is under duressand should be examined carefully to avoid a critical data loss condition.
The MicrosoftExchange Server product added read retry logic and has experienced improved readconsistency. This section outlines how and when SQL Server 2000 and SQLServer 2005 perform read retry operations.
Resource-based retriesSQLServer 2000 performs read retries only when beginning the read operationsfails and returns an operating system error of ERROR_WORKING_SET_QUOTA (1453)or ERROR_NO_SYSTEM_RESOURCES (1450). Unlike the SQL Server 2005enhancements, SQL Server 2000 does not try any other form of read retryother than sort failures.
When theerror occurs, the SQL Server worker yields for 100ms and tries the readoperation again. This loop continues until the I/O is successfully issued. Asimplified example demonstrates this behavior.

WHILE( FALSE == ReadFile()
&&(1450 == GetLastError() || 1453 ==GetLastError())
)
{
Yield(100);
}

SQLServer 2005 maintains the same logic when it is trying to start a readoperation.
Sort retriesSQLServer 7.0, 2000, and 2005 have sort-based retry logic. These frequently appearas ‘BobMgr’ entries in the SQL Server error log. When a read of a spooled sortbuffer from tempdb fails, SQL Servertries the read again. The retries are only attempted several times before theyare considered to be fatal to the sort. Sort retries should be considered aserious I/O stability problem. To correct the problem, try moving tempdb to different location.
Other read failure retriesSQLServer 2005 has extended the read retry logic to read failures that occur afterthe read was successfully started. When ReadFile returns TRUE, thisindicates that the operating system accepted the application’s request to readfrom the file. If a subsequent failure is experienced, the result is a SQLServer error such as an 823 or 824.
SQLServer 2005 allows for read retry operations to continuously occur whenthe read finishes with an error caused by a resource shortage. For allnon-resource shortage conditions, four (4) more retries may be tried.
Each successiveretry yields before it tries the read operation again. The yield is based onthe following formula: (yield time = retry attempt *250ms). If the error condition cannot beresolved within four retries (five total times: one initial read and four retries),an 823 or 824 error is reported. SQL Server 2005 saves the originalerror condition details, such as a checksum failure. It also includes theoriginal details with the error report in the SQL Server error log.
If theretry succeeds, an informational message is added to the SQL Server error log.This indicates that a retry occurred. The following is an example of themessage.
“A read of the file <<FILE NAME>> at offset <<PHYSICAL OFFSET>> succeeded after failing <<RETRY COUNT>> time(s) with error: <<DETAILED ERROR INFORMATION>>.Additional messages in the SQL Server error log and system event log mayprovide more detail. This error condition threatens database integrity and mustbe corrected. Complete a full database consistency check (DBCC CHECKDB). Thiserror can be caused by many factors; for more information, see SQL Server BooksOnline.”Readretry problems are a serious problem with data stability as the I/O subsystemis returning incorrect data to SQL Server. This condition is likely to cause afatal SQL Server error or even a system-wide failure. Additionally, the retryactivity has performance affect on SQL Server operations. This is because assoon as a read error is detected, the worker performs the reties until it succeedsor the retry limit is exhausted.
Page auditSQLServer 2000 SP4 and SQL Server 2005 include additional pageaudit capabilities. Enabling the dynamic trace flag -T806 causes all physical page reads to run the fundamental DBCC pageaudit against the page as soon as the read is complete. This check is performedat the same point as the PAGE_AUDIT and other logical page checks are performed.
This isanother way to locate data page corruption in areas of the page other than thebasic page header fields, which are always checked during the physical read. Forexample, when trace flag –T806 isenabled, the row layout is audited for appropriate consistency.

Pageaudit was first added in SQL Server 2000 SP3, hot fix build 8.00.0937.For more information, see the following Microsoft Knowledge Base article: FIX: Additional diagnostics have been added toSQL Server 2000 to detect unreported read operation failures (http://support.microsoft.com/kb/841776/en-us).
Note: Enabling page auditcapabilities can increase the CPU load on the server and decrease overall SQLServer performance.
SQLServer 2005 introduces checksum protection, which generally supersedespage audit capabilities. Checksum protection ensures that every bit on the pageis the same as that written to stable media.
For SQLServer 2005 installations, checksum is often a better solution than constantdata integrity auditing. However, page audit can help catch corruption whichwas stored to stable media even though physical page consistency was notcompromised. Microsoft SQL Server Support has encountered an example of this.In that instance, a third-party extended stored procedure scribbled on a datapage that was already marked dirty. The checksum was calculated on a damagedpage and the page was written. The reading in of this page, with page auditenabled, could indicatethe error condition when checksum would not detect the failure. If you believe theserver may be experiencing a problem that checksum cannot detect but page auditis detecting, consider in-memory checksumming and latch enforcement to help locatethe scribbler.
Log auditSQLServer 2000 and 2005 include trace flag –T3422 which enables log recordauditing. Troubleshooting a system that is experiencing problems with log filecorruption may be easier using the additional log record audits this trace flag provides. Use thistrace flagwith caution as it introduces overhead to each transaction log record.
CheckpointSQLServer 2005 implemented user-controlled I/O target behavior for the manualCHECKPOINT commandand improved the I/O load levels during automatic checkpointing. For moreinformation on how to issue a manual CHECKPOINT command and specify a target value (in seconds), see SQL Server 2005Books Online.
Microsofthas received requests to implement a more dynamic checkpoint algorithm. For example,this would be useful during a SQL Server shutdown. Especially for highlyavailable databases, a more aggressive checkpoint can reduce the amount of workand time that crash recovery needs during a restart.
Theamount of transaction log activity determines when to trigger checkpoint of adatabase. Transaction log records have a recovery cost value calculated inmilliseconds. Each time a new transaction log record is produced, theaccumulated cost is used to determine the estimated recovery time requiredsince the last checkpoint. When the recovery time goal is exceeded, acheckpoint is triggered. This keeps the crash recovery runtime within the specifiedrecovery intervalgoal.
Thefollowing base rules apply to checkpoint. The term latency as it is used here indicates the elapsed time from when thewrite was issued until the write is considered complete by the checkpointprocess.
Action
Description
Manual Checkpoint – Target Specified
·I/O latency target set to the default of 20ms. The target is set to 100ms if shutdown is in progress.
·Maximum number of standing I/Os is capped at the larger of the following calculations:
·Committed Buffer Count / 3750
·80 * Number of Schedulers
·Number of outstanding I/Os is constantly adjusted so that progress through the buffer pool is commensurate with elapsed time and target time.
Manual Checkpoint – No target specified
- or -
Automatic Checkpoint in response to database activity
·I/O latency target set to the default of 20ms. The target is set to 100ms if shutdown is in progress.
·Maximum number of standing I/Os is capped at the larger of the following calculations:
·Committed Buffer Count / 3750
·80 * Number of Schedulers
·Minimum number of outstanding I/Os required is 2.
·Number of outstanding I/Os is adjusted to keep write response time near latency target.
For any checkpoint invocation
Whencheckpoint reaches its outstanding I/O target, it yields until one of theoutstanding I/Os is finished.
For no target specified or automatic checkpointing
Thecheckpoint process tracks checkpoint-specific I/O response times. It can adjustthe number of outstanding I/O requests if the I/O latency of checkpoint writesexceed the latency target. As checkpoint processing continues, the goal for theoutstanding number of I/Os is adjusted in order to maintain response times thatdo not exceed the established latency goal. If the outstanding I/O levels beginto exceed the tolerable latency goals, checkpoint adjusts its activity to avoidpossible affects on the overall system.
For a manual checkpoint, target specified
Whencheckpoint processing detects that it is ahead of the specified target, ityields until activities fall within goal, as outlined in the previous table, oruntil all the outstanding checkpoint I/Os finish. If all outstanding I/Os are complete,checkpoint issues another I/O and again tries to use the target goal.

SQL Server 2005 SP1 allows for continuous checkpointing
SQLServer 2005 SP1 alters the checkpoint algorithm slightly. SQL Server 2005does not add time between I/Os. Therefore, the checkpoint process may finish aheadof the target schedule. Service Pack 1 introduces the appropriate delays to honor thetarget as closely as possible. We do not recommend this, but administrators candisable automatic checkpointing and use manual checkpointing with a specifiedtarget. Putting the manual, targeted checkpoint in a continuous loop provides acontinuous checkpoint operation. Do this with extreme caution becausecheckpoints are serialized and this could affect other databases and backupoperations. It also requires that a checkpoint process be established for alldatabases.
Noticethat SQL Server 2005 Service Pack 1 also contains a fix for a veryrare checkpoint bug. The fix is for a very small window where checkpoint couldmiss flushing a buffer. This could lead to unexpected data damage. To avoidthis problem, apply SQL Server 2005 SP1.
WriteMultiple extendedSQLServer 7.0 introduced an internal routine named WriteMultiple. The WriteMultiple routine writes data pages to stable media. For more information,see“Flushing a Data Page To Disk” in SQL Server I/O Basics on MSDN.
SQLServer 7.0 and 2000 could issue a WriteMultiple operation for up to 16 pages(128 KB). SQL Server 2005 extends the capability of WriteMultiple upto 32 pages (256 KB). This may change the block size configurationsfor your performance goals. For more information about physical database layout,see the article “PhysicalDatabase Storage Design” (http://download.microsoft.com/download/4/f/8/4f8f2dc9-a9a7-4b68-98cb-163482c95e0b/PhysDBStor.doc).
SQLServer 2005 varies the WriteMultiple logic. In SQL Server 7.0 and 2000, the function accepts astarting page ID. The starting page and up to 16 subsequent, contiguousdirty pages for the same database are bundled in a single write request. SQLServer does this by using hash table lookups for subsequent contiguous pages. Whena page is not found or a page is found that is clean, the I/O request isconsidered finished.
SQLServer 2005 adds additional lookup and safety steps to WriteMultiple. SQL Server 2005 does the forward page search in the same way asSQL Server 7.0 and 2000. When the forward search is finished, SQL Server 2005can do a backward search if all 32 pages of the I/O request are not yet filled.The same hash table lookup activity occurs when SQL Server searches for more pages.For example if WriteMultiple was passed page 1:20, the search would examine1:19, 1:18, and so on. The search continues until:
·A page is not found. ·A page is found to be clean. ·All 32 pages for the I/Ohave been identified.

SQLServer 2005 adds additional page header checks. One such additional checkis the actual page ID check. The expected page ID is compared to that of theactual page header. The prevents writing a scribbled or incorrect page to diskand causing permanent database damage.
Read-ahead enhancedIn SQLServer 2005, the read-ahead design is enhanced so that it reduces physicaldata transfer requirements by trimming the leading and trailing pages from therequest if the data page(s) are already in the buffer pool.
For moreinformation on SQL Server read-ahead logic, see SQL Server I/O Basics (http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx).
Forexample, a read-ahead request is to be issued for pages 1 through 128 but pages 1and 128 are already located in the SQL Server buffer pool. The read-aheadrequest would be for pages 2 through 127 in SQL Server 2005. In comparison,SQL Server 2000 requests pages 1 through 128 and ignores the datathat is returned for pages 1 and 128.
Sparse files / Copy on write / StreamsNTFSsparse file technology is used for database snapshots and online DBCCCHECK* operations. This section provides more detailedinformation about this technology in SQL Server.

Note: At the time of publication, manufacture-specific“thin provisioning” implementations have not yet been tested. See the SQLServer Always On Storage Solution Review program (http://www.microsoft.com/sql/AlwaysOn) for newer information about this topic.
StreamsOnline DBCCCHECK* uses a transient, sparse file stream for eachdata file that is checked. The streams are named using the following template: “<<ORIGINALFILE>>:MSSQL_DBCC<<DBID>>”.The stream is a secondary data area associated with theoriginal file provided by the file system. Online DBCC uses the stream tocreate a transient snapshot of the database as long as it performs checks. Thissnapshot is unavailable to database users. The snapshot stream enables onlineDBCC to create and test all facts against an exact point-in-time replica of thedatabase. It requires only limited physical storage to do this. During onlineDBCC, only those pages that are modified after DBCC started are stored in thestream. When online DBCC is finished, the stream is deleted.
It isworthy to notice that the stream enables DBCC to reduce its locking granularity.If the stream cannot be created or runs out of space, DBCC reverts to theolder, TABLE LOCK behavior. Administrators should review the free space availableon each volume to make sure that high database concurrency is maintained.
For moreinformation about online DBCC, see DBCC Internal Database Snapshot Usage in SQLServer 2005 Books Online.
Copy-on-write and sparse filesSnapshotdatabases contain images of data pages that have been modified after they werecreated. Establishing a database snapshot includes performing an immediate,secondary rollback of active transactions at the time of creation. Activetransactions in the primary database must be rolled back in the new snapshot toobtain the correct point in time. Transaction states in the primary databaseremain unaffected.
Toconserve physical disk space, snapshot databases are stored on sparse files.This limits the physical disk space requirement of the snapshot database tothat of the modified images. As more data pages are modified in the parentdatabase, the physical storage requirement of the snapshot database increases.
Snapshotdatabase files are named as specified by the CREATE DATABASE command. Forexample, the snapshot parent database can contain the main.mdf file and thesnapshot may use snapshot_main.mdf.
SQLServer 2005 implements copy-on-write for the data pages of a snapshotdatabase. Before the primary database page can be modified, the original datapage is written to any associated database snapshot. It is important for administratorsto monitor the physical size of the snapshot databases to determine and predictthe physical storage requirements. Notice that the smallest allocation unit ina sparse file is 64 KB. Therefore, the space requirement may grow fasterthan you expect.
For moreinformation about snapshot databases, see How Database Snapshots Work in SQLServer 2005 Books Online.
Recoveryredo and undo operations use the transaction log to return a database to a consistenttransactional state. However, this logic does not apply to a snapshot database.Notice that snapshot databases do not have transaction logs. Therefore, thecopy-on-write activity must be complete before the primary database transactioncan continue.
Thesnapshot uses a combination of API calls to determine the 64KB allocated regionsin the sparse file. It must also implement an in-memory allocation bitmap totrack the individual pages that have been stored in the snapshot. During thecreation or expansion of the snapshot file, SQL Server sets the appropriatefile system attributes so that all unwritten regions return complete zeroimages for any read request.
When amodification request is started in the primary database, the data page may be writtento any snapshot database and the in-memory allocation bitmap is appropriately maintained.Because it can introduce I/O on the snapshot database when the copy-on-writedecision is being made, it is important to consider the additional overhead. Determiningwhen a copy of the database page is necessary involves the combination of variousAPI calls and possible read requests. Therefore, read requests may be generatedto the snapshot database in order to determine whether the data page hasalready been copied to the snapshot database.
When SQLServer performs the copy-on-write operation into a sparse file, the operatingsystem may perform the write in a synchronous manner when it acquires newphysical storage space. To prevent the synchronous nature of these writes from affectingthe SQLOS scheduler, the copy-on-write writes may be performed by usingsecondary workers from the worker pool. In fact, if multiple snapshots existfor the same primary database, the writes can be performed by using multipleworkers in a parallel manner. The initiating worker waits for the secondaryworkers and any writes to complete before continuing with the modification onthe original data page. When SQL Server waits for the writes to complete, theoriginating worker may show a wait status such as replica write or a latch waitfor the replica data page which is in I/O.
Even ifthe transaction is rolled back, the data page has been written to the snapshot.As soon as a write occurs, the sparse file obtains the physical storage. It isno longer possible to fully roll back this operation and reclaim the physicaldisk space.
Note: If a copy-on-writeoperation fails, the snapshot database is marked as suspect. SQL Servergenerates the appropriate error.
Realizethat the increased copy-on-write operations (actual writes or reads to determinewhether a copy-on-write is necessary) can change the performance dynamics ofsome queries. For example, the I/O speed of a snapshot database could limitcertain query scalabilities. Therefore, you should locate the snapshot databaseon a high speed I/O subsystem.
Note: Utilities such as WinZip, WinRAR,copy and others do not maintain the actual integrity of a sparse file. When afile is copied by using these utilities, all unallocated bytes are read and restoredas zeros. This requires actual allocations and the restored file looses thesparse file attributes. To copy a sparse file, you must use a utility thatunderstands how to capture and restore metadata structure of the file.
Stream and sparse file visibilityStreamand sparse file sizes are easier to monitor if you are aware of some keyvisibility limitations.
Secondaryfile streams are frequently invisible to commands such as ‘dir’. This can make it difficult to determine how much copy-on-writeactivity has occurred during an online DBCC CHECK*. However, various third-partyutilities are available which can be used to view size information about filestreams.
Similarly,sparse file sizes that are reported to commands such as ‘dir’ indicatethe complete file size as established by the End Of File (EOF) setting and notthe actual physical storage on disk. Windows Explorer shows the logical as‘Size’ and the physical as ‘Size on Disk.’
When SQLServer writes to a sparse database file, it can acquire space in databaseextent sizes of 64 KB (8 pages * 8KB each = 64KB). SQL Server 2005detects when a segment of the file has not been allocated in the sparse file.It not only copies the page during the copy-on-write operation but establishesthe next seven pages (one extent) with all zero images. This reduces theoperating system-level fragmentation by working on the common operating systemfile system cluster boundaries. It also enables future copy-on-write requestsfor any one of the other seven pages to finish quickly because thephysical space and file system tracking has already been established. It also enablesSQL Server to use the file level allocation information to determine whatextents are physically allocated in the sparse file. The zeroed page images canbe used to determine which of the pages that are enclosed in allocation regionhave been copied from the primary database.
Snapshot readsSQLServer 2005 is designed to read data from both the snapshot file and thematching primary database file when data is queried in the snapshot database. Byusing the in-memory sparse database allocation bitmaps and the zero pageimages, SQL Server can quickly determine the actual location of data pages thatare required by a query.
Forexample, during query execution SQL Server can elect to perform a large read, readingin eight or more data pages with a single read request such as a read-ahead. Lookat a specific example.
Aread-ahead operation in the primary database, for the same region, creates asingle read of 64 KB to retrieve eight contiguous pages. However, if thethird and sixth pages have been modified and copied (with copy-on-write) to thesnapshot database, this causes a split set of reads. This example requires fiveseparate read requests.
·Pages 1 and 2 can be read fromthe primary
·Page 3 from the snapshot
·Pages 4 and 5 from the primary
·Page 6 from the snapshot
·Pages 7 and 8 from the primary
SQLServer always tries to optimize physical data access requests but a queryagainst the snapshot database may have to perform more I/Os than the identicalquery executed in the primary database.
WARNING:If you use SQL Server database snapshots or online DBCC CHECK* operations, applythe operating system for system bug 1320579 fix to avoid corruption of the snapshot.This is covered in the following article: Error message when you run the DBCCcheck command in SQL Server 2005: "8909 16 1 Table error: Object ID 0,index ID -1, partition ID 0, alloc unit ID 0 (type unknown)" (909003)
Instant file initializationNeweroperating system versions, including Windows XP and Windows Server 2003,implement instant file initialization capabilities by supplying the API SetFileValidData. This API enables SQL Server to acquire physical disk space withoutphysically zeroing the contents. This enables SQL Server to consume thephysical disk space quickly. All versions of SQL Server use the databaseallocation structures to determine the valid pages in the database; every timethat a new data page is allocated, it is formatted and written to stable media.
SQLServer 2000 creates and expands database log and data files by stampingthe new section of the file by using all zero values. A SQL Server 2005 instancewith incorrect account privileges will revert to SQL Server 2000 behavior.The algorithm used by SQL Server is more aggressive than the NTFS zeroinitialization (DeviceIoControl, FSCTL_SET_ZERO_DATA), thereby elevating theNTFS file lock behavior and enabling concurrent access to other sections of thefile. However, zero initializing is limited by physical I/O capabilities andcan be a lengthy operation.
SQLServer 2005 uses instant file initialization only for data files. Instant file initializationremoves the zero stamping during the creation or growth of the data file. Thismeans that SQL Server 2005 can create very large data files in seconds.
Thefollowing rules apply to SQL Server 2005 and instant file initialization.
·The operating system and file systemmust support instant file initialization.
·The SQL Server startup accountmust possess the SE_MANAGE_VOLUME_NAME privilege. This privilege is required to successfully run SetFileValidData.
·The file must be a SQL Serverdatabase data file. SQL Server transaction log files are not eligible forinstant file initialization.
·If trace flag –T1806 is enabled, SQL Server 2005 behavior reverts to SQL Server 2000behavior.
SetFileValidData doesallow for fast allocation of the physical disk space. High-level permissionscould enable data that already exists on the physical disk to be seen, but onlyinternally, during a SQL Server read operation. Because SQL Server knows whichpages have been allocated, this data is not exposed to a user or administrator.
To guaranteetransaction log integrity, SQL Server must zero initialize the transaction logfiles. However, for data files SQL Server formats the data pages as they are allocatedso the existing data does not pose a problem for database data files.
Note: To guarantee the physicaldata file space acquisition during data file creation or expansion, on a thinprovisioned subsystem, use trace flag–T1806.
Traceflag –T1806 provides backward compatibility to zero initialize database datafiles without using instant file initialization of files. You may also removethe SE_MANAGE_VOLUME_NAME privilege to force SQL Server to use zero file initialization. Formore information on the SE_MANAGE_VOLUME_NAME privilege, see Database File Initialization in SQL Server 2005Books Online.
Note: Zero fileinitialization does not protect against previously stored data discovery. To fullyprevent discovery of previous data, the physical media must have a DepartmentOf Defense (DOD)-level series of write. This is generally seven unique writepatterns. Zero file initialization only performs a single, all-zero write tothe data pages.
Formore information on previously stored data security, see Protect and purge yourpersonal files (http://www.microsoft.com/athome/moredone/protectpurgepersonalfiles.mspx).
For more information about DOD-5012.2 STD, see Design Criteria ForElectronics Record Management Software Applications (http://www.dtic.mil/whs/directives/corres/pdf/50152std_061902/p50152s.pdf).

I/O affinity and snapshot backupsI/Oaffinity dedicates special, hidden schedulers to performing core buffer pool I/Oand log writer activities. The I/O affinity configuration option was introducedin SQL Server 2000 SP1. The INF: Understanding How toSet the SQL Server I/O Affinity Option (http://support.microsoft.com/default.aspx?scid=kb;en-us;298402) Microsoft Knowledge Base article outlines the I/O affinityimplementation.
Vendorscan use Virtual Device (VDI)-based snapshot backups to perform actions such assplitting a mirror. To accomplish this, SQL Server must guarantee point-in-timedata stability and avoid torn writes by freezing all new I/O operations andcompleting all outstanding I/Os for the database. The Windows Volume Shadow CopyService (VSS) backup is one such VDI application.
For moreinformation on VDI snapshot backups, see Snapshot Backups in SQL Server BooksOnline.
For moreinformation on VDI Specifications, see SQL Server 2005 Virtual Backup Device Interface (VDI) Specification (http://www.microsoft.com/downloads/details.aspx?FamilyID=416f8a51-65a3-4e8e-a4c8-adfe15e850fc&DisplayLang=en).
The SQLServer 2000 design does not account for frozen I/O in combination with I/Oaffinity. A single database snapshot backup freezes I/O requests (reads andwrites) for all databases. This makes I/O affinity and snapshot backups anunlikely combination choice for SQL Server 2000.
SQLServer 2005 supports I/O affinity as outlined in the article mentionedearlier (INF: Understanding How toSet the SQL Server I/O Affinity Option). It corrects the condition which could freeze all I/O requestswhen I/O affinity is enabled.
SQLServer I/O affinity is a very specialized, high-end server configuration option.It should only be used after significant testing indicates that it will result in performanceimprovements. A successful I/O affinity implementation increases overallthroughput. The I/O affinity schedulers maintain reasonable CPU usage levelsand effectively allow other schedulers access to increased CPU resources.
Locked memory pagesBoth 32-bitand 64bit versions of SQL Server use the AWE API set to lock pages in memoryas VirtualLock is notguaranteed to keep all locked pages in memory like AllocateUserPhysicalPages. This can sometimes be confusing because the AWE sp_configure settings exist but are notused on 64bit SQL Server. Instead, the lock pages privilege determines when touse the AWE API set. The Windows “LockPages In Memory” privilege isnecessary for SQL Server to make AWE allocations.
LockPages In Memory can have a positive affect on I/O performance and can preventthe trimming of the working set of SQL Server.
Warning: Do not use AWE or locked pages without testing. Forcing strict,physical memory usagecan result in undesired physical memory pressure on the system. System-level orhardware components may not perform well when physical memory pressure ispresent. Use this option with caution. Duringan I/O operation, the memory for the I/O should not cause page faults.Therefore, if the memory is not already locked, it will be. Because mostapplications do not lock I/O memory, the operating system transitions thememory to a locked state. When the I/O finishes, the memory is transitionedback to an unlocked state.
SQLServer 2005 64-bit Enterprise Edition detects if the LockPagesIn Memory privilege is present and establishes a locked pages data cache. SQLServer 32-bit installations require enabling AWE memory options to establish thelocked memory behavior. This avoids the lock and unlock transition during I/O, therebyproviding improved I/O performance. To disable this behavior, remove theprivilege or, for 64-bit installations, enable startup trace flag –T835.
Important: SQL Serveralways tries to avoid paging by releasing memory back to the system when it ispossible. However, certain conditions can make this difficult. When pages arelocked they cannot be paged. We recommend careful consideration of locked page usage.
It isnot always possible for SQL Server to avoid paging operations. Locked pages canalso be used to avoid paging as a troubleshooting technique if you have reasonto believe damage to memory may have occurred during a paging operation.
Idle serverSQLServer 2005 can provide an idle SQL Server process (sqlservr.exe) similarto the suspend/resume operations that are provided by the operating system. SQLServer 2005 introduces a per-node, system task called Resource Monitor. ResourceMonitor’s primary responsibility is to watch core memory levels and then triggercache adjustments accordingly. When the SQL Server 2005 ResourceMonitordetects that there are no user-initiated requests remaining to process, it canenter the idle state. When a new user request arrives or a critical event must run,SQL Server wakes and handles the activities. The following table outlines someof the key event/request types and associated actions as they relate to theidle SQL Server process.

Event/Request Type
Description
User Request
A user request includes those actions initiated by a client application that require SQL Server to perform a service.
The following are common examples of user requests.
·Batch request·SQL RPC request·DTC request·Connection request·Disconnect request·Attention requestThese actions are also called external requests.
Active user requests prevent SQL Server from entering an idle state.
When SQL Server is in an idle state, a user request wakes the SQL Server process.
Internal Tasks
An internal request includes those actions that a user cannot issue a specific client request to trigger or control. For example:
·Lazy writer·Log writer·Automatic checkpointing·Lock monitorInternal tasks do not prevent SQL Server from entering an idle state. Only critical internal tasks can wake the SQL Server from an idle state.
Critical Event
Some SKUs of SQL Server respond to events such as memory notifications in order to wake the SQL Server process from an idle state and honor the event.
Thereare some basic rules the SQL Server process uses to determine whether it canenter an idle state.
The SQLServer idle state is not entered until:
·No user requests have beenactive in 15 minutes. ·The instance is not participatingin database mirroring. ·Service Broker is in an idlestate. SQLServer wakes in response to:
·Memory pressure (except on SQL ServerExpress).·A critical internal task or event.·An external request such as a TabularData Stream (TDS) packet arrival.The idleactivity can change the way SQL Server interacts with system. The followingtable outlines specific behavior related to the individual SKUs.

SKU
Behavior
SQL Server Express Service
·By default can declare the server idle.
·Triggers the operating system to aggressively trim the SQL Server working set memory using API SetProcessWorkingSetSize(…, -1, -1)
·Does not wake in response to operating system memory pressure notifications.
·Tries to enter an idle state immediately after service startup.
SQL Express Individual User Instance
·By default can declare the server idle.
·May be configured to allow for aggressive working set trimming.
·Wakes in response to operating system memory pressure notifications.
Workgroup
·By default can declare the server idle.
·May be configured to allow for aggressive working set trimming.
·Wakes in response to operating system memory pressure notifications
Standard
Enterprise
Developer

·Idle server behavior requires a trace flag.
·Wakes in response to operating system memory notification.
SQLServer always tries to avoid using the page file whenever possible. However, itis not always possible to avoid all page file activity. An idle SQL Server processmay introduce aggressive use of the page file and increased I/O path usage eventhough SQL Server tries to avoid these. The paging activities require that memoryregions be stored and retrieved on disk. This opens the possibility of memorycorruption if the subsystem is not handling the paging activity correctly.
Note: The Windows operatingsystems use storage and read-ahead design techniques for the page file that aresimilar to that which SQL Server uses for data and log files. This means thatan idle SQL Server can experience I/O patterns during a paging operation similarto that of its own data file reads and writes. To guarantee data integrity, thepage file should uphold I/O specifications suited for SQL Server database andlog files.SQLServer idle server activity can be controlled with the following trace flags.

Trace Flag
Description
8009
Enable idle server actions
8010
Disable idle server actions
Database mirroring (DBM)SQLServer 2005 SP1 introduces the database mirroring (DBM) feature set. Formore information about database mirroring solutions, see Database Mirroring in SQLServer 2005 Books Online.
Databasemirroring is not specifically targeted as a remote mirroring solution. However,database mirroring addresses the necessary properties required to maintain theprimary and mirror relationship, guarantee data integrity, and allow for bothfailover and failback to occur as outlined in Remote Mirroring earlier in this paper.
Databasemirroring uses CRC checks to guarantee data transmissions between the primaryand mirror to maintain the correct data integrity. SQL Server 2005, fornew databases, provides checksum capabilities for data pages and log blocks to strengthendata integrity maintenance. To enhance your ‘Always On’ solution, we recommend usingthe CRC transmission checks in combination with the checksum databasecapabilities to provide the highest high level of protection against datacorruption.
Multiple instance access to read-only databasesSQLServer 2005 introduces Scalable Shared Database support (SSD), enablingmultiple SQL Server instances to use the same read-only database from a read-onlyvolume. The setup details and other detailed information about SSD are outlinedin the SQL Server 2005 Books Online Web Refresh and in the followingMicrosoft Knowledge Base article: Scalable Shared Databases are supported by SQLServer 2005 (http://support.microsoft.com/?kbid=910378).
SSDprovides various scale out possibilities, including but not limited to thefollowing:
·Multiple server access·Separate server resources·Separate tempdb resourcesNote: Multiple server,database file access is never supported for SQL Server databases in read/writemode. SQL Server SSD is neversupported against writable database files.
Some I/Osubsystems support features such as volume-level, copy-on-write (COW) snapshots.These I/O subsystem solutions monitor write operations on the primary(read/write) volume and save the initial data to the volume snapshot. Thevolume snapshot is presented as a read-only copy when mounted. The read-only,snapshot volume is a supported configuration for SQL Server SSD databases. Suchimplementations can provide a powerful tool for accessing live production data,read only, from many servers.
Some subsystemsuse distributed locking mechanisms instead of read-only volume snapshotcapabilities. This allows multiple servers to access the same read/writevolume. The distributed locking environments are very powerful but do notpresent a point-in-time image of the data to secondary servers. The live datais presented to all servers connected to the volume. SQL Server SSD is notsupported against the writable database files. The SQL Server buffer poolcannot maintain point-in-time synchronization on secondary servers. This wouldlead to various problems because of the resulting dirty read activities.
Ramp up of local cacheThe Enterprise Edition of SQLServer 2005 tries to ramp up a node’s local data cache during the nodeinitialization phase. The initial memory growth committal of a node isconsidered to be the ramp-up period. During the ramp-up period, whenever asingle page read is requested, the request is turned into an eight-page request(64 KB) by the buffer pool. The ramp-up helps reduce waits for subsequentsingle page reads after a restart. This enables the data cache to be populatedquicker and SQL Server to return to cached behavior quicker.
Anon-NUMA computer is considered to have a single-node implementation. On largermemory installations, this can be a significant advantage because it enables quickerdata cache repopulation. Each node is assigned and monitored by a separate ResourceMonitortask. SQL Server is aware of the different nodes and different memoryrequirements that may exist within those nodes. This includes the buffer pool.This makes SQL Server fully NUMA aware.
Encrypted file systems (EFS)SQLServer databases can be stored in NTFS encrypted files. However, use this featurewith caution as encryption actions disable asynchronous I/O capabilities andlead to performance issues. When performing I/O on an EFS-enabled file, the SQLServer scheduler becomes stalled until the I/O request completes. Actions suchas SQL Server read ahead become disabled for EFS files. We recommend using the built-inSQL Server 2005 encryption capabilities instead of EFS when possible.
Use ofEFS for SQL Server should be limited to physical security situations. Use it inlaptop installations or other installations where physical data security couldbe at risk.
If EFSmust be deployed in server environment consider the following.
·Use a dedicated SQL Serverinstance. ·Test throughput limitationswell.·Test with and without the I/Oaffinity mask. Using I/O affinity may provide a pseudo-async capability to SQLServer.DiskPar.exeThediskpar utility provides alignment capabilities to prevent misalignmentperformance problems. Systems running SQL Server should properly align onboundaries to help optimize performance. I/O subsystem manufactures haveestablished recommendations for proper alignment in a SQL Server environment.
This isthe same recommendation as for Microsoft Exchange Server. The following is anexcept from the Microsoft Exchange documentation and is applicable to SQLServer.
“Eventhough some storage obfuscates sectors & tracks, using diskpar will stillhelp by preventing misalignment in cache. If the disk is not aligned, every Nth(usually 8th) read or write crosses a boundary, and the physical disk mustperform two operations.
Atthe beginning of all disks is a section reserved for the master boot record(MBR) consuming 63 sectors. This means that your partition will start on the64th sector, misaligning the entire partition. Most vendors suggest a 64-sectorstarting offset.
Checkwith your particular vendor before standardizing this setting on a particularstorage array.”
Always On high-availability data storageSQLServer 2005 introduces the Always On storage review program for high-availabilitysolutions. Various manufacturers have reviewed their solutions againstMicrosoft SQL Server requirements and have published detailed white papersabout how their solutions can be used with SQL Server to maintain the Always Ongoal. For more information, see SQL Server Always On Storage Solution Review Program (www.microsoft.com/sql/AlwaysOn).
SQLIOSimSQLIOSim replaces SQLIOStress. It is used totest SQL Server I/O patterns without requiring that SQL Server be installed. The tests do not use actual SQL Serverdatabase and log files but simulate them instead. This utility greatly extendstesting capabilities by enabling control over memory footprint, multiple filesper database, shrink and grow actions, files larger than 4 GB and otheroptions.
We recommend using SQLIOSim to test the systembefore you install SQL Server. This will help you to improve your data safety.
Note: If SQL Server is reporting corruption or other I/O subsystem errorconditions, back up your data and then run the SQLIOSim testing utility to testthe I/O subsystem in addition to running other hardware check utilitiesprovided by your hardware manufacture.
Important: SQLIOSim and SQLIOStress are also used by the Microsoft Hardware CompatibilityLabs and by various vendors to make sure that the I/O subsystem is SQL Server I/Opattern compliant. If SQLIOSim or SQLIOStress return errors, this clearly indicatesthat the I/O subsystem is not performing at an HCL-compliant level. This couldlead to severe data damage or loss.

Conclusion

For more information:
http://www.microsoft.com/technet/prodtechnol/sql/default.mspx

Did this paper help you? Please give us yourfeedback. On a scale of 1 (poor) to 5 (excellent), howwould you rate this paper?!href(mailto: sqlfback@microsoft.com?subject=Feedback: [Paper Title])

References

There are many aspects to consider when you are settingup the I/O subsystem. This section provides a list of documents that you might considerreading.
For updated I/O details, you can also visit the http://www.microsoft.com/sql/supportWeb page.
SQL Server Always Storage Solution ReviewProgram
·http://www.microsoft.com/sql/AlwaysOn
Certification Policy
·KB913945- Microsoftdoes not certify that third-party products will work with Microsoft SQL Server
·KB841696 - Overview of the Microsoftthird-party storage software solutions support policy
·KB231619 - How to use the SQLIOStressutility to stress a disk subsystem such as SQL Server
Fundamentals and Requirements
·White paper- SQLServer 2000 I/O Basics (applies to SQL Server versions 7.0, 2000, and 2005)
·KB230785 - SQL Server 7.0, SQL Server2000 and SQL Server 2005 logging and data storage algorithms extend datareliability
·KB917047 - Microsoft SQL Server I/O subsystemrequirements for the tempdb database
·KB231347 - SQL Server databases not supportedon compressed volumes (except 2005 read only files)
Subsystems
·KB917043 - Keyfactors to consider when evaluating third-party file cache systems with SQLServer
·KB234656- Usingdisk drive caching with SQL Server
·KB46091- Using hard disk controllercaching with SQL Server
·KB86903 - Description of caching diskcontrols in SQL Server
·KB304261- Descriptionof support for network database files in SQL Server
·KB910716 (in progress) - Support for third-party RemoteMirroring solutions used with SQL Server 2000 and 2005
·KB833770 - Supportfor SQL Server 2000 on iSCSI technology components (applies to SQLServer 2005)
Design and Configuration
·White paper - PhysicalDatabase Layout and Design
·KB298402 - Understanding How to Setthe SQL Server I/O Affinity Option
·KB78363 - When Dirty Cache Pages areFlushed to Disk
·White paper - DatabaseMirroring in SQL Server 2005
·White paper - DatabaseMirroring Best Practices and Performance Considerations
·KB910378 - Scalable shareddatabase are supported by SQL Server 2005
·MSDN article - Read-OnlyFilegroups
·KB156932 - AsynchronousDisk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP
Diagnostics
·KB826433 - AdditionalSQL Server Diagnostics Added to Detect Unreported I/O Problems
·KB897284 - SQL Server 2000 SP4diagnostics help detect stalled and stuck I/O operations (applies to SQLServer 2005)
·KB828339 - Errormessage 823 may indicate hardware problems or system problems in SQL Server
·KB167711 - Understanding Bufwait andWritelog Timeout Messages
·KB815436 - Use Trace Flag 3505 to ControlSQL Server Checkpoint Behavior
·KB906121 - Checkpoint resumes behaviorthat it exhibited before you installed SQL Server 2000 SP3 when you enabletrace flag 828
·WebCast- DataRecovery in SQL Server 2005
Known Issues
·KB909369 - Automatic checkpoints onsome SQL Server 2000 databases do not run as expected
·KB315447 - SQL Server 2000 may be moreaggressive with Lazy Writers than SQL Server 7.0
·KB818767 - Improved CPU Usage forDatabase Logging When Transaction Log Stalls Occur
·KB815056 - You receive an "Error:17883" error message when the checkpoint process executes
·KB915385 A snapshot-based database backuprestore process may fail, and you may receive an error message in SQL Server2005
·Support Assistance (http://www.microsoft.com/sql/support)
Utilities
·Download - SQLIODisk Subsystem Benchmark Tool
·Download - SQLIOStress utility tostress disk subsystem (applies to SQL Server 7.0, 2000, and 2005 -replaced with SQLIOSim)
转自:http://technet.microsoft.com/zh-cn/library/cc917726.aspx
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: