Identical query plan with radically different performance based on one improbable parameter?_问答_开发者

I have an ad-hoc query that should be pretty zippy (I have a DBA background and I'm pretty good at optimization) and in almost all cases it is. HOWEVER, when I supply a specific parameter to the query, (a pretty selective value with a smaller than average expected result set) the tempdb starts growing and dies when it runs out of disk.

This is a one-time report against a Lawson AP system on a SQL Server 2005 system, if anyone cares.

I have examined the query plans against a performant run and a non-performant run, not just the estimated query plans but the actual query plans, and they are exactly identical. I have updated the statistics and the query plans remain identical. The only thing that should be different is the actual data.

The data for the nonperformant group does look strange... id columns are char(9) and char(22), left padded spaces and then padded zeroes added in for good measure. A typical ID value would be something like

'    00112'

...which is strange, but it is a valid char(9) and therefore legit. The columns are well indexed, as with all cases except one, this works quite well.

I'm thinking the problem has to do with the indexes working on the data, although I don't see how. The number of result records for the nonperformant query are less than half the number of the largest ones that return results in seconds.

the query I'm using is as follows (this is against a Lawson v9 database schema):

insert into dbo.rptPayments with (tablock)
    (
    VendorGroup,
    VendorID,
    VendorName,
    ChargedToCompany,
    ChargedToAccount,
    ChargedToSubAccount,
    InvoiceID,
    PaymentAmount
    )
select top 100 percent
    vm.VENDOR_GROUP as VendorGroup,
    rtrim(ltrim(vm.VENDOR)) as VendorID,
    vm.VENDOR_VNAME as VendorName,
    gm.NAME as ChargedToCompany,
    dist.DIS_ACCOUNT as ChargedToAccount,
    dist.DIS_SUB_ACCT as ChargedToSubAccount,
    inv.INVOICE as InvoiceID,
    dist.ORIG_TRAN_AMT as PaymentAmount
from
    dbo.APVENMAST vm with (nolock)
inner join
    dbo.APINVOICE inv with (nolock)
    on inv.VENDOR = vm.VENDOR
    and inv.VENDOR_GROUP = vm.VENDOR_GROUP
inner join
    dbo.APDISTRIB dist with (nolock)
    on dist.COMPANY = inv.COMPANY
    and dist.VENDOR = inv.VENDOR
    and dist.INVOICE = inv.INVOICE
inner join
    dbo.APPAYMENT pay with (nolock)
    on pay.COMPANY = inv.COMPANY
    and pay.VENDOR = vm.VENDOR
    and pay.INVOICE = inv.INVOICE
    and pay.VENDOR_GROUP = inv.VENDOR_GROUP
inner loop join
    dbo.GLSYSTEM gm with (nolock, index(GLSSET1))
    on gm.COMPANY = dist.DIST_COMPANY
where
    vm.VENDOR_GROUP = @VendorGroup
    and vm.VENDOR_STATUS = 'A'
    and inv.INVOICE_DTE between '2009-09-01' and '2011-08-31'
    and pay.VOID_SEQ = 0
    and pay.CANCEL_SEQ = 0

...as you can see, I'm using some hints to force the query down the right path... the optimizer was making some truly poor choices. I'm expecting a result set of 20000 - 30000 records.

Schemas for the referenced tables may be found here.

Definitions for the custom indexes I've built for this are as follows:

CREATE NONCLUSTERED INDEX [tmpAPDISTRIB] ON [dbo].[APDISTRIB] 
(
    [COMPANY] ASC,
    [VENDOR] ASC,
    [INVOICE] ASC
)
INCLUDE ( [DIST_COMPANY],
[ORIG_TRAN_AMT],
[DIS_ACCOUNT],
[DIS_SUB_ACCT]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [tmpAPPAYMENT] ON [dbo].[APPAYMENT] 
(
    [COMPANY] ASC,
    [VENDOR] ASC,
    [INVOICE] ASC,
    [VENDOR_GROUP] ASC
)
INCLUDE ( [VOID_SEQ],
[CANCEL_SEQ]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [tmpAPINVOICE] ON [dbo].[APINVOICE] 
(
    [VENDOR] ASC,
    [VENDOR_GROUP] ASC,
    [INVOICE_DTE] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Any help or suggestion would be enormously helpful.

UPDATE开发者_如何学运维 Upon deeper investigation, I discovered that the records from the problem @VendorGroup were stored differently... apparently there is functionality available in the AP Distributions (represented by the dbo.APDISTRIB table) to support 'recurring' invoices with the same invoice number, causing a cartesian product on my joins, as I had not joined against this column. No other vendor group was using this feature.

As per my update, the reason the query plan looked different was one company was using a table considerably more than any other.