[译]避免在Joins中出现OR

避免在Joins中出现OR

为什么要避免在joins中出现OR条件,一个优化例子

OR条件和join语句—这两样东西不适合放在一起.类似于or能大大影响带有子查询的查询,他们能给joins语带带来非常坏的影响.

举个例子, 看下面的查询(也许不太符合实际). 这里假设一个数字型表(numbers)已经创建好了 (详见使用笛卡尔积查询方式生成的数据). 假设现在的表里有10,000行记录.

SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON ( (n1.num = n2.num - 400) OR (n1.num = n2.num + 400))
INNER JOIN numbers n3 ON n2.num = n3.num - 300
ORDER BY n1.num, n2.num, n3.num

这个查询看起来非常简单. 然而即使源表只有10,000行记录以及选出的结果只有区区18,900行记录, 却花费了5秒(译者PS:我承认我的机器很差劲, 用了20秒), 而且Management Studio(其实是我的整个机器)在执行的时候失去响应. 这个查询使CPU的使用率飙升到100%. 这个不是个好现象—joins和OR条件根本就不能放在一起(至少是在sql server中是这样).

我遇到这个情况是在BPS(基础程序设计支持).我们有一个学校完成的DTS包用来把学生数据转移到另外一个数据库(采用不同的构架). 其中的一部花费了20分钟, 我只是用它跑了几次. 但因为我最好找更好的方法, 我决定研究一下其中的步骤. 在20分的运行过程中, I fully expected to see some highly complex tangled web of SQL with 30 joins and 150 lines.我不能再犯错误了, 毕竟这只是一个简单的查询(带有or条件的join). 在五分钟内, 我把这条查询拆分成独立的两块, 这样以前需要20分钟的现在只需要2秒.

在上面的查询中, 这只是一中解决方案(使用union)来拆分OR条件. 肯定好友其他实现的方式, 我希望你们在评论中提出其他的解决方案来处理OR条件.

SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON n1.num = n2.num - 400
INNER JOIN numbers n3 ON n2.num = n3.num - 300
UNION
SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON n1.num = n2.num + 400
INNER JOIN numbers n3 ON n2.num = n3.num - 300
ORDER BY n1.num, n2.num, n3.num

使用上面的10,000行记录的表, 这个查询在瞬间完成. 处理800,000行数据的表只需要8秒. 如果我使用第一种查询查询处理那个大的表, 我想的机器不得不重启了.

声明

这篇文章, 包括其中的任何相关的源码和文件,都在Code Project Open License (CPOL) 的许可一下.

原文地址 codeproject

http://www.codeproject.com/Articles/206408/Avoiding-OR-Conditions-with-Joins

 

译者补充: 次日上午, 根据分析, 如果在num加上索引, 速度会提升很多, 果不其然, 完成也是做瞬间.

CREATE CLUSTERED INDEX [_dta_index_numbers_c_9_2041058307__K1] ON [dbo].[numbers] 
(
	[num] ASC
)

原文


Avoiding OR Conditions with Joins

Why OR conditions in join statements should be avoided and an example fix

OR conditions and join statements – some things just don’t go well together. Similar to how OR conditions can significantly impair queries having subqueries, they can wreak havoc on joins as well.

Take, for example, the following [admittedly unrealistic] query. This assumes a numbers table [numbers] has been set up. [See myCartesian join postfor a query to set up a numbers
table.] Assume the numbers table only has 10,000 records.

Collapse

SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON ( (n1.num = n2.num - 400) OR (n1.num = n2.num + 400))
INNER JOIN numbers n3 ON n2.num = n3.num - 300
ORDER BY n1.num, n2.num, n3.num

This query looks simple enough. However, even though the source table only has 10,000 rows and the result only has 18,900 records, it takes 5 seconds to complete, and Management Studio [and
frankly my entire computer] freezes while it is running. This query spiked CPU usage to 100%. That’s not good – joins and OR conditions simply do not mix [at
least on SQL Server].

I encountered this exact type of scenario at BPS. We have a DTS package for bringing over student data from the source location to another database [with a different schema] where the school assignments are done. One of the steps took 20 minutes, and I simply lived with it the first few times it ran. But because it’s best to ask if there’s a better way, I decided to investigate the step. With a 20 minute runtime, I fully expected to see some highly complex tangled web of SQL with 30 joins and 150 lines. I couldn’t have been more wrong – it was a simple query resembling the one above [with an OR
condition in the join]. In under 5 minutes, I’d broken apart the query into two separate steps and what previously took 20 minutes to run only took 2 seconds.

For the query above, here’s one solution [using aUNION] for breaking apart the OR. There are likely other ways and I invite comments for other approaches to removing the OR condition.

Collapse

SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON n1.num = n2.num - 400
INNER JOIN numbers n3 ON n2.num = n3.num - 300
UNION
SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON n1.num = n2.num + 400
INNER JOIN numbers n3 ON n2.num = n3.num - 300
ORDER BY n1.num, n2.num, n3.num

This query runs instantly with a numbers table having 10,000 rows. It finishes in 8 seconds for a table having 800,000 rows. If I had used the larger table with the first query, I’d probably have to reboot.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

debug trigger

Step one. create a new stored procedure using the code which will trigger the trigger.

Step two. debug the procedure.

Step three. Step through the trigger code using the F11 key or the Step Into button at the statement which could trigger the trigger, such as UPDATE, DELETE or UPDATE.

Step through the trigger until you exit back to the stored procedure, and continue to the end.

PS:  Suppose you can debug a procedure in sql server. 

reference: http://msdn.microsoft.com/en-us/library/ms165046(VS.80).aspx