开发者

Large SQL dataset query using java

开发者 https://www.devze.com 2023-04-11 12:32 出处:网络
I have the following configuration: SQL Server 2008 Java as backend technology - Spring + Hibernate Basically what I want to do is a select with a where clause on a table. The problem is the tabl

I have the following configuration:

  • SQL Server 2008
  • Java as backend technology - Spring + Hibernate

Basically what I want to do is a select with a where clause on a table. The problem is the table has about 700M entries and the query takes a really long time.

Can you please indicate some pointers on where to optimize the query or what sort of techniques are can I use in order to get an improvement in perfor开发者_开发百科mance?

Thanks.


Using indexes is the standard technique used to deal with this problem. As requested, here are some pointers that should get you started:

  • http://odetocode.com/articles/70.aspx
  • http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/
  • http://www.petri.co.il/introduction-to-sql-server-indexes.htm


The first thing I do in this case is isolate whether it is the amount of data I am returning that is the problem or not (an i/o issue). A simple non-scientific way to do this is change your query to just return the count:

select count(*) --just return a count, no data!
from MyTable
inner join MyOtherTable on ...
where ...

If this runs very quickly, it tells you your indexes are in order (assuming no sub-selects in your WHERE clause). If not, then you need to work on indexes, the WHERE clause, or your query construction itself (JOINs being done, etc).

Once that is satisfactory, add back in your SELECT clause. If it is slow, you are going to have to look at your data access pattern:

  • Can you return fewer columns?
  • Can you return fewer rows at once?
  • Is there caching you can do in the application layer?
  • Is this query a candidate for partitioned/materialized views (if your database supports those)?


I would run Profiler to find the exact query that is being generated. ORMs can create less than optimal queries. Once you know the query, you can run it in SSMS and see the execution plan. This will give you clues as to where you have performance problems.

Several things that can cause performance problems:

  • Lack of correct indexing (Foreign keys should be indexed if you have joins as well as the criteria in the where clause)
  • Lack of sargability in the where clause forcing the query to not use existing indexes
  • Returning more columns than are needed
  • Correlated subqueries and scalar functions that cause row-by-agonzing-row operations
  • Returning too much data (will anybody really be looking at 1 million records returned? You only want to return the amount you show on page not the whole possible recordset)
  • Locking and blocking

There's more (After all whole very long books are written o nthis subject) but that should be enough to get you started at where to look.


You should provide some indexes for those column you often use to restrict the result. Other thing is the pagination of the result set.


Regardless of the specific DB, I would do the following:

  1. run an explain analyze
  2. make sure you have an index for the columns that are part of your where clause
  3. If indexes are ok, it's very likely that you are fetching a lot of records from disk, which is very slow: if you really cannot refine your query so that you fetch fewer records, consider clustering your table, to improve disk locality of your records.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号