How to improve the SPARQL query performance of SDB?_问答_开发者

How to improve the SPARQL query performance of SDB?

开发者 https://www.devze.com 2023-04-05 14:49 出处：网络

In my application, i used the SPARQL database is SDB of Jena, and the database server is DB2. but i find the query performance of SPARQL is very low.

who can help me to solve this problem? how to improve the sparql query performance,special is the query performance of SDB?

Below is my test case data and the SPARQL:

Test case:

total rdf triple counts are 13294. the query result triple counts are 420. the query spent 42 seconds.

the SPARQL is:

SELECT DISTINCT ?s ?name ?ownerId ?status ?time 
  ?value ?startTime ?endTime ?description 
WHERE 
{
  ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> "http://www.w3c.com/schemas/cp#Event" .
  ?s <http://www.w3c.com/schemas/cp#time> ?time .
  ?s <http://www.w3c.com/schemas/cp#ownerId> ?ownerId .
  ?s <http://www.w3c.com/schemas/cp#name>  ?name .
  ?s <http://www.w3c.com/schemas/cp#value> ?value .
  ?s <http://www.w3c.com/schemas/cp#_status> ?status .
  ?s <http://www.w3c.com/schemas/cp#start_Time> ?startTime .
  ?s <http://www.w3c.co开发者_JS百科m/schemas/cp#end_Time> ?endTime .
  ?s <http://www.w3c.com/schemas/cp#description> ?description .
  FILTER(xsd:dateTime(?time) >= "2011-08-12T00:00:00"^^xsd:dateTime  
    && xsd:dateTime(?time) <= "2011-09-18T23:59:59"^^xsd:dateTime) 
}

The query performance of any Triplestore like SDB is always going to be worse than a native triplestore because an SQL backed triplestore like SDB has to down-compile SPARQL into SQL which often creates horrendously complex SQL queries.

So taking your example you've asked for 9 triple patterns to be matched which will generate an SQL SELECT containing 9 INNER JOIN operations which will take a lot of time to start with.

Then you are applying a FILTER to those triple patterns, the problem you have with this is that unless the filter expression is very simple or close enough to SQL to be converted into it a FILTER has to be evaluated in Java code in memory. What this means in practise is that you are selecting our all the possible events in the triplestore and then filtering for date range in-memory using Java which is always going to make your query slower.

Unless there is a specific reason you want to use SDB I'd really suggest looking at Jena's native triple store TDB or TDB2. It is designed to do the types of Joins required by SPARQL queries much more efficiently and the way it stores the data allows it to do more complicated filters like your date range one much much faster.