开发者

How to join two tables on three conditions?

开发者 https://www.devze.com 2023-03-21 07:31 出处:网络
hope someone can figure out what I\'m doing wrong here. The task seems quite simple, but is apparently beyond me.

hope someone can figure out what I'm doing wrong here. The task seems quite simple, but is apparently beyond me.

I have two tables and I am trying to append precipitation data. Three conditions need to match in the two tables to get the correct precipitation data for each field site, e.g. year, latitude, and longitude. I tried by using the following queries (among other failed queries that I can't even remember any more):

SELECT f.*, g.* 
  FROM fieldSites f  LEFT OUTER JOIN gpcp_precipitation2 g
    ON f.date = g.year 
   AND f.d_lat = g.lat
 WHERE f.d_lon = g.lon; 

this one timed out

and:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

this one also timed out on me.

I'd like to dump to a .csv file, but right now I just want to execute a successful query.

Here are my tables:

left table: fieldSites

siteId  d_lat  d_lon   year  data1  data2  country      
  1     -13.75  18.75   2009  0.598  0.351  Angola       
  1     -13.75  18.75   2008  0.654  0.330  Angola       
  1     -13.75  18.75   2007  0.489  0.381  Angola       
  1     -13.75  18.75   2006  0.554  0.389  Angola       
  1     -13.75  18.75   2005  0.321  0.321  Angola       
  1     -13.75  18.75   2004  0.598  0.351  Angola       
  1     -13.75  18.75   2003  0.654  0.330  Angola       
  1     -13.75  18.75   2002  0.489  0.381  Angola       
  1     -13.75  18.75   2001  0.554  0.389  Angola开发者_运维技巧       
  2     -78.75  163.75  2009  0.285  0.155  Antarctica   
  2     -78.75  163.75  2008  0.285  0.155  Antarctica   
  2     -78.75  163.75  2007  0.285  0.155  Antarctica   
  2     -78.75  163.75  2006  0.285  0.155  Antarctica   
  2     -78.75  163.75  2005  0.285  0.155  Antarctica   
...1052 sites, 11 years, 11496 rows

right table: gpcp_precipitation2

siteId   lat    lon   year  precipitation
1        81.5   1.25  2009  93.36571912   
1        81.5   1.25  2008  93.36571912   
1        81.5   1.25  2007  93.36571912   
1        81.5   1.25  2006  93.36571912   
1        81.5   1.25  2005  93.36571912   
1        81.5   1.25  2004  93.36571912   
1        81.5   1.25  2003  93.36571912   
1        81.5   1.25  2002  93.36571912   
1        81.5   1.25  2001  93.36571912   
1        81.5   1.25  2000  93.36571912   
1        81.5   3.75  2009  93.36571912 
1        81.5   3.75  2008  93.36571912   
1        81.5   3.75  2007  93.36571912

... 92300 rows  

What I want is this:

siteId  d_lat  d_lon   year  data1  data2  country      precipitation  
  1     13.75  18.75   2009  0.598  0.351  Angola       144.286
  1     13.75  18.75   2008  0.654  0.330  Angola       114.970
  1     13.75  18.75   2007  0.489  0.381  Angola       70.000
  1     13.75  18.75   2006  0.554  0.389  Angola       174.179
  1     13.75  18.75   2005  0.321  0.321  Angola       174.743
  1     13.75  18.75   2004  0.598  0.351  Angola       70.506
  1     13.75  18.75   2003  0.654  0.330  Angola       173.716
  1     13.75  18.75   2002  0.489  0.381  Angola       74.162
  1     13.75  18.75   2001  0.554  0.389  Angola       139.445
  2     78.75  163.75  2009  0.285  0.155  Antarctica   0
  2     78.75  163.75  2008  0.285  0.155  Antarctica   0
  2     78.75  163.75  2007  0.285  0.155  Antarctica   0
  2     78.75  163.75  2006  0.285  0.155  Antarctica   0

Am I doing something completely stupid? I am stumped. Thanks so much for any advice.


Select fieldSites.*, precipitation.*
From fieldSites
Inner Join gpcp_precipitation2 As precipitation On precipitation.siteId = fieldSites.siteId
Where
    fieldSites.d_year = precipitation.year And
    fieldSites.d_lat = precipitation.lat And
    fieldSites.d_lon = precipitation.lon

If that query is timing out, you have an indexing problem, not necessarily a query problem. This gives you several predicates in the where clause to filter on, so it should reduce your joins quite a bit, but you may need an index that includes siteId, year, lat, and lon on both tables.


Add an index on (date, d_lat, d_lon) to the first table and a (year, lat, lon) index to the second table. Then, try the joins.

From your comments, I suggest you use the second query:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON  fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year ;

Can you also post the query plan for the above, now that you have added some indexes? (use EXPLAIN SELECT ... )


SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

The last line:

AND fieldSites.date = gpcp_precipitation2.year);

According to your tables this should be fieldSites.year Is that a typo or the error?

0

精彩评论

暂无评论...
验证码 换一张
取 消