I've got an anonymous table in which the are two columns: UserId and PhoneNumber. 
It was selected from开发者_如何学编程 Call Details record table. Now I would like to create a network based on similarity between users. There should be a connection between users if they called to at least 3 the same numbers.
There are more than 20 million rows. When I use a simple program written in C#, it would take more then 4 days to accomplish this task. I wonder, is it possible to write SQL query which would give me the same result and if there is a similarity simply insert a row into a new table with two columns, user1 and user2, or just give it to the output?
Maybe there is some other good solution to accomplish this task?
Assuming your table is called CallingList, then you should be able to use a query like this:
SELECT C1.UserID AS User1, C2.UserID AS User2
  FROM CallingList AS C1
  JOIN CallingList AS C2 ON C1.PhoneNumber = C2.PhoneNumber
 WHERE C1.UserID < C2.UserID
 GROUP BY C1.UserID, C2.UserID
HAVING COUNT(*) >= 3
Whether that will be faster than the C# remains to be seen.
Make sure you have an index on CallingList(PhoneNumber) unless your optimizer will create one automatically behind the scenes.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论