What is the fastest way to return x,y coordinates that are present in both list A and list B?_问答_开发者

I have two lists (list A and list B) of x,y coordinates where 0 < x < 4000, 0 < y < 4000, and they will always be integers. I need to know what coordinates are in both lists. What would be your suggestion for how to approach this?

I have been thinking about representing the lists as two grids of 开发者_JAVA技巧bits and doing bitwise & possibly?

List A has about 1000 entries and changes maybe once every 10,000 requests. List B will vary wildly in length and will be different on every run through.

EDIT: I should mention that no coordinate will be in lists twice; 1,1 cannot be in list A more than once for example.

Represent (x,y) as a single 24 bit number as described in the comments.

Maintain A in numerical order (you said it doesn't vary much, so this should be hardly any cost).

For each B do a binary search on the list. Since A is about 1000 items big, you'll need at most 10 integer comparisons (in the worst case) to check for membership.

If you have a bit more memory (about 2MB) to play with you could create a bit-vector to support all possible 24 bit numbers then then perform a single bit operation per item to test for membership. So A would be represented by a single 2^24 bit number with a bit-set if the value is there (otherwise 0). To test for membership you would just use an appropriate bit and operation.

Put the coordinates of list A into some kind of a set (probably a hash, bst, or heap), then you can quickly see if the coordinate from list B is present.

Depending on whether you're expecting the list to be present or not present in the list would determine what underlying data structure you use.

Hashes are good at telling you if something is in it, though depending on how it's implemented, could behave poorly when trying to find something that isn't in it.

bst and heaps are equally good at telling you if something is in it or not, but don't perform theoretically as well as hashes when something is in it.

Since A is rather static you may consider building a query structure and check of all elements in B whether they occur in A. One example would be an std::set > A and you can query like A.find(element_from_b) != A.end() ...

So the running time in total is worst case O(b log a) (where b is the number of elements in B, and a respectively). Note also that since a is always about 10000, log a basically is constant.

Define an ordering based on their lexicographic order (sort first on x then on y). Sort both lists based on that ordering in O(n log n) time where n is the larger of the number of elements of each list. Set a pointer to the first elment of each list and advance the one that points to the lesser element; when the pointers reference to elements with the same value, put them into a set (to avoid multiplicities within each list). This last part can be done in O(n) time (or O(m log m) where m is the number of elements common to both lists).

Update (based on comment below and edit above): Since no point appears more than once in each list, then you can use a list or vector or dequeue to hold the points common to both or some other (amortized) constant time insertion realizing the O(n) time performance regardless of the number of common elements.

This is easy if you implement an STL predicate which orders two pairs (i.e. return (R.x < L.x || (R.x==L.x && R.y < L.y). You can then call std::list::sort to order them, and std::set_intersection to find the common elements. No need to write the algoritms

This is the kind of problem that just screams "Bloom Filter" at me.

If I understand correctly, you want the common coordinates in X and Y -- the intersection of (sets) Listing A and B? If you are using STL:

#include <vector>
#include <std>
using namespace std;
// ...
set<int> a; // x coord (assumed populated elsewhere)
set<int> b; // y coord (assumed populated elsewhere)
set<int> in; // intersection
// ...
set_intersection(a.begin(), a.end(), b.begin(), b.end(), insert_iterator<set<int> >(in,in.begin()));

I think hashing is your best bet.

//Psuedocode: