开发者

minimal checks to find repeats in a list

开发者 https://www.devze.com 2023-04-07 13:33 出处:网络
Given a sequence (d1, d2, ..., dn), I want to evaluate the product (1 - Dij) for all i != j where Dij = 1 if di = dj and 0 otherwise.

Given a sequence (d1, d2, ..., dn), I want to evaluate the product (1 - Dij) for all i != j where Dij = 1 if di = dj and 0 otherwise.

The code I have only checks Dij when i

prod = 1;
for (int i=1; i<n; ++i) {
    for (int j=i; j<=n; ++j) {
        prod *= (1 - Dij);
    }
}

I know I can stop when I get Dij=1, but what I'm trying to do is get a minimal expression of the Dij's to check. This way I have one expression and then I can use difference sequences and evaluate it. So I know that I can do i<j instead of i != j. So I want to expand out this product and get something like this for n=3:

(1 - D12) (1 - D13) (1 - D23) = 1 - D12 - D13 - D23 + D12*D13 + D12*D23 + D13*D23 - D12*D13*D23

But there is more that I can do. This expression is actually always equal to

1 - D12 - D13 - D23 + 3 * D12*D13 - D12*D13*D23

My questions are:

  1. Why is D12 * D13 = D12 * D23? This is always true (meaning it doesn't matter what the d sequence is) but I don't really get why because it seems to me that this means D13 = D23 which isn't always true (it depends on the d sequence). This is the relation that helps make the expression smaller.

  2. How can I find all the relations like this 开发者_StackOverflow中文版and get a minimal expression? Is the expression above minimal? I don't even know.


You are trying to determine if D contains any duplicates or not. Ultimately, that requires you to compare every entry with each other, which is simply enumerating all unique combinations of two elements. That ends up being N*(N-1)/2. You can do a little better by sorting D first and then searching for duplicate adjacent pairs (O(N*log(N)), or, assuming you are sticking to a bounded range of integers, you can reduce it to linear time with a bit vector, or if you are feeling adventurous, a radix sort.


I can answer 1 for you. Consider these two cases:

Case 1: D13 = D23

Here you can just multiply by D12 on both sides to get D12 * D13 = D12 * D23.

Case 2: D13 != D23

This means that either d1 = d3 XOR d2 = d3 but not both. Therefore we know that d1 != d2. This implies that D12 = 0. Therefore

D12 * D13 = 0 * D13 = 0 = 0 * D23 = D12 * D23

The problem with your logic when you think this implies D13 = D23 is that you cannot divide by 0, and D12 might be 0 (as always happens in the second case).


Your second question is interesting, and I do not know the answer off the top of my head, but here are some observations that may be helpful.

Draw the numbers 1, 2, ..., n in a row:

1  2  3 ... n

Given an expression D_(i1,j1) * D_(i2,j2) * ... * D_(ik,jk), make an arc from i1 to j1 and i2 to j2 and so on. This turns that row into a graph (vertices are numbers, edges are these arcs).

Each connected component of that graph represents a subset of the number 1, 2, ..., n, and taken as a whole this gives us a set partition of {1, 2, ..., n}.

Fact: Any two terms that have the same corresponding set partition are equal.

Example:

D12 * D23 = D12 * D13

               ---------
              |         |
1 -- 2 -- 3 = 1 -- 2    3

Sometimes this fact will mean the degree is the same, as in the case above, and sometimes the degree will decrease as in

D12 * D13 * D23

 ---------
|         |
1 -- 2 -- 3

The upshot is that now you can express the product (1 - Dij) as a sum over set partitions:

\prod_{i<j} ( 1 - Dij ) = \sum_{P set partition of \{1,2,...,n\}} c_P * m_P

where the monomial term is given by

mP = mP1 * mP2 * ... * mPk

when P = P1 union P2 union ... union Pk and if Pi = { a < b < c < ... < z } then

m_Pi = Dab * Dac * ... * Daz

Finally, the coefficient term is just

c_P = \prod (#P1)! (#P2)! ... (#Pn)!

Having worked this out, I'm now certain this belongs on http://math.stackexchange.com rather than here.


I haven't followed the math, but isn't this something you would code using a hashtable, or possibly even a sparse bit-array if you know the size of the di are bounded? Just iterate over the list, filling in your data structure with a "1" at the position corresponding to the value of di - if it's already 1, return 0. If you complete (in n steps), return 1. It should be O(n).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号