开发者

How to generate a random partition from an iterator in Python

开发者 https://www.devze.com 2023-01-17 12:13 出处:网络
Given the desired number of partitions, the partitions should be nearly equal in size. This question handles the problem for a list. They do not have the random property, but that is easily added. My

Given the desired number of partitions, the partitions should be nearly equal in size. This question handles the problem for a list. They do not have the random property, but that is easily added. My problem is, that I have an iterator as input, so shuffle does not apply. The reason for that is开发者_StackOverflow社区 that I want to randomly partition the nodes of graph. The graph can be very large, so I am looking for a solution that does not just create an intermediate list.

My first idea was to use compress() with a random number function as selector. But that only works for two partitions.


You could just create k list. When you receive a value, pick a random integer x between 0 and k-1, and put that value into the x-th list.

On average each list will contain N/k elements, but with a standard deviation of √(N * 1/k * (1-1/k)).

def random_partition(k, iterable):
  results = [[] for i in range(k)]
  for value in iterable:
    x = random.randrange(k)
    results[x].append(value)
  return results


You're just dealing to various partitions, right?

def dealer( iterator, size ):
    for item in iterator
        yield random.randrange( size ), item

Won't that get you started by assigning each item to a partition?

Then you can do something like this to make lists. Maybe not a good thing, but it shows how to use the function.

def make_lists( iterator, size ):
    the_lists = []*size
    for partition, item in dealer( iterator, size ):
        the_lists[partition].append(item)
    return the_lists


You can make the length of lists more uniform by adjusting the weights depending on the number of nodes generated so far in each partition. They'll be roughly equal-length if you pick a function so that the weight is 0 when (number of nodes in partition n) > (number of nodes)/(number of partions), i.e.

weight[i] = max(numNodes/numPartitions - nodesSoFar[i],0)

(The max() is to stop negative weights, which might happen if you have 4 nodes and 3 partitions.)

Then pick a random number from 1 to sum(weights) (or 0 to sum(weights)-1) and pick the partition appropriately.

compress() works provided you use a different selector per partition; something like (x == n for x in random_partition_numbers) where random_partition_numbers is a generator. You'll need to copy random_partition_numbers for each partition, of course. This design is inherently slower, since it needs to iterate through the list of nodes for each partition.

0

精彩评论

暂无评论...
验证码 换一张
取 消