开发者

How to ensure that randomly generated numbers are not being repeated? [duplicate]

开发者 https://www.devze.com 2023-01-04 03:19 出处:网络
This question already has answers here: Closed 12 years ago. Possible Duplicates: Unique (non-repeating) random numbers in O(1)?
This question already has answers here: Closed 12 years ago.

Possible Duplicates:

Unique (non-repeating) random numbers in O(1)?

How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N

I want to generate random number in a certain diapason, and I must be sure, that each new number is not a duplicate of formers. One solution is to store formerly generated numbers in a container and each new number checks aginst the container. If there is such number in the container, then we generate agin, else we use and add it to the container. But with each new number this operation is becoming slower and slower. Is there any better appr开发者_如何学Coach, or any rand function that can work faster and ensure uniqueness of the generation?

EDIT: Yes, there is a limit (for example from 0 to 1.000.000.000). But I want to generate 100.000 unique numbers! (Would be great if the solution will be by using Qt features.)


Is there a range for the random numbers? If you have a limit for random numbers and you keep generating unique random numbers, then you'll end up with a list of all numbers from x..y in random order, where x-y is the valid range of your random numbers. If this is the case, you might improve speed greatly by simply generating the list of all numbers x..y and shuffling it, instead of generating the numbers.


I think there are 3 possible approaches, depending on range-size, and performance pattern needed you can use another algorithm.

  1. Create a random number, see if it is in (a sorted) list. If not add and return, else try another.
    • Your list will grow and consume memory with every number you need. If every number is 32 bit, it will grow with at least 32 bits every time.
    • Every new random number increases the hit-ratio and this will make it slower.
    • O(n^2) - I think
  2. Create an bit-array for every number in the range. Mark with 1/True if already returned.
    • Every number now only takes 1 bit, this can still be a problem if the range is big, but every number now only allocates 1 bit.
    • Every new random number increases the hit-ratio and this will make it slower.
    • O(n*2)
  3. Pre-populate a list with all the numbers, shuffle it, and return the Nth number.
    • The list will not grow, returning numbers will not get slower,
    • but generating the list might take a long time, and a lot of memory.
    • O(1)

Depending on needed speed, you could store all lists in a database. There's no need for them to be in memory except speed.


Fill out a list with the numbers you need, then shuffle the list and pick your numbers from one end.


If you use a simple 32-bit linear congruential RNG (such as the so-called "Minimal Standard"), all you have to do is store the seed value you use and compare each generated number to it. If you ever reach that value again, your sequence is starting to repeat itself and you're out of values. This is O(1), but of course limited to 2^32-1 values (though I suppose you could use a 64-bit version as well).


There is a class of pseudo-random number generators that, I believe, has the properties you want: the Linear congruential generator. If defined properly, it will produce a list of integers from 0 to N-1, with no two numbers repeating until you've used all of the numbers in the list once.

#include <stdint.h>

/*
 * Choose these values as follows:
 *
 * The MODULUS and INCREMENT must be relatively prime.
 * The MULTIPLIER-1 must be divisible by all prime factors of the MODULUS.
 * The MULTIPLIER-1 must be divisible by 4, if the MODULUS is divisible by 4.
 *
 * In addition, modulus must be <= 2**32 (0x0000000100000000ULL).
 *
 * A small example would be 8, 5, 3.
 * A larger example would be 256, 129, 251.
 * A useful example would be 0x0000000100000000ULL, 1664525, 1013904223.
 */

#define MODULUS    (0x0000000100000000ULL)
#define MULTIPLIER (1664525)
#define INCREMENT  (1013904223)

static uint64_t seed;

uint32_t lcg( void ) {
    uint64_t temp;

    temp = seed * MULTIPLIER + INCREMENT;   // 64-bit intermediate product
    seed = temp % MODULUS;                  // 32-bit end-result

    return (uint32_t) seed;
}

All you have to do is choose a MODULUS such that it is larger than the number of numbers you'll need in a given run.


It wouldn't be random if there is such a pattern?

As far as I know you would have to store and filter all unwanted numbers...


unsigned int N = 1000;
vector <unsigned int> vals(N);
for(unsigned int i = 0; i < vals.size(); ++i)
   vals[i] = i;
std::random_shuffle(vals.begin(), vals.end());

unsigned int random_number_1 = vals[0];
unsigned int random_number_2 = vals[1];
unsigned int random_number_3 = vals[2];
//etc


You could store the numbers in a vector, and get them by index (1..n-1). After each random generation, remove the indexed number from the vector, then generate the next number in the interval 1..n-2. etc.


If they can't be repeated, they aren't random.

EDIT:

Furthermore..

if they can't be repeated, they don't fit in a finite computer


How many random numbers do you need? Maybe you can apply a shuffle algorithm to a precalculated array of random numbers?


There is no way a random generator will output values depending on previously outputted values, because they wouldn't be random. However, you can improve performance by using different pools of random values each with values combined by a different salt value, which will divide the quantity of numbers to check by the quantity of pools you have.


If the range of the random number doesn't matter you could use a really large range of random numbers and hope you don't get any collisions. If your range is billions of times larger than the number of elements you expect to create your chances of a collision are small but still there. If the numbers don't to have an actual random distribution you could have a two part number {counter}{random x digits} that would ensure a unique number but it wouldn't be randomly distributed.


There's not going to be a pure functional approach that isn't O(n^2) on the number of results returned so far - every time a number is generated you will need to check against every result so far. Additionally, think about what happens when you're returning e.g. the 1000th number out of 1000 - you will require on average 1000 tries until the random algorithm comes up with the last unused number, with each attempt requiring an average of 499.5 comparisons with the already-generated numbers.

It should be clear from this that your description as posted is not quite exactly what you want. The better approach, as others have said, is to take a list of e.g. 1000 numbers upfront, shuffle it, and then return numbers from that list incrementally. This will guarantee you're not returning any duplicates, and return the numbers in O(1) time after the initial setup.


You can allocate enough memory for array of bits with 1 bit for each possible number. and check/set bits for every generated number. for example for numbers from 0 to 65535 you will need only 8192 (8kb) of memory.


Here's an interesting solution I came up with:

Assume you have numbers 1 to 1000 - and you don't have enough memory.

You could put all 1000 numbers into an array, and remove them one by one, but you'll get memory overflow error.

You could split the array in two, so you have an array of 1-500 and one empty array

You could then check if the number exists in array 1, or doesn't exist in the second array.

So assuming you have 1000 numbers, you can get a random number from 1-1000. If its less than 500, check array 1 and remove it if present. If it's NOT in array 2, you can add it.

This halves your memory usage.

If you propogate this using recursion, you can split your 500 array into a 250 and empty array.

Assuming empty arrays use no space, you can decrease your memory usage quite a bit.

Searching will be massively faster too, because if you break it down a lot, you generate a number such as 29. It's less than 500, less than 250, less than 125, less than 62, less than 31, greater than 15, so you do those 6 calculations, then check the array containing an average of 16/2 items - 8 in total.

I should patent this search, although I bet it already exists!


Especially given the desired number of values, you want a Linear Feedback Shift Register.

Why?

No shuffle step, nor a need to keep track of values you've already hit. As long as you go less than the full period, you should be fine.

It turns out that the Wikipedia article has some C++ code examples which are more tested than anything I would give you off the top of my head. Note that you'll want to be pulling values from inside the loops -- the loops just iterate the shift register through. You can see this in the snippet here.

(Yes, I know this was mentioned, briefly in the dupe -- saw it as I was revising. Given it hasn't been brought up here and is the best way to solve the poster's question, I think it should be brought up again.)


Let's say size=100.000 then create an array with this size. Create random numbers then put them into array.Problem is which index that number will be ? randomNumber%size will give you index.

When u put next number, use that function for index and check this value is exist or not. If not exist put it if exist then create new number and try that. U can create in fastest way with this way. Disadvange of this way is you will never find numbers which last section is same.

For example for last sections is 1231232444556 3458923444556

you will never have such numbers in your list even if they are totally different but last sections are same.


First off, there's a huge difference between random and pseudorandom. There's no way to generate perfectly random numbers from a deterministic process (such as a computer) without bringing in some physical process like latency between keystrokes or another entropy source.

The approach of saving all the numbers generated will slow down the computation rather quickly; the more numbers you have, the larger your storage needs, until you've filled up all available memory. A better method would be (as someone's already suggested) using a well known pseudorandom number generator such as the Linear Congruential Generator; it's super fast, requiring only modular multiplication and addition, and the theory behind it gets a lot of mention in Vol. 2 of Knuth's TAOCP. That way, the theory involved guarantees a rather large period before repetition, and the only storage needed are the parameters and seed used.


If you have no problem when a value can be calculated by the previous one, LFSR and LCG are fine. When you don't want that one output value can be calculated by another, you can use a block cipher in counter mode to generate the output sequence, given that the cipher block length is equal to the output length.


Use Hashset generic class . This class does not contain same values. You can put in all of your generated numbers then u can use them in Hashset.You can also check it if it is exist or not .Hashset can determine existence of items in fastest way.Hashset does not slow when list become bigger and this is biggest feature of it.

For example :

HashSet<int> array = new HashSet<int>();
            array.Add(1);
            array.Add(2);
            array.Add(1);
            foreach (var item in array)
            {
                Console.WriteLine(item);
            }
            Console.ReadKey();
0

精彩评论

暂无评论...
验证码 换一张
取 消