Set allowing quick insert/deletion and random selection in C#_问答_开发者

What data structure could I use in C# to allow quick insertion/deletion as well as uniform random selection? A List has slow deletion by element (since it开发者_开发百科 needs to find the index of the element each time), while a HashSet does not seem to allow random selection of an element (without copying to a list.)

The data structure will be updated continuously, so insertion and deletion need to be online procedures. It seems as if there should be a way to make insertion, deletion, and random selection all O(log n).

A binary search tree with arbitrary integer keys assigned to the objects would solve all of these problems, but I can't find the appropriate class in the C# standard library. Is there a canonical way to solve this without writing a custom binary search tree?

There is already a BST in the C# BCL, it's called a SortedDictionary<TKey, TValue>, if you don't want Key Value Pairs, but instead want single items, you can use the SortedSet<T> (SortedSet is in .NET 4.0).

It sounds like from your example you'd want a SortedDictionary<int, WhateverValueType>. Though I'm not sure exactly what you are after when you say "uniform random selection".

Of course, the Dictionary<TKey, TValue> is O(1) which is much faster. So unless you have a need for sorted order of the keys, I'd use that.

UPDATE: From the sounds of your needs, you're going to have a catch-22 on efficiency. To be able to jump into a random contiguous index in the data structure, how often will you be inserting/deleting? If not often, you could use an array and just Sort() after (O(n log n)), or always insert/delete in order (O(n)).

Or, you could wrap a Dictionary<int, YourType> and keep a parallel List<int> and update it after every Add/Delete:

_dictionary.Add(newIndex, newValue);
_indexes.Add(newIndex);

And then just access a random index from the list on lookups. The nice thing is that in this method really the Add() will be ~ O(1) (unless List resizes, but you can set an initial capacity to avoid some of that) but you would incurr a O(n) cost on removes.

I'm afraid the problem is you'll either sacrifice times on the lookups, or on the deletes/inserts. The problem is all the best access-time containers are non-contiguous. With the dual List<int>/Dictionary<int, YourValue> combo, though, you'd have a pretty good mix.

UPDATE 2: It sounds like from our continued discussion that if that absolute performance is your requirement you may have better luck rolling your own. Was fun to think about though, I'll update if I think of anything else.

Binary search trees and derived structures, like SortedDictionary or SortedSet, operate by comparing keys.

Your objects are not comparable by itself, but they offer object identity and a hash value. Therefore, a HashSet is the right data structure. Note: A Dictionary<int,YourType> is not appropriate because removal becomes a linear search (O(n)), and doesn't solve the random problem after removals.

Insert is O(1)
Remove is O(1)
RandomElement is O(n). It can easily be implemented, e.g.
```
set.ElementAt(random.Next(set.Count))
```
No copying to an intermediate list is necessary.

I realize that this question is over 3 years old, but just for people who come across this page:

If you don't need to keep the items in the data set sorted, you can just use a List<ItemType>.

Insertion and random selection are O(1). You can do deletion in O(1) by just moving the last item to the position of the item you want to delete and removing it from the end.

Code:

using System; // For the Random
using System.Collections.Generic; // The List

// List:
List<ItemType> list = new List<ItemType>();

// Add x:
ItemType x = ...; // The item to insert into the list
list.Add( x );

// Random selection
Random r = ...; // Probably get this from somewhere else
int index = r.Next( list.Count );
ItemType y = list[index];

// Remove item at index
list[index] = list[list.Count - 1]; // Copy last item to index
list.RemoveAt( list.Count - 1 ); // Remove from end of list

EDIT: Of course, to remove an element from the List<ItemType> you'll need to know its index. If you want to remove a random element, you can use a random index (as done in the example above). If you want to remove a given item, you can keep a Dictionary<ItemType,int> which maps the items to their indices. Adding, removing and updating these indices can all be done in O(1) (amortized).

Together this results in a complexity of O(1) (amortized) for all operations.