开发者

Advanced: How to optimize my complex O(n²) algorithm

开发者 https://www.devze.com 2023-04-05 22:49 出处:网络
I have people and places data as: Person entity has IList<DateRangePlaces> each having IList<Place> of possible places

I have people and places data as:

  • Person entity has
    • IList<DateRangePlaces> each having
      • IList<Place> of possible places
    • Schedule day pattern as ie. 10 days available 4 unavailable

Within a particular DateRangePlaces date range one has to obey to Schedule pattern whether person can go to a particular place or not.

  • Place entity has
    • IList<DateRangeTiming> each defining opening/closing times within each date range

Overlapping date ranges work as LIFO. So for each day that has already been defined previously new timing definition takes preference.

The problem

Now I need to do something like this (in pseudo code):

for each Place
{
    for each Day between minimum and maximum date in IList<DateRangeTiming>
    {
        get a set of People applicable for Place and on Day
    }
}

This means that number of steps to execute my task is approx.:

(places)( ∑(days) × ∑(people) )

This to my understanding is

O(x × yx × z)

and likely approximates to this algorithm complexity:

O(n3)

I'm not an expert in theory so you can freely correct my assumptions. What is true is that this kind of complexity is definitely not acceptable especially given the fact that I will be operating over long date ranges with many places and people.

From the formula approximation we can see that people set would be iterated lots of times. Hence I would like to optimize at least this part. To ease things a bit I changed

Person.IList<DateRangePlaces>.IList<Place>

to

Person.IList<DateRangePlaces>.IDictionary<int, Place>

which would give me a faster result whether a person can go to some place on particular date, because I would only check whether Place.Id is present in the dictionary versus IList.Where() LINQ clause that would have to scan the whole list each and every time.

Question

  1. Can you suggest any additional optimizations I could implement into my algorithm to make it faster or even make it less complex in terms of the big O notation?

  2. Which memory structure types would you use where and why (lists, dictionaries, stacks, queues...) to improve performance?

Addendum: The whole problem is even more complex

There're also additional complexities that I didn't mention since I wanted to simplify my question to make it more clear. So. There's also:

Place.IList<Permission>
Person.IList<DateRangePermission>

So places require particular permissions and people have 开发者_运维知识库a limited time permission grants that expire.

Additional to that, there's also

Person.IList<DateRangeTimingRestriction>

which tells only particular times that person can go somewhere during particular date range. And

Person.IList<DateRangePlacePriorities>

Which defines place prioritization for a particular date range.

And during this process of getting applicable people I also have to calculate certain factor per every person per every place that's related to the:

  • number of places that a person can visit on particular day
  • person's place priority factor on that particular day

All these are the reasons why I decided to rather manipulate this data in memory than using a very complex stored procedure that would also be doing multiple table scans to get factors per person and place and day.

I think such stored procedure would be way to complex to handle and maintain. So I rather get all the data first (put it appropriate memory structures to aid performance) and then mangle with it in memory.


I suggest using a relational database and writing a stored procedure to retrieve the "set of People applicable for Place and on Day".

The stored procedure approach would not be complex nor difficult to maintain if the model is architected properly. Additionally, relational databases have primary keys and indexing to avoid table scans.

The only way to speed things up using collections would be:

  1. change the collection type. You could use a KeyedCollection, IDictionary<> or even a disconnected recordset. Disconnected recordsets also give you the ability to set foreign keys to child recordsets, however I think this would be a fairly complex pattern to use.

  2. maintain a collection within a collection - basically the same concept as a parent / child relationship with a foreign key. The object references will only be pointers to the original object's memory space or, if you're using a keyed collection you could simply store the index of the other collection.

  3. maintain boolean properties that can allow you to skip iterations if true or false. For example, as you build your entities, set a boolean of "HasPlaceXPermission". if the value is false, you know not to retrieve information related to place X.

  4. maintain flags - flags can be a very good optimization technique when used properly. Similar to #3, flags can be used to determine permissions very quickly, for example if((person.PlacePermissions & (Place.Colorado | Place.Florida) > 0) // do date/time scan on Colorado and Florida, else don't.

It's difficult to know which collection types I would use based upon the information you have provided, I would need a larger scope of the application to determine that architecturally. For example, where is the data stored, how is it retrieved, how is it prepared and how is it presented? Knowing how the application is architected would help to determine its optimization points.


You can't avoid O(n^2) as the minimal iteration you need is to pass every Place and every Date element to find a match for a given Person.

I think the best way is to use a DB similar to SQL server and run your query in SQL as a store procedure.


The date range is presumably fairly limited, perhaps never more than a few years. Call it constant. When you say, for each of those combinations, you need to "get a set of people applicable", then it's pretty clear: if you really do need to get all that data, then you can't improve the complexity of you solution, because you need to return a result for each combination.

Don't worry about complexity unless you're having trouble scaling with large numbers of people. Ordinary profiling is the place to start if you're having performance problems. O(#locations * #people) is not so bad.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号