开发者

(while doing weighted sorting) How to allocate different coefficients to fields per request?

开发者 https://www.devze.com 2023-04-11 05:40 出处:网络
(I\'m attaching both Solr and SQL as tags because I don\'t know what to use in such situation. Maybe even something else)

(I'm attaching both Solr and SQL as tags because I don't know what to use in such situation. Maybe even something else)

Example:

Web application that must sort tasks based on Time and Price. The user has a slider that determines what's more important (Time or Price).

It has to do a weighted sorting where the 开发者_运维技巧score of the result depends on the Price and the Time but it must be possible to change the coefficients when the user slides towards Time or respectively Price.

Example 2:

User is trying to find the right socks. Wondering between how green and how long they should be. Again there's a slider between these two properties. If at the slider's 50% the user cares about how green they are as much as how long they are. If the slider's closer to the green end the user is more interested in how green the socks are but do want to be long as well.

I don't know what software to use or how to achieve this.


MS SQL SERVER answer...

DECLARE
  @min_time    DATETIME,
  @max_time    DATETIME,
  @min_price   MONEY,
  @max_price   MONEY
SELECT
  @min_time    = MIN(timestamp),
  @max_time    = MAX(timestamp),
  @min_price   = MIN(price),
  @max_price   = MAX(price)
FROM
  yourTable

SELECT
  *
FROM
  yourTable
ORDER BY
  (CAST(DATEDIFF(second, @min_time, timestamp) AS FLOAT) / CAST(DATEDIFF(second, @min_time, @max_time) AS FLOAT)) * @slider
  +
  (CAST(price - @min_price AS FLOAT) / CAST(@max_price - @min_price AS FLOAT)) * (1 - @slider)

-- Where te slider value is anything between 0 and 1

To make your sentiment work, I do the same calculation to both Time and Price - I convert them to a value from 0 to 1 (which I'll call it's positional weight).
- 0.0 = Equal to the minimum value for that field
- 0.5 = Exaclty half way between the min and max of that field
- 1.0 = Equal to the maximum value for that field

I then multiply the positional weight by the slider's value (or 1-value), and add the two results together.

When the slider is at 0 or 1, it's simple; one positional weight is multipled by one, one positional weight is multiplied by zero. In other words, one positional weight is unchanged, and one positional weight is ignored.

When the slider is at 0.5, half of each positional weight is added together.


In the case where 99.999% of values are close together and there is one extreme outlier, this can cause that field to be unusually dominant, or the opposite. (Most positional weights are very close to either 0 or 1)

As such, one option is to base the positional weight on the data order only. So, in the case of many values being close, but with one extreme outlier; the value in the middle of the list still gets given 0.5 as it's positional weight. In short - Its position in the sequence is important, not it's actual value.

DECLARE
  @count       FLOAT
SELECT
  @count       = CAST(COUNT(*) AS FLOAT)
FROM
  yourTable

WITH
  ordered_data
AS
(
SELECT
  ROW_NUMBER() OVER (ORDER BY timestamp) AS time_id,
  ROW_NUMBER() OVER (ORDER BY price)     AS price_id,
  *
FROM
  yourData
)
SELECT
  *
FROM
  ordered_data
ORDER BY
  (CAST(time_id AS FLOAT) / @count) * @slider
  +
  (CAST(price_id AS FLOAT) / @count) * (1 - @slider)


Which is best, why, etc, starts to get statistical, and depends on exactly what you are trying to achieve. Maybe you could take the average of the two different positional weights, and use those? Hopefully this gives you something to work with though.


Both answers force the positional weight to be a percentage. This is because TIME and PRICE can have vastly different scales. Making them percantages (0 to 1) forces them to be of the same scale. You may want to consider alternative mechanisms for choosing suitable scales, and these may be different for each field.

Each answer works out the positional weight relative to a fixed point : the lowest item in the list. You may wish to choose other reference points such as the MEAN, MODE or MEDIAN. In doing so you will have a range of positional weights of (-x to +y), with x and y potentially being very different values. You may then choose to re-weight these to be (-1 to +1). This will require scaling them along a curve, and you'll need to decide how to determine that curve.

Each answer works out a 'distance' from a fixed reference point as being 0 to 1, or in the previous paragraph -1 to +1. This assumes that both TIME and PRICE are always equally important. But what if you've chosen only the expensive items, where the positional weight should always be closer to 1? You'd need a mechanism to scale against "all possible values" rather than "all present values".

You have a lot of choice here, and what choice is right or wrong is dependant on the functional requirement you set out. I do not believe that there is a universal Truth to find. Perhaps you need to create some examples and work out what you WANT to happen, and then work out HOW?


In Solr, you could make a query like:

time:[t]^[wt] price:[p]^[wp]

where [t]=the requested time, [p]=the requested price, and [wt] and [wp] are weights; a higher weight gives more importance to the term it's applied to.

That will weight exact matches, but you can also compute continuous functions of the difference between the requested value and the exact value using a FunctionQuery;

see http://wiki.apache.org/solr/FunctionQuery for more info on that

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号