开发者

Java algorithm to track parts of aggregated values

开发者 https://www.devze.com 2023-04-10 21:55 出处:网络
My program have evaluate hundreds of millions of records. So the que开发者_运维百科stion of memory and performance are important.

My program have evaluate hundreds of millions of records. So the que开发者_运维百科stion of memory and performance are important. Lets each record has key - ticketID. Also record has field value and field source_name. In source ticketID have from 1 to many (neary 100) source_name. I need aggregate only by ticketID - receive nearly 1 million of record, but also must have possibility subtract values for specified source_name - so I have track contributes.

Do exist some algorithms or data structures that allow resolve this problem?


I can't quite parse the question fully so I'll assume:

  • "nearly 1 million of record" means that there is nearly 1 million unique ticketID fields.
  • "nearly 100" different source_names in the system.
  • not all ticketIds have source_names. We don't have 100 million ticketID x source_name combinations.
  • You want to be able to total all of the ticketIds but also total by source_name.

With these assumptions I would use a Map of maps. The outer Map has a key of source_name and the value of the inner Map. The inner Map has a key of the ticketId and a cumulative value.

So the pseudo code would look something like:

Map<String, Map<Integer,Double>> valueMap =
    new HashMap<String, Map<Integer,Double>>();

while (...reading in and processing data...) {
    int ticketId = ...;
    String sourceName = ...;
    double entryValue = ...;

    Map<Integer,Double> sourceNameMap = valueMap.get(sourceName);
    Double value = sourceNameMap.get(ticketId);
    if (oldValue == null) {
        value = entryValue;
    } else {
        value += entryValue;
    }
    sourceNameMap.put(ticketId, value);
}

You can easily get the total by adding up each of the source_name maps. You can also keep a running total for each source_name of course if that helps. If your system can allocate a gigabyte to the JVM then it should be able to handle a good number of ticketID x source_name pairs.

You might consider creating a mutable internal value class to save on GC cycles:

private static class MutableValue {
    double value;
    public MutableValue(double value) {
        this.value = value;
    }
    public void add(double value) {
        this.value += value;
    }
}

So then you can say:

MutableValue value = sourceNameMap.get(ticketId);
if (oldValue == null) {
    sourceNameMap.put(new MutableValue(entryValue));
} else {
    value.add(entryValue);
}

If you edit your question, I'll edit my answer in case I've made some improper assumptions.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号