开发者

Collection of an own datatype or HashMap

开发者 https://www.devze.com 2023-04-13 08:47 出处:网络
I have to store information to objects of type \"Person\" in a data structure. The information could for example be a simple integer value. The integer values will change frequently, and also the pers

I have to store information to objects of type "Person" in a data structure. The information could for example be a simple integer value. The integer values will change frequently, and also the persons for whom information is stored can change. Two things are important:

  1. It should be possible to quickly look up if there is information stored for a specific person.
  2. I will have to have a huge number of such data structures, so memory is important.

There are two different ways I can think of. First of course I could create an own data type which has both a refer开发者_如何学Goence to a person as well as an integer as fields. Problem: I guess every time I want to know if there is information for a specific person I'll have to look through all the objects and call the getter-method for the person. Second I could use a HashMap with Person as key and Integer as Value. From an object oriented point of view this might not be as elegant as the first possibility. Furthermore and even worse, HashMaps seem to consume more memory than simpler collections (apart from this I like them really well, it seems one often has to link two different objects). If each takes for example a KB than it would be already problematic for me (I'll might need the described data structure about a million times).

Which variant would you suggest? Or can you think of a third, better possibility?

Thanks and kind regards

Patrick


A 1KB person object seems a bit steep to me. You would need some 256 int fields to get to that size.

As for the HashMap, I think it's a perfectly good solution, although I would use Person as a value and some integer or string identifier as key.

From a quick look at the source (disregarding the size of the map object itself) each map Entry object has an int and 3 references, so that would be 16 bytes on a 32bit VM; if you have 20 int fields (80 bytes) in the Person object and an int as key, the total memory for you Entry + Person + int key would go around 100 bytes. Under these conditions you would need about 100Mb for a million persons with 20 int fields (is it too much?)

As for the information in itself, there are a few optimizations you can do:

  • Perhaps a byte will suffice for a person's age (sadly we don't get past 127 years much :P). Think about the values of the data you need and if a byte or a short are enough.
  • If you are going to need a person's name, rather than keeping it as a single String, consider a String[] with the various names, this way you take advantage of the String constant pool and any single names that repeat themselves will only have one instance in the jvm.
  • Although this is not always the case (it depends on the jvm implementation), most of the times a boolean is 32 bits, so if you are really pressed for memory and you have a lot of boolean fields, use a single byte or short field and bit mask it. You can get 8 "booleans" of a byte and 16 of a short.

However, note that perhaps these optimizations won't even be necessary and they will definitly impact the readability of your code. In the end the best approach would probably be to run a few tests and optimize as necessary.


I'm not sure why you think HashMap is less elegant then what you'd write (but then I don't know how amazing your programming skills are). Maybe you mean that a HashMap has more methods than you'll need. HashMaps are designed for speedy lookups and for memory efficiency (as opposed to TreeMaps which prioritize sortability). With regards to memory efficiency, have you checked how memory grows? It could be that HashMaps look like memory pigs at low numbers of items compared to other non-hashed maps, but then that memory usage grows very slowly compared to others (which start small but grow big and fast). A KB per entry seems a bit large (which is why I think you'll find the true size is much smaller when you measure with enough samples), but then again, maybe not, and a million entries would mean 1 MB -- really, does that worry you?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号