开发者

Loading facebook's big text file to memory (39MB) for autocompletion

开发者 https://www.devze.com 2023-04-02 20:44 出处:网络
I\'m trying to implement part of the facebook ads api, the auto complete function ads.getAutoCompleteData

I'm trying to implement part of the facebook ads api, the auto complete function ads.getAutoCompleteData

Basically, Facebook supplies this 39MB file which updated weekly, and which contains targeting ads data includi开发者_如何学Pythonng colleges, college majors, workplaces, locales, countries, regions and cities.

Our application needs to access all of those objects and supply auto completion using this file's data.

I'm thinking of preferred ways to solved this. I was thinking about one of the following options:

  1. Loading it to memory using Trie (Patricia-trie), the disadvantage of course that it will take too much memory on the server.
  2. Using a dedicated search platform such as Solr on a different machine, the disadvantage is perhaps over-engineering (Though the file size will probably increase largely in the future).
  3. (Fill here cool, easy and speed of light option) ?

Well, what do you think?


I would stick with a service oriented architecture (especially if the product is supposed to handle high volumes) and go with Solr. That being said, 39 MB is not a lot of hold in memory if it's going to be a singleton. With indexes and all this will get up to what? 400MB? This of course depends on what your product does and what kind of hardware you wish to run it on.

I would go with Solr or write your own service that reads the file into a fast DB like MySQL's MyISAM table (or even in-memory table) and use mysql's text search feature to serve up results. Barring that I would try to use Solr as a service.

The benefit of writing my own service is that I know what is going on, the down side is that it'll be no where as powerful as Solr. However I suspect writing my own service will take less time to implement.

Consider writing your own service that serves up request in a async manner (if your product is a website then using ajax). The trouble with Solr or Lucene is that if you get stuck, there is not a lot of help out there.

Just my 2 cents.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号