开发者

How to fetch the continuous list with PostgreSQL in web

开发者 https://www.devze.com 2023-04-12 10:27 出处:网络
I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary c开发者_C百科ases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there

I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary c开发者_C百科ases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there are some special requirements in this case:

  • A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).
  • Pages does not have to be randomly accessible but only sequentially.
  • API would return a URL which contains a cursor token that directs to the page of continuous chunks.
  • Cursor tokens have not to exist permanently but for some time.
  • Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.

How can I achieve the mission? I am ready to change my whole database schema for it!


Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:

  • store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:

    create table foo_paging_cursor (
      cursor_token ..., -- probably a uuid is best or timestamp (see below)
      result_ids integer[], -- or text[] if you have non-integer ids
      expiry_time TIMESTAMP
    );
    
  • You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.

  • Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.

Don't even consider using real db cursors for this.

Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.


I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :)

How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got).

This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So:

  • When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.
  • When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1.

You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response.

I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s).

Anyway, hope this helps!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号