开发者

PHP - *fast* serialize/unserialize?

开发者 https://www.devze.com 2022-12-25 06:26 出处:网络
I have a PHP script that builds a binary search tree over a rather large CSV file (5MB+). This is nice and all, but it takes about 3 seconds to read/parse/index the file.

I have a PHP script that builds a binary search tree over a rather large CSV file (5MB+). This is nice and all, but it takes about 3 seconds to read/parse/index the file.

Now I thought I could use serialize() and unserialize() to quicken the process. When the CSV file has not changed in the meantime, there is no point in parsing it again.

To my horror I find that calling serialize() on my index object takes 5开发者_如何学JAVA seconds and produces a huge (19MB) text file, whereas unserialize() takes unbearable 27 seconds to read it back. Improvements look a bit different. ;-)

So - is there a faster mechanism to store/restore large object graphs to/from disk in PHP?

(To clarify: I'm looking for something that takes significantly less than the aforementioned 3 seconds to do the de-serialization job.)


var_export should be lots faster as PHP won't have to process the string at all:

// export the process CSV to export.php
$php_array = read_parse_and_index_csv($csv); // takes 3 seconds
$export = var_export($php_array, true);
file_put_contents('export.php', '<?php $php_array = ' . $export . '; ?>');

Then include export.php when you need it:

include 'export.php';

Depending on your web server set up, you may have to chmod export.php to make it executable first.


Try igbinary...did wonders for me:

http://pecl.php.net/package/igbinary


First you have to change the way your program works. divide CSV file to smaller chunks. This is an IP datastore i assume. .

Convert all IP addresses to integer or long.

So if a query comes you can know which part to look. There are <?php ip2long() /* and */ long2ip(); functions to do this. So 0 to 2^32 convert all IP addresses into 5000K/50K total 100 smaller files. This approach brings you quicker serialization.

Think smart, code tidy ;)


It seems that the answer to your question is no.

Even if you discover a "binary serialization format" option most likely even that would be to slow for what you envisage.

So, what you may have to look into using (as others have mentioned) is a database, memcached, or on online web service.

I'd like to add the following ideas as well:

  • caching of requests/responses
  • your PHP script does not shutdown but becomes a network server to answer queries
  • or, dare I say it, change the data structure and method of query you are currently using


i see two options here

string serialization, in the simplest form something like

  write => implode("\x01", (array) $node);
  read  => explode() + $node->payload = $a[0]; $node->value = $a[1] etc

binary serialization with pack()

  write => pack("fnna*", $node->value, $node->le, $node->ri, $node->payload);
  read  => $node = (object) unpack("fvalue/nre/nli/a*payload", $data);

It would be interesting to benchmark both options and compare the results.


If you want speed, writing to or reading from the file system in less than optimal.

In most cases, a database server will be able to store and retrieve data much more efficiently than a PHP script that is reading/writing files.

Another possibility would be something like Memcached.

Object serialization is not known for its performance but for its ease of use and it's definitely not suited to handle large amounts of data.


SQLite comes with PHP, you could use that as your database. Otherwise you could try using sessions, then you don't have to serialize anything, you just saving the raw PHP object.


What about using something like JSON for a format for storing/loading the data? I have no idea how fast the JSON parser is in PHP, but it's usually a fast operation in most languages and it's a lightweight format.

http://php.net/manual/en/book.json.php

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号