开发者

How to check image integrity?

开发者 https://www.devze.com 2023-02-19 17:54 出处:网络
I am building a web crawler, and one of its functions is to download images. The problem is that sometimes, for some reason, there are images that are downloaded with errors in them, eg: Half of the

I am building a web crawler, and one of its functions is to download images.

The problem is that sometimes, for some reason, there are images that are downloaded with errors in them, eg: Half of the image is plain gray or white, like it stopped downloading at some point, and then filled the void with gray. The image types are still considered valid, because I can get them with getimagesize, and also open and view them. But they are not like th开发者_运维百科e originals.

Any ideas?


Compare response header Content-Length with actual number of bytes you received. There could be other reasons but I can't tell anything without seeing your code where you download that image.


I think this is a transmission interruption.

I see many cases: either your connection has been reset, in this case testing the socket signal should enable you to diagnose the problem and re initiate the download.

Or there is an undetected error during the transmission (but normally TCP/IP should deal with this) and/or you don't write all the downloaded correctly (you think you read all the data on socket, but read provides a smallest value and you don't check the returned value to check it's the intended size) and then your image is not complete.

Usually half grey images (especially JPEG) are sign of a file that is not complete (headers are ok, so you don't have problem with you getimagesize) but the JPEG does not end with a 0xFF 0xD9. So check you read all the data by comparing with the size you have to read. Eventually you can write image format dependent function to check integrity of file for example by checking the flags within the JPEG. But it could be resource consuming.


Just do an imagecreatefromstring() and checks if returns not a resource

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号