开发者

Pickup directories: How not to pickup files that are still being written?

开发者 https://www.devze.com 2023-04-08 07:34 出处:网络
I have a Python script that checks on a pickup directory and processes any files that it finds, and then deletes them.

I have a Python script that checks on a pickup directory and processes any files that it finds, and then deletes them.

How can I make sure not to pickup a file that is still being written by the process that drops files in that directory?

My test case is pretty simple. I cop开发者_开发问答y-paste 300MB of files into the pickup directory, and frequently the script will grab a file that's still being written. It operates on only the partial file, then delete it. This fires off a file operation error in the OS as the file it was writing to disappeared.

  • I've tried acquiring a lock on the file (using the FileLock module) before I open/process/delete it. But that hasn't helped.

  • I've considered checking the modification time on the file to avoid anything within X seconds of now. But that seems clunky.

My test is on OSX, but I'm trying to find a solution that will work across the major platforms.

I see a similar question here (How to check if a file is still being written?), but there was no clear solution.

Thank you


As a workaround, you could listen to file modified events (watchdog is cross-platform). The modified event (on OS X at least) isn't fired for each write, it's only fired on close. So when you detect a modified event you can assume all writes are complete.

Of course, if the file is being written in chunks, and being saved after each chunk this won't work.


One solution to this problem would be to change the program writing the files to write the files to a temporary file first, and then move that temporary file to the destination when it is done. On most operating systems, when the source and destination are on the same file system, move is atomic.


If you have no control over the writing portion, about all you can do is watch the file yourself, and when it stops growing for a certain amount of time, call it good. I have to use that method myself, and found 40 seconds is safe for my conditions.


Each OS will have a different solution, because file locking mechanisms are not portable.

  • On Windows, you can use OS locking.
  • On Linux you can have a peek at open files (similarily how lsof does) and if file is open, leave it.


Have you tried opening the file before coping it? If the file is still in use, then open() should throw exception.

try:
  with open(filename, "rb") as fp:
    pass
  # Copy the file
except IOError:
  # Dont copy
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号