开发者

How to open and read 1000s of files very quickly

开发者 https://www.devze.com 2023-04-12 22:10 出处:网络
My problem is that application takes too long to load thousands of files.Yes, I know it\'s going to take a long time, but I would like to make it faster by any amount of time.What I mean by \"load\" i

My problem is that application takes too long to load thousands of files. Yes, I know it's going to take a long time, but I would like to make it faster by any amount of time. What I mean by "load" is open the file to get its descriptor and then read the first 100 bytes or so of it.

So, my main strategy has been to create a second thread that will open and close (without reading any contents) all the files. This seems to help because the thread runs ahead of the main thread and I'm guessing the OS is caching these file descriptors ahead of time so that when my main threa开发者_开发百科d opens them it's a quick open. This has actually helped because the thread can start caching these file descriptors while my main thread is parsing the data read in from these files.

So my real question is...what else can I do to make this faster? What approaches are there? Has anyone had success doing this?

I've heard of OS prefetching calls but it was for virtual memory pages. Is there a way to tell the OS, hey I'm going to be needed all these files pretty soon - I suggest that you start gathering them for me ahead of time. My lookahead thread is pretty crude.

Are there low level disk techniques I could use? Is there possibly a pattern of file access that would help? Right now, the files that are loaded all come from the same folder. I suppose there is no way to determine where exactly on disk they lie and which ordering of file opens would be fastest for the disk. I'm also guessing that the disk has some hard ware to make this as efficient as possible too.

My application is mainly for windows, but unix suggestions would help as well.

I am programming in C++ if that makes a difference.

Thanks, -julian


My first thought is that this is going to be hard to work around from a programmatic level.

You'll find Linux and OSX can access thousands of files like this in a fraction of the time it takes Windows. I don't know how much control you have over the machine. If you can keep the thousands of files on a FAT partition, you should see better results than with NTFS.

How often are you scanning these files and how often are they changing. If the ratio is heavily on the reading side, it would make sense to copy the start of each file into a cache. The cache could store the filename, modification time, and 100 bytes of each of the thousand files.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号