开发者

How to provide extend-on-write functionality for memory mapped files in Linux?

开发者 https://www.devze.com 2023-03-24 22:34 出处:网络
I\'m working on porting some code from AIX to Linux. Parts of the code use the shmat() system call to create new files. When used with SHM_MAP in a writable mode, one can extend the file beyond its or

I'm working on porting some code from AIX to Linux. Parts of the code use the shmat() system call to create new files. When used with SHM_MAP in a writable mode, one can extend the file beyond its original length (of zero, in my case):

When a file is mapped onto a segment, the file is referenced by accessing the segment. The memory paging system automatically takes care of the physical I/O. References beyond the end of the file cause the file to be extended in page-sized increments. The file cannot be extended beyond the next segment boundary.

(A "segment" in AIX is a 256 MB chunk of address space, and a "page" is usually 4 KB.)

What I would like to do on Linux is the following:

  • Reserve a large-ish chunk of address space (it doesn't have to be as big as 256 MB, these aren't such large files)
  • Set up the page protection bits so that a segfault is generated on the first access to a page that hasn't been touched before
  • On a page fault, clear the 开发者_开发百科"cause a page fault" bit and allocate committed memory for the page, allowing the write (or read) that caused the page fault to proceed
  • Upon closing the shared memory area, write the modified pages to a file

I know I can do this on Windows with the VirtualProtect function, the PAGE_GUARD memory protection bit, and a structured exception handler. What is the corresponding method on Linux to do the same? Is there perhaps a better way to implement this extend-on-write functionality on Linux?

I've already considered:

  • using mmap() with some fixed large-ish size, but I can't tell how much of the file was written to by the application code
  • allocating an anonymous shared memory area of large-ish size, but again I can't tell how much of the area has been written
  • mmap() by itself does not seem to provide any facility to extend the length of the backing file

Naturally I would like to do this with only minimal changes to the application code.


This is very similar to a homework I once did. Basically I had a list of "pages" and a list of "frames", with associated information. Using SIGSEGV I would catch faults and alter the memory protection bits as necessary. I'll include parts that you may find useful.

Create mapping. Initially it has no permissions.

int w_create_mapping(size_t size, void **addr)
{

    *addr = mmap(NULL,
            size * w_get_page_size(),
            PROT_NONE,
            MAP_ANONYMOUS | MAP_PRIVATE,
            -1,
            0
    );

    if (*addr == MAP_FAILED) {
        perror("mmap");
        return FALSE;
    }

    return TRUE;
}

Install signal handler

int w_set_exception_handler(w_exception_handler_t handler)
{
    static struct sigaction sa;
    sa.sa_sigaction = handler;
    sigemptyset(&sa.sa_mask);
    sigaddset(&sa.sa_mask, SIGSEGV);
    sa.sa_flags = SA_SIGINFO;

    if (sigaction(SIGSEGV, &sa, &previous_action) < 0)
        return FALSE;

    return TRUE;
}

Exception handler

static void fault_handler(int signum, siginfo_t *info, void *context)
{
    void *address;      /* the address that faulted */

    /* Memory location which caused fault */
    address = info->si_addr;

    if (FALSE == page_fault(address)) {
        _exit(1);
    }
}

Increasing protection

int w_protect_mapping(void *addr, size_t num_pages, w_prot_t protection)
{
    int prot;

    switch (protection) {
    case PROTECTION_NONE:
        prot = PROT_NONE;
        break;
    case PROTECTION_READ:
        prot = PROT_READ;
        break;
    case PROTECTION_WRITE:
        prot = PROT_READ | PROT_WRITE;
        break;
    }

    if (mprotect(addr, num_pages * w_get_page_size(), prot) < 0)
        return FALSE;

    return TRUE;
}

I can't publicly make it all available since the team is likely to use that same homework again.


Allocate a big buffer however you like and then use mprotect()* system call to make the tail of the buffer read only and register a signal handler for SIGSEGV to note where in the before writes have been made and use mprotect() yet again to enable writes.

  • http://linux.die.net/man/2/mprotect


I've contemplated similar things myself, and haven't found any way for mmap() to extend the backing file either.

Currently, I plan on trying two alternatives:

  • manually manage filesize, extending it myself and mremap()'ing afterwards
  • create a sparse file and hope that the VM would allocate needed sectors when flushing dirty pages.

honestly, I don't think sparse files would work, but it's worth a try.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号