Mapping Files into Memory
2015-12-11 10:53
281 查看
本为转载自:
https://www.safaribooksonline.com/library/view/linux-system-programming/0596009585/ch04s03.html
As an alternative to standard file I/O, the kernel provides an interface that allows an application to map a file into memory, meaning that there is a one-to-one correspondence between a memory address and a word in the file. The programmer can then access
the file directly through memory, identically to any other chunk of memory-resident data—it is even possible to allow writes to the memory region to transparently map back to the file on disk.
POSIX.1 standardizes—and Linux implements—the
we will visit other applications of
A call to
by the file descriptor
into memory. If
and additional behavior can be given by
The
The call returns the actual address in memory where the mapping begins.
The
in which case the pages in this mapping may not be accessed (making little sense!), or a bitwise OR of one or more of the following flags:
The pages may be read.
The pages may be written.
The pages may be executed.
The desired memory protection must not conflict with the open mode of the file. For example, if the program opens the file read-only,
Protection Flags, Architectures, and Security
While POSIX defines four protection bits (read, write, execute, and stay the heck away), some architectures support only a subset of these. It is common, for example, for a processor to not differentiate between the actions of reading and executing. In that
case, the processor may have only a single "read" flag. On those systems,
Until recently, the x86 architecture was one such system.
Of course, relying on such behavior is not portable. Portable programs should always set
The reverse situation is one reason for the prevalence of buffer overflow attacks: even if a given mapping does not specify execution permission, the processor may allow execution anyway.
Recent x86 processors have introduced the NX (no-execute) bit, which allows for readable, but not executable, mappings. On these
newer systems,
The
Instructs
kernel is unable to place the mapping at the given address, the call fails. If the address and length parameters overlap an existing mapping, the overlapped pages are discarded and replaced by the new mapping. As this option requires intimate knowledge of
the process address space, it is nonportable, and its use is discouraged.
States that the mapping is not shared. The file is mapped copy-on-write, and any changes made in memory by this process are not reflected in the actual file, or in the mappings of other processes.
Shares the mapping with all other processes that map this same file. Writing into the mapping is equivalent to writing to the file. Reads from the mapping will reflect the writes of other processes.
Either
more advanced flags are discussed in Chapter 8.
When you map a file descriptor, the file's reference count is incremented. Therefore, you can close the file descriptor after mapping the file, and your process will still have access to it. The corresponding decrement of the file's reference count will occur
when you unmap the file, or when the process terminates.
As an example, the following snippet maps the file backed by
into a read-only mapping:
Figure 4-1 shows
the effects of paramaters supplied with
Figure 4-1. Mapping a file into a process' address space
The page is the smallest unit of memory that can have distinct permissions and behavior. Consequently, the page is the building
block of memory mappings, which in turn are the building blocks of the process address space.
The
must be aligned on a page-sized boundary. That is, they must be integer multiples of the page size.
Mappings are, therefore, integer multiples of pages. If the
file's size is not a multiple of the page size—the mapping is rounded up to the next full page. The bytes inside this added memory, between the last valid byte and the end of the mapping, are zero-filled. Any read from that region will return zeros. Any writes
to that memory will not affect the backing file, even if it is mapped as
are ever written back to the file.
The standard POSIX method of obtaining the page size is with
A call to
or
Because
no limit), it may be wise to clear
POSIX defines
size of a page, in bytes. Therefore, getting the page size is simple:
Linux also provides the
A call to
Not all Unix systems support this function; it's been dropped from the 1003.1-2001 revision of the POSIX standard. It is included here for completeness.
The page size is also stored statically in the macro
Thus, a third possible way to retrieve the page size is:
Unlike the first two options, however, this approach retrieves the system page size at compile-time, and not runtime. Some architectures support multiple machine types with different page sizes, and some machine types even support multiple page sizes themselves!
A single binary should be able to run on all machine types in a given architecture—that is, you should be able to build it once and run it everywhere. Hard-coding the page size would nullify that possibility. Consequently, you should determine the page size
at runtime. Because
this requirement is not overly difficult to meet.
Moreover, future kernel versions will likely not export this macro to user space. We cover it in this chapter due to its frequent presence in Unix code, but you should not use it in your own programs. The
On success, a call to
and sets
Possible
The given file descriptor is not a regular file, or the mode with which it was opened conflicts with
The file has been locked via a file lock.
The given file descriptor is not valid.
One or more of the parameters
invalid.
The system-wide limit on open files has been reached.
The filesystem on which the file to map resides does not support memory mapping.
The process does not have enough memory.
The result of
Two signals are associated with mapped regions:
This signal is generated when a process attempts to access a region of a mapping that is no longer valid—for example, because the file was truncated after it was mapped.
This signal is generated when a process attempts to write to a region that is mapped read-only.
Linux provides the
A call to
which must be page-aligned, and continuing for
access attempts result in a
Normally,
from a previous invocation of
On success,
and
which specifies that one or more parameters were invalid.
As an example, the following snippet unmaps any memory regions with pages contained in the interval
Let's consider a simple example program that uses
The only unfamiliar system call in this example should be
All you need to know at this point is that
files are mmap-able; other nonregular files are not mmap-able, and will set
The rest of the example should be straightforward. The program is passed a filename as an argument. It opens the file, ensures it is a regular file, maps it, closes it, prints the file byte-by-byte to standard out, and then unmaps the file from memory.
Manipulating files via
Reading from and writing to a memory-mapped file avoids the extraneous copy that occurs when using the
Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch overhead. It is as simple as accessing memory.
When multiple processes map the same object into memory, the data is shared among all the processes. Read-only and shared writable mappings are shared in their entirety; private writable mappings have their not-yet-COW (copy-on-write) pages shared.
Seeking around the mapping involves trivial pointer manipulations. There is no need for the
For these reasons,
There are a few points to keep in mind when using
Memory mappings are always an integer number of pages in size. Thus, the difference between the size of the backing file and an integer number of pages is "wasted" as slack space. For small files, a significant percentage of the mapping may be wasted. For example,
with 4 KB pages, a 7 byte mapping wastes 4,089 bytes.
The memory mappings must fit into the process' address space. With a 32-bit address space, a very large number of various-sized mappings can result in fragmentation of the address space, making it hard to find large free contiguous regions. This problem, of
course, is much less apparent with a 64-bit address space.
There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel. This overhead is generally obviated by the elimination of the double copy mentioned in the previous section, particularly for larger and frequently
accessed files.
For these reasons, the benefits of
or when the total size of the mapped file is evenly divisible by the page size (and thus there is no wasted space).
Linux provides the
A call to
the new size
value of
The opening
with (and includes) the low address, whereas the closing
as interval notation.
The
which specifies that the kernel is free to move the mapping, if required, in order to perform the requested resizing. A large resizing is more likely to succeed if the kernel can move the mapping.
On success,
and sets
The memory region is locked, and cannot be resized.
Some pages in the given range are not valid pages in the process' address space, or there was a problem remapping the given pages.
An argument was invalid.
The given range cannot be expanded without moving (and
Libraries such as glibc often use
This would only work if all
example assumes the programmer has written a
The GNU C library does use
POSIX defines the
A call to
where
as the
and
set to only
On some systems,
On success,
and sets
The memory cannot be given the permissions requested by
The parameter
Insufficient kernel memory is available to satisfy the request, or one or more pages in the given memory region are not a valid part of the process' address space.
POSIX provides a memory-mapped equivalent of the
A call to
continuing for
must be page-aligned; it is generally the return value from a previous
Without invocation of
of
modifies the file's pages in the kernel's page cache, without kernel involvement. The kernel may not synchronize the page cache and the disk anytime soon.
The
Specifies that synchronization should occur asynchronously. The update is scheduled, but the
place.
Specifies that all other cached copies of the mapping be invalidated. Any future access to any mappings of this file will reflect the newly synchronized on-disk contents.
Specifies that synchronization should occur synchronously. The
Either
Usage is simple:
This example asynchronously synchronizes (say that 10 times fast) to disk the file mapped in the region
On success,
and sets
The
a bit other than one of the three valid flags is set, or
The given memory region (or part of it) is not mapped. Note that Linux will return
unmapped, but it will still synchronize any valid mappings in the region.
Before version 2.4.19 of the Linux kernel,
place of
Linux provides a system call named
its behavior to take advantage of the mapping's intended use. While the Linux kernel dynamically tunes its behavior, and generally provides optimal performance without explicit advice, providing such advice can ensure the desired caching and readahead behavior
for some workloads.
A call to
and extending for
If
that starts at
can be one of:
The application has no specific advice to give on this range of memory. It should be treated as normal.
The application intends to access the pages in the specified range in a random (nonsequential) order.
The application intends to access the pages in the specified range sequentially, from lower to higher addresses.
The application intends to access the pages in the specified range in the near future.
The application does not intend to access the pages in the specified range in the near future.
The actual behavior modifications that the kernel takes in response to this advice are implementation-specific: POSIX dictates only the meaning of the advice, not any potential consequences. The current 2.6 kernel behaves as follows in response to the
The kernel behaves as usual, performing a moderate amount of readahead.
The kernel disables readahead, reading only the minimal amount of data on each physical read operation.
The kernel performs aggressive readahead.
The kernel initiates readahead, reading the given pages into memory.
The kernel frees any resources associated with the given pages, and discards any dirty and not-yet-synchronized pages. Subsequent accesses to the mapped data will cause the data to be paged in from the backing file.
Typical usage is:
This call instructs the kernel that the process intends to access the memory region
Readahead
When the Linux kernel reads files off the disk, it performs an optimization known asreadahead. That is, when a request is made
for a given chunk of a file, the kernel also reads the following chunk of the file. If a request is subsequently made for that chunk—as is the case when reading a file sequentially—the kernel can return the requested data immediately. Because disks have track
buffers (basically, hard disks perform their own readahead internally), and because files are generally laid out sequentially on disk, this optimization is low-cost.
Some readahead is usually advantageous, but optimal results depend on the question of how much readahead to perform. A sequentially accessed file may benefit from a larger readahead window, while a randomly accessed file may find readahead to be worthless overhead.
As discussed in "Kernel
Internals" in Chapter 2, the kernel dynamically tunes the size of the readahead window in response to the hit rate inside that window. More hits imply that a larger window would be advantageous; fewer hits suggest a smaller window. The
On success,
and
An internal kernel resource (probably memory) was unavailable. The process can try again.
The region exists, but does not map a file.
The parameter
is invalid, or the pages were locked or shared with
An internal I/O error occurred with
The given region is not a valid mapping in this process' address space, or
Get 10 Days Free
https://www.safaribooksonline.com/library/view/linux-system-programming/0596009585/ch04s03.html
Mapping Files into Memory
As an alternative to standard file I/O, the kernel provides an interface that allows an application to map a file into memory, meaning that there is a one-to-one correspondence between a memory address and a word in the file. The programmer can then accessthe file directly through memory, identically to any other chunk of memory-resident data—it is even possible to allow writes to the memory region to transparently map back to the file on disk.
POSIX.1 standardizes—and Linux implements—the
mmap( )system call for mapping objects into memory. This section will discuss
mmap( )as it pertains to mapping files into memory to perform I/O; in Chapter 8,
we will visit other applications of
mmap( ).
mmap( )
A call to mmap( )asks the kernel to map
lenbytes of the object represented
by the file descriptor
fd, starting at
offsetbytes into the file,
into memory. If
addris included, it indicates a preference to use that starting address in memory. The access permissions are dictated by
prot,
and additional behavior can be given by
flags:
#include <sys/mman.h> void * mmap (void *addr, size_t len, int prot, int flags, int fd, off_t offset);
The
addrparameter offers a suggestion to the kernel of where best to map the file. It is only a hint; most users pass
0.
The call returns the actual address in memory where the mapping begins.
The
protparameter describes the desired memory protection of the mapping. It may be either
PROT_NONE,
in which case the pages in this mapping may not be accessed (making little sense!), or a bitwise OR of one or more of the following flags:
PROT_READ
The pages may be read.
PROT_WRITE
The pages may be written.
PROT_EXEC
The pages may be executed.
The desired memory protection must not conflict with the open mode of the file. For example, if the program opens the file read-only,
protmust not specify
PROT_WRITE.
Protection Flags, Architectures, and Security
While POSIX defines four protection bits (read, write, execute, and stay the heck away), some architectures support only a subset of these. It is common, for example, for a processor to not differentiate between the actions of reading and executing. In that
case, the processor may have only a single "read" flag. On those systems,
PROT_READimplies
PROT_EXEC.
Until recently, the x86 architecture was one such system.
Of course, relying on such behavior is not portable. Portable programs should always set
PROT_EXECif they intend to execute code in the mapping.
The reverse situation is one reason for the prevalence of buffer overflow attacks: even if a given mapping does not specify execution permission, the processor may allow execution anyway.
Recent x86 processors have introduced the NX (no-execute) bit, which allows for readable, but not executable, mappings. On these
newer systems,
PROT_READno longer implies
PROT_EXEC.
The
flagsargument describes the type of mapping, and some elements of its behavior. It is a bitwise OR of the following values:
MAP_FIXED
Instructs
mmap( )to treat
addras a requirement, not a hint. If the
kernel is unable to place the mapping at the given address, the call fails. If the address and length parameters overlap an existing mapping, the overlapped pages are discarded and replaced by the new mapping. As this option requires intimate knowledge of
the process address space, it is nonportable, and its use is discouraged.
MAP_PRIVATE
States that the mapping is not shared. The file is mapped copy-on-write, and any changes made in memory by this process are not reflected in the actual file, or in the mappings of other processes.
MAP_SHARED
Shares the mapping with all other processes that map this same file. Writing into the mapping is equivalent to writing to the file. Reads from the mapping will reflect the writes of other processes.
Either
MAP_SHAREDor
MAP_PRIVATEmust be specified, but not both. Other,
more advanced flags are discussed in Chapter 8.
When you map a file descriptor, the file's reference count is incremented. Therefore, you can close the file descriptor after mapping the file, and your process will still have access to it. The corresponding decrement of the file's reference count will occur
when you unmap the file, or when the process terminates.
As an example, the following snippet maps the file backed by
fd, beginning with its first byte, and extending for
lenbytes,
into a read-only mapping:
void *p; p = mmap (0, len, PROT_READ, MAP_SHARED, fd, 0); if (p == MAP_FAILED) perror ("mmap");
Figure 4-1 shows
the effects of paramaters supplied with
mmap( )on the mapping between a file and a process' address space.
Figure 4-1. Mapping a file into a process' address space
The page size
The page is the smallest unit of memory that can have distinct permissions and behavior. Consequently, the page is the buildingblock of memory mappings, which in turn are the building blocks of the process address space.
The
mmap( )system call operates on pages. Both the
addrand
offsetparameters
must be aligned on a page-sized boundary. That is, they must be integer multiples of the page size.
Mappings are, therefore, integer multiples of pages. If the
lenparameter provided by the caller is not aligned on a page boundary—perhaps because the underlying
file's size is not a multiple of the page size—the mapping is rounded up to the next full page. The bytes inside this added memory, between the last valid byte and the end of the mapping, are zero-filled. Any read from that region will return zeros. Any writes
to that memory will not affect the backing file, even if it is mapped as
MAP_SHARED. Only the original
lenbytes
are ever written back to the file.
sysconf( )
The standard POSIX method of obtaining the page size is with sysconf( ), which can retrieve a variety of system-specific information:
#include <unistd.h> long sysconf (int name);
A call to
sysconf( )returns the value of the configuration item
name,
or
−1if
nameis invalid. On error, the call sets
errnoto
EINVAL.
Because
−1may be a valid value for some items (e.g., limits, where
−1means
no limit), it may be wise to clear
errnobefore invocation, and check its value after.
POSIX defines
_SC_PAGESIZE(and a synonym,
_SC_PAGE_SIZE) to be the
size of a page, in bytes. Therefore, getting the page size is simple:
long page_size = sysconf (_SC_PAGESIZE);
getpagesize( )
Linux also provides the getpagesize( )function:
#include <unistd.h> int getpagesize (void);
A call to
getpagesize( )will likewise return the size of a page, in bytes. Usage is even simpler than
sysconf( ):
int page_size = getpagesize ( );
Not all Unix systems support this function; it's been dropped from the 1003.1-2001 revision of the POSIX standard. It is included here for completeness.
PAGE_SIZE
The page size is also stored statically in the macro PAGE_SIZE, which is defined in
<asm/page.h>.
Thus, a third possible way to retrieve the page size is:
int page_size = PAGE_SIZE;
Unlike the first two options, however, this approach retrieves the system page size at compile-time, and not runtime. Some architectures support multiple machine types with different page sizes, and some machine types even support multiple page sizes themselves!
A single binary should be able to run on all machine types in a given architecture—that is, you should be able to build it once and run it everywhere. Hard-coding the page size would nullify that possibility. Consequently, you should determine the page size
at runtime. Because
addrand
offsetare usually
0,
this requirement is not overly difficult to meet.
Moreover, future kernel versions will likely not export this macro to user space. We cover it in this chapter due to its frequent presence in Unix code, but you should not use it in your own programs. The
sysconf( )approach is your best bet.
Return values and error codes
On success, a call to mmap( )returns the location of the mapping. On failure, the call returns
MAP_FAILED,
and sets
errnoappropriately. A call to
mmap( )never returns
0.
Possible
errnovalues include:
EACCESS
The given file descriptor is not a regular file, or the mode with which it was opened conflicts with
protor
flags.
EAGAIN
The file has been locked via a file lock.
EBADF
The given file descriptor is not valid.
EINVAL
One or more of the parameters
addr,
len, or
offare
invalid.
ENFILE
The system-wide limit on open files has been reached.
ENODEV
The filesystem on which the file to map resides does not support memory mapping.
ENOMEM
The process does not have enough memory.
EOVERFLOW
The result of
addr+lenexceeds the size of the address space.
EPERM
PROT_EXECwas given, but the filesystem is mounted
noexec.
Associated signals
Two signals are associated with mapped regions:SIGBUS
This signal is generated when a process attempts to access a region of a mapping that is no longer valid—for example, because the file was truncated after it was mapped.
SIGSEGV
This signal is generated when a process attempts to write to a region that is mapped read-only.
munmap( )
Linux provides the munmap( )system call for removing a mapping created with
mmap( ):
#include <sys/mman.h> int munmap (void *addr, size_t len);
A call to
munmap( )removes any mappings that contain pages located anywhere in the process address space starting at
addr,
which must be page-aligned, and continuing for
lenbytes. Once the mapping has been removed, the previously associated memory region is no longer valid, and further
access attempts result in a
SIGSEGVsignal.
Normally,
munmap( )is passed the return value and the
lenparameter
from a previous invocation of
mmap( ).
On success,
munmap( )returns
0; on failure, it returns
−1,
and
errnois set appropriately. The only standard
errnovalue is
EINVAL,
which specifies that one or more parameters were invalid.
As an example, the following snippet unmaps any memory regions with pages contained in the interval
[addr,addr+len]:
if (munmap (addr, len) == −1) perror ("munmap");
Mapping Example
Let's consider a simple example program that uses mmap( )to print a file chosen by the user to standard out:
#include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> int main (int argc, char *argv[]) { struct stat sb; off_t len; char *p; int fd; if (argc < 2) { fprintf (stderr, "usage: %s <file>\n", argv[0]); return 1; } fd = open (argv[1], O_RDONLY); if (fd == −1) { perror ("open"); return 1; } if (fstat (fd, &sb) == −1) { perror ("fstat"); return 1; } if (!S_ISREG (sb.st_mode)) { fprintf (stderr, "%s is not a file\n", argv[1]); return 1; } p = mmap (0, sb.st_size, PROT_READ, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { perror ("mmap"); return 1; } if (close (fd) == −1) { perror ("close"); return 1; } for (len = 0; len < sb.st_size; len++) putchar (p[len]); if (munmap (p, sb.st_size) == −1) { perror ("munmap"); return 1; } return 0; }
The only unfamiliar system call in this example should be
fstat( ), which we will cover inChapter 7.
All you need to know at this point is that
fstat( )returns information about a given file. The
S_ISREG( )macro can check some of this information, so that we can ensure that the given file is a regular file (as opposed to a device file or a directory) before we map it. The behavior of nonregular files when mapped depends on the backing device. Some device
files are mmap-able; other nonregular files are not mmap-able, and will set
errnoto
EACCESS.
The rest of the example should be straightforward. The program is passed a filename as an argument. It opens the file, ensures it is a regular file, maps it, closes it, prints the file byte-by-byte to standard out, and then unmaps the file from memory.
Advantages of mmap( )
Manipulating files via mmap( )has a handful of advantages over the standard
read( )and
write( )system calls. Among them are:
Reading from and writing to a memory-mapped file avoids the extraneous copy that occurs when using the
read( )or
write( )system calls, where the data must be copied to and from a user-space buffer.
Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch overhead. It is as simple as accessing memory.
When multiple processes map the same object into memory, the data is shared among all the processes. Read-only and shared writable mappings are shared in their entirety; private writable mappings have their not-yet-COW (copy-on-write) pages shared.
Seeking around the mapping involves trivial pointer manipulations. There is no need for the
lseek( )system call.
For these reasons,
mmap( )is a smart choice for many applications.
Disadvantages of mmap( )
There are a few points to keep in mind when using mmap( ):
Memory mappings are always an integer number of pages in size. Thus, the difference between the size of the backing file and an integer number of pages is "wasted" as slack space. For small files, a significant percentage of the mapping may be wasted. For example,
with 4 KB pages, a 7 byte mapping wastes 4,089 bytes.
The memory mappings must fit into the process' address space. With a 32-bit address space, a very large number of various-sized mappings can result in fragmentation of the address space, making it hard to find large free contiguous regions. This problem, of
course, is much less apparent with a 64-bit address space.
There is overhead in creating and maintaining the memory mappings and associated data structures inside the kernel. This overhead is generally obviated by the elimination of the double copy mentioned in the previous section, particularly for larger and frequently
accessed files.
For these reasons, the benefits of
mmap( )are most greatly realized when the mapped file is large (and thus any wasted space is a small percentage of the total mapping),
or when the total size of the mapped file is evenly divisible by the page size (and thus there is no wasted space).
Resizing a Mapping
Linux provides the mremap( )system call for expanding or shrinking the size of a given mapping. This function is Linux-specific:
#define _GNU_SOURCE #include <unistd.h> #include <sys/mman.h> void * mremap (void *addr, size_t old_size, size_t new_size, unsigned long flags);
A call to
mremap( )expands or shrinks mapping in the region
[addr,addr+old_size)to
the new size
new_size. The kernel can potentially move the mapping at the same time, depending on the availability of space in the process' address space and the
value of
flags.
Tip
The opening [in
[addr,addr+old_size)indicates that the region starts
with (and includes) the low address, whereas the closing
)indicates that the region stops just before (does not include) the high address. This convention is known
as interval notation.
The
flagsparameter can be either
0or
MREMAP_MAYMOVE,
which specifies that the kernel is free to move the mapping, if required, in order to perform the requested resizing. A large resizing is more likely to succeed if the kernel can move the mapping.
Return values and error codes
On success, mremap( )returns a pointer to the newly resized memory mapping. On failure, it returns
MAP_FAILED,
and sets
errnoto one of the following:
EAGAIN
The memory region is locked, and cannot be resized.
EFAULT
Some pages in the given range are not valid pages in the process' address space, or there was a problem remapping the given pages.
EINVAL
An argument was invalid.
ENOMEM
The given range cannot be expanded without moving (and
MREMAP_MAYMOVEwas not given), or there is not enough free space in the process' address space.
Libraries such as glibc often use
mremap( )to implement an efficient
realloc( ), which is an interface for resizing a block of memory originally obtained via
malloc( ). For example:
void * realloc (void *addr, size_t len) { size_t old_size = look_up_mapping_size (addr); void *p; p = mremap (addr, old_size, len, MREMAP_MAYMOVE); if (p == MAP_FAILED) return NULL; return p; }
This would only work if all
malloc( )allocations were unique anonymous mappings; nonetheless, it stands as a useful example of the performance gains to be had. The
example assumes the programmer has written a
look_up_mapping_size( )function.
The GNU C library does use
mmap( )and family for performing some memory allocations. We will look that topic in depth in Chapter 8.
Changing the Protection of a Mapping
POSIX defines the mprotect( )interface to allow programs to change the permissions of existing regions of memory:
#include <sys/mman.h> int mprotect (const void *addr, size_t len, int prot);
A call to
mprotect( )will change the protection mode for the memory pages contained in
[addr,addr+len),
where
addris page-aligned. The
protparameter accepts the same values
as the
protgiven to
mmap( ):
PROT_NONE,
PROT_READ,
PROT_WRITE,
and
PROT_EXEC. These values are not additive; if a region of memory is readable, and
protis
set to only
PROT_WRITE, the call will make the region only writable.
On some systems,
mprotect( )may operate only on memory mappings previously created via
mmap( ). On Linux,
mprotect( )can operate on any region of memory.
Return values and error codes
On success, mprotect( )returns
0. On failure, it returns
−1,
and sets
errnoto one of the following:
EACCESS
The memory cannot be given the permissions requested by
prot. This can happen, for example, if you attempt to set the mapping of a file opened read-only to writable.
EINVAL
The parameter
addris invalid or not page-aligned.
ENOMEM
Insufficient kernel memory is available to satisfy the request, or one or more pages in the given memory region are not a valid part of the process' address space.
Synchronizing a File with a Mapping
POSIX provides a memory-mapped equivalent of the fsync( )system call that we discussed in Chapter 2:
#include <sys/mman.h> int msync (void *addr, size_t len, int flags);
A call to
msync( )flushes back to disk any changes made to a file mapped via
mmap( ), synchronizing the mapped file with the mapping. Specifically, the file or subset of a file associated with the mapping starting at memory address
addrand
continuing for
lenbytes is synchronized to disk. The
addrargument
must be page-aligned; it is generally the return value from a previous
mmap( )invocation.
Without invocation of
msync( ), there is no guarantee that a dirty mapping will be written back to disk until the file is unmapped. This is different from the behavior
of
write( ), where a buffer is dirtied as part of the writing process, and queued for writeback to disk. When writing into a memory mapping, the process directly
modifies the file's pages in the kernel's page cache, without kernel involvement. The kernel may not synchronize the page cache and the disk anytime soon.
The
flagsparameter controls the behavior of the synchronizing operation. It is a bitwise OR of the following values:
MS_ASYNC
Specifies that synchronization should occur asynchronously. The update is scheduled, but the
msync( )call returns immediately without waiting for the writes to take
place.
MS_INVALIDATE
Specifies that all other cached copies of the mapping be invalidated. Any future access to any mappings of this file will reflect the newly synchronized on-disk contents.
MS_SYNC
Specifies that synchronization should occur synchronously. The
msync( )call will not return until all pages are written back to disk.
Either
MS_ASYNCor
MS_SYNCmust be specified, but not both.
Usage is simple:
if (msync (addr, len, MS_ASYNC) == −1) perror ("msync");
This example asynchronously synchronizes (say that 10 times fast) to disk the file mapped in the region
[addr,addr+len).
Return values and error codes
On success, msync( )returns
0. On failure, the call returns
−1,
and sets
errnoappropriately. The following are valid
errnovalues:
EINVAL
The
flagsparameter has both
MS_SYNCand
MS_ASYNCset,
a bit other than one of the three valid flags is set, or
addris not page-aligned.
ENOMEM
The given memory region (or part of it) is not mapped. Note that Linux will return
ENOMEM, as POSIX dictates, when asked to synchronize a region that is only partly
unmapped, but it will still synchronize any valid mappings in the region.
Before version 2.4.19 of the Linux kernel,
msync( )returned
EFAULTin
place of
ENOMEM.
Giving Advice on a Mapping
Linux provides a system call named madvise( )to let processes give the kernel advice and hints on how they intend to use a mapping. The kernel can then optimize
its behavior to take advantage of the mapping's intended use. While the Linux kernel dynamically tunes its behavior, and generally provides optimal performance without explicit advice, providing such advice can ensure the desired caching and readahead behavior
for some workloads.
A call to
madvise( )advises the kernel on how to behave with respect to the pages in the memory map starting at
addr,
and extending for
lenbytes:
#include <sys/mman.h> int madvise (void *addr, size_t len, int advice);
If
lenis
0, the kernel will apply the advice to the entire mapping
that starts at
addr. The parameter
advicedelineates the advice, which
can be one of:
MADV_NORMAL
The application has no specific advice to give on this range of memory. It should be treated as normal.
MADV_RANDOM
The application intends to access the pages in the specified range in a random (nonsequential) order.
MADV_SEQUENTIAL
The application intends to access the pages in the specified range sequentially, from lower to higher addresses.
MADV_WILLNEED
The application intends to access the pages in the specified range in the near future.
MADV_DONTNEED
The application does not intend to access the pages in the specified range in the near future.
The actual behavior modifications that the kernel takes in response to this advice are implementation-specific: POSIX dictates only the meaning of the advice, not any potential consequences. The current 2.6 kernel behaves as follows in response to the
advicevalues:
MADV_NORMAL
The kernel behaves as usual, performing a moderate amount of readahead.
MADV_RANDOM
The kernel disables readahead, reading only the minimal amount of data on each physical read operation.
MADV_SEQUENTIAL
The kernel performs aggressive readahead.
MADV_WILLNEED
The kernel initiates readahead, reading the given pages into memory.
MADV_DONTNEED
The kernel frees any resources associated with the given pages, and discards any dirty and not-yet-synchronized pages. Subsequent accesses to the mapped data will cause the data to be paged in from the backing file.
Typical usage is:
int ret; ret = madvise (addr, len, MADV_SEQUENTIAL); if (ret < 0) perror ("madvise");
This call instructs the kernel that the process intends to access the memory region
[addr,addr+len)sequentially.
Readahead
When the Linux kernel reads files off the disk, it performs an optimization known asreadahead. That is, when a request is made
for a given chunk of a file, the kernel also reads the following chunk of the file. If a request is subsequently made for that chunk—as is the case when reading a file sequentially—the kernel can return the requested data immediately. Because disks have track
buffers (basically, hard disks perform their own readahead internally), and because files are generally laid out sequentially on disk, this optimization is low-cost.
Some readahead is usually advantageous, but optimal results depend on the question of how much readahead to perform. A sequentially accessed file may benefit from a larger readahead window, while a randomly accessed file may find readahead to be worthless overhead.
As discussed in "Kernel
Internals" in Chapter 2, the kernel dynamically tunes the size of the readahead window in response to the hit rate inside that window. More hits imply that a larger window would be advantageous; fewer hits suggest a smaller window. The
madvise( )system call allows applications to influence the window size right off the bat.
Return values and error codes
On success, madvise( )returns
0. On failure, it returns
−1,
and
errnois set appropriately. The following are valid errors:
EAGAIN
An internal kernel resource (probably memory) was unavailable. The process can try again.
EBADF
The region exists, but does not map a file.
EINVAL
The parameter
lenis negative,
addris not page-aligned, the
adviceparameter
is invalid, or the pages were locked or shared with
MADV_DONTNEED.
EIO
An internal I/O error occurred with
MADV_WILLNEED.
ENOMEM
The given region is not a valid mapping in this process' address space, or
MADV_WILLNEEDwas given, but there is insufficient memory to page in the given regions.
The best content for your career. Discover unlimited learning on demand for around $1/day.
Get 10 Days Free
相关文章推荐
- 使用unity UGUI 利用Socket 实现 多客户端通讯
- Android插件化开发
- Unity5的AssetBundle的一点使用心得
- iOS9 ReplayKit录制视频
- iOS9 ReplayKit录制视频
- iOS开发-给图片添加水印
- Android创建和使用数据库
- Android中PopupWindow点击窗口之外和返回键消失,界面锁定的实现。
- 分分钟 掌握 Android Activity生命周期(图文+源码) AND 保存Activity状态
- 关于Android Studio提交代码到Git上的操作(Mac)
- Android动态加载相关文章记录
- 浅谈 iOS 之 Crash log 符号化
- java微信公众号开发总结(2)——文本消息处理
- 探讨Swift数组和字典
- 浅谈 iOS 之 Crash log 符号化
- iOS蓝牙开发流程
- 微信第三方登录接口
- Android Edittext 显示光标 获取焦点 监听焦点
- 魅族魅蓝真机调试不能识别设备
- android 环境搭建