您的位置:首页 > 其它

IO之Standard I_O分析(一)

2018-01-24 20:51 162 查看

1. 重要概念

When we open or create a file with the standard I/O library, we say that we have associated a stream with the file.

Standard I/O file streams can be used with both single-byte and multibyte (‘‘wide’’) character sets. A stream’s orientation determines whether the characters that are read and written are single byte or multibyte.

This FILE is normally a structure that contains all the information required by the standard I/O library to manage the stream: the file descriptor used for actual I/O, a pointer to a buffer for the stream, the size of the buffer, a count of the number of characters currently in the buffer, an error flag, and the like.

The goal of the buffering provided by the standard I/O library is to use the minimum number of read and write calls.

Fully buffered : actual I/O takes place when the standard I/O buffer is filled. The term flush describes the writing of a standard I/O buffer. A buffer can be flushed automatically by the standard I/O routines, such as when a buffer fills, or we can call the function fflush to flush a stream.

Line buffered : In this case, the standard I/O library performs I/O when a newline character is encountered on input or output.

Unbuffered : The standard error stream, for example, is normally unbuffered so that any error messages are displayed as quickly as possible, regardless of whether they contain a newline.

standard error is unbuffered, streams open to terminal devices are line buffered, and all other streams are fully buffered.

Note that we perform I/O on each stream before printing its buffering status, since the first I/O operation usually causes the buffers to be allocated for a stream.

  上面的每句话都是非常重要,都是我摘取的重要概念,特别黑色加重的部分,更是重中之重。实际上来说IO分层次和高层次之分,而standard IO属于高层次IO,那么如上一篇记录的FILE I/O属于低层次了。那么第一个值得注意的问题:File Descriptor 和 File Pointer到底是什么关系。思考这个问题,将是学习深入的第一步。

  关于Stream的概念,非常抽象,但是每个流都对应一个结构体。这个结构体就是FILE Struture,具体结构的成员依赖于实现,但是至少我们知道包含了File descriptor和Buffer。这样前一个问题我们渐渐可以解决,那么读和写的缓冲区是不同的,也就是说取决于
fopen
的打开模式,当读和写同时具有的话,会开辟两个buffer(注意只是两个指针)。具体的细节,可以不去关心,但是这些是最基本的概念。

  那么既然提供Buffer,必然区分Buffer的种类和Buffer的设置API,这些细节接下来的API部分再分析。但是抛出来第二个问题:Buffer的使用到底对系统性能要多大的提高?



  上面的截图,取自APUE。具体的实验没必要再自己做了,分析上面结果可以更好理解这些概念:

无论你是
fgets,getc,fgetc
都具有相同的system time,这也是容易理解的,因为我们采用了默认的buffer,该buffer是由自动分配的,所以说所引发的system call的次数非常接近,从而耗时是基本相同的。

最明显的对比是3或者4行和第5行的对比,同样都是读取一个字符,同样的迭代数,却有明显的不同的结果:这是因为buffer的存在,大大减少了实际IO的次数!

fully buffer,line buffere每次实际发生io调用的时刻需要注意!但是这一切一般我们无需关心,我们只关心相应API自身的特性。

  接下来看看第三个问题:那么到底buffer初始化发生在什么时候呢? 如最后的黑体字所言,每次
fopen
获得的fp只是初始化了相关的FILE structure,第一次对fp进行相应IO操作的时候,系统才真正初始化buffer区域。

  第四个问题:buffer更新的时机对我们有什么影响呢?或者说我们需要在意buffer的存在吗? 最好理解的答案就是对于Line buffered的file,我们看到IO操作的结果可能与我们期望有点不同。

2. 重要API

Fwide:

#include <stdio.h>
#include <wchar.h>
int fwide(FILE *fp, int mode);
Returns: positive if stream is wide oriented,negative if stream is byte oriented,or 0 if stream has no orientation


  并不是所有的场合都是以一个字节来表示一个字符,那么针对这种场场合,以及需要进行转换的场合,编码的指定是非常必要的。这里只是指定是否是单字节或者多字节character.

Setbuf:

#include <stdio.h>
void setbuf(FILE *restrict fp, char *restrict buf );
int setvbuf(FILE *restrict fp, char *restrict buf, int mode, size_t size);
Returns: 0 if OK, nonzero on error


These functions must be called after the stream has been opened (obviously, since each requires a valid file pointer as its first argument) but before any other operation is performed on the stream.

In general, we should let the system choose the buffer size and automatically allocate the buffer. When we do this, the standard I/O library automatically releases the buffer when we close the stream.

  如果我们自己解决buffer分配,那么我们显然自己需要解决当
close
stream时buffer的free。推荐的模式:让standard io自己解决这一切! 上述第一段也明确了
setbuf
在open stream和 operate stream之间,那么言外之意,读和写的一系列函数将会分别操作同一个写buffer,或者读buffer。

  
setbuf,setvbuf
的行为实际上有些区别,
setbuf
指定的buffer必须满足BUFSIZ的预定义大小,但是setvbuf却可以指定size,并且
setvbuf
的buffer指定为NULL的时候,系统会自动分配相应的buffer区域。

Getc系列

#include <stdio.h>
int getc(FILE *fp);
int fgetc(FILE *fp);
int getchar(void);
All three return: next character if OK, EOF on end of file or error


  每次处理一个字符,而返回值被转换成int,可以发现
EOF on end of file or error
,为了区分到底是什么原因,又依赖下面的函数:

#include <stdio.h>
int ferror(FILE *fp);
int feof(FILE *fp);
Both return: nonzero (true) if condition is true, 0 (false) otherwise
void clearerr(FILE *fp);


  最开始的时候,提到FILE structure,其中包含了flag成员,这里正好前后呼应。对于另外一个特性ungetc,我觉得并不常用,但是相应的一些概念很重要,可以帮助我们更好的理解buffer:

When we push characters back with ungetc, they are not written back to the underlying file or device. Instead, they are kept incore in the standard I/O library’s buffer for the stream.

Gets系列

include <stdio.h>
char *fgets(char *restrict buf, int n, FILE *restrict fp);
char *gets(char *buf );
Both return: buf if OK, NULL on end of file or error

int fputs(const char *restrict str, FILE *restrict fp);
int puts(const char *str);
Both return: non-negative value if OK, EOF on error


  
fgets
保留换行符,且buffer依然以null-character结尾,其他该函数的特性很容易在任何博文或者资料中找到。
fputs
输出字符串,但是不包含结尾的null-character,注意以下的陈述:

The function fputs writes the null-terminated string to the specified stream. The null byte at the end is not written. Note that this need not be line-at-a-time output, since the string need not contain a newline as the last non-null character.

Fread,Printf

#include <stdio.h>
size_t fread(void *restrict ptr, size_t size, size_t nobj,
FILE *restrict fp);
size_t fwrite(const void *restrict ptr, size_t size, size_t nobj,
FILE *restrict fp);
Both return: number of objects read or written

int printf(const char *restrict format, ...);
int fprintf(FILE *restrict fp, const char *restrict format, ...);
int dprintf(int fd, const char *restrict format, ...);
All three return: number of characters output if OK, negative value if output error
int sprintf(char *restrict buf, const char *restrict format, ...);
Returns: number of characters stored in array if OK, negative value if encoding error
int snprintf(char *restrict buf, size_t n, const char *restrict format, ...);
Returns: number of characters that would have been stored in array if buffer was large enough, negative value if encoding error


  上面的不赘述,很多中文教程,博文都有详细的分析。特别Format IO, 非常繁琐,采取用到看到的方式比较可取。

FFlush :

#include <stdio.h>
int fflush(FILE *fp);
Returns: 0 if OK, EOF on error


The fflush function causes any unwritten data for the stream to be passed to the kernel. As a special case, if fp is NULL, fflush causes all output streams to be flushed.

  如果理解Standard IO最终内部调用System Call的层次概念,就会理解
the stream to be passed to the kernel
的概念。但是
fflush
sync
又不一样,
fflush
保证调用System call,而
sync
保证数据写到硬盘等设备中。

Fopen freopen fdopen :

#include <stdio.h>
FILE *fopen(const char *restrict pathname, const char *restrict type);
FILE *freopen(const char *restrict pathname, const char *restrict type,FILE *restrict fp);
FILE *fdopen(int fd, const char *type);
All three return: file pointer if OK, NULL on error


The freopen function opens a specified file on a specified stream, closing the stream first if it is already open. If the stream previously had an orientation, freopen clears it. This function is typically used to open a specified file as one of the predefined streams: standard input, standard output, or standard error

The fdopen function takes an existing file descriptor, and associates a standard I/O stream with the descriptor.

  上面两点反映了
freopen,fdopen
的基本用途:
freopen
可以重新绑定fp和file,想想以前说的dup,dup2,fcntl对于fd的操作,是不是感觉有点相似, 但是两者有着一些本质的区别。使用
fdopen
函数将会有几个注意的地方:

With fdopen, the meanings of the type argument differ slightly. The descriptor has already been opened, so opening for writing does not truncate the file. (If the descriptor was created by the open function, for example, and the file already existed, the O_TRUNC flag would control whether the file was truncated. The fdopen function cannot simply truncate any file it opens for writing.) Also, the standard I/O append mode cannot create the file (since the file has to exist if a descriptor refers to it).

  
fd
作为参数传递过来的时候,已经表明
fd
所关联的file的存在,所以此时
w
的截断效应,及
a+
create
效应都会失效。注意下面的陈述:

When a file is opened for reading and writing (the plus sign in the type), two restrictions apply.

Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.

Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.

Ftell系列 :

#include <stdio.h>
long ftell(FILE *fp);
Returns: current file position indicator if OK, −1L on error
int fseek(FILE *fp, long offset, int whence);
Returns: 0 if OK, −1 on error
void rewind(FILE *fp);


  
fseek
类似于
lseek
,只不过
fseek
操作的是stream。
ftell
返回当前位置距离开头的位置,而
rewind
重新把stream回归到开头。注意buffer也有指示当前偏移的变量,而stream中fd所指向的内核维护的列表中也记录了当前偏移。两者不同,但是fseek所指的偏移与lseek相同,都可以指向当前文件的偏移位置。

Fclose :

#include <stdio.h>
int fclose(FILE *fp);
Returns: 0 if OK, EOF on error


  
fclose
会调用
flush
,而
flush
即是对buffer采取一系列的actions,如果buffer中还有数据,读buffer的数据将会被丢弃,但是写buffer的数据将会被传递给内核。

3:其他概念

3.1 Memory Streams

  Memory Stream特别适合字符串操作,带着这样的先入为主的观点展开下面的内容。

#include <stdio.h>
FILE *fmemopen(void *restrict buf, size_t size, const char *restrict type);
Returns: stream pointer if OK, NULL on error


  
buf
参数指定地址,如果为
null
的时候,由
fmemopen
自主分配,但是我们无从知道这个地址,所以无法对于其访问,那么读写就没有任何意义。
type
参数与
fopen
中的有些区别,
fopen
append
模式直指
end of file
,但是
fmemopen
中为第一个遇到的
null-byte
。在
buffer
中如果没有
null-byte
那么也是
one byte past the end of file
. 除了append模式,其他打开模式,都将设置offset在文件开头, 关于这点的原话:

a null byte is written at the current position in the stream whenever we increase the amount of data in the stream’s buffer and call fclose, fflush, fseek, fseeko, or fsetpos.

Memory streams are well suited for creating strings, because they prevent buffer overflows. They can also provide a performance boost for functions that take standard I/O stream arguments used for temporary files, because memory streams access only main memory instead of a file stored on disk.

Temporary Files

#include <stdio.h>
char *tmpnam(char *ptr);
Returns: pointer to unique pathname
FILE *tmpfile(void);
Returns: file pointer if OK, NULL on error
char *mkdtemp(char *template);
Returns: pointer to directory name if OK, NULL on error
int mkstemp(char *template);
Returns: file descriptor if OK, −1 on error


The standard technique often used by the tmpfile function is to create a unique pathname by calling tmpnam, then create the file, and immediately unlink it. Unlike tmpfile, the temporary file created by mkstemp is not removed automatically for us. If we want to remove it from the file system namespace, we need to unlink it ourselves. a window exists between the time that the unique pathname is returned and the time that an application creates a file with that name.The tmpfile and mkstemp functions should be used instead, as they don’t suffer from this problem.

  创建临时文件或者文件夹,使用
mkstemp
或者
mkdtemp
。因为这两者相当于原子操作:使用
tmpnam
获取唯一的字符串,然后使用
fopen
,
open
等函数创建一个文件。这一切在
mkstemp
或者
mkdtemp
中完成。
unlink
并非真正删除一个文件,只是删除当前目录对其文件的索引条目,那么使用
mk*temp
,就必需人为进行
unlink
了,(当文件的link数为0时,当process终止时或者close该文件时都会自动删除该文件)
tmpfile
这一切都替我们做好了。但是我们无法指定
tmpfile
的名字。

3. 实验部分

实验一:

Type in the program that copies a file using line-at-a-time I/O (fgets and fputs) from Figure 5.5, but use a MAXLINE of 4. What happens if you copy lines that exceed this length? Explain what is happening.

实验二:

How would you use the fsync function (Section 3.13) with a standard I/O stream?

实验三:

In the programs in Figures 1.7 and 1.10, the prompt that is printed does not contain a newline, and we don’t call fflush. What causes the prompt to be output?

实验四:

write系列操作紧跟read的话需要中间使用fflush,并观察现象?read系列操作紧跟write,并让read遇到遇到文件的末尾,观察现象?

实验五:

Memory Streams中null-byte在write系列操作和fclose之后,是否还在当前位置写入?fprintf是否写入字符串中末尾的null-byte?
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: