您的位置：首页 > 其它

Import Text Data Files with Low-Level I/O

2016-12-01 00:03 447 查看

Overview
Low-level file I/O functions allow the most control over reading or writing data to a file. However, these functions require that you specify more detailed information about your file than the easier-to-use
high-level functions, such as

importdata

. For more information on the high-level functions that read text files, see

Ways to Import Text Files.

If the high-level functions cannot import your data, use one of the following:

fscanf

, which reads formatted data in a text or ASCII file; that is, a file you can view in a text editor. For more information, see

Reading Data in a Formatted Pattern.

fgetl

and

fgets

, which read one line of a file at a time, where a newline character separates each line. For more information, see

Reading Data Line-by-Line.

fread

, which reads a stream of data at the byte or bit level. For more information, see

Import Binary Data with Low-Level I/O.

For additional information, see:

Testing for End of File (EOF)

Opening Files with Different Character Encodings

Note: The low-level file I/O functions are based on functions in the ANSI® Standard C Library. However, MATLAB® includes
vectorized versions of the functions, to read and write data in an array with minimal control loops.

Reading Data in a Formatted Pattern

To import text files that

importdata

and

textscan

cannot read, consider using

fscanf

. The

fscanf

function requires that you describe the format of your file, but includes many options for this format description.

For example, create a text file

mymeas.dat

as shown. The data in

mymeas.dat

includes repeated sets of times, dates, and measurements. The header text includes the number of sets of measurements,

Measurement Data
N=3

12:00:00
01-Jan-1977
4.21  6.55  6.78  6.55
9.15  0.35  7.57  NaN
7.92  8.49  7.43  7.06
9.59  9.33  3.92  0.31
09:10:02
23-Aug-1990
2.76  6.94  4.38  1.86
0.46  3.17  NaN   4.89
0.97  9.50  7.65  4.45
8.23  0.34  7.95  6.46
15:03:40
15-Apr-2003
7.09  6.55  9.59  7.51
7.54  1.62  3.40  2.55
NaN   1.19  5.85  5.05
6.79  4.98  2.23  6.99

Opening the File
As with any of the low-level I/O functions, before reading, open the file with

fopen

, and obtain a file identifier. By default,

fopen

opens files for read access, with a permission of

'r'

.

When you finish processing the file, close it with

fclose

(fid
)

.

Describing the Data
Describe the data in the file with format specifiers, such as

'%s'

for text,

'%d'

for an integer, or

'%f'

for a floating-point number. (For a complete list of specifiers, see the

fscanf

reference page.)

To skip literal characters in the file, include them in the format description. To skip a data field, use an asterisk (

'*'

) in the specifier.

For example, consider the header lines of

mymeas.dat

Measurement Data   % skip the first 2 words, go to next line:  %*s %*s\n
N=3                % ignore 'N=', read integer:  N=%d\n
% go to next line:  \n
12:00:00
01-Jan-1977
4.21  6.55  6.78  6.55
...

To read the headers and return the single value for

N = fscanf(fid, '%*s %*s\nN=%d\n\n', 1);

Specifying the Number of Values to Read
By default,

fscanf

reapplies your format description until it cannot match the description to the data, or it reaches the end of the file.

Optionally, specify the number of values to read, so that

fscanf

does not attempt to read the entire file. For example, in

mymeas.dat

, each set of measurements includes a fixed number of rows and columns:

measrows = 4;
meascols = 4;
meas  = fscanf(fid, '%f', [measrows, meascols])';

Creating Variables in the Workspace
There are several ways to store

mymeas.dat

in the MATLAB workspace. In this case, read the values into a structure. Each element of the structure has three fields:

mtime

mdate

, and

meas

.

Note:

fscanf

fills arrays with numeric values in column order. To make the output array match the orientation of numeric data in a file, transpose the array.

filename = 'mymeas.dat';
measrows = 4;
meascols = 4;

% open the file
fid = fopen(filename);

% read the file headers, find N (one value)
N = fscanf(fid, '%*s %*s\nN=%d\n\n', 1);

% read each set of measurements
for n = 1:N
mystruct(n).mtime = fscanf(fid, '%s', 1);
mystruct(n).mdate = fscanf(fid, '%s', 1);

% fscanf fills the array in column order,
% so transpose the results
mystruct(n).meas  = ...
fscanf(fid, '%f', [measrows, meascols])';
end

% close the file
fclose(fid);

Reading Data Line-by-Line
MATLAB provides two functions that read lines from files and store them as character vectors:

fgetl

and

fgets

. The

fgets

function copies the line along with the newline character to the output, but

fgetl

does not.

The following example uses

fgetl

to read an entire file one line at a time. The function

litcount

determines whether a given character sequence (

literal

) appears in each line. If it does, the function prints the entire line preceded by the number of times the literal appears on the line.

function y = litcount(filename, literal)
% Count the number of times a given literal appears in each line.

fid = fopen(filename);
y = 0;
tline = fgetl(fid);
while ischar(tline)
matches = strfind(tline, literal);
num = length(matches);
if num > 0
y = y + num;
fprintf(1,'%d:%s\n',num,tline);
end
tline = fgetl(fid);
end
fclose(fid);

Create an input data file called

badpoem

Oranges and lemons,
Pineapples and tea.
Orangutans and monkeys,
Dragonflys or fleas.

To find out how many times

'an'

appears in this file, call

litcount

litcount('badpoem','an')

This returns:

2: Oranges and lemons,
1: Pineapples and tea.
3: Orangutans and monkeys,
ans =
6

Testing for End of File (EOF)

When you read a portion of your data at a time, you can use

feof

to check whether you have reached the end of the file.

feof

returns a value of

when the file pointer is at the end of the file. Otherwise, it returns

.

Note: Opening an empty file does not move the file position indicator to the end of the file. Read operations, and the

fseek

and

frewind

functions, move the file position indicator.

Testing for EOF with feof
When you use

textscan

fscanf

, or

fread

to read portions of data at a time, use

feof

to check whether you have reached the end of the file.

For example, suppose that the hypothetical file

mymeas.dat

has the following form, with no information about the number of measurement sets. Read the data into a structure with fields for

mtime

mdate

, and

meas

12:00:00
01-Jan-1977
4.21  6.55  6.78  6.55
9.15  0.35  7.57  NaN
7.92  8.49  7.43  7.06
9.59  9.33  3.92  0.31
09:10:02
23-Aug-1990
2.76  6.94  4.38  1.86
0.46  3.17  NaN   4.89
0.97  9.50  7.65  4.45
8.23  0.34  7.95  6.46

To read the file:

filename = 'mymeas.dat';
measrows = 4;
meascols = 4;

% open the file
fid = fopen(filename);

% make sure the file is not empty
finfo = dir(filename);
fsize = finfo.bytes;

if fsize > 0

% read the file
block = 1;
while ~feof(fid)
mystruct(block).mtime = fscanf(fid, '%s', 1);
mystruct(block).mdate = fscanf(fid, '%s', 1);

% fscanf fills the array in column order,
% so transpose the results
mystruct(block).meas  = ...
fscanf(fid, '%f', [measrows, meascols])';

block = block + 1;
end

end

% close the file
fclose(fid);

Testing for EOF with fgetl and fgets
If you use

fgetl

fgets

in a control loop,

feof

is not always the best way to test for end of file. As an alternative, consider checking whether the value that

fgetl

fgets

returns is a character vector.

For example, the function

litcount

described in
Reading Data Line-by-Line includes the following

while

loop and

fgetl

calls :

y = 0;
tline = fgetl(fid);
while ischar(tline)
matches
a006
= strfind(tline, literal);
num = length(matches);
if num > 0
y = y + num;
fprintf(1,'%d:%s\n',num,tline);
end
tline = fgetl(fid);
end

This approach is more robust than testing

~feof(fid)

for two reasons:

If

fgetl

fgets

find data, they return a character vector. Otherwise, they return a number (

-1

).

After each read operation,

fgetl

and

fgets

check the next character in the file for the end-of-file marker. Therefore, these functions sometimes set the end-of-file indicator
before they return a value of

-1

. For example, consider the following three-line text file. Each of the first two lines ends with a newline character, and the third line contains only the end-of-file marker:

123
456

Three sequential calls to

fgetl

yield the following results:

t1 = fgetl(fid);    % t1 = '123', feof(fid) = false
t2 = fgetl(fid);    % t2 = '456', feof(fid) = true
t3 = fgetl(fid);    % t3 = -1,    feof(fid) = true

This behavior does not conform to the ANSI specifications for the related C language functions.

Opening Files with Different Character Encodings
Encoding schemes support the characters required for particular alphabets, such as those for Japanese or European languages. Common encoding schemes include US-ASCII or UTF-8.

If you do not specify an encoding scheme,

fopen

opens files for processing using the default encoding for your system. To determine the default, open a file, and call

fopen

again with the syntax:

[filename, permission, machineformat, encoding] = fopen(fid);

If you specify an encoding scheme when you open a file, the following functions apply that scheme:

fscanf

fprintf

fgetl

fgets

fread

, and

fwrite

.

For a complete list of supported encoding schemes, and the syntax for specifying the encoding, see the

fopen

reference page.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航