Import Text Data Files with Low-Level I/O
2016-12-01 00:03
447 查看
Overview
Low-level file I/O functions allow the most control over reading or writing data to a file. However, these functions require that you specify more detailed information about your file than the easier-to-use
high-level functions, such as
Ways to Import Text Files.
If the high-level functions cannot import your data, use one of the following:
Reading Data in a Formatted Pattern.
Reading Data Line-by-Line.
Import Binary Data with Low-Level I/O.
For additional information, see:
Testing for End of File (EOF)
Opening Files with Different Character Encodings
Note: The low-level file I/O functions are based on functions in the ANSI® Standard C Library. However, MATLAB® includes
vectorized versions of the functions, to read and write data in an array with minimal control loops.
Reading Data in a Formatted Pattern
To import text files that
For example, create a text file
Opening the File
As with any of the low-level I/O functions, before reading, open the file with
When you finish processing the file, close it with
Describing the Data
Describe the data in the file with format specifiers, such as
To skip literal characters in the file, include them in the format description. To skip a data field, use an asterisk (
For example, consider the header lines of
To read the headers and return the single value for
Specifying the Number of Values to Read
By default,
Optionally, specify the number of values to read, so that
Creating Variables in the Workspace
There are several ways to store
Note:
Reading Data Line-by-Line
MATLAB provides two functions that read lines from files and store them as character vectors:
The following example uses
Create an input data file called
To find out how many times
This returns:
Testing for End of File (EOF)
When you read a portion of your data at a time, you can use
Note: Opening an empty file does not move the file position indicator to the end of the file. Read operations, and the
Testing for EOF with feof
When you use
For example, suppose that the hypothetical file
To read the file:
Testing for EOF with fgetl and fgets
If you use
For example, the function
Reading Data Line-by-Line includes the following
This approach is more robust than testing
If
After each read operation,
before they return a value of
Three sequential calls to
This behavior does not conform to the ANSI specifications for the related C language functions.
Opening Files with Different Character Encodings
Encoding schemes support the characters required for particular alphabets, such as those for Japanese or European languages. Common encoding schemes include US-ASCII or UTF-8.
If you do not specify an encoding scheme,
If you specify an encoding scheme when you open a file, the following functions apply that scheme:
For a complete list of supported encoding schemes, and the syntax for specifying the encoding, see the
Low-level file I/O functions allow the most control over reading or writing data to a file. However, these functions require that you specify more detailed information about your file than the easier-to-use
high-level functions, such as
importdata. For more information on the high-level functions that read text files, see
Ways to Import Text Files.
If the high-level functions cannot import your data, use one of the following:
fscanf, which reads formatted data in a text or ASCII file; that is, a file you can view in a text editor. For more information, see
Reading Data in a Formatted Pattern.
fgetland
fgets, which read one line of a file at a time, where a newline character separates each line. For more information, see
Reading Data Line-by-Line.
fread, which reads a stream of data at the byte or bit level. For more information, see
Import Binary Data with Low-Level I/O.
For additional information, see:
Testing for End of File (EOF)
Opening Files with Different Character Encodings
Note: The low-level file I/O functions are based on functions in the ANSI® Standard C Library. However, MATLAB® includes
vectorized versions of the functions, to read and write data in an array with minimal control loops.
Reading Data in a Formatted Pattern
To import text files that
importdataand
textscancannot read, consider using
fscanf. The
fscanffunction requires that you describe the format of your file, but includes many options for this format description.
For example, create a text file
mymeas.datas shown. The data in
mymeas.datincludes repeated sets of times, dates, and measurements. The header text includes the number of sets of measurements,
N:
Measurement Data N=3 12:00:00 01-Jan-1977 4.21 6.55 6.78 6.55 9.15 0.35 7.57 NaN 7.92 8.49 7.43 7.06 9.59 9.33 3.92 0.31 09:10:02 23-Aug-1990 2.76 6.94 4.38 1.86 0.46 3.17 NaN 4.89 0.97 9.50 7.65 4.45 8.23 0.34 7.95 6.46 15:03:40 15-Apr-2003 7.09 6.55 9.59 7.51 7.54 1.62 3.40 2.55 NaN 1.19 5.85 5.05 6.79 4.98 2.23 6.99
Opening the File
As with any of the low-level I/O functions, before reading, open the file with
fopen, and obtain a file identifier. By default,
fopenopens files for read access, with a permission of
'r'.
When you finish processing the file, close it with
fclose
(.fid)
Describing the Data
Describe the data in the file with format specifiers, such as
'%s'for text,
'%d'for an integer, or
'%f'for a floating-point number. (For a complete list of specifiers, see the
fscanfreference page.)
To skip literal characters in the file, include them in the format description. To skip a data field, use an asterisk (
'*') in the specifier.
For example, consider the header lines of
mymeas.dat:
Measurement Data % skip the first 2 words, go to next line: %*s %*s\n N=3 % ignore 'N=', read integer: N=%d\n % go to next line: \n 12:00:00 01-Jan-1977 4.21 6.55 6.78 6.55 ...
To read the headers and return the single value for
N:
N = fscanf(fid, '%*s %*s\nN=%d\n\n', 1);
Specifying the Number of Values to Read
By default,
fscanfreapplies your format description until it cannot match the description to the data, or it reaches the end of the file.
Optionally, specify the number of values to read, so that
fscanfdoes not attempt to read the entire file. For example, in
mymeas.dat, each set of measurements includes a fixed number of rows and columns:
measrows = 4; meascols = 4; meas = fscanf(fid, '%f', [measrows, meascols])';
Creating Variables in the Workspace
There are several ways to store
mymeas.datin the MATLAB workspace. In this case, read the values into a structure. Each element of the structure has three fields:
mtime,
mdate, and
meas.
Note:
fscanffills arrays with numeric values in column order. To make the output array match the orientation of numeric data in a file, transpose the array.
filename = 'mymeas.dat'; measrows = 4; meascols = 4; % open the file fid = fopen(filename); % read the file headers, find N (one value) N = fscanf(fid, '%*s %*s\nN=%d\n\n', 1); % read each set of measurements for n = 1:N mystruct(n).mtime = fscanf(fid, '%s', 1); mystruct(n).mdate = fscanf(fid, '%s', 1); % fscanf fills the array in column order, % so transpose the results mystruct(n).meas = ... fscanf(fid, '%f', [measrows, meascols])'; end % close the file fclose(fid);
Reading Data Line-by-Line
MATLAB provides two functions that read lines from files and store them as character vectors:
fgetland
fgets. The
fgetsfunction copies the line along with the newline character to the output, but
fgetldoes not.
The following example uses
fgetlto read an entire file one line at a time. The function
litcountdetermines whether a given character sequence (
literal) appears in each line. If it does, the function prints the entire line preceded by the number of times the literal appears on the line.
function y = litcount(filename, literal) % Count the number of times a given literal appears in each line. fid = fopen(filename); y = 0; tline = fgetl(fid); while ischar(tline) matches = strfind(tline, literal); num = length(matches); if num > 0 y = y + num; fprintf(1,'%d:%s\n',num,tline); end tline = fgetl(fid); end fclose(fid);
Create an input data file called
badpoem:
Oranges and lemons, Pineapples and tea. Orangutans and monkeys, Dragonflys or fleas.
To find out how many times
'an'appears in this file, call
litcount:
litcount('badpoem','an')
This returns:
2: Oranges and lemons, 1: Pineapples and tea. 3: Orangutans and monkeys, ans = 6
Testing for End of File (EOF)
When you read a portion of your data at a time, you can use
feofto check whether you have reached the end of the file.
feofreturns a value of
1when the file pointer is at the end of the file. Otherwise, it returns
0.
Note: Opening an empty file does not move the file position indicator to the end of the file. Read operations, and the
fseekand
frewindfunctions, move the file position indicator.
Testing for EOF with feof
When you use
textscan,
fscanf, or
freadto read portions of data at a time, use
feofto check whether you have reached the end of the file.
For example, suppose that the hypothetical file
mymeas.dathas the following form, with no information about the number of measurement sets. Read the data into a structure with fields for
mtime,
mdate, and
meas:
12:00:00 01-Jan-1977 4.21 6.55 6.78 6.55 9.15 0.35 7.57 NaN 7.92 8.49 7.43 7.06 9.59 9.33 3.92 0.31 09:10:02 23-Aug-1990 2.76 6.94 4.38 1.86 0.46 3.17 NaN 4.89 0.97 9.50 7.65 4.45 8.23 0.34 7.95 6.46
To read the file:
filename = 'mymeas.dat'; measrows = 4; meascols = 4; % open the file fid = fopen(filename); % make sure the file is not empty finfo = dir(filename); fsize = finfo.bytes; if fsize > 0 % read the file block = 1; while ~feof(fid) mystruct(block).mtime = fscanf(fid, '%s', 1); mystruct(block).mdate = fscanf(fid, '%s', 1); % fscanf fills the array in column order, % so transpose the results mystruct(block).meas = ... fscanf(fid, '%f', [measrows, meascols])'; block = block + 1; end end % close the file fclose(fid);
Testing for EOF with fgetl and fgets
If you use
fgetlor
fgetsin a control loop,
feofis not always the best way to test for end of file. As an alternative, consider checking whether the value that
fgetlor
fgetsreturns is a character vector.
For example, the function
litcountdescribed in
Reading Data Line-by-Line includes the following
whileloop and
fgetlcalls :
y = 0; tline = fgetl(fid); while ischar(tline) matches a006 = strfind(tline, literal); num = length(matches); if num > 0 y = y + num; fprintf(1,'%d:%s\n',num,tline); end tline = fgetl(fid); end
This approach is more robust than testing
~feof(fid)for two reasons:
If
fgetlor
fgetsfind data, they return a character vector. Otherwise, they return a number (
-1).
After each read operation,
fgetland
fgetscheck the next character in the file for the end-of-file marker. Therefore, these functions sometimes set the end-of-file indicator
before they return a value of
-1. For example, consider the following three-line text file. Each of the first two lines ends with a newline character, and the third line contains only the end-of-file marker:
123 456
Three sequential calls to
fgetlyield the following results:
t1 = fgetl(fid); % t1 = '123', feof(fid) = false t2 = fgetl(fid); % t2 = '456', feof(fid) = true t3 = fgetl(fid); % t3 = -1, feof(fid) = true
This behavior does not conform to the ANSI specifications for the related C language functions.
Opening Files with Different Character Encodings
Encoding schemes support the characters required for particular alphabets, such as those for Japanese or European languages. Common encoding schemes include US-ASCII or UTF-8.
If you do not specify an encoding scheme,
fopenopens files for processing using the default encoding for your system. To determine the default, open a file, and call
fopenagain with the syntax:
[filename, permission, machineformat, encoding] = fopen(fid);
If you specify an encoding scheme when you open a file, the following functions apply that scheme:
fscanf,
fprintf,
fgetl,
fgets,
fread, and
fwrite.
For a complete list of supported encoding schemes, and the syntax for specifying the encoding, see the
fopenreference page.
相关文章推荐
- Export to Text Data Files with Low-Level I/O
- Import Binary Data with Low-Level I/O
- Export Binary Data with Low-Level I/O
- 《Data-intensive Text Processing with MapReduce》读书笔记第3章:MapReduce算法设计(5)
- 《Data-intensive Text Processing with MapReduce》读书笔记第3章:MapReduce算法设计(1)
- outlook2010导入旧档案数据文件 import old archive data files
- #3 working with data stored in files && securing your application
- Data Structure for HTML DOM with compare text inside
- DataPump Import Of Object Types Fails With Errors ORA-39083 ORA-2304 Or ORA-39117 ORA-39779 (Doc ID
- How to import data from a text file to a ACCESS table
- 《Data-intensive Text Processing with MapReduce》读书笔记第3章:MapReduce算法设计(2)
- Linux: working with text files
- Data-Intensive Text Processing with MapReduce 第三章(1)——local aggregation
- Import data from SQLServer with Sqoop
- The Contents.json describing the image set "BG_Email_Textfield.imageset" must start with a top level
- 《Data-intensive Text Processing with MapReduce》读书笔记第2章:MapReduce基础(3)
- 1.6.4 Uploading Structured Data Store Data with the Data Import Handler
- A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
- Java 源码 ——顺序存取文件的创建及写入(Writing data to a sequential text file with class Formatter)
- Reading Csv Files with Text_io in Oracle D2k Forms