您的位置：首页 > 理论基础 > 计算机网络

Android下音频录制以及网络传输的思路及开发方法

2013-07-29 18:40 423 查看

本文来讲述在Android中使用AudioRecord和AudioTrack来进行音频录制播放，并使用speex进行音频压缩编码，再采用TCP传输思路和方法。

1 开篇

在之前的文章我讨论过视频压缩，网络传输，多播等方案和思路，所以考虑到全面的问题，我在这里开始讨论音频的压缩传输问题了。当然这篇文章也属于从网上总结而来的，里面也有一些自己的思考方式。特别感谢开源项目android-recorder，里面很多代码给我直接的帮助。敬佩。

2
相关知识

本文的相关知识为AudioRecord、AudioTrack、Thread、TCP、Speex等。如果有问题可以单独网上搜索这些关键字。

3
多线程音频读取思路

android-recorder项目中的两个线程处理方法非常好。一个线程启动录制，然后循环调用Read方法，并将数据压倒另外一个线程的数据处理队列中，在另外一个线程的run里面，循环从队列里面取出一条，并写入到文件里面。代码示例如下：

线程1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

public void run() {

PcmWriter pcmWriter = new PcmWriter();

pcmWriter.init();

Thread writerThread = new Thread (pcmWriter);

pcmWriter.setRecording(true);

writerThread.start();

synchronized (mutex) {

while (!this.isRecording) {

try {

mutex.wait();

} catch (InterruptedException e) {

throw new IllegalStateException("Wait()
interrupted!", e);

}

}

}

android.os.Process

.setThreadPriority(android.os.Process.THREAD_PRIORITY_URGENT_AUDIO);

int bufferRead = 0;

int bufferSize = AudioRecord.getMinBufferSize(frequency,

AudioFormat.CHANNEL_IN_MONO,
audioEncoding);

short[] tempBuffer = new short[bufferSize];

AudioRecord recordInstance = new AudioRecord(

MediaRecorder.AudioSource.MIC,
frequency,

AudioFormat.CHANNEL_IN_MONO,
audioEncoding, bufferSize);

recordInstance.startRecording();

while (this.isRecording) {

bufferRead = recordInstance.read(tempBuffer, 0,
bufferSize);

if (bufferRead == AudioRecord.ERROR_INVALID_OPERATION) {

throw new IllegalStateException(

"read() returned AudioRecord.ERROR_INVALID_OPERATION");

} else if (bufferRead == AudioRecord.ERROR_BAD_VALUE) {

throw new IllegalStateException(

"read() returned AudioRecord.ERROR_BAD_VALUE");

} else if (bufferRead == AudioRecord.ERROR_INVALID_OPERATION) {

throw new IllegalStateException(

"read() returned AudioRecord.ERROR_INVALID_OPERATION");

}

pcmWriter.putData(tempBuffer,
bufferRead);

log.debug("put
data done!");

}

recordInstance.stop();

pcmWriter.setRecording(false);

}

线程2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

public void run() {

log.error("pcmwriter
thread runing");

while (this.isRecording()) {

if(list.size() > 0){

rawData = list.remove(0);

try {

for (int i = 0; i < rawData.size; ++i) {

dataOutputStreamInstance.writeShort(rawData.pcm[i]);

}

} catch (IOException e) {

e.printStackTrace();

}

} else {

try {

Thread.sleep(200);

} catch (InterruptedException e) {

e.printStackTrace();

}

}

}

stop();

}

上面看着代码很多，实际核心代码为：

1

2

3

4

5

6

7

8

9

10

11

short[] tempBuffer = new short[bufferSize];

AudioRecord recordInstance = new AudioRecord(

MediaRecorder.AudioSource.MIC,
frequency,

AudioFormat.CHANNEL_IN_MONO,
audioEncoding, bufferSize);

recordInstance.startRecording();

while (this.isRecording) {

bufferRead = recordInstance.read(tempBuffer, 0,
bufferSize);

pcmWriter.putData(tempBuffer,
bufferRead);

}

recordInstance.stop();

这里需要解释一下几个问题：

Class Overview

The AudioRecord class manages the audio resources for Java applications to record audio from the audio input hardware of the platform. This is achieved by “pulling” (reading) the data from the AudioRecord object. The application is responsible for polling the
AudioRecord object in time using one of the following three methods: read(byte[], int, int), read(short[], int, int) or read(ByteBuffer, int). The choice of which method to use will be based on the audio data storage format that is the most convenient for
the user of AudioRecord.

Upon creation, an AudioRecord object initializes its associated audio buffer that it will fill with the new audio data. The size of this buffer, specified during the construction, determines how long an AudioRecord can record before “over-running” data that
has not been read yet. Data should be read from the audio hardware in chunks of sizes inferior to the total recording buffer size.

AudioRecord类从音频输入硬件底层为Java应用程序管理音频资源。这是通过从AudioRecord对象读取数据。应用程序可以通过是那个read方法从AudioRecord对象读取数据。选择哪个读取方法根据音频数据储存格式决定。

创建的时候，AudioRecord对象需要提供一个音频数据buffer存放地址。

1	public AudioRecord (int audioSource, int sampleRateInHz, int channelConfig, int audioFormat, int bufferSizeInBytes)

Since: API Level 3 Class constructor.

Parameters

audioSource the recording source. See MediaRecorder.AudioSource for recording source definitions.

sampleRateInHz the sample rate expressed in Hertz. 44100Hz is currently the only rate that is guaranteed to work on all devices, but other rates such as 22050, 16000, and 11025 may work on some devices.

channelConfig describes the configuration of the audio channels. See CHANNEL_IN_MONO and CHANNEL_IN_STEREO. CHANNEL_IN_MONO is guaranteed to work on all devices.

audioFormat the format in which the audio data is represented. See ENCODING_PCM_16BIT and ENCODING_PCM_8BIT

bufferSizeInBytes the total size (in bytes) of the buffer where audio data is written to during the recording. New audio data can be read from this buffer in smaller chunks than this size. See getMinBufferSize(int, int, int) to determine the minimum required
buffer size for the successful creation of an AudioRecord instance. Using values smaller than getMinBufferSize() will result in an initialization failure.

Throws

IllegalArgumentException

构造函数参数说明：

audioSource 录制源，参考 MediaRecorder.AudioSource的定义

int CAMCORDER Microphone audio source with same orientation as camera if available, the main device microphone otherwise

int DEFAULT

int MIC Microphone audio source

int VOICE_CALL Voice call uplink + downlink audio source

int VOICE_COMMUNICATION Microphone audio source tuned for voice communications such as VoIP.

int VOICE_DOWNLINK Voice call downlink (Rx) audio source

int VOICE_RECOGNITION Microphone audio source tuned for voice recognition if available, behaves like DEFAULT otherwise.

int VOICE_UPLINK Voice call uplink (Tx) audio source

sampleRateInHz 参数为抽样频率，目前 44100Hz 是支持所有设备的，其他的22050, 16000, and 11025可能在部分设备上好使。

audioFormat 音频格式，ENCODING_PCM_16BIT和 ENCODING_PCM_8BIT，第一种音频会清楚一些，也大一些。

bufferSizeInBytes 是一个预存储音频数据空间大小。

4
播放方法

1

2

3

4

5

6

7

8

9

10

AudioTrack mAudioTrack;

mAudioTrack = new AudioTrack(AudioManager.STREAM_MUSIC,mFrequency,mChannel,mSampBit,minBufSize,AudioTrack.MODE_STREAM);

//AudioTrack中有MODE_STATIC和MODE_STREAM两种分类。

//STREAM的意思是由用户在应用程序通过write方式把数据一次一次得写到audiotrack中。

//这个和我们在socket中发送数据一样，应用层从某个地方获取数据，例如通过编解码得到PCM数据，然后write到audiotrack。

//这种方式的坏处就是总是在JAVA层和Native层交互，效率损失较大。

//而STATIC的意思是一开始创建的时候，就把音频数据放到一个固定的buffer，然后直接传给audiotrack，

//后续就不用一次次得write了。AudioTrack会自己播放这个buffer中的数据。

//这种方法对于铃声等内存占用较小，延时要求较高的声音来说很适用。

mAudioTrack.play();

上面提到，在AudioTrack构造函数的最后一个参数，有两种类型，有MODE_STATIC和MODE_STREAM，如果是播放PCM文件，一般采用MODE_STATIC方式，对于是实时Buffer方式，采用MODE_STREAM。

Class Overview

The AudioTrack class manages and plays a single audio resource for Java applications. It allows to stream PCM audio buffers to the audio hardware for playback. This is achieved by “pushing” the data to the AudioTrack object using one of the write(byte[], int,
int) and write(short[], int, int) methods.

AudioTrack类为Java应用程序管理和播放单一音频资源。她支持PCM流音频缓冲到音频硬件的回放。回放是通过将数据通过write函数压入到AudioTrack对象中实现。

An AudioTrack instance can operate under two modes: static or streaming.

一个AudioTrack实例可以通过两种模式操作，静态和流的模式。

In Streaming mode, the application writes a continuous stream of data to the AudioTrack, using one of the write() methods. These are blocking and return when the data has been transferred from the Java layer to the native layer and queued for playback. The
streaming mode is most useful when playing blocks of audio data that for instance are:

在流的模式，应用程序使用write方法将一个连续流数据写入到AudioTrack。流模式在下列的情况将被用到：

•too big to fit in memory because of the duration of the sound to play,

音频太大，不能一次性加载到内存播放

•too big to fit in memory because of the characteristics of the audio data (high sampling rate, bits per sample …)

由于音频数据的特征（高采样率，每采样一个位等）导致音频很大，不能加载到内存中播放

•received or generated while previously queued audio is playing.

网络接收到的实时音频播放

The static mode is to be chosen when dealing with short sounds that fit in memory and that need to be played with the smallest latency possible. The static mode will therefore be preferred for UI and game sounds that are played often, and with the smallest
overhead possible.

在static模式，一般是比较小的音频。在游戏里面一般用到比较多。

Upon creation, an AudioTrack object initializes its associated audio buffer. The size of this buffer, specified during the construction, determines how long an AudioTrack can play before running out of data.

For an AudioTrack using the static mode, this size is the maximum size of the sound that can be played from it.

For the streaming mode, data will be written to the hardware in chunks of sizes inferior to the total buffer size.

AudioTrack对象创建的时候，需要初始化几个参数，包含音频Buffer，Buffer大小，对于static模式，最大的播放音频大小，对于流模式，为数据写入到总缓冲区的大小块

5
jni使用speex进行压缩编码

官方文档里面提到了，如果使用C/C++调用speex进行音频压缩及解压的代码。

压缩代码：

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

#include

#include

/*The frame size in hardcoded for this sample code but it doesn't
have to be*/

#define FRAME_SIZE 160

int main(int argc, char **argv)

{

char *inFile;

FILE *fin;

short in[FRAME_SIZE];

float input[FRAME_SIZE];

char cbits[200];

int nbBytes;

/*Holds the state of the encoder*/

void *state;

/*Holds bits so they can be read and written to by the Speex routines*/

SpeexBits bits;

int i, tmp;

/*Create a new encoder state in narrowband mode*/

state = speex_encoder_init(&speex_nb_mode);

/*Set the quality to 8 (15 kbps)*/

tmp=8;

speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);

inFile = argv[1];

fin = fopen(inFile, "r");

/*Initialization of the structure that holds the bits*/

speex_bits_init(&bits);

while (1)

{

/*Read a 16 bits/sample audio frame*/

fread(in, sizeof(short), FRAME_SIZE, fin);

if (feof(fin))

break;

/*Copy the 16 bits values to float so Speex can work on them*/

for (i=0;i
input[i]=in[i];

/*Flush all the bits in the struct so we can encode a new frame*/

speex_bits_reset(&bits);

/*Encode the frame*/

speex_encode(state, input, &bits);

/*Copy the bits to an array of char that can be written*/

nbBytes = speex_bits_write(&bits, cbits, 200);

/*Write the size of the frame first. This is what sampledec expects
but

it's likely to be different in your own application*/

fwrite(&nbBytes, sizeof(int), 1, stdout);

/*Write the compressed data*/

fwrite(cbits, 1, nbBytes, stdout);

}

/*Destroy the encoder state*/

speex_encoder_destroy(state);

/*Destroy the bit-packing struct*/

speex_bits_destroy(&bits);

fclose(fin);

return 0;

}

解压代码：

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

#include

#include

/*The frame size in hardcoded for this sample code but it doesn't
have to be*/

#define FRAME_SIZE 160

int main(int argc, char **argv)

{

char *outFile;

FILE *fout;

/*Holds the audio that will be written to file (16 bits per sample)*/

short out[FRAME_SIZE];

/*Speex handle samples as float, so we need an array of floats*/

float output[FRAME_SIZE];

char cbits[200];

int nbBytes;

/*Holds the state of the decoder*/

void *state;

/*Holds bits so they can be read and written to by the Speex routines*/

SpeexBits bits;

int i, tmp;

/*Create a new decoder state in narrowband mode*/

state = speex_decoder_init(&speex_nb_mode);

/*Set the perceptual enhancement on*/

tmp=1;

speex_decoder_ctl(state, SPEEX_SET_ENH, &tmp);

outFile = argv[1];

fout = fopen(outFile, "w");

/*Initialization of the structure that holds the bits*/

speex_bits_init(&bits);

while (1)

{

/*Read the size encoded by sampleenc, this part will likely be

different in your application*/

fread(&nbBytes, sizeof(int), 1, stdin);

fprintf (stderr, "nbBytes:
%dn", nbBytes);

if (feof(stdin))

break;

/*Read the "packet" encoded by sampleenc*/

fread(cbits, 1, nbBytes, stdin);

/*Copy the data into the bit-stream struct*/

speex_bits_read_from(&bits, cbits, nbBytes);

/*Decode the data*/

speex_decode(state, &bits, output);

/*Copy from float to short (16 bits) for output*/

for (i=0;i
out[i]=output[i];

/*Write the decoded audio to file*/

fwrite(out, sizeof(short), FRAME_SIZE, fout);

}

/*Destroy the decoder state*/

speex_decoder_destroy(state);

/*Destroy the bit-stream truct*/

speex_bits_destroy(&bits);

fclose(fout);

return 0;

}

通过上面的范例，我们就可以通过JNI的方式包含speex库，并撰写相应的android.mk文件，然后写JNI的encode和decode代码了。

下面的代码摘自android-recoder-6

[code lang="c"]

#include

#include

#include

#include

static int codec_open = 0;

static int dec_frame_size;

static int enc_frame_size;

static SpeexBits ebits, dbits;

void *enc_state;

void *dec_state;

static JavaVM *gJavaVM;

//打开编码函数

extern "C"

JNIEXPORT jint JNICALL Java_com_ryong21_encode_Speex_open

(JNIEnv *env, jobject obj, jint compression) {

int tmp;

if (codec_open++ != 0)

return (jint)0;

speex_bits_init(&ebits);

speex_bits_init(&dbits);

enc_state = speex_encoder_init(&speex_nb_mode);

dec_state = speex_decoder_init(&speex_nb_mode);

tmp = compression;

speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY, &tmp);

speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &enc_frame_size);

speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &dec_frame_size);

return (jint)0;

}

//编码函数实现

extern "C"

JNIEXPORT jint JNICALL Java_com_ryong21_encode_Speex_encode

(JNIEnv *env, jobject obj, jshortArray lin, jint offset, jbyteArray encoded, jint size) {

jshort buffer[enc_frame_size];

jbyte output_buffer[enc_frame_size];

int nsamples = (size-1)/enc_frame_size + 1;

int i, tot_bytes = 0;

if (!codec_open)

return 0;

speex_bits_reset(&ebits);

for (i = 0; i < nsamples; i++) { env->GetShortArrayRegion(lin, offset + i*enc_frame_size, enc_frame_size, buffer);

speex_encode_int(enc_state, buffer, &ebits);

}

//env->GetShortArrayRegion(lin, offset, enc_frame_size, buffer);

//speex_encode_int(enc_state, buffer, &ebits);

tot_bytes = speex_bits_write(&ebits, (char *)output_buffer,

enc_frame_size);

env->SetByteArrayRegion(encoded, 0, tot_bytes,

output_buffer);

return (jint)tot_bytes;

}

//解码函数实现

extern "C"

JNIEXPORT jint JNICALL Java_com_ryong21_encode_Speex_decode

(JNIEnv *env, jobject obj, jbyteArray encoded, jshortArray lin, jint size) {

jbyte buffer[dec_frame_size];

jshort output_buffer[dec_frame_size];

jsize encoded_length = size;

if (!codec_open)

return 0;

env->GetByteArrayRegion(encoded, 0, encoded_length, buffer);

speex_bits_read_from(&dbits, (char *)buffer, encoded_length);

speex_decode_int(dec_state, &dbits, output_buffer);

env->SetShortArrayRegion(lin, 0, dec_frame_size,

output_buffer);

return (jint)dec_frame_size;

}

//获取帧大小

extern "C"

JNIEXPORT jint JNICALL Java_com_ryong21_encode_Speex_getFrameSize

(JNIEnv *env, jobject obj) {

if (!codec_open)

return 0;

return (jint)enc_frame_size;

}

//关闭解码器

extern "C"

JNIEXPORT void JNICALL Java_com_ryong21_encode_Speex_close

(JNIEnv *env, jobject obj) {

if (--codec_open != 0)

return;

speex_bits_destroy(&ebits);

speex_bits_destroy(&dbits);

speex_decoder_destroy(dec_state);

speex_encoder_destroy(enc_state);

}

[/code]

java对jni的导入实现为：

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

class Speex {

/* quality

* 1 : 4kbps (very noticeable artifacts, usually intelligible)

* 2 : 6kbps (very noticeable artifacts, good intelligibility)

* 4 : 8kbps (noticeable artifacts sometimes)

* 6 : 11kpbs (artifacts usually only noticeable with headphones)

* 8 : 15kbps (artifacts not usually noticeable)

*/

private static final int DEFAULT_COMPRESSION = 8;

private Logger log = LoggerFactory.getLogger(Speex.class);

Speex() {

}

public void init() {

load();

open(DEFAULT_COMPRESSION);

log.info("speex
opened");

}

private void load() {

try {

System.loadLibrary("speex");

} catch (Throwable e) {

e.printStackTrace();

}

}

public native int open(int compression);

public native int getFrameSize();

public native int decode(byte encoded[], short lin[], int size);

public native int encode(short lin[], int offset, byte encoded[], int size);

public native void close();

}

然后我们就可以在开发代码中使用了。

6
TCP传输

在TCP传输端，可以采取本博客之前的Server端框架，参考 Android多客户端TCP
Server框架代码或者 Android中TCP
Server的模型代码

7
接受端还原并播放

播放的方式，我们可以使用AudioTrack或者其他的方式。代码可以参考第四部分。

8
尾声

本博客整体的代码是从android-recorder中学习而来的，一些实现也是他的摘录，算是对音频方面的总结吧。在这里感谢原开源作者。另外对于网上音频聊天的实现，一般都通过服务器中转，如red5，实现方面有的UDP，有的地方TCP。见仁见智了，同时也看具体的业务逻辑而定。

9
代码下载

代码我附上google上面的最新代码，值得阅读。虽然在代码里面他将音频转为flv格式播放，但不影响我们自主封装。

android-recorder-5.1

10
参考文章

1 http://www.eoeandroid.com/thread-101317-1-1.html
2 http://code.google.com/p/android-recorder/
3 http://code.google.com/p/spydroid-ipcamera/
4 https://github.com/pdwryst/SoundSwap/tree/c87e7a2d1c63589f928e653f467c6801bbbb1d20
5 https://github.com/RaabsIn513/RecordAudio
6 https://github.com/trashmaxx/AudioRecorder
7 http://hi.baidu.com/lzhts/blog/item/6773d3ca165b7a49f31fe744.html
8 http://blog.csdn.net/hellogv/article/details/6026455
9 http://www.speex.org/
10 http://www.eoeandroid.com/thread-70014-1-1.html
转自：http://blog.jouhu.com/?p=2442

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航