Rust 驱动 Audio - 播放和录音

rust-esp32 - 这篇文章属于一个选集。

§ 17: 本文

音频格式
#

PCM：脉冲编码调制（英语：Pulse-code modulation，缩写：PCM）是一种模拟信号的数字化方法。 A PCM stream has two basic properties that determine the stream’s fidelity to the original analog signal:

the sampling rate, which is the number of times per second that samples are taken;
and the bit depth, which determines the number of possible digital values that can be used to represent each sample.

The compact disc (CD) brought PCM to consumer audio applications with its introduction in 1982. The CD uses a 44,100 Hz sampling frequency and 16-bit resolution and stores up to 80 minutes of stereo audio per disc. stereo audio 是通过 two-channel 提供的。

The audio contained in a CD-DA consists of two-channel signed 16-bit LPCM sampled at 44,100 Hz and written as a little-endian interleaved stream with left channel coming first

LPCM 的解释：Linear pulse-code modulation (LPCM) 是一种数字信号的表示方法，主要用于音频信号。它通过将模拟信号定期采样并量化为线性级别的数字值来工作。LPCM是脉冲编码调制（PCM）的一种形式，特别强调了量化过程是线性的。这意味着模拟信号的每个采样值都直接转换成相应的数字值， 而这个转换过程不涉及任何非线性压缩 。LPCM的关键步骤包括采样、量化和编码：

**采样**：这是将连续的模拟信号转换为离散信号的过程。根据奈奎斯特定理，为了避免混叠效应，采样频率应至少为信号最高频率的两倍。例如，CD音频以44.1kHz的频率采样，这意味着它可以准确地再现高达22.05kHz 的声音频率，覆盖了人耳可听范围。
**量化**：量化过程涉及将每个采样点的振幅（即大小或强度）近似到一组有限的数值中。在LPCM中，这个过程是线性的，这意味着模拟信号的动态范围被均匀分配给量化级别。量化的精度通常用比特数表示，比如CD音质的LPCM采用16位量化，提供了65536（2^16）个不同的可能振幅级别。
**编码**：最后，量化后的数值被编码为数字信号，可以存储或传输。在LPCM中，这些数值直接表示信号的振幅， 不进行任何额外的压缩或编码 。

LPCM 是一种无损的音频格式 ，因为它不涉及压缩过程中的信息丢失（尽管原始模拟信号在采样和量化过程中可能会有一定程度的近似）。由于它的这个特性， LPCM广泛用于需要高音质的应用中 ，如CD音频、DVD音频、蓝光音频和一些专业音频录制系统。

LPCM的主要优点包括简单、直接和高质量的音频表示， 但它也有一个缺点，即相对较高的数据率 。例如，未压缩的CD质量音频（使用44.1kHz的采样率和16位深度的立体声LPCM）的数据率约为1.4Mbps。相比之下， 许多现代音频压缩技术，如MP3或AAC ，通过去除人耳难以察觉的音频信息来大幅度减少所需的数据率，但这种压缩是有损的。

立体声和多声道 LPCM：

在立体声 LPCM 流中，左声道和右声道的采样值通常是 交错存储的 。例如，一个典型的存储序列可能是L1、 R1、L2、R2、…、Ln、Rn，其中L和R分别代表左声道和右声道的采样值，n是采样点的索引。
在多声道LPCM流中，各声道的采样值可以按不同方式组织。最常见的是交错方式，即按照采样时刻顺序依次存储各声道的采样值，比如L1、C1、R1、LS1、RS1、L2、C2、R2、LS2、RS2、…，其中L、C、R、LS、RS分别代表左前、中央、右前、左后和右后声道的采样值。

参考：https://planethifi.com/pcm-audio/

WAV（Waveform Audio File Format） 是一种音频文件格式，它通常用来 存储未压缩的音频数据 ，这些数据大多数情况下 使用Linear Pulse-Code Modulation (LPCM) 编码 。WAV格式由微软和IBM开发，最初是为Windows 3.1 操作系统设计的。由于其无损特性和广泛的兼容性，WAV格式成为了保存高质量音频的一种流行选择。

总结：WAV 文件和 LPCM 的关系：

**存储LPCM数据**：WAV文件格式经常用来存储LPCM编码的音频数据。这意味着WAV文件可以 保存按照LPCM方法采样、量化和编码的音频信号，保留原始音频的所有细节而不会丢失任何信息 。
**无损音频格式**：由于LPCM是一种无损编码方式， 因此使用LPCM编码的WAV文件也是无损的 。这使得WAV文件特别适合需要高质量音频，如专业音乐制作、音频编辑和音频分析的场合。
**高数据率**：LPCM编码的音频数据未经过压缩，所以WAV文件通常具有较高的数据率。例如，一段标准的CD质量音频（44.1kHz采样率、16位深度、立体声）的数据率大约为1.4Mbps。 这意味着WAV文件可以变得相当大 ，尤其是对于较长的录音。
**广泛的应用**：WAV格式由于其简单、无损和高质量的特性，在很多应用中被广泛使用，尤其是在需要原始音质的场合，如音乐制作、电影后期制作、广播和科学研究等。

总结： wav 文件不需要解码，可以直接读取 LPCM 编码数据 ，然后通过 I2S 接口发送给功放芯片播放。

// https://github.com/espressif/esp-box/blob/master/examples/watering_demo/main/app/app_audio.c

static void audio_beep_task(void *pvParam)
{
    while (true) {
        xSemaphoreTake(audio_sem, portMAX_DELAY);
        b_audio_playing = true;
        sr_echo_play("/spiffs/echo_en_wake.wav"); // 直接播放 wav 文件的音频数据
        b_audio_playing = false;

        /* It's useful if wake audio didn't finish playing when next wake word detetced */
        // xSemaphoreTake(audio_sem, 0);
    }
}

esp_err_t sr_echo_play(void *filepath)
{
    FILE *fp = NULL;
    struct stat file_stat;
    esp_err_t ret = ESP_OK;

    const size_t chunk_size = 4096;
    uint8_t *buffer = malloc(chunk_size);
    ESP_GOTO_ON_FALSE(NULL != buffer, ESP_FAIL, EXIT, TAG, "buffer malloc failed");

    ESP_GOTO_ON_FALSE(-1 != stat(filepath, &file_stat), ESP_FAIL, EXIT, TAG, "Failed to stat file");

    fp = fopen(filepath, "r");
    ESP_GOTO_ON_FALSE(NULL != fp, ESP_FAIL, EXIT, TAG, "Failed create record file");

    wav_header_t wav_head;
    int len = fread(&wav_head, 1, sizeof(wav_header_t), fp);
    ESP_GOTO_ON_FALSE(len > 0, ESP_FAIL, EXIT, TAG, "Read wav header failed");

    if (NULL == strstr((char *)wav_head.Subchunk1ID, "fmt") &&
            NULL == strstr((char *)wav_head.Subchunk2ID, "data")) {
        ESP_LOGI(TAG, "PCM format");
        fseek(fp, 0, SEEK_SET);
        wav_head.SampleRate = 16000;
        wav_head.NumChannels = 2;
        wav_head.BitsPerSample = 16;
    }

    ESP_LOGD(TAG, "frame_rate= %" PRIi32 ", ch=%d, width=%d", wav_head.SampleRate, wav_head.NumChannels, wav_head.BitsPerSample);
    bsp_codec_set_fs(wav_head.SampleRate, wav_head.BitsPerSample, I2S_SLOT_MODE_STEREO);

    bsp_codec_mute_set(true);
    bsp_codec_mute_set(false);
    bsp_codec_volume_set(100, NULL);

    size_t cnt, total_cnt = 0;
    do {
        /* Read file in chunks into the scratch buffer */
        len = fread(buffer, 1, chunk_size, fp);
        if (len <= 0) {
            break;
        } else if (len > 0) {
            bsp_i2s_write(buffer, len, &cnt, portMAX_DELAY);
            total_cnt += cnt;
        }
    } while (1);
    ESP_LOGI(TAG, "play end, %d K", total_cnt / 1024);

EXIT:
    if (fp) {
        fclose(fp);
    }
    if (buffer) {
        free(buffer);
    }
    return ret;
}

There are three major groups of audio file formats :

Uncompressed audio formats, such as WAV, AIFF, AU or raw header-less PCM; Note wav can also use compression as well.
Formats with lossless compression, such as FLAC, Monkey's Audio (filename extension .ape), WavPack (filename extension .wv), TTA, ATRAC Advanced Lossless, ALAC (filename extension .m4a， Apple Lossless), MPEG-4 SLS, MPEG-4 ALS, MPEG-4 DST, Windows Media Audio Lossless (WMA Lossless), and Shorten (SHN).
1. Formats with lossy compression, such as Opus, MP3, Vorbis, Musepack, AAC, ATRAC and Windows Media Audio Lossy (WMA lossy).

.m4a An audio-only MPEG-4 file, used by Apple for unprotected music downloaded from their iTunes Music Store. Audio within the m4a file is typically encoded with AAC, although lossless ALAC may also be used.

音频文件存储：

WAV（Waveform Audio File Format）：
**普遍支持**：WAV是最广泛支持的音频文件格式之一，由微软开发，原生支持LPCM音频流。
**无损质量**：WAV文件可以无损存储LPCM音频数据，保持原始音频质量。
**元数据支持**：WAV格式支持存储关于音频流的详细信息，如采样率、位深度、声道数等。
**文件大小**：由于WAV文件通常不使用压缩，文件大小可能会非常大，尤其是对于高采样率、高位深度、多声道音频。
AIFF（Audio Interchange File Format）
**类似WAV**：AIFF是苹果公司开发的一种音频文件格式，与WAV非常相似，提供无损音频质量和广泛的元数据支持。
**跨平台**：虽然AIFF最初是为Macintosh系统设计的，但现在它在多个平台上都得到支持。
**文件大小**：和WAV一样，AIFF文件也可能相当大，特别是当存储高质量的多声道LPCM音频时。
FLAC（Free Lossless Audio Codec）
**无损压缩**：FLAC提供无损压缩，能够在不损失音质的情况下减小文件大小，适用于LPCM音频数据。
**标签和元数据**：FLAC支持丰富的标签和元数据，方便音乐管理和播放器识别。
**广泛支持**：尽管主要用于立体声音频，FLAC格式也支持多达8个声道的音频，适用于多声道LPCM音频流的存储。
Multichannel WAV/RF64
**大型文件**：为了克服WAV文件对文件大小的限制（4GB），扩展格式如RF64被设计用来支持更大的文件，适合长时间的高质量多声道录音。
**广泛兼容性**：这些格式保持了与标准WAV格式的向后兼容性，同时扩展了其能力，以支持更大的数据量。

存储过程：存储多声道LPCM音频流通常涉及以下步骤：

**选择格式**：根据需要支持的声道数、对音质的要求以及对文件大小的考虑，选择合适的音频文件格式。
**准备音频数据**：将LPCM编码的音频数据按照选择的格式要求（如声道排列、采样率、位深度等）进行组织。
**写入文件**：将音频数据连同必要的元数据（如格式头信息）一起写入到文件中。
**验证**：确保写入的音频文件符合所选格式的规范，并且可以被目标播放器或编辑软件正确读取。

使用适当的音频编辑或编码软件，你可以轻松地将多声道LPCM音频流保存到这些格式的文件中，无论是通过图形用户界面操作还是通过编程方式。

I2S 接口和播放声音
#

一般来说，一个语音提示文件的 MP3 格式的大小约 5KB，而未压缩的 wav 格式的大小则为 60KB 左右。如果拿 2MB 的 FLASH 空间来存储 MP3 格式的语音提示文件，则其数量要远大于 WAV 格式。

wav 保存的是未压缩的 PCM 数据，可以直接通过 I2S 接口发送给数字音频芯片来播放。

而其他格式如 MP3， 需要通过软件或硬件解码为 PCM 格式 ，然后才能通过 I2S 数字音频接口发送给功放芯片。

使用I2C协议来配置WM8978模块
初始化ESP32的I2S通信接口
建立数据缓冲，大于4096字节
从FLASH读取一个扇区（4096字节）
转为解码所需的stream比特流形式（如开源的 mad MP3 解码库 ）
开始MP3解码
解码4096字节完成后，把 PCM 数据 通过I2S送入WM8978模块

综上：

使用 ESP32 播放 mp3 文件前，都需要解码，解码输出的格式为 PCM：
- 开源的 MAD (MPEG Audio Decoder) MP3 解码库：https://www.underbit.com/products/mad/
- ESP32 Box S3 的 esp-audio-player 使用的 libhelix-mp3 解码库：https://github.com/ultraembedded/libhelix-mp3/tree/master
- 开源的 ESP32-audioI2S：https://github.com/schreibfaul1/ESP32-audioI2S
  - 可以解码播放： mp3, m4a and wav files from SD card via I2S，HELIX-mp3 and -aac decoder is included. There is also an OPUS decoder for Fullband, n VORBIS decoder( .ogg 格式) and a FLAC decoder.
然后将解码后的 PCM 编码数据通过 I2S 接口发送给数字音频功放芯片（codec chip）；
功放芯片进行 DAC 转换，驱动扬声器；
对于支持 MIC 输入的 codec chip，drvier 也通过 I2S 接口来读取 ADC 后的音频 PCM 数据，然后进一步处理，如 直接保存为未编码的 wav 格式文件 ，或经过压缩后编码为其他格式，如 mp3、aac 等来存储到 TF 卡，或者再发送给 codec chip 来播放；

注：I2S 接口是数字音频信号的传输协议（不一定是物理接口），而 PCM 是数字音频的编码格式，可以经过 DAC 直接转换为模拟信号。

大一统的 ESP32-audioI2S 解码播放示例：https://github.com/schreibfaul1/ESP32-audioI2S

实际项目： https://github.com/Makerfabs/Project_MakePython_Audio_Music

// https://github.com/schreibfaul1/ESP32-audioI2S
#include "Arduino.h"
#include "WiFi.h"
#include "Audio.h"
#include "SD.h"
#include "FS.h"

// Digital I/O used
#define SD_CS          5
#define SPI_MOSI      23
#define SPI_MISO      19
#define SPI_SCK       18
#define I2S_DOUT      25
#define I2S_BCLK      27
#define I2S_LRC       26

Audio audio;

String ssid =     "*******";
String password = "*******";

void setup() {
    pinMode(SD_CS, OUTPUT);      digitalWrite(SD_CS, HIGH);
    SPI.begin(SPI_SCK, SPI_MISO, SPI_MOSI);
    Serial.begin(115200);
    SD.begin(SD_CS);
    WiFi.disconnect();
    WiFi.mode(WIFI_STA);
    WiFi.begin(ssid.c_str(), password.c_str());
    while (WiFi.status() != WL_CONNECTED) delay(1500);
    audio.setPinout(I2S_BCLK, I2S_LRC, I2S_DOUT);
    audio.setVolume(21); // default 0...21
//  or alternative
//  audio.setVolumeSteps(64); // max 255
//  audio.setVolume(63);
//
//  *** radio streams ***
    audio.connecttohost("http://stream.antennethueringen.de/live/aac-64/stream.antennethueringen.de/"); // aac
//  audio.connecttohost("http://mcrscast.mcr.iol.pt/cidadefm");                                         // mp3
//  audio.connecttohost("http://www.wdr.de/wdrlive/media/einslive.m3u");                                // m3u
//  audio.connecttohost("https://stream.srg-ssr.ch/rsp/aacp_48.asx");                                   // asx
//  audio.connecttohost("http://tuner.classical102.com/listen.pls");                                    // pls
//  audio.connecttohost("http://stream.radioparadise.com/flac");                                        // flac
//  audio.connecttohost("http://stream.sing-sing-bis.org:8000/singsingFlac");                           // flac (ogg)
//  audio.connecttohost("http://s1.knixx.fm:5347/dein_webradio_vbr.opus");                              // opus (ogg)
//  audio.connecttohost("http://stream2.dancewave.online:8080/dance.ogg");                              // vorbis (ogg)
//  audio.connecttohost("http://26373.live.streamtheworld.com:3690/XHQQ_FMAAC/HLSTS/playlist.m3u8");    // HLS
//  audio.connecttohost("http://eldoradolive02.akamaized.net/hls/live/2043453/eldorado/master.m3u8");   // HLS (ts)
//  *** web files ***
//  audio.connecttohost("https://github.com/schreibfaul1/ESP32-audioI2S/raw/master/additional_info/Testfiles/Pink-Panther.wav");        // wav
//  audio.connecttohost("https://github.com/schreibfaul1/ESP32-audioI2S/raw/master/additional_info/Testfiles/Santiano-Wellerman.flac"); // flac
//  audio.connecttohost("https://github.com/schreibfaul1/ESP32-audioI2S/raw/master/additional_info/Testfiles/Olsen-Banden.mp3");        // mp3
//  audio.connecttohost("https://github.com/schreibfaul1/ESP32-audioI2S/raw/master/additional_info/Testfiles/Miss-Marple.m4a");         // m4a (aac)
//  audio.connecttohost("https://github.com/schreibfaul1/ESP32-audioI2S/raw/master/additional_info/Testfiles/Collide.ogg");             // vorbis
//  audio.connecttohost("https://github.com/schreibfaul1/ESP32-audioI2S/raw/master/additional_info/Testfiles/sample.opus");             // opus
//  *** local files ***
//  audio.connecttoFS(SD, "/test.wav");     // SD
//  audio.connecttoFS(SD_MMC, "/test.wav"); // SD_MMC
//  audio.connecttoFS(SPIFFS, "/test.wav"); // SPIFFS

//  audio.connecttospeech("Wenn die Hunde schlafen, kann der Wolf gut Schafe stehlen.", "de"); // Google TTS
}

void loop()
{
    audio.loop();
}

// optional
void audio_info(const char *info){
    Serial.print("info        "); Serial.println(info);
}
void audio_id3data(const char *info){  //id3 metadata
    Serial.print("id3data     ");Serial.println(info);
}
void audio_eof_mp3(const char *info){  //end of file
    Serial.print("eof_mp3     ");Serial.println(info);
}
void audio_showstation(const char *info){
    Serial.print("station     ");Serial.println(info);
}
void audio_showstreamtitle(const char *info){
    Serial.print("streamtitle ");Serial.println(info);
}
void audio_bitrate(const char *info){
    Serial.print("bitrate     ");Serial.println(info);
}
void audio_commercial(const char *info){  //duration in sec
    Serial.print("commercial  ");Serial.println(info);
}
void audio_icyurl(const char *info){  //homepage
    Serial.print("icyurl      ");Serial.println(info);
}
void audio_lasthost(const char *info){  //stream URL played
    Serial.print("lasthost    ");Serial.println(info);
}
void audio_eof_speech(const char *info){
    Serial.print("eof_speech  ");Serial.println(info);
}

对于数字音频功放芯片，一般也称为 codec chip：

将 PCM 数字音频解码，然后 DAC 转换为模型信号输出；
将 MIC 收到的模拟声音信号经过 ADC 转换，然后编码为 PCM 数字比特流；
driver 都是通过 I2S 接口来发送和接受 PCM 数字信号；

Wm8960 is a low power, high quality stereo CODEC, that provides two interface types: voice input and output. The communication between ESP32 and WM8960 is I2S.

一般 I2S 接口的数字音频功放芯片 codec chip，除了可以播放 PCM 编码格式的数字音频信号外，还提供控制（静音、音量大小等）和 MIC 输入功能，如 ES8374

codec chip 的 MIC 将 ADC 转换为 PCM 编码数据，driver 可以通过 I2S 接口来读取这些数据，进行后续处理，如编码后保存到 TF 卡或者播放。

示例：https://github.com/espressif/esp-box/blob/master/examples/usb_headset/main/src/usb_headset.c

如果需要更好的音频质量和更多的接口选项，可使用外部 I2S 编解码器来完成所有模拟输入和输出信号的处理。不同类型的编解码器芯片可提供不同的额外功能，如音频输入信号前置放大器、耳机输出放大器、多个模拟输入和输出、音效处理等。I2S 是音频编解码器芯片接口的行业标准，通常用于高速、连续传输音频数据。为了优化音频数据处理的性能，可能需要额外的内存。对于这种情况，请考虑使用集成 8 MB PSRAM 和 ESP32 芯片的 ESP32-WROVER-E 模组。

https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/project-design.html

ESP32 提供了乐鑫音频开发框架（ADF），支持常见的编解码格式： https://docs.espressif.com/projects/esp-adf/en/latest/index.html

I (397) PLAY_FLASH_MP3_CONTROL: [ 1 ] Start audio codec chip
I (427) PLAY_FLASH_MP3_CONTROL: [ 2 ] Create audio pipeline, add all elements to pipeline, and subscribe pipeline event
I (427) PLAY_FLASH_MP3_CONTROL: [2.1] Create mp3 decoder to decode mp3 file and set custom read callback
I (437) PLAY_FLASH_MP3_CONTROL: [2.2] Create i2s stream to write data to codec chip
I (467) PLAY_FLASH_MP3_CONTROL: [2.3] Register all elements to audio pipeline
I (467) PLAY_FLASH_MP3_CONTROL: [2.4] Link it together [mp3_music_read_cb]-->mp3_decoder-->i2s_stream-->[codec_chip]
I (477) PLAY_FLASH_MP3_CONTROL: [ 3 ] Set up  event listener
I (477) PLAY_FLASH_MP3_CONTROL: [3.1] Listening event from all elements of pipeline
I (487) PLAY_FLASH_MP3_CONTROL: [ 4 ] Start audio_pipeline
I (507) PLAY_FLASH_MP3_CONTROL: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
I (7277) PLAY_FLASH_MP3_CONTROL: [ 5 ] Stop audio_pipeline

示例：https://github.com/espressif/esp-adf/tree/master/examples

记录声音
#

使用麦克风 Module 如 INMP441 module 来将声音转换为数字信号（PCM 编码后的数字流），然后 ESP32 driver 通过 I2S 接口来获取数字音频。

INMP441 module will be acting as a mic input for capturing mono 16-bit audio signals at rate 8000 samples per second.
一般数字音频功放芯片集成有 MIC，也是通过 I2S 接口来获取 PCM 数据，所以也称为 codec chip。

如果是模拟 MIC 则可以使用 ESP32 的 ADC 引脚转换为 LPCM，然后再保存到 wav 文件中。

通过 I2S 从 MIC 读取 PCM 数字音频后，以 wav 文件格式存入 SD 卡：

wav 文件：medatadata header + LPCM raw data；

// https://www.makerfabs.com/blog/post/how-to-make-an-esp32-sound-recorder

void WM8960_Record(String filename, char *buff, int record_time)
{
    int headerSize = 44;
    byte header[headerSize];
    int waveDataSize = record_time * 16000 * 16 * 2 / 8;
    int recode_time = millis();
    int part_time = recode_time;

    File file = SD.open(filename, FILE_WRITE);
    if (!file)
        return;

    Serial.println("Begin to record:");

    for (int j = 0; j < waveDataSize / sizeof(buff); ++j)
    {
        I2S_Read(buff, sizeof(buff));
        file.write((const byte *)buff, sizeof(buff));
        if ((millis() - part_time) > 1000)
        {
            Serial.print(".");
            part_time = millis();
        }
    }

    file.seek(0);
    CreateWavHeader(header, waveDataSize);
    file.write(header, headerSize);

    Serial.println("");
    Serial.println("Finish");
    Serial.println(millis() - recode_time);
    file.close();
}

播放 wav 文件：

// https://www.makerfabs.com/blog/post/how-to-make-an-esp32-sound-recorder

void WM8960_Play (String filename, char *buff)
{
    File file = SD.open(filename);
    if (! file)
        return;
    Serial.println("Begin to play:");
    Serial.println(filename);
    file.seek(44);  // 跳过 wav header
    while (file.readBytes(buff, sizeof(buff)))
    {
        I2S_Write(buff, sizeof(buff));
    }
    Serial.println("Finish");
    file.close();
}

另一个使用 I2S 从 MIC 读取数据，存入 wav 文件的例子： https://github.com/MhageGH/esp32_SoundRecorder/tree/master

#include "Arduino.h"
#include <FS.h>
#include "Wav.h"
#include "I2S.h"
#include <SD.h>


//comment the first line and uncomment the second if you use MAX9814
//#define I2S_MODE I2S_MODE_RX
#define I2S_MODE I2S_MODE_ADC_BUILT_IN

const int record_time = 10;  // second
const char filename[] = "/sound.wav";

const int headerSize = 44;
const int waveDataSize = record_time * 88000;
const int numCommunicationData = 8000;
const int numPartWavData = numCommunicationData/4;
byte header[headerSize];
char communicationData[numCommunicationData];
char partWavData[numPartWavData];
File file;

void setup() {
  Serial.begin(115200);
  if (!SD.begin()) Serial.println("SD begin failed");
  while(!SD.begin()){
    Serial.print(".");
    delay(500);
  }
  CreateWavHeader(header, waveDataSize);
  SD.remove(filename);
  file = SD.open(filename, FILE_WRITE);
  if (!file) return;
  file.write(header, headerSize);
  I2S_Init(I2S_MODE, I2S_BITS_PER_SAMPLE_32BIT);
  for (int j = 0; j < waveDataSize/numPartWavData; ++j) {
    I2S_Read(communicationData, numCommunicationData);
    for (int i = 0; i < numCommunicationData/8; ++i) {
      partWavData[2*i] = communicationData[8*i + 2];
      partWavData[2*i + 1] = communicationData[8*i + 3];
    }
    file.write((const byte*)partWavData, numPartWavData);
  }
  file.close();
  Serial.println("finish");
}

void loop() {
}


// wav 头文件
#include "Wav.h"

void CreateWavHeader(byte* header, int waveDataSize){
  header[0] = 'R';
  header[1] = 'I';
  header[2] = 'F';
  header[3] = 'F';
  unsigned int fileSizeMinus8 = waveDataSize + 44 - 8;
  header[4] = (byte)(fileSizeMinus8 & 0xFF);
  header[5] = (byte)((fileSizeMinus8 >> 8) & 0xFF);
  header[6] = (byte)((fileSizeMinus8 >> 16) & 0xFF);
  header[7] = (byte)((fileSizeMinus8 >> 24) & 0xFF);
  header[8] = 'W';
  header[9] = 'A';
  header[10] = 'V';
  header[11] = 'E';
  header[12] = 'f';
  header[13] = 'm';
  header[14] = 't';
  header[15] = ' ';
  header[16] = 0x10;  // linear PCM
  header[17] = 0x00;
  header[18] = 0x00;
  header[19] = 0x00;
  header[20] = 0x01;  // linear PCM
  header[21] = 0x00;
  header[22] = 0x01;  // monoral
  header[23] = 0x00;
  header[24] = 0x44;  // sampling rate 44100
  header[25] = 0xAC;
  header[26] = 0x00;
  header[27] = 0x00;
  header[28] = 0x88;  // Byte/sec = 44100x2x1 = 88200
  header[29] = 0x58;
  header[30] = 0x01;
  header[31] = 0x00;
  header[32] = 0x02;  // 16bit monoral
  header[33] = 0x00;
  header[34] = 0x10;  // 16bit
  header[35] = 0x00;
  header[36] = 'd';
  header[37] = 'a';
  header[38] = 't';
  header[39] = 'a';
  header[40] = (byte)(waveDataSize & 0xFF);
  header[41] = (byte)((waveDataSize >> 8) & 0xFF);
  header[42] = (byte)((waveDataSize >> 16) & 0xFF);
  header[43] = (byte)((waveDataSize >> 24) & 0xFF);
}

除了 I2S 接口的数字 MIC 外，常见的还有 模拟输出的 MIC ，这时可以使用 ESP32 的 ADC 引脚来进行模数转换 ，将结果以 LPCM 编码的 wav 文件保存：

// https://github.com/AlirezaSalehy/WAVRecorder/blob/main/library/library.ino
#include <SD.h>
#include <SPI.h>

#include "src/WAVRecorder.h"
#include "src/AudioSystem.h"
#include "src/SoundActivityDetector.h"

#define SAMPLE_RATE 16000
#define SAMPLE_LEN 8

// Hardware SPI's CS pin which is different in each board
#ifdef ESP8266
  #define CS_PIN 16
#elif ARDUINO_SAM_DUE
  #define CS_PIN 4
#elif ESP32
  #define CS_PIN 5
#endif

// The analog pins (ADC inputs) which microphone outputs are connected to.
#define MIC_PIN_1 34
#define MIC_PIN_2 35

#define NUM_CHANNELS 1
channel_t channels[] = {{MIC_PIN_1}};

char file_name[] = "/sample.wav";
File dataFile;

#if defined(ESP32) || defined(ESP8266)
  AudioSystem* as;
#endif
WAVRecorder* wr;
//SoundActivityDetector* sadet;

void recordAndPlayBack();

void setup() {
  for (int i = 0; i < sizeof(channels)/sizeof(channel_t); i++)
    pinMode(channels[i].ADCPin, INPUT);
  //analogReadResolution(12); for ESP32

  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(115200);
  Serial.println();

  // put your setup code here, to run once:
  if (!SD.begin(CS_PIN)) {
    Serial.println("Failes to initialize SD!");
  }
  else {
    Serial.println("SD opened successfuly");
  }
  SPI.setClockDivider(SPI_CLOCK_DIV2); // This is becuase feeding SD Card with more than 40 Mhz, leads to unstable operation.
                                       // (Also depends on SD class) ESP8266 & ESP32 SPI clock with no division is 80 Mhz.

  #if defined(ESP32) || defined(ESP8266)
     as = new AudioSystem(CS_PIN);
  #endif
  //sadet = new SoundActivityDetector(channels[0].ADCPin, 2000, 10 * 512, 6 * 512, &Serial);
  wr = new WAVRecorder(12, channels, NUM_CHANNELS, SAMPLE_RATE, SAMPLE_LEN, &Serial);

}

void loop() {
  // put your main code here, to run repeatedly:
  recordAndPlayBack();
}

void recordAndPlayBack() {
    if (SD.exists(file_name)) {
      SD.remove(file_name);
      Serial.println("File removed!");
    }

    dataFile = SD.open(file_name, FILE_WRITE);
    if (!dataFile) {
      Serial.println("Failed to open the file!");
      return;
    }

    // Setting file to store recodring
    wr->setFile(&dataFile);

    Serial.println("Started");
    // With checks Sound power level and it exceeds a threshold recording starts and stops recording when power fall behind another threshold.
    //wr->startBlocking(sadet);

    // Recording for 3000 ms
    wr->startBlocking(3000);
    Serial.println("File Created");

    Serial.println("Playing file");

    #if defined(ESP32) || defined(ESP8266)
        as->playAudioBlocking(file_name);
    #endif
}

另一个例子：Broadcasting Your Voice with ESP32-S3 & INMP441

The ESP32-S3’s I2S interface is set up to handle the audio data using Direct Memory Access (DMA) buffers. DMA allows for efficient data transfer without involving the main processor, offloading the task to a dedicated DMA controller. By configuring the DMA buffer in I2S, the captured audio samples can be stored and transmitted seamlessly.

https://github.com/0015/ThatProject/blob/master/ESP32_MICROPHONE/Broadcasting_Your_Voice/ESP32-S3_INMP441_WebSocket_Client/ESP32-S3_INMP441_WebSocket_Client.ino


void i2s_install() {
  // Set up I2S Processor configuration
  const i2s_config_t i2s_config = {
    .mode = i2s_mode_t(I2S_MODE_MASTER | I2S_MODE_RX),
    .sample_rate = 44100,
    //.sample_rate = 16000,
    .bits_per_sample = i2s_bits_per_sample_t(16),
    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
    .communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_STAND_I2S),
    .intr_alloc_flags = 0,
    .dma_buf_count = bufferCnt,
    .dma_buf_len = bufferLen,
    .use_apll = false
  };

  i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
}


void micTask(void* parameter) {

  i2s_install();
  i2s_setpin();
  i2s_start(I2S_PORT);

  size_t bytesIn = 0;
  while (1) {
    esp_err_t result = i2s_read(I2S_PORT, &sBuffer, bufferLen, &bytesIn, portMAX_DELAY);
    if (result == ESP_OK && isWebSocketConnected) {
      client.sendBinary((const char*)sBuffer, bytesIn);
    }
  }
}