第1C部分：单精度浮点FIR滤波器，引入向量操作

与第1B部分类似，第1C部分使用单精度浮点运算来实现FIR滤波器。然而，与直接使用普通的C循环实现内积不同，第1C部分调用了lib_xcore_math提供的库函数vect_f32_dot()。

vect_f32_dot()函数是手动优化的XS3汇编代码，旨在尽可能快地执行。我们期望在这里看到另一个显著的性能提升。

这种加速的主要原因之一是，C编译器默认情况下不会生成C函数的双发（dual-issue）实现。双发代码通常允许内部循环大大缩短，但通常以程序大小为代价。

使用 `lib_xcore_math` 加速

本阶段使用了lib_xcore_math中的以下操作：

vect_f32_dot()

`vect_f32_dot()`

C_API
float vect_f32_dot(
    const float b[],
    const float c[],
    const unsigned length);

vect_f32_dot()函数接受两个float向量 b[] 和 c[]（每个向量都有length个元素），将它们逐元素相乘，然后将乘积求和。其结果应与第1B部分中的filter_sample()的C实现相同，只是速度更快。

实现

第1C部分与第1B部分在filter_task()、rx_frame()和tx_frame()的实现上完全相同。唯一的变化是filter_sample()的实现。

src/part1C/part1C.c
//Apply the filter to produce a single output sample
float filter_sample(
    const float sample_history[TAP_COUNT])
{
  // Return the inner product of sample_history[] and filter_coef[]
  return vect_f32_dot(&sample_history[0], 
                      &filter_coef[0], 
                      TAP_COUNT);
}

与第1B部分通过循环遍历滤波器系数来实现filter_sample()不同，第1C部分直接调用了lib_xcore_math中的vect_f32_dot()函数。

结果

运行时间

时间类型	测量时间
每个滤波器系数	21.13 ns
每个输出样本	21641.67 ns
每个帧	5608909.00 ns

输出波形

第1C部分的输出

使用 lib_xcore_math 加速​

vect_f32_dot()​

实现​

结果​

运行时间​

输出波形​

使用 `lib_xcore_math` 加速

`vect_f32_dot()`

实现

结果

运行时间

输出波形