add midtern report and change data source
This commit is contained in:
76
docs/reports/performance_data.md
Normal file
76
docs/reports/performance_data.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# 性能测试数据表格
|
||||
|
||||
## GPU性能测试结果 (NVIDIA A100, 2048×2048输入)
|
||||
|
||||
| 排名 | 骨干网络 | 注意力机制 | 单尺度推理(ms) | FPN推理(ms) | FPS | FPN开销 |
|
||||
|------|----------|------------|----------------|-------------|-----|---------|
|
||||
| 1 | ResNet34 | None | 18.10 ± 0.07 | 21.41 ± 0.07 | 55.3 | +18.3% |
|
||||
| 2 | ResNet34 | SE | 18.14 ± 0.05 | 21.53 ± 0.06 | 55.1 | +18.7% |
|
||||
| 3 | ResNet34 | CBAM | 18.23 ± 0.05 | 21.50 ± 0.07 | 54.9 | +17.9% |
|
||||
| 4 | EfficientNet-B0 | None | 21.40 ± 0.13 | 33.48 ± 0.42 | 46.7 | +56.5% |
|
||||
| 5 | EfficientNet-B0 | CBAM | 21.55 ± 0.05 | 33.33 ± 0.38 | 46.4 | +54.7% |
|
||||
| 6 | EfficientNet-B0 | SE | 21.67 ± 0.30 | 33.52 ± 0.33 | 46.1 | +54.6% |
|
||||
| 7 | VGG16 | None | 49.27 ± 0.23 | 102.08 ± 0.42 | 20.3 | +107.1% |
|
||||
| 8 | VGG16 | SE | 49.53 ± 0.14 | 101.71 ± 1.10 | 20.2 | +105.3% |
|
||||
| 9 | VGG16 | CBAM | 50.36 ± 0.42 | 102.47 ± 1.52 | 19.9 | +103.5% |
|
||||
|
||||
## CPU性能测试结果 (Intel Xeon 8558P, 2048×2048输入)
|
||||
|
||||
| 排名 | 骨干网络 | 注意力机制 | 单尺度推理(ms) | FPN推理(ms) | GPU加速比 |
|
||||
|------|----------|------------|----------------|-------------|-----------|
|
||||
| 1 | ResNet34 | None | 171.73 ± 39.34 | 169.73 ± 0.69 | 9.5× |
|
||||
| 2 | ResNet34 | CBAM | 406.07 ± 60.81 | 169.00 ± 4.38 | 22.3× |
|
||||
| 3 | ResNet34 | SE | 419.52 ± 94.59 | 209.50 ± 48.35 | 23.1× |
|
||||
| 4 | VGG16 | None | 514.94 ± 45.35 | 1038.59 ± 47.45 | 10.4× |
|
||||
| 5 | VGG16 | SE | 808.86 ± 47.21 | 1024.12 ± 53.97 | 16.3× |
|
||||
| 6 | VGG16 | CBAM | 809.15 ± 67.97 | 1025.60 ± 38.07 | 16.1× |
|
||||
| 7 | EfficientNet-B0 | SE | 1815.73 ± 99.77 | 1745.19 ± 47.73 | 83.8× |
|
||||
| 8 | EfficientNet-B0 | None | 1820.03 ± 101.29 | 1795.31 ± 148.91 | 85.1× |
|
||||
| 9 | EfficientNet-B0 | CBAM | 1954.59 ± 91.84 | 1793.15 ± 99.44 | 90.7× |
|
||||
|
||||
## 关键性能指标汇总
|
||||
|
||||
### 最佳配置推荐
|
||||
|
||||
| 应用场景 | 推荐配置 | 推理时间 | FPS | 内存占用 |
|
||||
|----------|----------|----------|-----|----------|
|
||||
| 实时处理 | ResNet34 + None | 18.1ms | 55.3 | ~2GB |
|
||||
| 高精度匹配 | ResNet34 + SE | 18.1ms | 55.1 | ~2.1GB |
|
||||
| 多尺度搜索 | 任意配置 + FPN | 21.4-102.5ms | 9.8-46.7 | ~2.5GB |
|
||||
| 资源受限 | ResNet34 + None | 18.1ms | 55.3 | ~2GB |
|
||||
|
||||
### 骨干网络对比分析
|
||||
|
||||
| 骨干网络 | 平均推理时间 | 平均FPS | 特点 |
|
||||
|----------|--------------|---------|------|
|
||||
| **ResNet34** | **18.16ms** | **55.1** | 速度最快,性能稳定 |
|
||||
| EfficientNet-B0 | 21.54ms | 46.4 | 平衡性能,效率较高 |
|
||||
| VGG16 | 49.72ms | 20.1 | 精度高,但速度慢 |
|
||||
|
||||
### 注意力机制影响
|
||||
|
||||
| 注意力机制 | 性能影响 | 推荐场景 |
|
||||
|------------|----------|----------|
|
||||
| None | 基准 | 实时应用,资源受限 |
|
||||
| SE | +0.5% | 高精度要求 |
|
||||
| CBAM | +2.2% | 复杂场景,可接受轻微性能损失 |
|
||||
|
||||
## 测试环境说明
|
||||
|
||||
- **GPU**: NVIDIA A100 (40GB HBM2)
|
||||
- **CPU**: Intel Xeon 8558P (32 cores)
|
||||
- **内存**: 512GB DDR4
|
||||
- **软件**: PyTorch 2.0+, CUDA 12.0
|
||||
- **输入尺寸**: 2048×2048像素
|
||||
- **测试次数**: 每个配置运行5次取平均值
|
||||
|
||||
## 性能优化建议
|
||||
|
||||
1. **实时应用**: 使用ResNet34 + 无注意力机制
|
||||
2. **批量处理**: 可同时处理2-4个并发请求
|
||||
3. **内存优化**: 使用梯度检查点和混合精度
|
||||
4. **部署建议**: A100 GPU可支持8-16并发推理
|
||||
|
||||
---
|
||||
|
||||
*注:以上数据基于未训练模型的前向推理测试,训练后性能可能有所变化。*
|
||||
Reference in New Issue
Block a user