深蘭科技|目標檢測二十年間的那些事兒
(3) Fast RCNN
2015年,R. Girshick提出了Fast RCNN檢測器[19],這是對R-CNN和SPPNet的進一步改進。Fast RCNN使我們能夠在相同的網(wǎng)絡配置下同時訓練檢測器和邊界框回歸器。在VOC07數(shù)據(jù)集上,F(xiàn)ast RCNN將mAP從58.5%( RCNN)提高到70.0%,檢測速度是R-CNN的200多倍。
雖然Fast-RCNN成功地融合了R-CNN和SPPNet的優(yōu)點,但其檢測速度仍然受到提案檢測的限制。然后,一個問題自然而然地出現(xiàn)了:“ 我們能用CNN模型生成對象提案嗎? ” 之后的Faster R-CNN解決了這個問題。
(4) Faster RCNN
2015年,S. Ren等人提出了Faster RCNN檢測器[20],在Fast RCNN之后不久。Faster RCNN 是第一個端到端的,也是第一個接近實時的深度學習檢測器(COCO mAP@.5=42.7%,COCO mAP@[.5,.95]=21.9%, VOC07 mAP=73.2%,VOC12 mAP=70.4%)。Faster RCNN的主要貢獻是引入了區(qū)域提案網(wǎng)絡 (RPN)從而允許幾乎所有的cost-free的區(qū)域提案。從RCNN到Faster RCNN,一個目標檢測系統(tǒng)中的大部分獨立塊,如提案檢測、特征提取、邊界框回歸等,都已經(jīng)逐漸集成到一個統(tǒng)一的端到端學習框架中。
雖然Faster RCNN突破了Fast RCNN的速度瓶頸,但是在后續(xù)的檢測階段仍然存在計算冗余。后來提出了多種改進方案,包括RFCN和 Light head RCNN。
(5) Feature Pyramid Networks(FPN)
2017年,T.-Y.Lin等人基于Faster RCNN提出了特征金字塔網(wǎng)絡(FPN)[21]。在FPN之前,大多數(shù)基于深度學習的檢測器只在網(wǎng)絡的頂層進行檢測。雖然CNN較深層的特征有利于分類識別,但不利于對象的定位。為此,開發(fā)了具有橫向連接的自頂向下體系結構,用于在所有級別構建高級語義。由于CNN通過它的正向傳播,自然形成了一個特征金字塔,F(xiàn)PN在檢測各種尺度的目標方面顯示出了巨大的進步。在基礎的Faster RCNN系統(tǒng)中使用FPN骨架可在無任何修飾的條件下在MS-COCO數(shù)據(jù)集上以單模型實現(xiàn)state-of-the-art 的效果(COCO mAP@.5=59.1%,COCO mAP@[.5,.95]= 36.2%)。FPN現(xiàn)在已經(jīng)成為許多最新探測器的基本組成部分。
基于卷積神經(jīng)網(wǎng)絡的單級檢測器
單階段檢測的發(fā)展及各類檢測器的結構[2]
(1) You Only Look Once (YOLO)
YOLO由R. Joseph等人于2015年提出[22]。它是深度學習時代的第一個單級檢測器。YOLO非?欤篩OLO的一個快速版本運行速度為155fps, VOC07 mAP=52.7%,而它的增強版本運行速度為45fps, VOC07 mAP=63.4%, VOC12 mAP=57.9%。YOLO是“ You Only Look Once ” 的縮寫。從它的名字可以看出,作者完全拋棄了之前的“提案檢測+驗證”的檢測范式。相反,它遵循一個完全不同的設計思路:將單個神經(jīng)網(wǎng)絡應用于整個圖像。該網(wǎng)絡將圖像分割成多個區(qū)域,同時預測每個區(qū)域的邊界框和概率。后來R. Joseph在 YOLO 的基礎上進行了一系列改進,其中包括以路徑聚合網(wǎng)絡(Path aggregation Network, PAN)取代FPN,定義新的損失函數(shù)等,陸續(xù)提出了其 v2、v3及v4版本(截止本文的2020年7月,Ultralytics發(fā)布了“YOLO v5”,但并沒有得到官方承認),在保持高檢測速度的同時進一步提高了檢測精度。
必須指出的是,盡管與雙級探測器相比YOLO的探測速度有了很大的提高,但它的定位精度有所下降,特別是對于一些小目標而言。YOLO的后續(xù)版本及在它之后提出的SSD更關注這個問題。
(2) Single Shot MultiBox Detector (SSD)
SSD由W. Liu等人于2015年提出[23]。這是深度學習時代的第二款單級探測器。SSD的主要貢獻是引入了多參考和多分辨率檢測技術,這大大提高了單級檢測器的檢測精度,特別是對于一些小目標。SSD在檢測速度和準確度上都有優(yōu)勢(VOC07 mAP=76.8%,VOC12 mAP=74.9%, COCO mAP@.5=46.5%,mAP@[.5,.95]=26.8%,快速版本運行速度為59fps) 。SSD與其他的檢測器的主要區(qū)別在于,前者在網(wǎng)絡的不同層檢測不同尺度的對象,而后者僅在其頂層運行檢測。
(3) RetinaNet
單級檢測器有速度快、結構簡單的優(yōu)點,但在精度上多年來一直落后于雙級檢測器。T.-Y.Lin等人發(fā)現(xiàn)了背后的原因,并在2017年提出了RetinaNet[24]。他們的觀點為精度不高的原因是在密集探測器訓練過程中極端的前景-背景階層不平衡(the extreme foreground-background class imbalance)現(xiàn)象。為此,他們在RetinaNet中引入了一個新的損失函數(shù) “ 焦點損失(focal loss)”,通過對標準交叉熵損失的重構,使檢測器在訓練過程中更加關注難分類的樣本。焦損耗使得單級檢測器在保持很高的檢測速度的同時,可以達到與雙級檢測器相當?shù)木。(COCO mAP@.5=59.1%,mAP@[.5, .95]=39.1% )。
參考文獻:
[1]Zhengxia Zou, Zhenwei Shi, Member, IEEE, Yuhong Guo, and Jieping Ye, Object Detection in 20 Years: A Survey Senior Member, IEEE
[2]Xiongwei Wu, Doyen Sahoo, Steven C.H. Hoi, Recent Advances in Deep Learning for Object Detection, arXiv:1908.03673v1
[3]K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: CVPR, 2016.
[4]R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: CVPR, 2014.
[5]K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, in: ICCV, 2017.
[6]L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, in: arXiv preprint arXiv:1412.7062, 2014.
[7]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
[8]P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.
[9]P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.
[10]C. Papageorgiou and T. Poggio, “A trainable system for object detection,” International journal of computer vision, vol. 38, no. 1, pp. 15–33, 2000.
[11]N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 886–893.
[12]P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
[13]P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, “Cascade object detection with deformable part models,” in Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, 2010, pp. 2241–2248.
[14]P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627– 1645, 2010.
[15]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
[16]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Regionbased convolutional networks for accurate object detection and segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 1, pp. 142– 158, 2016.
[17]K. E. Van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, “Segmentation as selective search for object recognition,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 1879–1886.
[18]K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visualrecognition,” in European conference on computer vision. Springer, 2014, pp. 346–361.
[19]R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
[20]S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
[21]T.-Y. Lin, P. Dollar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.” in CVPR, vol. 1, no. 2, 2017, p. 4.
[22]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[23]W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.
[24]T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” IEEE transactions on pattern analysis and machine intelligence, 2018.

請輸入評論內(nèi)容...
請輸入評論/評論長度6~500個字
最新活動更多
推薦專題
- 1 UALink規(guī)范發(fā)布:挑戰(zhàn)英偉達AI統(tǒng)治的開始
- 2 北電數(shù)智主辦酒仙橋論壇,探索AI產(chǎn)業(yè)發(fā)展新路徑
- 3 “AI寒武紀”爆發(fā)至今,五類新物種登上歷史舞臺
- 4 降薪、加班、裁員三重暴擊,“AI四小龍”已折戟兩家
- 5 國產(chǎn)智駕迎戰(zhàn)特斯拉FSD,AI含量差幾何?
- 6 光計算迎來商業(yè)化突破,但落地仍需時間
- 7 東陽光:2024年扭虧、一季度凈利大增,液冷疊加具身智能打開成長空間
- 8 地平線自動駕駛方案解讀
- 9 封殺AI“照騙”,“淘寶們”終于不忍了?
- 10 優(yōu)必選:營收大增主靠小件,虧損繼續(xù)又逢關稅,能否乘機器人東風翻身?