20.3.16 : 새로운 Pose Estimation SOTA TCMR / 구글과 버클리의 Self-attention 기반 인식용 네트워크 BoTNet

AI/Hot issue

20.3.16 : 새로운 Pose Estimation SOTA TCMR / 구글과 버클리의 Self-attention 기반 인식용 네트워크 BoTNet

탈공대 2021. 3. 16. 20:17

www.youtube.com/watch?v=WB3nTnSQDII&ab_channel=%EC%B5%9C%ED%99%8D%EC%84%9D

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video (2020)
- CVPR2021에 발표예정인 논문으로 기존 3D Pose Estimation SOTA인 VIBE보다 안정적으로 자세 추정을 한다고 함
- 한국 최고의 컴퓨터비전 연구실인 서울대 CV Lab의 논문이고, 아직 코드 공개는 되지 않았음 (아마도 CVPR 발표 이후에 하겠지..?)

www.marktechpost.com/2021/03/14/researchers-from-google-research-and-uc-berkeley-introduce-botnet-a-simple-backbone-architecture-that-implements-self-attention-computer-vision-tasks/

Researchers from Google Research and UC Berkeley Introduce BoTNet: A Simple Backbone Architecture that Implements Self-Attention

The team introduces a new simple yet efficient deep learning technique that incorporates self-attention for multiple computer vision tasks.

www.marktechpost.com

Researchers from Google Research and UC Berkeley Introduce BoTNet: A Simple Backbone Architecture that Implements Self-Attention Computer Vision Tasks
- Self-attention을 기반으로 하여 Vision task를 수행하는 논문들이 많이 나오고 있는데, 해당 논문은 구글리서치와 버클리에서 공동 연구를 통해 나온 논문으로, CNN과 Attention의 결합으로 구성된 네트워크라고 한다
- Self-attention 기반의 모델들을 트리 형태로 정리한 그림이 인상깊다

- 구조는 ResNet의 Bottleneck구조와 유사한데, 가운데 3x3 conv filter 대신에 MHSA(Multi-head Self-Attention)을 넣어서 마지막 low-resolution stage에만 적용한 것이 눈에 띈다

- 실행시간 대비 정확도 성능으로 EfficientNet보다 더 좋다고 한다
- 논문의 originality를 MHSA를 stage 하나에 쓴거 이상으로는 모르겠다