DepthFormer: A High-Resolution Depth-Wise Transformer for Animal Pose Estimation

Liu, Sicong; Fan, Qingcheng; Liu, Shanghao; Zhao, Chunjiang

DepthFormer: A High-Resolution Depth-Wise Transformer for Animal Pose Estimation

Sicong Liu, Qingcheng Fan, Shanghao Liu and Chunjiang Zhao ()
Additional contact information
Sicong Liu: College of Information Engineering, Northwest A&F University, Yangling, Xianyang 712100, China
Qingcheng Fan: College of Information Engineering, Northwest A&F University, Yangling, Xianyang 712100, China
Shanghao Liu: College of Information Engineering, Northwest A&F University, Yangling, Xianyang 712100, China
Chunjiang Zhao: College of Information Engineering, Northwest A&F University, Yangling, Xianyang 712100, China

Agriculture, 2022, vol. 12, issue 8, 1-13

Abstract: Animal pose estimation has important value in both theoretical research and practical applications, such as zoology and wildlife conservation. A simple but effective high-resolution Transformer model for animal pose estimation called DepthFormer is provided in this study to address the issue of large-scale models for multi-animal pose estimation being problematic with limited computing resources. We make good use of a multi-branch parallel design that can maintain high-resolution representations throughout the process. Along with two similarities, i.e., sparse connectivity and weight sharing between self-attention and depthwise convolution, we utilize the delicate structure of the Transformer and representative batch normalization to design a new basic block for reducing the number of parameters and the amount of computation required. In addition, four PoolFormer blocks are introduced after the parallel network to maintain good performance. Benchmark evaluation is performed on a public database named AP-10K, which contains 23 animal families and 54 species, and the results are compared with the other six state-of-the-art pose estimation networks. The results demonstrate that the performance of DepthFormer surpasses that of other popular lightweight networks (e.g., Lite-HRNet and HRFormer-Tiny) when performing this task. This work can provide effective technical support to accurately estimate animal poses with limited computing resources.

Keywords: animal pose estimation; depthformer; multi-resolution representations; depthwise convolution (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2077-0472/12/8/1280/pdf (application/pdf)
https://www.mdpi.com/2077-0472/12/8/1280/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:12:y:2022:i:8:p:1280-:d:894566

Access Statistics for this article

Agriculture is currently edited by Ms. Leda Xuan

More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().