skip to main content
10.1145/3512527.3531362acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Temporal-Consistent Visual Clue Attentive Network for Video-Based Person Re-Identification

Published: 27 June 2022 Publication History

Abstract

Video-based person re-identification (ReID) aims to match video trajectories of pedestrians across multi-view cameras and has important applications in criminal investigation and intelligent surveillance. Compared with single image re-identification, the abundant temporal information contained in video sequences makes it describe pedestrian instances more precisely and effectively. Recently, most existing video-based person ReID algorithms have made use of temporal information by fusing diverse visual contents captured in independent frames. However, these algorithms only measure the salience of visual clues in each single frame, inevitably introducing momentary interference caused by factors like occlusion. Therefore, in this work, we introduce a Temporal-consistent Visual Clue Attentive Network (TVCAN), which is designed to capture temporal-consistently salient pedestrian contents among frames. Our TVCAN consists of two major modules, the TCSA module, and the TCCA module, which are responsible for capturing and emphasizing consistently salient visual contents from the spatial dimension and channel dimension, respectively. Through extensive experiments, the effectiveness of our designed modules has been verified. Additionally, our TVCAN outperforms all compared state-of-the-art methods on three mainstream benchmarks.

Supplementary Material

MP4 File (ICMR22-fp53.mp4)
Video-based person re-identification (ReID) aims to match video trajectories of pedestrians across multi-view cameras and has important applications in criminal investigation and intelligent surveillance. Compared with single image re-identification, the abundant temporal information contained in video sequences makes it describe pedestrian instances more precisely and effectively. Recently, most existing video-based person ReID algorithms have made use of temporal information by fusing diverse visual contents captured in independent frames. However, these algorithms only measure the salience of visual clues in each single frame, inevitably introducing momentary interference caused by factors like occlusion. Therefore, in this work, we introduce a Temporal-consistent Visual Clue Attentive Network (TVCAN), which is designed to capture temporal-consistently salient pedestrian contents among frames. Our TVCAN consists of two major modules, the TCSA module and the TCCA module, which are responsible for capturing and emphasizing consistently salient visual contents from the spatial dimension and channel dimension, respectively. Through extensive experiments, the effectiveness of our designed modules has been verified. Additionally, our TVCAN outperforms all compared state-of-the-art methods on three mainstream benchmark.

References

[1]
Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang. 2018. Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[2]
Guangyi Chen, Chunze Lin, Liangliang Ren, Jiwen Lu, and Jie Zhou. 2019. Self-critical attention learning for person re-identification. In Proc. IEEE Int. Conf. Comp. Vis.
[3]
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[4]
Ying-Cong Chen, Wei-Shi Zheng, and Jianhuang Lai. 2015. Mirror Representation for Modeling View-Specific Transform in Person Re-Identification. In Proceedings of the 24th International Conference on Artificial Intelligence. 3402--3408.
[5]
Zengqun Chen, Zhiheng Zhou, Junchu Huang, Pengyu Zhang, and Bo Li. 2020. Frame-guided region-aligned representation for video person re-identification. In Proc. Conf. AAAI.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[7]
Chanho Eom, Geon Lee, Junghyup Lee, and Bumsub Ham. 2021. Video-based Person Re-identification with Spatial and Temporal Memory Networks. In Proc. IEEE Int. Conf. Comp. Vis.
[8]
Yang Fu, Xiaoyang Wang, Yunchao Wei, and Thomas Huang. 2019. STA: Spatial-temporal attention for large-scale video-based person re-identification. In Proc. Conf. AAAI, Vol. 33. 8287--8294.
[9]
Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, and Hongsheng Li. 2018. FD-GAN: Pose-guided feature distilling GAN for robust person re-identification. arXiv:1810.02936 (2018).
[10]
K. He, X. Ren, and et al. 2016. Deep residual learning for image recognition. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[11]
A. Hermans, L. Beyer, and B. Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017).
[12]
Martin Hirzer, Peter Roth, Csaba Beleznai, and Horst Bischof. 2011. Person Re-Identification by Descriptive and Discriminative Classification. In Proceedings of the Scandinavian Conference on Image Analysis (SCIA) .
[13]
Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2020. Temporal complementary learning for video person re-identification. In Proc. Eur. Conf. Comp. Vis.
[14]
Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, and Xilin Chen. 2019. Vrstc: Occlusion-free video person re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 7183--7192.
[15]
Xiaoqiang Hu, Dan Wei, Ziyang Wang, Jianglin Shen, and Hongjuan Ren. 2021. Hypergraph video pedestrian re-identification based on posture structure relationship and action constraints. Pattern Recognition (2021).
[16]
Bingliang Jiao, Xin Tan, Lu Yang, Yunlong Wang, and Peng Wang. 2021. Instance and Pair-Aware Dynamic Networks for Re-Identification. arXiv:2103.05395 (2021).
[17]
D. Li, X. Chen, Z. Zhang, and K. Huang. 2017. Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[18]
Jianing Li, Shiliang Zhang, and Tiejun Huang. 2019. Multi-scale 3D convolution network for video based person re-identification. In Proc. Conf. AAAI, Vol. 33. 8618--8625.
[19]
Shuang Li, Slawomir Bak, Peter Carr, and Xiaogang Wang. 2018. Diversity Regularized Spatial temporal Attention for Video-based Person Re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[20]
Shuzhao Li, Huimin Yu, and Haoji Hu. 2020 a. Appearance and motion enhancement for video-based person re-identification. In Proc. Conf. AAAI.
[21]
Xingze Li, Wengang Zhou, Yun Zhou, and Houqiang Li. 2020 b. Relation-guided spatial attention and temporal refinement for video-based person re-identification. In Proc. Conf. AAAI.
[22]
Zhang Li, Tao Xiang, and Shaogang Gong. 2016. Learning a Discriminative Null Space for Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 1239--1248.
[23]
Jinxian Liu, Bingbing Ni, Yichao Yan, Peng Zhou, Shuo Cheng, and Jianguo Hu. 2018. Pose transferrable person re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 4099--4108.
[24]
Martinel, Niki, Abir Das, Christian Micheloni, and Amit K. Roy-Chowdhury. 2016. Temporal Model Adaptation for Person Re-Identification. In Proc. Eur. Conf. Comp. Vis. 858--877.
[25]
Niall McLaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[26]
M Saquib Sarfraz, Arne Schumann, Andreas Eberle, and Rainer Stiefelhagen. 2018. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 420--429.
[27]
Arulkumar Subramaniam, Athira Nambiar, and Anurag Mittal. 2019. Co-segmentation inspired attention networks for video-based person re-identification. In Proc. IEEE Int. Conf. Comp. Vis.
[28]
Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, and Kyoung Mu Lee. 2018. Part-aligned bilinear representations for person re-identification. In Proc. Eur. Conf. Comp. Vis. 402--419.
[29]
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proc. Eur. Conf. Comp. Vis. 480--496.
[30]
Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. 2018. Learning discriminative features with multiple granularities for person re-identification. In Proceedings of the 26th ACM international conference on Multimedia. 274--282.
[31]
Kan Wang, Pengfei Wang, Changxing Ding, and Dacheng Tao. 2021 b. Batch Coherence-Driven Network for Part-Aware Person Re-Identification. IEEE Transactions on Image Processing, Vol. 30 (2021), 3405--3418.
[32]
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. [n.d.]. Person Re-identification by Video Ranking. In Proc. Eur. Conf. Comp. Vis.
[33]
Zhikang Wang, Lihuo He, Xiaoguang Tu, Jian Zhao, Xinbo Gao, Shengmei Shen, and Jiashi Feng. 2021 a. Robust Video-based Person Re-Identification by Hierarchical Mining. IEEE Trans. Circuits Syst. Video Technol. (2021).
[34]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In Proc. Eur. Conf. Comp. Vis. 3--19.
[35]
Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. arXiv preprint arXiv:1606.01609 (2016).
[36]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proc. IEEE Int. Conf. Comp. Vis.
[37]
Jinrui Yang, Wei-Shi Zheng, Qize Yang, Ying-Cong Chen, and Qi Tian. 2020. Spatial-temporal graph convolutional network for video-based person re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[38]
Xi Yang, Liangchen Liu, Nannan Wang, and Xinbo Gao. 2021. A Two-Stream Dynamic Pyramid Representation Model for Video-Based Person Re-Identification. IEEE Trans. Image Process., Vol. 30 (2021), 6266--6276.
[39]
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C.H. Hoi. 2021. Deep Learning for Person Re-identification: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1--1.
[40]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proc. Eur. Conf. Comp. Vis.
[41]
Le Zhang, Zenglin Shi, Joey Tianyi Zhou, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Zeng Zeng, and Chunhua Shen. 2020. Ordered or orderless: A revisit for video based person re-identification. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 4 (2020), 1460--1466.
[42]
Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, and Xian-sheng Hua. 2019. Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
[43]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A Video Benchmark for Large-Scale Person Re-Identification. In Proc. Eur. Conf. Comp. Vis.
[44]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proc. IEEE Int. Conf. Comp. Vis. 1116--1124.
[45]
Qin Zhou, Heng Fan, Shibao Zheng, Hang Su, Xinzhe Li, Shuang Wu, and Haibin Ling. 2018. Graph correspondence transfer for person re-identification. In Proc. Conf. AAAI, Vol. 32.
[46]
Z. Zhou, Y. Huang, W. Wang, L. Wang, and T. Tan. 2017. See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 6776--6785.

Cited By

View all

Index Terms

  1. Temporal-Consistent Visual Clue Attentive Network for Video-Based Person Re-Identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval
    June 2022
    714 pages
    ISBN:9781450392389
    DOI:10.1145/3512527
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention
    2. temporal-consistent information
    3. video-based person re-identification

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICMR '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 108
      Total Downloads
    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media