Latest Research by Professor Zhang Fan’s Team Selected for Oral Presentation at CVPR

Editor:College of Information Science and Technology Time:2024-07-19

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 was held in Seattle, USA, from June 17-21. At this prestigious conference, the paper titled "LTGC: Long-Tail Recognition via Leveraging Generated Content" by Professor Fan Zhang’s group from the School of Information Science and Technology was selected for an oral presentation.

 Figure 1. The Google Scholar Impact Ranking of CVPR

This year, CVPR received 11,532 submissions from around the world, of which 2,719 papers were accepted, and only 90 were selected for oral presentations, resulting in an acceptance rate of 0.78%. This is our university's first oral presentation at CVPR. CVPR 2024 program chair Zeynep Akata congratulated the team, saying, “Only 90 papers out of about 11,500 submissions were selected as orals, so this is a rare honor.”

 Figure 2. Oral Presentation at the Conference

 Figure 3. Online Version of the Paper

Research Achievements

Image recognition in real-world scenarios is a hot topic in the field of computer vision. It faces the challenge of learning highly discriminative representations from scarce data with long-tail distributions and the problem of imbalanced learning due to the varying number of samples across numerous categories. This paper proposes a novel generative fine-tuning framework called LTGC, which leverages generated content to address the issue of long-tail image recognition in real-world scenarios. Inspired by the rich knowledge inherent in large models, LTGC utilizes the collaborative capabilities of multiple large models to parse and reason over the original data, generating diverse and scarce image content distinct from the original data. Subsequently, the method designs several novel modules to ensure the quality of the generated image data and proposes an effective synthetic and real data fine-tuning framework for model training. This method aims to improve the recognition rate and generalization ability of vision models on long-tail data in real-world scenarios and has been validated for effectiveness and advancement through comparative experiments with current vision models.

Figure 4. The Framework of LTGC

Conclusions

Unlike existing methods that mainly focus on training strategies and proprietary data features, this research work is the first to introduce general language and vision modality models to solve the long-tail training problem. By leveraging the powerful representation capabilities and rich knowledge accumulation of large models, the method effectively expands the training samples, alleviating the negative impact of long-tail distributions and ultimately achieving good results. This work provides a new perspective for solving the long-tail problem and opens new research avenues for related directions. It has received high recognition from reviewers and field experts.

Paper link: https://ltgccode.github.io

Author Information

The first author of the paper is Qihao Zhao, a Ph.D. student from the School of Information Science and Technology, class of 2021. The co-first and second authors are Yalun Dai and Hao Li, undergraduates from the class of 2018, also from the School of Information Science and Technology. The research was supervised by Professor Fan Zhang and Associate Professor Wei Hu, in collaboration with Professor Jun Liu from the Singapore University of Technology and Design. Beijing University of Chemical Technology is the primary institution of completion.

Corresponding Author Profile

Professor Fan Zhang is a professor at the School of Information Science and Technology/Artificial Intelligence Center, Beijing University of Chemical Technology. He is a member of the university’s degree committee and a senior member of the Chinese Institute of Electronics and IEEE. He joined Beijing University of Chemical Technology in 2010 and has visited the University of Illinois at Urbana-Champaign and the Technical University of Dresden in Germany. His main research areas are remote sensing image processing and artificial intelligence. He has led over 30 projects, including the National Natural Science Foundation projects, and has published over 100 academic papers in journals and conferences such as ISPRS P&RS, IEEE TGRS, CVPR, and ICCV, with more than 5,000 citations. He has received the Beijing Natural Science Second Prize and the Young Teaching Excellence Award from Beijing University of Chemical Technology.