Integrating Automatic Prompt Engineering and Vision-Language Model for Pad Defect Classification

Yi-Ting Shen1, Yan-Hsiu Liu2, Yi-Ting Li1, Wuqian Tang1, Yung-Chih Chen3, Hao-Chiang Shao4, Chia-Wen Lin1, Chun-Yao Wang1
1National Tsing Hua University, 2United Microelectronics Corporation, 3National Taiwan University of Science and Technology, 4National Chung Hsing University


Abstract

Most defect classification methods rely on deep learning, which requires human effort for image annotation. However, acquiring large, high-quality labeled datasets is often impractical due to time and cost constraints. In this paper, we propose an approach, which leverages a pretrained vision-language model with automatic prompt engineering, to reduce dataset dependence. By optimizing prompts, our approach enables accurate classification, even for unknown pad defects. Experimental results demonstrate that our approach achieves higher accuracy compared to CNN-based models and VLM-based methods on a semiconductor company's dataset.