Most defect classification methods rely on deep learning, which requires human effort for image annotation. However, acquiring large, high-quality labeled datasets is often impractical due to time and cost constraints. In this paper, we propose an approach, which leverages a pretrained vision-language model with automatic prompt engineering, to reduce dataset dependence. By optimizing prompts, our approach enables accurate classification, even for unknown pad defects. Experimental results demonstrate that our approach achieves higher accuracy compared to CNN-based models and VLM-based methods on a semiconductor company's dataset.