Finite State Machines (FSMs) are a core abstraction for modeling digital systems, control logic, and reactive behaviors. Automating FSM state and transition identification is therefore critical for computer-aided design (CAD) tools. While multi-modal large language models (MLLMs) have shown promise in design automation, FSM diagrams are often not directly interpretable by these models, resulting in errors in control synthesis and HDL generation. This paper presents FSMVision, a multi-modal artificial intelligence (AI) framework that extracts complete FSM semantics directly from diagrams. By integrating visual parsing, spatial reasoning, and language grounding, FSMVision accurately identifies states, transitions, and conditions for design automation. Experiments demonstrate 99.00% state detection and 99.01% transition detection accuracy across diverse FSM diagrams of varying complexity. FSMVision surpasses existing MLLM-based baselines, achieving superior structural and semantic fidelity, and enables direct translation of FSM diagrams into executable implementations, advancing AI-driven design and control automation.