This paper presents a scalable design architecture for lightweight neural networks for resource-limited devices. As case studies, several field-programmable gate array (FPGA) designs with pipelining and parallel structures are conducted to the application of handwritten digit classification. Specifically, first static analysis of the timing diagrams is carried out to trade quality of results and speed off for resource utilization. It allows users to find the optimal design architecture with minimum area-latency-power cost corresponding to different design specifications. Second, the single-precision floating-point adder and multiplier are implemented at register-transfer level to reduce the design complexity. Through integrating with Vivado IPs including subtractor, accumulator, exponential operator, and reciprocal, the sigmoid neuron can be constructed and further instantiated to establish the proposed neural networks. Experimental results show that the lightweight design architecture achieves more than x3.88 speedup and 80% energy saving with similar FPGA cost and a classification accuracy of up to 95.68%.