Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FSM) based stochastic computing elements can compute complex functions, such as the exponentiation and hyperbolic tangent functions, more effciently than those using combinational logic. However, the FSM, as a sequential logic, cannot be directly implemented in parallel like the combinational logic, so reducing the long latency of the calculation becomes diffcult. Applications in the relatively higher frequency domain would require an extremely fast clock rate using FSM. This paper proposes a parallel implementation of the FSM, using an estimator and a dispatcher to directly initialize the FSM to the steady state. Experimental results show that the outputs of four typical functions using the parallel implementation are very close to those of the serial version. The paralell FSM scheme further shows equivelent or better image quality than the serial implementation in three image processing applications, Contrast Strecthing, Edge Detection and Frame Difference.