This paper describe a system-level approach to improve the area and delay of datapath designs that perform polynomial computations over Z_(2^m ), which are used in many applications such as computer graphics and digital signal processing domains. This approach optimizes the implementation of multivariate polynomial systems in terms of the number of arithmetic operations by performing optimization on a system level prior to high-level synthesis. Univariate functional decomposition of polynomial expressions and canonization form over Z_(2^m ) are used in this method. We use GAUT high-level synthesis tool to generate RTL datapath architectures for the optimized polynomials. Experimental results on a set of benchmark applications with polynomial expressions show that this method outperforms conventional methods in terms of the area of the sequential datapath architecturs in speed optimization mode with an average improvement of 25.81%, and the required clock cycles in two modes of speed optimization and area optimization, with an average improvement of 23.48% and 38.24%, respectively.