We propose, for the first time, the use of hierarchical matrix ($\mathcal{H}$-matrix) in the efficient finite-element-based (FE-based) direct solver implementation for both steady and transient thermal analyses of three-dimensional integrated circuits (3D ICs). $\mathcal{H}$-matrix was shown to provide a data-sparse way to approximate the matrices and their inverses with almost linear space and time complexities. We show this is also true for FE-based transient analysis of thermal parabolic partial differential equations (PDEs). Specifically, we show that the stiffness matrix from a FE-based steady and transient thermal analysis can be represented by $\mathcal{H}$-matrix without approximation, and its inverse and Cholesky factors can be evaluated by $\mathcal{H}$-matrix with controlled accuracy. We then show that the memory and time complexities of the solver are bounded by $\mathcal{O}(\mathit{k_1N}\log{N})$ and $\mathcal{O}(\mathit{k_1^2N}\log^2N)$, respectively, for very large scale thermal systems, where $k_1$ is a small quantity determined by accuracy requirements and $\mathit{N}$ is the number of unknowns in the system. Numerical results demonstrate that the proposed method shows significant advantages over the LU based analysis techniques in terms of both memory and time complexity.