Model-free reinforcement learning (RL) has become a promising technique for designing a robust dynamic power management (DPM) framework that can cope with variations and uncertainties. Moreover, the potentially significant benefit of performing application-level scheduling as part of the system-level power management should be harnessed. This paper presents an architecture for hierarchical DPM in an embedded system composed of a processor and connected I/O devices (i.e., system components.) The goal is to facilitate saving in the component power consumption. The proposed adaptive DPM technique consists of two layers: an RL-based component-level local power manager (LPM) and a system-level global power manager (GPM). The LPM performs component power and latency optimization. It employs temporal difference learning on semi-Markov decision process (SMDP) for model-free RL, and it is specifically optimized for an environment with multiple types of applications. The GPM interacts with the CPU scheduler to perform effective application-level scheduling, thereby, enabling the LPM to do more component power optimizations. Power and latency tradeoffs of each type of application can be precisely controlled based on a user-defined parameter. Experiments show that the amount of average power saving (without any increase in the latency) is up to 31.1% compared to existing approaches.