This paper presents an efficient algorithm, Longest Processing Time with Load Balancing (LPT-LB), for mapping embedded application tasks to cores in a heterogeneous Multi-Processor System-on-Chip (MPSoC) platform. Our algorithm uses a heuristic to minimize the overall application execution time while evenly balancing the computational load on the individual cores in the platform. The algorithm executes in polynomial time, thereby making it more attractive than time-consuming offline algorithms based on genetic mutation and ILP solvers. LPT-LB also performs consistently better mapping, with regard to application execution time, than online algorithms proposed in the past. Experimental results for LPT-LB mapping of industrial scale applications such as Demosaic and Wimax, on heterogeneous MPSoC platforms with 2-16 PEs, show execution time reductions of 14% on average, and 35% in the best case, over previous online algorithms.