##plugins.themes.academic_pro.article.main##
Abstract
A big parallel processing job can be delayed substantially as long as one of its many tasks is
being assigned to an unreliable or congested machine. To tackle this so-called straggler problem, most
parallel processing frameworks such as MapReduce have adopted various strategies under which the
system may speculatively launch additional copies of the same task if its progress is abnormally slow
when extra idling resource is available. In this paper, we focus on the design of speculative execution
schemes for parallel processing clusters from an optimization perspective under different loading
conditions. For the lightly loaded case, we analyze and propose one cloning scheme, namely, the Smart
Cloning Algorithm (SCA) which is based on maximizing the overall system utility. We also derive the
workload threshold under which SCA should be used for speculative execution. For the heavily loaded
case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the
Microsoft Mantri scheme to mitigate stragglers. Our simulation results show SCA reduces the total job
flowtime, i.e., the job delay/ response time by nearly 6% comparing to the speculative execution strategy
of Microsoft Mantri. In addition, we show that the ESE Algorithm outperforms the Mantri baseline
scheme by 71% in terms of the job flowtime while consuming the same amount of computation resource.