开发者

Persistence in MapReduce

开发者 https://www.devze.com 2023-02-02 09:02 出处：网络

Let\'s say you have divided your work for the map phase of map/reduce and mapping is running.Now, each unit of work takes about 1 minute.Let\'s say that you need to stop processing.How would you persi

相关专题：mapreduce persistence

Let's say you have divided your work for the map phase of map/reduce and mapping is running. Now, each unit of work takes about 1 minute. Let's say that you need to stop processing. How would you persist the state of the map/reduce so that you waste the least 开发者_运维技巧amount of time when you start back up?

You'd have to memoize the results in a way that allows you to skip most of the processing of rows you've seen before. If there's a candidate key that identifies the row you can use that to look in a cache, then fetch the processed results that are stored there.

Setting up your cluster with Memcached or Redis would be one approach for achieving memoization.