开发者

Parallel process a intensive IO function

开发者 https://www.devze.com 2023-03-12 20:59 出处:网络
I have this sample code. List<Dictionary<string,string>> objects = new List<Dictionary<string,string>>();

I have this sample code.

List<Dictionary<string,string>> objects = new List<Dictionary<string,string>>();

foreach (string url in urls)
{
    objects.add(processUrl(url))
}

I need to process the URL, processUrl down load the page and run many r开发者_StackOverflowegex to extract some informations and return a "C# JSON like" object, so I want to run this in parallels and in the end I need a list of objects so i need to wait all tasks to continue process, how can I accomplish this? I se many example but none saving the return.

Regards


Like this?

var results = urls.AsParallel().Select(processUrl).ToList();

With Parallel:

Parallel.ForEach(
    urls, 
    url =>
    {
        var result = processUrl(url);
        lock (syncOjbect)
            objects.Add(result);
    };

or

var objects = new ConcurrentBag<Dictionary<string,string>>();
Parallel.ForEach(urls, url => objects.Add(processUrl(url)));
var result = objects.ToList();

or with Tasks:

var tasks = urls
    .Select(url => Task.Factory.StartNew(() => processUrl(url)))
    .ToArray();

Task.WaitAll(tasks);
var restuls = tasks.Select(arg => arg.Result).ToList();


First, refactor as

processUrl(url, objects);

and make the task responsible for adding the results to the list.

Then add locking so two parallel tasks don't try to use the results list at exactly the same time.


Note: async support in the next version of .NET will make this trivially easy.


You can use PLinq extensions, this requires the .NET 4.0

System.Threading.Tasks.Parallel
          .ForEach(urls, url => {
             var result = processUrl(url);
             lock(objects)
             {
                  objects.Add(result);
             }
           });
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号