开发者

Python : run multiple queries in parallel and get the first finished

开发者 https://www.devze.com 2023-03-16 14:32 出处:网络
I try to create a Python script that performs queries to multiple sites. The script works well (I use urllib2) but just for one link. For multiples sites, I make multiple requests one after the other

I try to create a Python script that performs queries to multiple sites. The script works well (I use urllib2) but just for one link. For multiples sites, I make multiple requests one after the other but it is not very powerful.

What is the ideal solution (the threads I guess) to run multiple queries in parallel and stop others when a query returns a specific string please ?

I found this question but I have not found how to change it to stop the remaini开发者_运维问答ng threads... : Python urllib2.urlopen() is slow, need a better way to read several urls

Thank you in advance !

(sorry if I made mistakes in English, I'm French ^^)


You can use Twisted to deal with multiple requests concurrently. Internally it will use epoll (or iocp or kqueue depending on the platform) to get notified of tcp availability efficently, which is cheaper than using threads. Once one request matches, you cancel the others.

Here is the Twisted http agent tutorial.


Usually this is implemented with the following pattern (sorry, my Python skills are not so good).

You have a class named Runner. This class has long running method, which gets the information you need. Also, it has a Cancel method, which interrupts the long running method in some way (you can make the url request object a class member field, so the cancel class calls the equivalent of request.terminate()).

The long running method need to accept a callback function, which to signal when done.

Then, before you start your many threads, you create instances of all these objects of that class, and keep them in a list. In the same loop you can start these long running methods, passing a callback method of your main program.

And, in the callback method, you just go trough the list of all threaded classes and call their cancel method.

Please, edit my answer with any Python specific implementation :)


You can run your queries with the multiprocessing library, poll for results, and shutdown queries you no longer need. Documentation for the module includes information on the Process class which has a terminate() method. If you wish to limit the number of requests sent out, check out options for pooling.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号