开发者

It's possible open multiple connections to multiple sites using only one thread?

开发者 https://www.devze.com 2023-04-10 02:29 出处:网络
Update I use a FixedThreadPool already. What happens is that each thread open one connection for one site. What I want to do is something asynchronous.

Update

I use a FixedThreadPool already. What happens is that each thread open one connection for one site. What I want to do is something asynchronous.

  1. Send request to a server
  2. Go to next request without need to wait the first request to complete
  3. When a request was established, do something informing another thread that connection was established and ready for download.

I think this will speed up the execution because will use less threads for opening the same or more connection that the currently performance.

In the current way, each thread wait a time without work waiting the connection establishes. In this new way, it will be always working.


The Question

I want to know if there is a way to open connection to multiple sites with only one thread.

This is because I'm doing an webcrawler, I already did a thread to open a connection, but after a certain number of threads, this will not help because the processor sharing will increase a lot.

I want this to speed up the number of pages downloaded. It's possible do this? How?

This code open a connection and do some processing. It's executed by the threads that open a connection

/*
 * Open connection to a server
 */
boolean openConnection(Link link) throws Exception {
    //set the connection paramenters
    HttpURLConnection conn = (HttpURLConnection) new URL(link.getOriginalURL().getURL()).openConnection();
    conn.setRequestProperty("User-Agent", ROBOT_NAME);
    conn.setInstanceFollowRedirects(true);
    conn.setConnectTimeout(READ_TIMEOUT);
    conn.setReadTimeout(READ_TIMEOUT);
    link.setConnection(conn);
    //open the connection
    conn.connect();        
    //check the server answer
    if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) {
        return false;
    }
    //analyse the URL of the redirected URL
    urlAnalyzer.fillURL(link.getRedirectedURL(), getRedirectedURL(link.getConnection()));
    return true;
}

This executes the connection openers, each one in one thread

/*
 * Start the execution of the connection openers     
 */
private void executeConnectionOpeners() {
    LOGGER.info("Starting connection openners.");
    /* Execution */
    NameThreadFactory ntf = new NameThreadFactory("Connection Opener");
    crawlerOpenerExecutor = Executors.newFixedThreadPool(nOpeners, ntf);
    for (int i = 0; i < nOpeners; i++) {
        crawlerOpenerExecutor.submit(new ConnectionOpener(this));
    }
    /* End of execution */
    LOGGER.info(nOpeners + " connection openers created and running.");
}开发者_高级运维


Fetching web pages isn't a particularly processor-intensive job: you've going to spend almost all of your time waiting for the network unless you're fetching a lot of small pages from very fast local connections.

Of course, you should look at how many threads it's actually worth using, via benchmarking - you'll probably want to have a fixed set of threads working off a shared producer/consumer queue. (You don't want to create a genuine new thread for each request.)

Now it should be possible to use only a very few threads if you can perform the fetch asynchronously (potentially with NIO) but I would personally check whether the "separate threads" approach is actually maxing out your CPU first. It's probably going to make the code much simpler than using asynchrony, and if the bottleneck is really the network, then you'll end up with harder-to-maintain code for little (if any) benefit.


Check out and see if you like Java 7's AsynchronousSocketChannel. Basically, you issue a read request, and when bytes are available, it'll call your callback. Of course, the callback must be invoked on some thread; you have some options to config the threading policy.


I've used xlightweb for similar purpose, i.e. asynchronous HTTP.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号