开发者

Get URL contents from a sign-in site

开发者 https://www.devze.com 2023-03-05 17:15 出处:网络
I want to get the source code of a site using PHP, but when I do it comes up empty.I believe it is because you have to sign in to the site before using it, and the PHP call is not in a signed in sessi

I want to get the source code of a site using PHP, but when I do it comes up empty. I believe it is because you have to sign in to the site before using it, and the PHP call is not in a signed in session when asking for the contents. Is this correct? Is there a way to bypass this or send the sign in user and pass by PHP, so the call can be made later?

I also tried logging in via my browser to the site and then calling my *.php file in my localhost from the same browser, but it didn't work.

This is an example of a required sign-in site where I want to get the source when opening my mailbox. This is the way I would normally get the site contents. But it comes up empty:

$url = "http://mail.yahoo.com/mc/welcome".$params;
$pagesource = file_get_contents( $url );

echo $pagesource;

This code works if you call it for example, with $url = "http://stackoverflow.com/users/432539/elcool" ; which is my profile page and is available publicly开发者_开发知识库 without needing to sign in.

Any ideas?


You'll need to use something like cURL to emulate the log-in form and send the POST request to the remote server.

See this post here for a simple example: http://davidwalsh.name/execute-http-post-php-curl

I would, though, see if the remote site offers an API that you can use to authenticate and get the data you're looking for, because the method you're implementing (named Web Scraping), is unreliable and may even be illegal, depending on the remote site's Terms of Service.


Yes, first you will need to make a login call to whatever the page is, and use the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE settings to keep track of the session when making calls as an authenticated user using curl_setopt.

0

精彩评论

暂无评论...
验证码 换一张
取 消