开发者

Problem logging into Facebook with Scrapy

开发者 https://www.devze.com 2023-03-28 13:14 出处:网络
(I have asked this question on the Scrapy google-group without luck.) I am trying to log into Facebook using Scrapy. I tried the following in the interactive shell:

(I have asked this question on the Scrapy google-group without luck.)

I am trying to log into Facebook using Scrapy. I tried the following in the interactive shell:

I set the headers and created a request as follows:

header_vals={'Accept-Language': ['en'], 'Content-Type': ['application/ 
x-www-form-urlencoded'], 'Accept-Encoding': ['gzip,deflate'], 
'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,*/ 
*;q=0.8'], 'User-Agent': ['Mozilla/5.0 Gecko/20070219 Firefox/ 
2.0.0.2']}

login_request=Request('https://www.facebook.com/login.php',headers=header_vals) 

fetch(login_request) 

I get redirected:

2011-08-11 13:54:54+0530 [def开发者_运维问答ault] DEBUG: Redirecting (meta refresh) 
to <GET https://www.facebook.com/login.php?_fb_noscript=1> from <GET 
https://www.facebook.com/login.php> 

. . .

[s]   request    <GET https://www.facebook.com/login.php> 

[s]   response   <200 https://www.facebook.com/login.php?_fb_noscript=1> 

I guess it shouldn't be redirected there if I am supplying the right headers?

I still attempt to go ahead and supply login details using the FormRequest as follows:

new_request=FormRequest.from_response(response,formname='login_form',formdata={'email':'...@email.com','pass':'password'},headers=header_vals)

new_request.meta['download_timeout']=180 

new_request.meta['redirect_ttl']=30 

fetch(new_request) results in:

2011-08-11 14:05:45+0530 [default] DEBUG: Redirecting (meta refresh) 
to <GET https://www.facebook.com/login.php?login_attempt=1&_fb_noscript=1> 
from <POST https://www.facebook.com/login.php?login_attempt=1>
.
.

[s]   response   <200 https://www.facebook.com/login.php?login_attempt=1&_fb_noscript=1> 

.

What am I missing here? Thanks for any suggestions and help.

I'll add that I've also tried this with a BaseSpider to see if this was a result of the cookies not being passed along in the shell, but it doesn't work there either.

I was able to use Mechanize to log on successfully. Can I take advantage of this to somehow pass cookies on to Scrapy?


Notice that "meta redirect" text near redirecting. Facebook has a noscript tag to automatically redirect clients without javascript to "/login.php?_fb_noscript=1". The problem is that you're posting to "/login.php" instead and always getting redirected by meta refresh header.

Even if you get over this problem it's against Facebook robots.txt, so you shouldn't really be doing this.

Why don't you just use Facebook Graph API?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号