开发者

Is there a way to crawl all facebook fan pages? [closed]

开发者 https://www.devze.com 2022-12-25 04:43 出处:网络
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this
Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 9 years ago.

Improve this question

Is there a way to crawl all facebook fan pages and collect some information? like for example crawling facebook fan pages and save their names, or how many fans, etc? Or at least, do you have a 开发者_JAVA技巧hint of how this could be possibly done?


Write a crawler.

  • I used Coca-Cola's page as an experiment: http://www.facebook.com/cocacola?v=wall

  • Parse out the "Fans" div, which contains an "All Fans" link. View source in your web browser, it looks like this: /social_graph.php?node_id=40796308305&class=FanManager

  • Turn that into a facebook URL and crawl it: http://www.facebook.com/social_graph.php?node_id=40796308305&class=FanManager

  • Parse out the fans, then parse out the "Next page" link.

  • Repeat, ad nauseum.

  • Throttle your requests so facebook doesn't blacklist you.


First select a page that contains your desired category for pages:

For Example: http://www.facebook.com/pages/ or http://www.facebook.com/pages/?browse&ps=93

Then use a crawler to get all pages links.

Now you can parse each page separately using extracted links.

You can use simple html dom for crawling.


Download and run websphinx jar. Enter http://www.facebook.com/pages/ to Starting URLs and select the subtree as Crawl. Don't forget to increase Page Size and Page Timeout values. Higher number (100-200) of threads have higher chance of crawling more pages successfully.

0

精彩评论

暂无评论...
验证码 换一张
取 消