Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this questionIs there a way to crawl all facebook fan pages and collect some information? like for example crawling facebook fan pages and save their names, or how many fans, etc? Or at least, do you have a 开发者_JAVA技巧hint of how this could be possibly done?
Write a crawler.
I used Coca-Cola's page as an experiment: http://www.facebook.com/cocacola?v=wall
Parse out the "Fans" div, which contains an "All Fans" link. View source in your web browser, it looks like this: /social_graph.php?node_id=40796308305&class=FanManager
Turn that into a facebook URL and crawl it: http://www.facebook.com/social_graph.php?node_id=40796308305&class=FanManager
Parse out the fans, then parse out the "Next page" link.
Repeat, ad nauseum.
Throttle your requests so facebook doesn't blacklist you.
First select a page that contains your desired category for pages:
For Example: http://www.facebook.com/pages/ or http://www.facebook.com/pages/?browse&ps=93
Then use a crawler to get all pages links.
Now you can parse each page separately using extracted links.
You can use simple html dom for crawling.
Download and run websphinx jar. Enter http://www.facebook.com/pages/ to Starting URLs and select the subtree as Crawl. Don't forget to increase Page Size and Page Timeout values. Higher number (100-200) of threads have higher chance of crawling more pages successfully.
精彩评论