开发者

Determine if a URL is in the header/footer of a web page given URL, page DOM, parent URL and other page URLs

开发者 https://www.devze.com 2023-01-08 17:13 出处：网络

Given a URL, the URL of the webpage that first URL is on, the DOM of the webpage, and a li开发者_如何学JAVAst of the rest of the URLs on the webpage how can I reliably determine if the URL is in the h

相关专题：heuristics

Given a URL, the URL of the webpage that first URL is on, the DOM of the webpage, and a li开发者_如何学JAVAst of the rest of the URLs on the webpage how can I reliably determine if the URL is in the header/footer of the page or if it's in neither?

I'm using C#/.NET.

I know that no solution is perfect since webpages are not semantically expressed and also because some websites/pages specifically obfuscate their pages, but I would like to build some logic that would work for say 75% of webpages.

Also, are there other pieces of information that would be helpful to determine the location of the URL in the page?

I think the creative task here is to define "header" and "footer", as in "content less than x units away from the top", or "the last 200 characters on the page". Once you have accomplished this, you can parse the page based on those rules.