I'm currently developing a firefox plugin. This plugin has to handle very crappy website that is really incorrectly formatted. I cannot modify these websites, so I have to handle them.
I reduced the bug I'm facing to a short sample of html (if this appellation is appropriate for an horror like this) :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Some title.</title>
<!-- Oh yes ! -->
<div style="visability:hidden;">
<a href="//example.com"> </a>
</div>
<!-- If meta are reduced, then the bug disapears ! -->
<meta name="description" content="Homepage of Company.com, Company's corporate Web site" />
<meta name="keywords" content="Company, Company & Co., Inc., blablabla, blablabla, blablabla, blablabla, blablabla, 开发者_如何学运维blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla" />
<meta http-equiv="Content-Language" content="en-US" />
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
</head>
<body class="homePage">
<div class="globalWrapper"><a href="/page.html">My gorgeous link !</a></div>
</body>
</html>
When opening the webpage, « My gorgeous link ! » if displayed and clickable. However, when I'm exploring the DOM with Javascript into my plugin, everything behaves (DOM exploration and innerHTML property) like the code was this one :
<html>
<head>
<title>Some title.</title>
<!-- Oh yes ! -->
</head><body><div style="visability:hidden;">
<a href="//example.com"> </a>
</div>
<!-- If meta are reduced, then the bug disapears ! -->
<meta name="description" content="Homepage of Company.com, Company's corporate Web site">
<meta name="keywords" content="Company, Company & Co., Inc., blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla, blablabla">
<meta http-equiv="Content-Language" content="en-US">
</body>
</html>
So, when exploring the DOM within the plugin, the document is somehow fixed by firefox. But this fixed DOM is inconsistent with what is in the webpage. Thus, my plugin doesn't behave as expected.
I'm really puzzled with that issue. The problem exists in both firefox 3.6 and firefox 4 (didn't tested firefox 5 yet). For example, reducing the meta, will fix the issue.
Where does this discrepancy come from ? How can I handle it ?
EDIT: With the answer I get, I think I should be a little more precise. I do know what firefow is doing when modifying the webpage in the second code snippet. The problem is the following one : « In the fixed DOM that I get into my plugin, the gorgeous link doesn't appear anywhere, but this link is actually visible on the webpage, and works. So the DOM I'm manipulating, and the DOM in the webpage are different - they are fixed in a different manner. » . So where does the difference come in the fixing behaviour, and how can I handle that, or, in other terms, how can I be aware, in my plugin, of the existance of the gorgeous link ?
NB: Exploring the DOM with firebug show a different DOM from what I'm getting in my module. Both DOM are fixed by firefox, but in a different manner. I do get the DOM like this :
var html = browser.contentDocument.documentElement;
// Then, for example :
html.getElementsByTagName('a'); // Returns only the a element in the header. On the webpage, only the a in the body appears.
DOM exploration with firebug show that the div and a within the header are removed, which is a different behaviour.
EDIT² : The code in my plugin is ran after the page loading end, by this mecanism :
gBrowser.addTabsProgressListener({
onStateChange: function(aBrowser, aWebProgress, aRequest, aStateFlags, aStatus) {
if( (aStateFlags & Components.interfaces.nsIWebProgressListener.STATE_STOP) ) {
// Some operations including the DOM parsing here
}
}
});
I tried to reproduce your issue and failed - everything seemed to work fine in Firefox 5. My mental powers tell me that you are trying to access the document before it finished loading. That's why you need a lengthy meta tag, the document body downloads in two network packets then and you are looking at the document when only the first packet has been received. Wait for the DOMContentLoaded
event before accessing the document. Or, if you are using a progress listener, wait for onStateChanged
call with STATE_STOP
and STATE_IS_DOCUMENT
flags set.
Where does this discrepancy come from ?
The start and end tags for the <head>
and <body>
elements are optional in HTML 4.
While inside the <head>
, if something is encountered that must appear in the <body>
, the <head>
is automatically terminated and the <body>
started.
</head><body>
are then ignored as errors.
How can I handle it ?
That rather depends on what you actually want to achieve. The DOM you get is the DOM you get though, so that is what you have to work with.
精彩评论