开发者

wget visit url which has parent directory exactly after the hostname

开发者 https://www.devze.com 2023-04-03 16:42 出处:网络
Update: I upgrade wget from 1.10 to 1.12 and solved the problem. For example www.example.com/level1/level2/../test.html

Update: I upgrade wget from 1.10 to 1.12 and solved the problem.

For example

www.example.com/level1/level2/../test.html

In this way, wget and browser will visit

www.example.com/level1/test.html

But for

www.example.com/../test.html

wget will visit

www.example.com/../test.html

browser will visit

www.example.com/test.开发者_C百科html

I was using wget to parse some webpage to get the size of it and the elements inside it. Now I found that some webpage are using "../css/xxx.jpg" instead of "css/xxx.jpg". It is Ok to visit the webpage with browser, but not wget.

Is there a way to solve it? Thank you.


Before passing URLs to wget, trim "../" from the begging of the path. (splitting the URLS into components would help.)

How to do this depends on what language or framework you are using.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号