开发者

How to get list of servlet parameters in a web server?

开发者 https://www.devze.com 2023-04-10 17:23 出处:网络
I\'m working on a project of web data mining to extract information directly from HTML by crawling server pages. My effort is concentrated only in an specific website which has a java web server, with

I'm working on a project of web data mining to extract information directly from HTML by crawling server pages. My effort is concentrated only in an specific website which has a java web server, with caucho resin installed.

Parameters are passed by value pairs in url, like www.xxxxxx.com/jm/search?act=see&id=909&... I have decoded many parameters by try b开发者_如何学JAVAut of course, results are comming very slowly.

My question is... do you Java Gurus know how to get all valid parameters of this kind of server? it is possible?

I don't have access to server and I don't know nothing about caucho resin, I'm coding an utility in Java to do the job.


Unless the server you're communicating with publishes a complete API, there can be any number of parameters. Consider this--a web form may not post all the parameters the server responds to, like parameters for internal usage, etc.

Since parameter handling is implemented away from "public" eyes, on the server side, it is opaque to the outside world.

If you're referring to the possible values of the parameters, the answer is basically the same. For example, how many valid product SKUs does Amazon have?

(Also note that it might be better to call these "request parameters", as servlets also have "init parameters", which is an entirely different question :)


Whether a parameter is valid is not something which is definied by the web server. It's definied by the custom servlet code itself. It's in turn usually definied in a functional requirement and/or technical specification document and probably also in the generated javadoc of the custom servlet.

Your best bet is to contact the owner/maintainer of the website for this information. If you can not or may not, then you're probably doing something which violates the website's policy. You can at least find all valid parameter names in the input elements of any public HTML form which submits to this servlet.


Update: as per your comment:

I'm talking about parameters not values. I did manage to find many of them by looking at HTML source code for "hidden" tags, but those are not the only ones, as I was able to find more of them by trial and error.

Just use Firebug or Fiddler to track HTTP requests made by a real webbrowser. You'll get a all parameters which are been sent in a nice table with name=value pairs. No need for trial'n'error.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号