开发者

Grab Form Data Via Python

开发者 https://www.devze.com 2023-04-05 08:26 出处:网络
I\'m looking to grab the form data that needs to be passed along to a specific website and submit it.Below is the html(form only) that I need to simulate.I\'ve been working on this for a few hours, bu

I'm looking to grab the form data that needs to be passed along to a specific website and submit it. Below is the html(form only) that I need to simulate. I've been working on this for a few hours, but can't seem to get anything to work. I want this to work in Google App Engine. Any help would be nice.

<form method="post" action="/member/index.bv"> 
        <table cellspacing="0" cellpadding="0" border="0" width="100%"> 
            <tr> 
                <td align="left"> 
                    <h3>member login</h3><input type="hidden" name="submit" value="login" /><br /> 
                </td> 
            </tr> 
            <tr> 
                <td align="left" style="color: #8b6c46;"> 
                    email:<br /> 
                    <input type="text" name="email" style="width: 140px;" /> 
                </td> 
            </tr> 
            <tr> 
                <td align="left" style="color: #8b6c46;"> 
                    password:<br /> 
                    <input type="password" name="password" style="width: 140px;" /> 
                </td> 
            </t>
            <tr> 
                <td> 
                    <input type="image" class="formElementImageButton" src="/resources/default/images/btnLogin.gif" style="width: 46px; height: 17px;" /> 
                </td> 
            </tr> 
            <tr> 
                <td align="left"> 
                    <div style="line-height: 1.5em;"> 
                        <a href="/join/" style="color: #8b6c46; font-weight: bold; text-decoration: underline; ">join</a><br /> 
                        <a href="/member/forgot/" style="color: #8b6c46; font-weight: bold; text-decoration: underline;">forgot password?</a><input type="hidden" name="lastplace" value="%2F"><br /> 
                        having trouble logging on, <a href="/cookieProblems.bv">click here</a> for help
                    </div> 
                </td> 
            </tr> 
        </table> 
    </form>

currently I'm trying to use this code to access it, but it's not working. I'm pretty new to this, so maybe I'm just missing it.

import urllib2, urllib

url = 'http://blah.com/member/index.bv'
values = {'email' : 'someemail@gmail.com',
          'password' : 'somepassword'}

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_pa开发者_如何学Cge = response.read()


Is this login page for a 3rd party site? If so, there may be more to it than simply posting the form inputs.

For example, I just tried this with the login page on one of my own sites. A simple post request won't work in my case, and this may be the same with the login page you are accessing as well.

For starters the login form may have a hidden csrf token value that you have to send when posting your login request. This means you'd have to first get the login page and parse the resulting html for the csrf token value. The server may also require its session cookie in the login request.

I'm using the requests module to handle the get/post and beautifulsoup to parse the data.

import requests                                                                                                                                                                                             
import zlib                                                                                                                                                                                                 
from BeautifulSoup import BeautifulSoup                                                                                                                                                                     

# first get the login page                                                                                                                                                                                                    
response = requests.get('https://www.site.com')                                                                                                                                                   
# if content is zipped, then you'll need to unzip it                                                                                                                                                                                 
html = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)  
# parse the html for the csrf token                                                                                                                                                
soup = BeautifulSoup(html)                                                                                                                                                                                  
csrf_token = soup.find(name='input', id='csrf_token')['value']                                                                                                                                              

# now, submit the login data, including csrf token and the original cookie data                                                                                                                                          
response = requests.post('https://www.site.com/login',                                                                                                                                       
            {'csrf_token': csrf_token,                                                                                                                                                                  
             'username': 'username',                                                                                                                                                                            
             'password': 'ckrit'},                                                                                                                                                                           
             cookies=response.cookies)                                                                                                                                                                   

login_result = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)                                                                                                                                                  
print login_result    

I cannot say if GAE will allow any of this or not, but at least it might be helpful in figuring out what you may require in your particular case. Also, as Carl points out, if a submit input is used to trigger the post you'd have to include it. In my particular example, this isn't required.


You're missing the hidden submit=login argument. Have you tried:

import urllib2, urllib

url = 'http://blah.com/member/index.bv'
values = {'submit':'login',
          'email' : 'someemail@gmail.com',
          'password' : 'somepassword'}

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号