在使用python的urllib2模拟post时的一个问题,目前还搞不清楚是urlencode的bug还是php对于post支持的问题。各位看官不妨帮我分析下。
情景是这样的,我需要往一个php开发的api接口上post数据,数据格式如下:
::
{"items":{"001":["1","2"]},"title":"test"}
这是那个api所能处理的格式,这个格式对应到页面表单是就是,如下:
::
<form action="/api/" method="post">
<span>下列小于3的数字是?</span>
<input type="checkbox" value="662469" id="item_662469" name="item[001][]">1 <br/>
<input type="checkbox" value="662470" id="item_662470" name="item[001][]">2<br/>
<input type="checkbox" value="662471" id="item_662471" name="item[001][]">3<br/>
<input type="checkbox" value="662472" id="item_662472" name="item[001][]">4<br/>
<input type="hidden" name="title" value="test"/>
<input type="submit"/>
</form>
这个表单提交到php的接口上之后,把post的数据输出出来就是上面的那个格式。因此,我需要用python模拟post发送最上面定义的格式到php开发的api上。 代码如下:
.. code:: python
import urllib2
import urllib2
params = {"items":{"001":["1","2"]},"title":"test"}
req = urllib2.Request(
url = 'http://test.com/api',
data = urllib.urlencode(params),
headers = {},
)
urllib2.urlopen(req).read()
这样发送过去的post请求在php端输出出来是:
::
{"items":"{'001':['1','2']}","title":"test"}
它把items对应的value转成一个string了。于是找到关键点:urllib.urlencode这个函数。经过它处理之后,json数据会被编码成url地址上那种get请求一类的编码,编码完成之后,urlencode中似乎只是对一级的键值对进行了处理,没有处理这种嵌套情况。
然后又看了下提交表单发送的post数据,从firebug可以看到表单数据,以及编码的数据。对比两个编码后的数据发现情况很不一样。 表单提交之后的post数据编码后是这样的:item%5B001%5D%5B%5D=1&item%5B001%5D%5B%5D=2&title=test urllib.urlencode编码后的数据是这样的:items=%7B%27001%27%3A%5B%271%27%2C%272%27%5D%7D&title=test 这意味着,虽然php端api输出的接收到的post数据格式为:{"items":{"001":["1","2"]},"title":"test"},但并不意味着客户端给它发的数据就是这个格式的,php可能做了处理?(懂行的朋友请指教下)
既然知道了差异,那么就改编下urlencode把,一切都是为了业务。于是有了下面代码:[今天使用中发现一个bug,修复]
.. code:: python
#copy from urllib
from urllib import quote, quote_plus, _is_unicode
def urlencode(query, doseq=0):
"""Encode a sequence of two-element tuples or dictionary into a URL query string.
If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the
parameters in the output will match the order of parameters in the
input.
"""
if hasattr(query,"items"):
# mapping objects
query = query.items()
else:
# it's a bother at times that strings and string-like objects are
# sequences...
try:
# non-sequence items should not work with len()
# non-empty strings will fail this
if len(query) and not isinstance(query[0], tuple):
raise TypeError
# zero-length sequences of all types will get here and succeed,
# but that's a minor nit - since the original implementation
# allowed empty dicts that type of behavior probably should be
# preserved for consistency
except TypeError:
ty,va,tb = sys.exc_info()
raise TypeError, "not a valid non-string sequence or mapping object", tb
l = []
if not doseq:
# preserve old behavior
for k, v in query:
k = quote_plus(str(k))
v = quote_plus(str(v))
l.append(k + '=' + v)
else:
for k, v in query:
k = quote_plus(str(k))
if isinstance(v, str):
v = quote_plus(v)
l.append(k + '=' + v)
elif _is_unicode(v):
# is there a reasonable way to convert to ASCII?
# encode generates a string, but "replace" or "ignore"
# lose information and "strict" can raise UnicodeError
v = quote_plus(v.encode("ASCII","replace"))
l.append(k + '=' + v)
elif isinstance(v, dict): #add by huyang
for _k, _v in v.items():
encode_k = quote_plus('%s[%s][]' % (k, _k)) #fixed
if isinstance(_v, list):
for inner_v in _v:
l.append(encode_k + '=' + quote_plus(inner_v))
else:
l.append(encode_k + '=' + quote_plus(v))
else:
try:
# is this a sufficient test for sequence-ness?
len(v)
except TypeError:
# not a sequence
v = quote_plus(str(v))
l.append(k + '=' + v)
else:
# loop over the sequence
for elt in v:
l.append(k + '=' + quote_plus(str(elt)))
return '&'.join(l)
- from the5fire.com微信公众号:Python程序员杂谈