the5fire

关注Python、Django、Vim、Linux、Web开发、团队管理和互联网--Life is short, we need Python.


对urllib中的urlencode的扩展

作者:the5fire | 标签:   | 发布:2012-11-13 9:44 p.m. | 阅读量: 23382, 22559

在使用python的urllib2模拟post时的一个问题,目前还搞不清楚是urlencode的bug还是php对于post支持的问题。各位看官不妨帮我分析下。

情景是这样的,我需要往一个php开发的api接口上post数据,数据格式如下:

::

{"items":{"001":["1","2"]},"title":"test"}

这是那个api所能处理的格式,这个格式对应到页面表单是就是,如下:

::

<form action="/api/" method="post">
<span>下列小于3的数字是?</span>
<input type="checkbox" value="662469" id="item_662469" name="item[001][]">1 <br/>
<input type="checkbox" value="662470" id="item_662470" name="item[001][]">2<br/>
<input type="checkbox" value="662471" id="item_662471" name="item[001][]">3<br/>
<input type="checkbox" value="662472" id="item_662472" name="item[001][]">4<br/>
<input type="hidden" name="title" value="test"/>
<input type="submit"/>
</form>

这个表单提交到php的接口上之后,把post的数据输出出来就是上面的那个格式。因此,我需要用python模拟post发送最上面定义的格式到php开发的api上。 代码如下:

.. code:: python

import urllib2
import urllib2

params = {"items":{"001":["1","2"]},"title":"test"}
req = urllib2.Request(
                url = 'http://test.com/api',
                data = urllib.urlencode(params),
                headers = {},
            )
urllib2.urlopen(req).read()

这样发送过去的post请求在php端输出出来是:

::

{"items":"{'001':['1','2']}","title":"test"}

它把items对应的value转成一个string了。于是找到关键点:urllib.urlencode这个函数。经过它处理之后,json数据会被编码成url地址上那种get请求一类的编码,编码完成之后,urlencode中似乎只是对一级的键值对进行了处理,没有处理这种嵌套情况。

然后又看了下提交表单发送的post数据,从firebug可以看到表单数据,以及编码的数据。对比两个编码后的数据发现情况很不一样。 表单提交之后的post数据编码后是这样的:item%5B001%5D%5B%5D=1&item%5B001%5D%5B%5D=2&title=test urllib.urlencode编码后的数据是这样的:items=%7B%27001%27%3A%5B%271%27%2C%272%27%5D%7D&title=test 这意味着,虽然php端api输出的接收到的post数据格式为:{"items":{"001":["1","2"]},"title":"test"},但并不意味着客户端给它发的数据就是这个格式的,php可能做了处理?(懂行的朋友请指教下)

既然知道了差异,那么就改编下urlencode把,一切都是为了业务。于是有了下面代码:[今天使用中发现一个bug,修复]

.. code:: python

#copy from urllib
from urllib import quote, quote_plus, _is_unicode

def urlencode(query, doseq=0):
    """Encode a sequence of two-element tuples or dictionary into a URL query string.
    If any values in the query arg are sequences and doseq is true, each
    sequence element is converted to a separate parameter.
    If the query arg is a sequence of two-element tuples, the order of the
    parameters in the output will match the order of parameters in the
    input.
    """
    if hasattr(query,"items"):
        # mapping objects
        query = query.items()
    else:
        # it's a bother at times that strings and string-like objects are
        # sequences...
        try:
            # non-sequence items should not work with len()
            # non-empty strings will fail this
            if len(query) and not isinstance(query[0], tuple):
                raise TypeError
                # zero-length sequences of all types will get here and succeed,
                # but that's a minor nit - since the original implementation
                # allowed empty dicts that type of behavior probably should be
                # preserved for consistency
        except TypeError:
            ty,va,tb = sys.exc_info()
            raise TypeError, "not a valid non-string sequence or mapping object", tb

    l = []
    if not doseq:
        # preserve old behavior
        for k, v in query:
            k = quote_plus(str(k))
            v = quote_plus(str(v))
            l.append(k + '=' + v)
    else:
        for k, v in query:
            k = quote_plus(str(k))
            if isinstance(v, str):
                v = quote_plus(v)
                l.append(k + '=' + v)
            elif _is_unicode(v):
                # is there a reasonable way to convert to ASCII?
                # encode generates a string, but "replace" or "ignore"
                # lose information and "strict" can raise UnicodeError
                v = quote_plus(v.encode("ASCII","replace"))
                l.append(k + '=' + v)
            elif isinstance(v, dict):    #add by huyang
                for _k, _v in v.items():
                    encode_k = quote_plus('%s[%s][]' % (k, _k))    #fixed
                    if isinstance(_v, list):
                        for inner_v in _v:
                            l.append(encode_k + '=' + quote_plus(inner_v))
                    else:
                        l.append(encode_k + '=' + quote_plus(v))
            else:
                try:
                    # is this a sufficient test for sequence-ness?
                    len(v)
                except TypeError:
                    # not a sequence
                    v = quote_plus(str(v))
                    l.append(k + '=' + v)
                else:
                    # loop over the sequence
                    for elt in v:
                        l.append(k + '=' + quote_plus(str(elt)))
    return '&'.join(l)
- from the5fire.com
----EOF-----

微信公众号:Python程序员杂谈


其他分类: