Python requests上传文件实现步骤_Python

官方文档：https://2.python-requests.org//en/master/

工作中涉及到一个功能，需要上传附件到一个接口，接口参数如下：

使用http post提交附件 multipart/form-data 格式，url : http://test.com/flow/upload，

									字段列表：

									md5:      //md5加密（随机值_当时时间戳）

									filesize:  //文件大小

									file:       //文件内容(须含文件名)

									返回值：

									{"success":true,"uploadName":"tmp.xml","uploadPath":"uploads\/201311\/758e875fb7c7a508feef6b5036119b9f"}

由于工作中主要用python，并且项目中已有使用requests库的地方，所以计划使用requests来实现，本来以为是很简单的一个小功能，结果花费了大量的时间，requests官方的例子只提到了上传文件，并不需要传额外的参数：

https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-file

									>>> url = 'https://httpbin.org/post'

									>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

									>>> r = requests.post(url, files=files)

									>>> r.text

									{

									 ...

									 "files": {

									  "file": "<censored...binary...data>"

									 },

									 ...

									}

但是如果涉及到了参数的传递时，其实就要用到requests的两个参数：data、files，将要上传的文件传入files，将其他参数传入data，request库会将两者合并到一起做一个multi part，然后发送给服务器。

最终实现的代码是这样的：

									with open(file_name) as f:

									　　content = f.read()

									request_data = {

									  'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(),

									  'filesize':len(content),

									}

									files = {'file':(file_name, open(file_name, 'rb'))}

									MyLogger().getlogger().info('url:%s' % (request_url))

									resp = requests.post(request_url, data=request_data, files=files)

虽然最终代码可能看起来很简单，但是其实我费了好大功夫才确认这样是OK的，中间还翻了requests的源码，下面记录一下翻阅源码的过程：

首先，找到post方法的实现，在requests.api.py中：

									def post(url, data=None, json=None, **kwargs):

									  r"""Sends a POST request.

									  :param url: URL for the new :class:`Request` object.

									  :param data: (optional) Dictionary, list of tuples, bytes, or file-like

									    object to send in the body of the :class:`Request`.

									  :param json: (optional) json data to send in the body of the :class:`Request`.

									  :param \*\*kwargs: Optional arguments that ``request`` takes.

									  :return: :class:`Response <Response>` object

									  :rtype: requests.Response

									  """

									  return request('post', url, data=data, json=json, **kwargs)

这里可以看到它调用了request方法，咱们继续跟进request方法，在requests.api.py中：

									def request(method, url, **kwargs):

									  """Constructs and sends a :class:`Request <Request>`.

									  :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.

									  :param url: URL for the new :class:`Request` object.

									  :param params: (optional) Dictionary, list of tuples or bytes to send

									    in the query string for the :class:`Request`.

									  :param data: (optional) Dictionary, list of tuples, bytes, or file-like

									    object to send in the body of the :class:`Request`.

									  :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.

									  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

									  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

									  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

									    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

									    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

									    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

									    to add for the file.

									  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

									  :param timeout: (optional) How many seconds to wait for the server to send data

									    before giving up, as a float, or a :ref:`(connect timeout, read

									    timeout) <timeouts>` tuple.

									  :type timeout: float or tuple

									  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.

									  :type allow_redirects: bool

									  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

									  :param verify: (optional) Either a boolean, in which case it controls whether we verify

									      the server's TLS certificate, or a string, in which case it must be a path

									      to a CA bundle to use. Defaults to ``True``.

									  :param stream: (optional) if ``False``, the response content will be immediately downloaded.

									  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

									  :return: :class:`Response <Response>` object

									  :rtype: requests.Response

									  Usage::

									   >>> import requests

									   >>> req = requests.request('GET', 'https://httpbin.org/get')

									   <Response [200]>

									  """

									  # By using the 'with' statement we are sure the session is closed, thus we

									  # avoid leaving sockets open which can trigger a ResourceWarning in some

									  # cases, and look like a memory leak in others.

									  with sessions.Session() as session:

									    return session.request(method=method, url=url, **kwargs)

这个方法的注释比较多，从注释里其实已经可以看到files参数使用传送文件，但是还是无法知道当需要同时传递参数和文件时该如何处理，继续跟进session.request方法，在requests.session.py中：

									def request(self, method, url,

									      params=None, data=None, headers=None, cookies=None, files=None,

									      auth=None, timeout=None, allow_redirects=True, proxies=None,

									      hooks=None, stream=None, verify=None, cert=None, json=None):

									    """Constructs a :class:`Request <Request>`, prepares it and sends it.

									    Returns :class:`Response <Response>` object.

									    :param method: method for the new :class:`Request` object.

									    :param url: URL for the new :class:`Request` object.

									    :param params: (optional) Dictionary or bytes to be sent in the query

									      string for the :class:`Request`.

									    :param data: (optional) Dictionary, list of tuples, bytes, or file-like

									      object to send in the body of the :class:`Request`.

									    :param json: (optional) json to send in the body of the

									      :class:`Request`.

									    :param headers: (optional) Dictionary of HTTP Headers to send with the

									      :class:`Request`.

									    :param cookies: (optional) Dict or CookieJar object to send with the

									      :class:`Request`.

									    :param files: (optional) Dictionary of ``'filename': file-like-objects``

									      for multipart encoding upload.

									    :param auth: (optional) Auth tuple or callable to enable

									      Basic/Digest/Custom HTTP Auth.

									    :param timeout: (optional) How long to wait for the server to send

									      data before giving up, as a float, or a :ref:`(connect timeout,

									      read timeout) <timeouts>` tuple.

									    :type timeout: float or tuple

									    :param allow_redirects: (optional) Set to True by default.

									    :type allow_redirects: bool

									    :param proxies: (optional) Dictionary mapping protocol or protocol and

									      hostname to the URL of the proxy.

									    :param stream: (optional) whether to immediately download the response

									      content. Defaults to ``False``.

									    :param verify: (optional) Either a boolean, in which case it controls whether we verify

									      the server's TLS certificate, or a string, in which case it must be a path

									      to a CA bundle to use. Defaults to ``True``.

									    :param cert: (optional) if String, path to ssl client cert file (.pem).

									      If Tuple, ('cert', 'key') pair.

									    :rtype: requests.Response

									    """

									    # Create the Request.

									    req = Request(

									      method=method.upper(),

									      url=url,

									      headers=headers,

									      files=files,

									      data=data or {},

									      json=json,

									      params=params or {},

									      auth=auth,

									      cookies=cookies,

									      hooks=hooks,

									    )

									    prep = self.prepare_request(req)

									    proxies = proxies or {}

									    settings = self.merge_environment_settings(

									      prep.url, proxies, stream, verify, cert

									    )

									    # Send the request.

									    send_kwargs = {

									      'timeout': timeout,

									      'allow_redirects': allow_redirects,

									    }

									    send_kwargs.update(settings)

									    resp = self.send(prep, **send_kwargs)

									    return resp

先大概看一下这个方法，先是准备request，最后一步是调用send，推测应该是发送请求了，所以我们需要跟进到prepare_request方法中，在requests.session.py中：

									def prepare_request(self, request):

									    """Constructs a :class:`PreparedRequest <PreparedRequest>` for

									    transmission and returns it. The :class:`PreparedRequest` has settings

									    merged from the :class:`Request <Request>` instance and those of the

									    :class:`Session`.

									    :param request: :class:`Request` instance to prepare with this

									      session's settings.

									    :rtype: requests.PreparedRequest

									    """

									    cookies = request.cookies or {}

									    # Bootstrap CookieJar.

									    if not isinstance(cookies, cookielib.CookieJar):

									      cookies = cookiejar_from_dict(cookies)

									    # Merge with session cookies

									    merged_cookies = merge_cookies(

									      merge_cookies(RequestsCookieJar(), self.cookies), cookies)

									    # Set environment's basic authentication if not explicitly set.

									    auth = request.auth

									    if self.trust_env and not auth and not self.auth:

									      auth = get_netrc_auth(request.url)

									    p = PreparedRequest()

									    p.prepare(

									      method=request.method.upper(),

									      url=request.url,

									      files=request.files,

									      data=request.data,

									      json=request.json,

									      headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),

									      params=merge_setting(request.params, self.params),

									      auth=merge_setting(auth, self.auth),

									      cookies=merged_cookies,

									      hooks=merge_hooks(request.hooks, self.hooks),

									    )

									    return p

在prepare_request中，生成了一个PreparedRequest对象，并调用其prepare方法，跟进到prepare方法中，在requests.models.py中：

									def prepare(self,

									      method=None, url=None, headers=None, files=None, data=None,

									      params=None, auth=None, cookies=None, hooks=None, json=None):

									    """Prepares the entire request with the given parameters."""

									    self.prepare_method(method)

									    self.prepare_url(url, params)

									    self.prepare_headers(headers)

									    self.prepare_cookies(cookies)

									    self.prepare_body(data, files, json)

									    self.prepare_auth(auth, url)

									    # Note that prepare_auth must be last to enable authentication schemes

									    # such as OAuth to work on a fully prepared request.

									    # This MUST go after prepare_auth. Authenticators could add a hook

									    self.prepare_hooks(hooks)

这里调用许多prepare_xx方法，这里我们只关心处理了data、files、json的方法，跟进到prepare_body中，在requests.models.py中：

									def prepare_body(self, data, files, json=None):

									    """Prepares the given HTTP body data."""

									    # Check if file, fo, generator, iterator.

									    # If not, run through normal process.

									    # Nottin' on you.

									    body = None

									    content_type = None

									    if not data and json is not None:

									      # urllib3 requires a bytes-like body. Python 2's json.dumps

									      # provides this natively, but Python 3 gives a Unicode string.

									      content_type = 'application/json'

									      body = complexjson.dumps(json)

									      if not isinstance(body, bytes):

									        body = body.encode('utf-8')

									    is_stream = all([

									      hasattr(data, '__iter__'),

									      not isinstance(data, (basestring, list, tuple, Mapping))

									    ])

									    try:

									      length = super_len(data)

									    except (TypeError, AttributeError, UnsupportedOperation):

									      length = None

									    if is_stream:

									      body = data

									      if getattr(body, 'tell', None) is not None:

									        # Record the current file position before reading.

									        # This will allow us to rewind a file in the event

									        # of a redirect.

									        try:

									          self._body_position = body.tell()

									        except (IOError, OSError):

									          # This differentiates from None, allowing us to catch

									          # a failed `tell()` later when trying to rewind the body

									          self._body_position = object()

									      if files:

									        raise NotImplementedError('Streamed bodies and files are mutually exclusive.')

									      if length:

									        self.headers['Content-Length'] = builtin_str(length)

									      else:

									        self.headers['Transfer-Encoding'] = 'chunked'

									    else:

									      # Multi-part file uploads.

									      if files:

									        (body, content_type) = self._encode_files(files, data)

									      else:

									        if data:

									          body = self._encode_params(data)

									          if isinstance(data, basestring) or hasattr(data, 'read'):

									            content_type = None

									          else:

									            content_type = 'application/x-www-form-urlencoded'

									      self.prepare_content_length(body)

									      # Add content-type if it wasn't explicitly provided.

									      if content_type and ('content-type' not in self.headers):

									        self.headers['Content-Type'] = content_type

									    self.body = body

这个函数比较长，需要重点关注L52，这里调用了_encode_files方法，我们跟进这个方法：

									def _encode_files(files, data):

									    """Build the body for a multipart/form-data request.

									    Will successfully encode files when passed as a dict or a list of

									    tuples. Order is retained if data is a list of tuples but arbitrary

									    if parameters are supplied as a dict.

									    The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)

									    or 4-tuples (filename, fileobj, contentype, custom_headers).

									    """

									    if (not files):

									      raise ValueError("Files must be provided.")

									    elif isinstance(data, basestring):

									      raise ValueError("Data must not be a string.")

									    new_fields = []

									    fields = to_key_val_list(data or {})

									    files = to_key_val_list(files or {})

									    for field, val in fields:

									      if isinstance(val, basestring) or not hasattr(val, '__iter__'):

									        val = [val]

									      for v in val:

									        if v is not None:

									          # Don't call str() on bytestrings: in Py3 it all goes wrong.

									          if not isinstance(v, bytes):

									            v = str(v)

									          new_fields.append(

									            (field.decode('utf-8') if isinstance(field, bytes) else field,

									             v.encode('utf-8') if isinstance(v, str) else v))

									    for (k, v) in files:

									      # support for explicit filename

									      ft = None

									      fh = None

									      if isinstance(v, (tuple, list)):

									        if len(v) == 2:

									          fn, fp = v

									        elif len(v) == 3:

									          fn, fp, ft = v

									        else:

									          fn, fp, ft, fh = v

									      else:

									        fn = guess_filename(v) or k

									        fp = v

									      if isinstance(fp, (str, bytes, bytearray)):

									        fdata = fp

									      elif hasattr(fp, 'read'):

									        fdata = fp.read()

									      elif fp is None:

									        continue

									      else:

									        fdata = fp

									      rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)

									      rf.make_multipart(content_type=ft)

									      new_fields.append(rf)

									    body, content_type = encode_multipart_formdata(new_fields)

									    return body, content_type