Python 爬虫处理 Post 请求

Post 请求发送的数据为 Data 数据

  • 携带 Data 数据的 post 请求

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    RequestURL:http://127.0.0.1:8080/test/test.do
    Request Method:POST
    Status Code:200 OK

    Request Headers
    Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    Accept-Encoding:gzip,deflate,sdch
    Accept-Language:zh-CN,zh;q=0.8,en;q=0.6
    AlexaToolbar-ALX_NS_PH:AlexaToolbar/alxg-3.2
    Cache-Control:max-age=0
    Connection:keep-alive
    Content-Length:25
    Content-Type:application/x-www-form-urlencoded
    Cookie:JSESSIONID=74AC93F9F572980B6FC10474CD8EDD8D
    Host:127.0.0.1:8080
    Origin:http://127.0.0.1:8080
    Referer:http://127.0.0.1:8080/test/index.jsp
    User-Agent:Mozilla/5.0 (Windows NT 6.1)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36

    Form Data
    name:mikan
    address:street

    Response Headers
    Content-Length:2
    Date:Sun, 11 May 2014 11:05:33 GMT
    Server:Apache-Coyote/1.1
  • 使用 Requests 模块发生 携带 Data 数据 的 post 请求

    1
    2
    3
    4
    5
    response = requests.post(
    url = "请求 url 地址",
    headers = "请求头字典",
    data = "请求数据字典",
    )

Post 请求发送的数据为 Request Payload 数据

  • 携带 Request Payload 数据的 post 请求

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    General
    Request URL: https://account.alibabacloud.com/csp/report.htm
    Request Method: POST
    Status Code: 200
    Remote Address: 47.88.251.186:443
    Referrer Policy: strict-origin-when-cross-origin

    Response Headers
    content-encoding: gzip
    content-type: text/html;charset=UTF-8
    date: Fri, 25 Jun 2021 07:54:07 GMT
    eagleeye-traceid: 0a98a6bb16246076475364468e2bdc
    server: Tengine
    strict-transport-security: max-age=0
    timing-allow-origin: *
    vary: Accept-Encoding

    Request Headers
    :authority: account.alibabacloud.com
    :method: POST
    :path: /csp/report.htm
    :scheme: https
    accept: */*
    accept-encoding: gzip, deflate, br
    accept-language: zh-CN,zh;q=0.9
    content-length: 1047
    content-type: application/csp-report
    cookie: ******************************************************************************************************
    origin: https://account.alibabacloud.com
    referer: https://account.alibabacloud.com/login/login.htm
    sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"
    sec-ch-ua-mobile: ?0
    sec-fetch-dest: report
    sec-fetch-mode: no-cors
    sec-fetch-site: same-origin
    user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36

    Request Payload
    {"csp-report":{"document-uri":"https://account.alibabacloud.com/login/login.htm","referrer":"https://www.google.com/","violated-directive":"script-src-elem","effective-directive":"script-src-elem","original-policy":"base-uri 'self';script-src 'self' 'unsafe-inline' 'unsafe-eval' 'report-sample' https: http: 'sha256-lfXlPY3+MCPOPb4mrw1Y961+745U3WlDQVcOXdchSQc=' 'sha256-QbgF6nrAFOI1VumLs3RwKgg0Qmj5JImgLwiAhJOUoeQ=' 'sha256-rRMdkshZyJlCmDX27XnL7g3zXaxv7ei6Sg+yt4R3svU=' 'sha256-kbHtQyYDQKz4SWMQ8OHVol3EC0t3tHEJFPCSwNG9NxQ=' 'sha256-46mc3H6z56gnOReRHr//8M7FxjqtSaDN7KetqqduuiE=' 'Strict-Dynamic' 'unsafe-hashes' 'nonce-0MVMRYu19o';frame-src 'self' *.aliyun.com *.alibaba.com *.alibabacloud.com gaic.alicdn.com g.alicdn.com;worker-src blob: 'self' data:;object-src 'self' g.alicdn.com;frame-ancestors *.aliyun.com;report-uri /csp/report.htm;","disposition":"report","blocked-uri":"inline","line-number":59,"source-file":"https://account.alibabacloud.com/login/login.htm","status-code":0,"script-sample":"var ALIYUN_ACCOUNT_LOGIN_CONFIG = {\n "}}
  • 使用 Requests 模块发生 携带 Request Payload 数据 的 post 请求

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    import json
    import requests
    from pprint import pprint

    url = ""
    payloadHeaders = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
    "content-type": "application/json;charset=UTF-8",
    }

    payloadData = {"query": [],
    "start": 6000,
    "rows": 100,
    "sort_field": {"sort_field": "ImpactFactor"},
    "highlight_field": "",
    "pinyin_title": [],
    "class_code": "",
    "core_periodical": [],
    "sponsor_region": [],
    "publishing_period": [],
    "publish_status": "",
    "return_fields": ["Title", "Id", "CorePeriodical", "Award", "IsPrePublished"]}

    response = requests.post(url,
    # 将 data 转化为 json 格式数据
    data=json.dumps(payloadData),
    headers=payloadHeaders)
    dates = response.json()
    print(dates['value'][-1])
    print(len(dates['value']))

两者区别

  • 请求头中 Content-Type: application/x-www-form-urlencoded,那么就是一个 POST 表单请求,请求主体将以一个标准的键值对和&的querystring形式出现。这种方式是HTML表单的默认设置,所以在过去这种方式更加常见。
  • 其他形式的POST请求,是放到 Request payload 中(现在是为了方便阅读,使用了Json这样的数据格式),请求的Content-Type: application/json;charset=UTF-8或者不指定。