scraping AJAX content on webpage with requests python












0















I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests package. But strangely, by using the Headers information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.



The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:



headers = {"authority": "cafe.bithumb.com",
"path": "/boards/43/contents",
"method": "POST",
"origin":"https://cafe.bithumb.com",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"accept-encoding":"gzip, deflate, br",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"accept":"application/json, text/javascript, */*; q=0.01",
"referer":"https://cafe.bithumb.com/view/boards/43",
"x-requested-with":"XMLHttpRequest",
"scheme": "https",
"content-length":"1107"}
s=requests.Session()
s.headers.update(headers)
r = s.post('https://cafe.bithumb.com/boards/43/contents')









share|improve this question



























    0















    I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests package. But strangely, by using the Headers information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.



    The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:



    headers = {"authority": "cafe.bithumb.com",
    "path": "/boards/43/contents",
    "method": "POST",
    "origin":"https://cafe.bithumb.com",
    "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
    "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
    "accept-encoding":"gzip, deflate, br",
    "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "accept":"application/json, text/javascript, */*; q=0.01",
    "referer":"https://cafe.bithumb.com/view/boards/43",
    "x-requested-with":"XMLHttpRequest",
    "scheme": "https",
    "content-length":"1107"}
    s=requests.Session()
    s.headers.update(headers)
    r = s.post('https://cafe.bithumb.com/boards/43/contents')









    share|improve this question

























      0












      0








      0








      I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests package. But strangely, by using the Headers information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.



      The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:



      headers = {"authority": "cafe.bithumb.com",
      "path": "/boards/43/contents",
      "method": "POST",
      "origin":"https://cafe.bithumb.com",
      "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
      "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
      "accept-encoding":"gzip, deflate, br",
      "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
      "accept":"application/json, text/javascript, */*; q=0.01",
      "referer":"https://cafe.bithumb.com/view/boards/43",
      "x-requested-with":"XMLHttpRequest",
      "scheme": "https",
      "content-length":"1107"}
      s=requests.Session()
      s.headers.update(headers)
      r = s.post('https://cafe.bithumb.com/boards/43/contents')









      share|improve this question














      I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests package. But strangely, by using the Headers information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.



      The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:



      headers = {"authority": "cafe.bithumb.com",
      "path": "/boards/43/contents",
      "method": "POST",
      "origin":"https://cafe.bithumb.com",
      "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
      "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
      "accept-encoding":"gzip, deflate, br",
      "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
      "accept":"application/json, text/javascript, */*; q=0.01",
      "referer":"https://cafe.bithumb.com/view/boards/43",
      "x-requested-with":"XMLHttpRequest",
      "scheme": "https",
      "content-length":"1107"}
      s=requests.Session()
      s.headers.update(headers)
      r = s.post('https://cafe.bithumb.com/boards/43/contents')






      python html ajax python-requests






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 24 '18 at 0:19









      LampardLampard

      34




      34
























          1 Answer
          1






          active

          oldest

          votes


















          0














          You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx). That means you can scrape Ajax data by modifying draw and start.



          Edit: Data was transformed to dictionary so we do not need urlencode, also we don't need cookie(i tested).



          import requests
          import json

          headers = {
          "Accept": "application/json, text/javascript, */*; q=0.01",
          "Origin": "https://cafe.bithumb.com",
          "X-Requested-With": "XMLHttpRequest",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
          "DNT": "1",
          "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
          "Referer": "https://cafe.bithumb.com/view/boards/43",
          "Accept-Encoding": "gzip, deflate, br"
          }

          string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""


          article_root = "https://cafe.bithumb.com/view/board-contents/{}"

          for page in range(1,4):
          with requests.Session() as s:
          s.headers.update(headers)

          data = {"draw":page}
          data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
          data["start"] = 30 * (page - 1)

          r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler

          json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
          for each in json_data:
          url = article_root.format(each[0])
          print(url)





          share|improve this answer


























          • thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

            – Lampard
            Nov 24 '18 at 10:43











          • @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

            – kcorlidy
            Nov 24 '18 at 11:04











          • @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

            – kcorlidy
            Nov 24 '18 at 11:48











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454140%2fscraping-ajax-content-on-webpage-with-requests-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx). That means you can scrape Ajax data by modifying draw and start.



          Edit: Data was transformed to dictionary so we do not need urlencode, also we don't need cookie(i tested).



          import requests
          import json

          headers = {
          "Accept": "application/json, text/javascript, */*; q=0.01",
          "Origin": "https://cafe.bithumb.com",
          "X-Requested-With": "XMLHttpRequest",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
          "DNT": "1",
          "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
          "Referer": "https://cafe.bithumb.com/view/boards/43",
          "Accept-Encoding": "gzip, deflate, br"
          }

          string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""


          article_root = "https://cafe.bithumb.com/view/board-contents/{}"

          for page in range(1,4):
          with requests.Session() as s:
          s.headers.update(headers)

          data = {"draw":page}
          data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
          data["start"] = 30 * (page - 1)

          r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler

          json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
          for each in json_data:
          url = article_root.format(each[0])
          print(url)





          share|improve this answer


























          • thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

            – Lampard
            Nov 24 '18 at 10:43











          • @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

            – kcorlidy
            Nov 24 '18 at 11:04











          • @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

            – kcorlidy
            Nov 24 '18 at 11:48
















          0














          You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx). That means you can scrape Ajax data by modifying draw and start.



          Edit: Data was transformed to dictionary so we do not need urlencode, also we don't need cookie(i tested).



          import requests
          import json

          headers = {
          "Accept": "application/json, text/javascript, */*; q=0.01",
          "Origin": "https://cafe.bithumb.com",
          "X-Requested-With": "XMLHttpRequest",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
          "DNT": "1",
          "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
          "Referer": "https://cafe.bithumb.com/view/boards/43",
          "Accept-Encoding": "gzip, deflate, br"
          }

          string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""


          article_root = "https://cafe.bithumb.com/view/board-contents/{}"

          for page in range(1,4):
          with requests.Session() as s:
          s.headers.update(headers)

          data = {"draw":page}
          data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
          data["start"] = 30 * (page - 1)

          r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler

          json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
          for each in json_data:
          url = article_root.format(each[0])
          print(url)





          share|improve this answer


























          • thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

            – Lampard
            Nov 24 '18 at 10:43











          • @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

            – kcorlidy
            Nov 24 '18 at 11:04











          • @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

            – kcorlidy
            Nov 24 '18 at 11:48














          0












          0








          0







          You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx). That means you can scrape Ajax data by modifying draw and start.



          Edit: Data was transformed to dictionary so we do not need urlencode, also we don't need cookie(i tested).



          import requests
          import json

          headers = {
          "Accept": "application/json, text/javascript, */*; q=0.01",
          "Origin": "https://cafe.bithumb.com",
          "X-Requested-With": "XMLHttpRequest",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
          "DNT": "1",
          "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
          "Referer": "https://cafe.bithumb.com/view/boards/43",
          "Accept-Encoding": "gzip, deflate, br"
          }

          string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""


          article_root = "https://cafe.bithumb.com/view/board-contents/{}"

          for page in range(1,4):
          with requests.Session() as s:
          s.headers.update(headers)

          data = {"draw":page}
          data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
          data["start"] = 30 * (page - 1)

          r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler

          json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
          for each in json_data:
          url = article_root.format(each[0])
          print(url)





          share|improve this answer















          You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx). That means you can scrape Ajax data by modifying draw and start.



          Edit: Data was transformed to dictionary so we do not need urlencode, also we don't need cookie(i tested).



          import requests
          import json

          headers = {
          "Accept": "application/json, text/javascript, */*; q=0.01",
          "Origin": "https://cafe.bithumb.com",
          "X-Requested-With": "XMLHttpRequest",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
          "DNT": "1",
          "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
          "Referer": "https://cafe.bithumb.com/view/boards/43",
          "Accept-Encoding": "gzip, deflate, br"
          }

          string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""


          article_root = "https://cafe.bithumb.com/view/board-contents/{}"

          for page in range(1,4):
          with requests.Session() as s:
          s.headers.update(headers)

          data = {"draw":page}
          data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
          data["start"] = 30 * (page - 1)

          r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler

          json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
          for each in json_data:
          url = article_root.format(each[0])
          print(url)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 24 '18 at 11:42

























          answered Nov 24 '18 at 6:21









          kcorlidykcorlidy

          2,2102318




          2,2102318













          • thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

            – Lampard
            Nov 24 '18 at 10:43











          • @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

            – kcorlidy
            Nov 24 '18 at 11:04











          • @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

            – kcorlidy
            Nov 24 '18 at 11:48



















          • thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

            – Lampard
            Nov 24 '18 at 10:43











          • @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

            – kcorlidy
            Nov 24 '18 at 11:04











          • @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

            – kcorlidy
            Nov 24 '18 at 11:48

















          thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

          – Lampard
          Nov 24 '18 at 10:43





          thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get data please? 3. your codes fetches the content of the 2nd page, I've tried page =0 but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!

          – Lampard
          Nov 24 '18 at 10:43













          @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

          – kcorlidy
          Nov 24 '18 at 11:04





          @Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do

          – kcorlidy
          Nov 24 '18 at 11:04













          @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

          – kcorlidy
          Nov 24 '18 at 11:48





          @Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on draw and start. i have edited my answer and compared with browser's response(same)

          – kcorlidy
          Nov 24 '18 at 11:48


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454140%2fscraping-ajax-content-on-webpage-with-requests-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Berounka

          Fiat S.p.A.

          Type 'String' is not a subtype of type 'int' of 'index'