scraping AJAX content on webpage with requests python
I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests
package. But strangely, by using the Headers
information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.
The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:
headers = {"authority": "cafe.bithumb.com",
"path": "/boards/43/contents",
"method": "POST",
"origin":"https://cafe.bithumb.com",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"accept-encoding":"gzip, deflate, br",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"accept":"application/json, text/javascript, */*; q=0.01",
"referer":"https://cafe.bithumb.com/view/boards/43",
"x-requested-with":"XMLHttpRequest",
"scheme": "https",
"content-length":"1107"}
s=requests.Session()
s.headers.update(headers)
r = s.post('https://cafe.bithumb.com/boards/43/contents')
python html ajax python-requests
add a comment |
I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests
package. But strangely, by using the Headers
information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.
The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:
headers = {"authority": "cafe.bithumb.com",
"path": "/boards/43/contents",
"method": "POST",
"origin":"https://cafe.bithumb.com",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"accept-encoding":"gzip, deflate, br",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"accept":"application/json, text/javascript, */*; q=0.01",
"referer":"https://cafe.bithumb.com/view/boards/43",
"x-requested-with":"XMLHttpRequest",
"scheme": "https",
"content-length":"1107"}
s=requests.Session()
s.headers.update(headers)
r = s.post('https://cafe.bithumb.com/boards/43/contents')
python html ajax python-requests
add a comment |
I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests
package. But strangely, by using the Headers
information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.
The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:
headers = {"authority": "cafe.bithumb.com",
"path": "/boards/43/contents",
"method": "POST",
"origin":"https://cafe.bithumb.com",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"accept-encoding":"gzip, deflate, br",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"accept":"application/json, text/javascript, */*; q=0.01",
"referer":"https://cafe.bithumb.com/view/boards/43",
"x-requested-with":"XMLHttpRequest",
"scheme": "https",
"content-length":"1107"}
s=requests.Session()
s.headers.update(headers)
r = s.post('https://cafe.bithumb.com/boards/43/contents')
python html ajax python-requests
I'm trying to scraping an AJAX loaded part on a webpage without executing the javascript. By using Chrome dev tool, I found that the AJAX container is pulling the content from a URL through a POST request, so I want to duplicate the request with python requests
package. But strangely, by using the Headers
information given from Chrome, I always get 400 error, and the same happens with the curl command copied from Chrome. So I'm wondering whether someone could kindly share some insights.
The website I'm interested in is here. Using Chrome: ctrl-shift-I, network, XHR, and the part I want is 'content'. The script I'm using is:
headers = {"authority": "cafe.bithumb.com",
"path": "/boards/43/contents",
"method": "POST",
"origin":"https://cafe.bithumb.com",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"accept-encoding":"gzip, deflate, br",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"accept":"application/json, text/javascript, */*; q=0.01",
"referer":"https://cafe.bithumb.com/view/boards/43",
"x-requested-with":"XMLHttpRequest",
"scheme": "https",
"content-length":"1107"}
s=requests.Session()
s.headers.update(headers)
r = s.post('https://cafe.bithumb.com/boards/43/contents')
python html ajax python-requests
python html ajax python-requests
asked Nov 24 '18 at 0:19
LampardLampard
34
34
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx
). That means you can scrape Ajax data by modifying draw
and start
.
Edit: Data was transformed to dictionary so we do not need urlencode
, also we don't need cookie(i tested).
import requests
import json
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Origin": "https://cafe.bithumb.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"DNT": "1",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "https://cafe.bithumb.com/view/boards/43",
"Accept-Encoding": "gzip, deflate, br"
}
string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""
article_root = "https://cafe.bithumb.com/view/board-contents/{}"
for page in range(1,4):
with requests.Session() as s:
s.headers.update(headers)
data = {"draw":page}
data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
data["start"] = 30 * (page - 1)
r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler
json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
for each in json_data:
url = article_root.format(each[0])
print(url)
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you getdata
please? 3. your codes fetches the content of the 2nd page, I've triedpage =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!
– Lampard
Nov 24 '18 at 10:43
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent ondraw
andstart
. i have edited my answer and compared with browser's response(same)
– kcorlidy
Nov 24 '18 at 11:48
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454140%2fscraping-ajax-content-on-webpage-with-requests-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx
). That means you can scrape Ajax data by modifying draw
and start
.
Edit: Data was transformed to dictionary so we do not need urlencode
, also we don't need cookie(i tested).
import requests
import json
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Origin": "https://cafe.bithumb.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"DNT": "1",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "https://cafe.bithumb.com/view/boards/43",
"Accept-Encoding": "gzip, deflate, br"
}
string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""
article_root = "https://cafe.bithumb.com/view/board-contents/{}"
for page in range(1,4):
with requests.Session() as s:
s.headers.update(headers)
data = {"draw":page}
data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
data["start"] = 30 * (page - 1)
r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler
json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
for each in json_data:
url = article_root.format(each[0])
print(url)
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you getdata
please? 3. your codes fetches the content of the 2nd page, I've triedpage =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!
– Lampard
Nov 24 '18 at 10:43
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent ondraw
andstart
. i have edited my answer and compared with browser's response(same)
– kcorlidy
Nov 24 '18 at 11:48
add a comment |
You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx
). That means you can scrape Ajax data by modifying draw
and start
.
Edit: Data was transformed to dictionary so we do not need urlencode
, also we don't need cookie(i tested).
import requests
import json
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Origin": "https://cafe.bithumb.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"DNT": "1",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "https://cafe.bithumb.com/view/boards/43",
"Accept-Encoding": "gzip, deflate, br"
}
string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""
article_root = "https://cafe.bithumb.com/view/board-contents/{}"
for page in range(1,4):
with requests.Session() as s:
s.headers.update(headers)
data = {"draw":page}
data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
data["start"] = 30 * (page - 1)
r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler
json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
for each in json_data:
url = article_root.format(each[0])
print(url)
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you getdata
please? 3. your codes fetches the content of the 2nd page, I've triedpage =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!
– Lampard
Nov 24 '18 at 10:43
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent ondraw
andstart
. i have edited my answer and compared with browser's response(same)
– kcorlidy
Nov 24 '18 at 11:48
add a comment |
You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx
). That means you can scrape Ajax data by modifying draw
and start
.
Edit: Data was transformed to dictionary so we do not need urlencode
, also we don't need cookie(i tested).
import requests
import json
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Origin": "https://cafe.bithumb.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"DNT": "1",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "https://cafe.bithumb.com/view/boards/43",
"Accept-Encoding": "gzip, deflate, br"
}
string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""
article_root = "https://cafe.bithumb.com/view/board-contents/{}"
for page in range(1,4):
with requests.Session() as s:
s.headers.update(headers)
data = {"draw":page}
data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
data["start"] = 30 * (page - 1)
r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler
json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
for each in json_data:
url = article_root.format(each[0])
print(url)
You just need to compare two post data, then you will find they have almost same except the a few parameter(draw=page...start=xx
). That means you can scrape Ajax data by modifying draw
and start
.
Edit: Data was transformed to dictionary so we do not need urlencode
, also we don't need cookie(i tested).
import requests
import json
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Origin": "https://cafe.bithumb.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"DNT": "1",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "https://cafe.bithumb.com/view/boards/43",
"Accept-Encoding": "gzip, deflate, br"
}
string = """columns[0][data]=0&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=1&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=false&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=2&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=false&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=3&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=false&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=4&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=false&columns[4][search][value]=&columns[4][search][regex]=false&start=30&length=30&search[value]=&search[regex]=false"""
article_root = "https://cafe.bithumb.com/view/board-contents/{}"
for page in range(1,4):
with requests.Session() as s:
s.headers.update(headers)
data = {"draw":page}
data.update( { ele[:ele.find("=")]:ele[ele.find("=")+1:] for ele in string.split("&") } )
data["start"] = 30 * (page - 1)
r = s.post('https://cafe.bithumb.com/boards/43/contents', data = data, verify = False) # set verify = False while you are using fiddler
json_data = json.loads(r.text).get("data") # transform string to dict then we can extract data easier
for each in json_data:
url = article_root.format(each[0])
print(url)
edited Nov 24 '18 at 11:42
answered Nov 24 '18 at 6:21
kcorlidykcorlidy
2,2102318
2,2102318
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you getdata
please? 3. your codes fetches the content of the 2nd page, I've triedpage =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!
– Lampard
Nov 24 '18 at 10:43
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent ondraw
andstart
. i have edited my answer and compared with browser's response(same)
– kcorlidy
Nov 24 '18 at 11:48
add a comment |
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you getdata
please? 3. your codes fetches the content of the 2nd page, I've triedpage =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!
– Lampard
Nov 24 '18 at 10:43
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent ondraw
andstart
. i have edited my answer and compared with browser's response(same)
– kcorlidy
Nov 24 '18 at 11:48
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get
data
please? 3. your codes fetches the content of the 2nd page, I've tried page =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!– Lampard
Nov 24 '18 at 10:43
thanks @kcorlidy, your codes works like a charm. I'm new to this, could you please elaborate about: 1.how to compare the two post data? The codes in my question was trying to replicate what I got from chrome dev tool, but I was not able to intercept the original request. 2. where did you get
data
please? 3. your codes fetches the content of the 2nd page, I've tried page =0
but it does not work, how should I change the code? 4. It seems to take 3-4 sec to get the response compared to ~0.5 within chrome, is it normal? Sorry for the barrage of questions and thanks very much!– Lampard
Nov 24 '18 at 10:43
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard 1 and 2.use fiddler 3.there is no page zero, page > 0. 4. It is normal, why you take 3-4 sec because you are setting up an new connection. And i will edit my answer to show you what to do
– kcorlidy
Nov 24 '18 at 11:04
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on
draw
and start
. i have edited my answer and compared with browser's response(same)– kcorlidy
Nov 24 '18 at 11:48
@Lampard i'm sorry because i haven't access page 2 at first. I found the response is dependent on
draw
and start
. i have edited my answer and compared with browser's response(same)– kcorlidy
Nov 24 '18 at 11:48
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454140%2fscraping-ajax-content-on-webpage-with-requests-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown