Programatically identified cookie is not getting accepted











up vote
0
down vote

favorite












I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.



I have tried the following two codes



First approach



import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass


Second approach



with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']


All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.



The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".



This is very puzzling. What is that I am missing in this whole process?










share|improve this question
























  • If you need chrome to get a good session cookie then you should use selenium or headless chrome.
    – pguardiario
    Nov 22 at 9:33















up vote
0
down vote

favorite












I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.



I have tried the following two codes



First approach



import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass


Second approach



with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']


All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.



The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".



This is very puzzling. What is that I am missing in this whole process?










share|improve this question
























  • If you need chrome to get a good session cookie then you should use selenium or headless chrome.
    – pguardiario
    Nov 22 at 9:33













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.



I have tried the following two codes



First approach



import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass


Second approach



with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']


All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.



The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".



This is very puzzling. What is that I am missing in this whole process?










share|improve this question















I am working on a web scraper on Python 2 that reads some contents of a website. To access the contents, I need to pass a cookie. Right now, I am finding the cookie by opening the website in Chrome, and finding the cookie from site information. I am hardcoding this cookie into my scraper and getting contents from website. However, the cookies gets invalidated in some hours and then no information can be extracted from the website. To address this, I am trying to refresh the cookie in my scraper itself when a new cookie is needed.



I have tried the following two codes



First approach



import requests
import browsercookie
try:
cj = browsercookie.chrome()
session = requests.Session()
r = session.get(base_url, cookies=cj)
new_cookie = str(session.cookies.get_dict()['JSESSIONID'])
except Exception as e:
pass


Second approach



with requests.Session() as s:
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
headers['Connection'] = 'keep-alive'
r = s.get(baseurl, headers=headers)
new_cookie = s.cookies.get_dict()['JSESSIONID']


All of these codes return cookies that looks perfectly fine. The problem I am facing is that these programatically identified cookies make the scraper not extract any result. When I send the cookie found in browser as hardcoded while making a request to website from scraper, the scraper gets the DOM of the website. But When I send the cookie found programatically while making a request to the website from scraper, the scraper cant access the DOM of the webiste.



The cookie information on the browser says that the cookie gets invalidated "When the browsing session ends".



This is very puzzling. What is that I am missing in this whole process?







python cookies web-scraping request






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 7:52

























asked Nov 22 at 7:42









harshvardhan

18213




18213












  • If you need chrome to get a good session cookie then you should use selenium or headless chrome.
    – pguardiario
    Nov 22 at 9:33


















  • If you need chrome to get a good session cookie then you should use selenium or headless chrome.
    – pguardiario
    Nov 22 at 9:33
















If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33




If you need chrome to get a good session cookie then you should use selenium or headless chrome.
– pguardiario
Nov 22 at 9:33

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426041%2fprogramatically-identified-cookie-is-not-getting-accepted%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426041%2fprogramatically-identified-cookie-is-not-getting-accepted%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Berounka

Sphinx de Gizeh

Different font size/position of beamer's navigation symbols template's content depending on regular/plain...