Beautiful Soup - Get arguments attributes which contains strings

Suppose we have a html like below:

<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>

I want to extract the arguments on title attribute if it contains Sports:

So in the end we have a variable sports:

sports = ['Football', 'Badminton', 'Ski Jump']

This is what i use:

sports = soup.find_all('span', {'title': 'Sports'})

I've got nothing

asked Nov 23 '18 at 5:45

JON PANTAU

1488

add a comment |

Suppose we have a html like below:

<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>

I want to extract the arguments on title attribute if it contains Sports:

So in the end we have a variable sports:

sports = ['Football', 'Badminton', 'Ski Jump']

This is what i use:

sports = soup.find_all('span', {'title': 'Sports'})

I've got nothing

asked Nov 23 '18 at 5:45

JON PANTAU

1488

add a comment |

Suppose we have a html like below:

<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>

I want to extract the arguments on title attribute if it contains Sports:

So in the end we have a variable sports:

sports = ['Football', 'Badminton', 'Ski Jump']

This is what i use:

sports = soup.find_all('span', {'title': 'Sports'})

I've got nothing

asked Nov 23 '18 at 5:45

JON PANTAU

1488

Suppose we have a html like below:

<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>

I want to extract the arguments on title attribute if it contains Sports:

So in the end we have a variable sports:

sports = ['Football', 'Badminton', 'Ski Jump']

This is what i use:

sports = soup.find_all('span', {'title': 'Sports'})

I've got nothing

python html beautifulsoup

asked Nov 23 '18 at 5:45

JON PANTAU

1488

asked Nov 23 '18 at 5:45

JON PANTAU

1488

asked Nov 23 '18 at 5:45

JON PANTAU

1488

asked Nov 23 '18 at 5:45

JON PANTAU

1488

asked Nov 23 '18 at 5:45

JON PANTAU

1488

add a comment |

3 Answers
3

active

oldest

votes

You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":

content = """

 <span title="Sports Football">Football</span>

 <span title="Sports Badminton">Tennis</span>

 <span title="Sports Ski Jump">Ski Jump</span>

"""



import re

from bs4 import BeautifulSoup as soup

d = soup(content, 'html.parser')

results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]

Output:

['Football', 'Tennis', 'Ski Jump']

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

add a comment |

You are getting nothing because there is no fixed title just named Sports and it does not work like a wildcard. If you want to get the attribute value of title, you can use get(attr_name) on your tag object that you get using find_all.

from bs4 import BeautifulSoup



html = '''<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>'''



soup = BeautifulSoup(html,"lxml")



title = [s.get('title') for s in soup.find_all('span')]

title

>> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']

In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.

sports = [s.text for s in soup.find_all('span')]

sports

>>['Football', 'Tennis', 'Ski Jump']

edited Nov 23 '18 at 6:08

answered Nov 23 '18 at 6:03

BernardL

2,3381929

title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
– Cua
Nov 23 '18 at 7:01

I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
– BernardL
Nov 23 '18 at 14:11

add a comment |

-1

Maybe the example you gave was just made up off the top of your head but the contents of your spans match what you are looking for exactly - so in that example you could work around by going:
sports = soup.find_all('span', {'title': 'Sports'}).contents
and that will give you the string versions of what you're looking for.

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

1

That will fail, soup.find_all returns a list, not a tag object.
– BernardL
Nov 23 '18 at 6:06

A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
– SmashGuy
Nov 23 '18 at 6:08

Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
– Matthew Sciamanna
Nov 23 '18 at 23:43

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53441208%2fbeautiful-soup-get-arguments-attributes-which-contains-strings%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":

content = """

 <span title="Sports Football">Football</span>

 <span title="Sports Badminton">Tennis</span>

 <span title="Sports Ski Jump">Ski Jump</span>

"""



import re

from bs4 import BeautifulSoup as soup

d = soup(content, 'html.parser')

results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]

Output:

['Football', 'Tennis', 'Ski Jump']

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

add a comment |

You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":

content = """

 <span title="Sports Football">Football</span>

 <span title="Sports Badminton">Tennis</span>

 <span title="Sports Ski Jump">Ski Jump</span>

"""



import re

from bs4 import BeautifulSoup as soup

d = soup(content, 'html.parser')

results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]

Output:

['Football', 'Tennis', 'Ski Jump']

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

add a comment |

You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":

content = """

 <span title="Sports Football">Football</span>

 <span title="Sports Badminton">Tennis</span>

 <span title="Sports Ski Jump">Ski Jump</span>

"""



import re

from bs4 import BeautifulSoup as soup

d = soup(content, 'html.parser')

results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]

Output:

['Football', 'Tennis', 'Ski Jump']

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":

content = """

 <span title="Sports Football">Football</span>

 <span title="Sports Badminton">Tennis</span>

 <span title="Sports Ski Jump">Ski Jump</span>

"""



import re

from bs4 import BeautifulSoup as soup

d = soup(content, 'html.parser')

results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]

Output:

['Football', 'Tennis', 'Ski Jump']

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

answered Nov 23 '18 at 16:06

Ajax1234

40.4k42653

add a comment |

from bs4 import BeautifulSoup



html = '''<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>'''



soup = BeautifulSoup(html,"lxml")



title = [s.get('title') for s in soup.find_all('span')]

title

>> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']

In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.

sports = [s.text for s in soup.find_all('span')]

sports

>>['Football', 'Tennis', 'Ski Jump']

edited Nov 23 '18 at 6:08

answered Nov 23 '18 at 6:03

BernardL

2,3381929

title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
– Cua
Nov 23 '18 at 7:01

I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
– BernardL
Nov 23 '18 at 14:11

add a comment |

from bs4 import BeautifulSoup



html = '''<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>'''



soup = BeautifulSoup(html,"lxml")



title = [s.get('title') for s in soup.find_all('span')]

title

>> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']

In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.

sports = [s.text for s in soup.find_all('span')]

sports

>>['Football', 'Tennis', 'Ski Jump']

edited Nov 23 '18 at 6:08

answered Nov 23 '18 at 6:03

BernardL

2,3381929

title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
– Cua
Nov 23 '18 at 7:01

I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
– BernardL
Nov 23 '18 at 14:11

add a comment |

from bs4 import BeautifulSoup



html = '''<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>'''



soup = BeautifulSoup(html,"lxml")



title = [s.get('title') for s in soup.find_all('span')]

title

>> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']

In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.

sports = [s.text for s in soup.find_all('span')]

sports

>>['Football', 'Tennis', 'Ski Jump']

edited Nov 23 '18 at 6:08

answered Nov 23 '18 at 6:03

BernardL

2,3381929

from bs4 import BeautifulSoup



html = '''<span title="Sports Football">Football</span>

<span title="Sports Badminton">Tennis</span>

<span title="Sports Ski Jump">Ski Jump</span>'''



soup = BeautifulSoup(html,"lxml")



title = [s.get('title') for s in soup.find_all('span')]

title

>> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']

In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.

sports = [s.text for s in soup.find_all('span')]

sports

>>['Football', 'Tennis', 'Ski Jump']

edited Nov 23 '18 at 6:08

answered Nov 23 '18 at 6:03

BernardL

2,3381929

edited Nov 23 '18 at 6:08

answered Nov 23 '18 at 6:03

BernardL

2,3381929

answered Nov 23 '18 at 6:03

BernardL

2,3381929

answered Nov 23 '18 at 6:03

BernardL

2,3381929

title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
– Cua
Nov 23 '18 at 7:01

I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
– BernardL
Nov 23 '18 at 14:11

add a comment |

title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
– Cua
Nov 23 '18 at 7:01

I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
– BernardL
Nov 23 '18 at 14:11

title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
– Cua
Nov 23 '18 at 7:01

I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
– BernardL
Nov 23 '18 at 14:11

add a comment |

-1

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

1

That will fail, soup.find_all returns a list, not a tag object.
– BernardL
Nov 23 '18 at 6:06

A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
– SmashGuy
Nov 23 '18 at 6:08

Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
– Matthew Sciamanna
Nov 23 '18 at 23:43

add a comment |

-1

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

1

That will fail, soup.find_all returns a list, not a tag object.
– BernardL
Nov 23 '18 at 6:06

A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
– SmashGuy
Nov 23 '18 at 6:08

Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
– Matthew Sciamanna
Nov 23 '18 at 23:43

add a comment |

-1

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

edited Nov 23 '18 at 6:55

SmashGuy

1,0471613

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

answered Nov 23 '18 at 6:03

Matthew Sciamanna

214

1

That will fail, soup.find_all returns a list, not a tag object.
– BernardL
Nov 23 '18 at 6:06

A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
– SmashGuy
Nov 23 '18 at 6:08

Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
– Matthew Sciamanna
Nov 23 '18 at 23:43

add a comment |

1

That will fail, soup.find_all returns a list, not a tag object.
– BernardL
Nov 23 '18 at 6:06

A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
– SmashGuy
Nov 23 '18 at 6:08

Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
– Matthew Sciamanna
Nov 23 '18 at 23:43

That will fail, soup.find_all returns a list, not a tag object.
– BernardL
Nov 23 '18 at 6:06

A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
– SmashGuy
Nov 23 '18 at 6:08

Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
– Matthew Sciamanna
Nov 23 '18 at 23:43

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htykuut