Beautiful Soup - Get arguments attributes which contains strings












3














Suppose we have a html like below:



<span title="Sports Football">Football</span>
<span title="Sports Badminton">Tennis</span>
<span title="Sports Ski Jump">Ski Jump</span>


I want to extract the arguments on title attribute if it contains Sports:



So in the end we have a variable sports:



sports = ['Football', 'Badminton', 'Ski Jump']


This is what i use:



sports = soup.find_all('span', {'title': 'Sports'})


I've got nothing










share|improve this question



























    3














    Suppose we have a html like below:



    <span title="Sports Football">Football</span>
    <span title="Sports Badminton">Tennis</span>
    <span title="Sports Ski Jump">Ski Jump</span>


    I want to extract the arguments on title attribute if it contains Sports:



    So in the end we have a variable sports:



    sports = ['Football', 'Badminton', 'Ski Jump']


    This is what i use:



    sports = soup.find_all('span', {'title': 'Sports'})


    I've got nothing










    share|improve this question

























      3












      3








      3







      Suppose we have a html like below:



      <span title="Sports Football">Football</span>
      <span title="Sports Badminton">Tennis</span>
      <span title="Sports Ski Jump">Ski Jump</span>


      I want to extract the arguments on title attribute if it contains Sports:



      So in the end we have a variable sports:



      sports = ['Football', 'Badminton', 'Ski Jump']


      This is what i use:



      sports = soup.find_all('span', {'title': 'Sports'})


      I've got nothing










      share|improve this question













      Suppose we have a html like below:



      <span title="Sports Football">Football</span>
      <span title="Sports Badminton">Tennis</span>
      <span title="Sports Ski Jump">Ski Jump</span>


      I want to extract the arguments on title attribute if it contains Sports:



      So in the end we have a variable sports:



      sports = ['Football', 'Badminton', 'Ski Jump']


      This is what i use:



      sports = soup.find_all('span', {'title': 'Sports'})


      I've got nothing







      python html beautifulsoup






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 5:45









      JON PANTAU

      1488




      1488
























          3 Answers
          3






          active

          oldest

          votes


















          1














          You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":



          content = """
          <span title="Sports Football">Football</span>
          <span title="Sports Badminton">Tennis</span>
          <span title="Sports Ski Jump">Ski Jump</span>
          """

          import re
          from bs4 import BeautifulSoup as soup
          d = soup(content, 'html.parser')
          results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]


          Output:



          ['Football', 'Tennis', 'Ski Jump']





          share|improve this answer





























            0














            You are getting nothing because there is no fixed title just named Sports and it does not work like a wildcard. If you want to get the attribute value of title, you can use get(attr_name) on your tag object that you get using find_all.



            from bs4 import BeautifulSoup

            html = '''<span title="Sports Football">Football</span>
            <span title="Sports Badminton">Tennis</span>
            <span title="Sports Ski Jump">Ski Jump</span>'''

            soup = BeautifulSoup(html,"lxml")

            title = [s.get('title') for s in soup.find_all('span')]
            title
            >> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']


            In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.



            sports = [s.text for s in soup.find_all('span')]
            sports
            >>['Football', 'Tennis', 'Ski Jump']





            share|improve this answer























            • title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
              – Cua
              Nov 23 '18 at 7:01












            • I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
              – BernardL
              Nov 23 '18 at 14:11



















            -1














            Maybe the example you gave was just made up off the top of your head but the contents of your spans match what you are looking for exactly - so in that example you could work around by going:
            sports = soup.find_all('span', {'title': 'Sports'}).contents
            and that will give you the string versions of what you're looking for.






            share|improve this answer



















            • 1




              That will fail, soup.find_all returns a list, not a tag object.
              – BernardL
              Nov 23 '18 at 6:06










            • A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
              – SmashGuy
              Nov 23 '18 at 6:08










            • Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
              – Matthew Sciamanna
              Nov 23 '18 at 23:43











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53441208%2fbeautiful-soup-get-arguments-attributes-which-contains-strings%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":



            content = """
            <span title="Sports Football">Football</span>
            <span title="Sports Badminton">Tennis</span>
            <span title="Sports Ski Jump">Ski Jump</span>
            """

            import re
            from bs4 import BeautifulSoup as soup
            d = soup(content, 'html.parser')
            results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]


            Output:



            ['Football', 'Tennis', 'Ski Jump']





            share|improve this answer


























              1














              You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":



              content = """
              <span title="Sports Football">Football</span>
              <span title="Sports Badminton">Tennis</span>
              <span title="Sports Ski Jump">Ski Jump</span>
              """

              import re
              from bs4 import BeautifulSoup as soup
              d = soup(content, 'html.parser')
              results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]


              Output:



              ['Football', 'Tennis', 'Ski Jump']





              share|improve this answer
























                1












                1








                1






                You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":



                content = """
                <span title="Sports Football">Football</span>
                <span title="Sports Badminton">Tennis</span>
                <span title="Sports Ski Jump">Ski Jump</span>
                """

                import re
                from bs4 import BeautifulSoup as soup
                d = soup(content, 'html.parser')
                results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]


                Output:



                ['Football', 'Tennis', 'Ski Jump']





                share|improve this answer












                You can use re.compile with BeautifulSoup to find all span tags if the first part of the title attribute is "Sports":



                content = """
                <span title="Sports Football">Football</span>
                <span title="Sports Badminton">Tennis</span>
                <span title="Sports Ski Jump">Ski Jump</span>
                """

                import re
                from bs4 import BeautifulSoup as soup
                d = soup(content, 'html.parser')
                results = [i.text for i in d.find_all('span', {'title':re.compile('^Sportss')})]


                Output:



                ['Football', 'Tennis', 'Ski Jump']






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 23 '18 at 16:06









                Ajax1234

                40.4k42653




                40.4k42653

























                    0














                    You are getting nothing because there is no fixed title just named Sports and it does not work like a wildcard. If you want to get the attribute value of title, you can use get(attr_name) on your tag object that you get using find_all.



                    from bs4 import BeautifulSoup

                    html = '''<span title="Sports Football">Football</span>
                    <span title="Sports Badminton">Tennis</span>
                    <span title="Sports Ski Jump">Ski Jump</span>'''

                    soup = BeautifulSoup(html,"lxml")

                    title = [s.get('title') for s in soup.find_all('span')]
                    title
                    >> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']


                    In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.



                    sports = [s.text for s in soup.find_all('span')]
                    sports
                    >>['Football', 'Tennis', 'Ski Jump']





                    share|improve this answer























                    • title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
                      – Cua
                      Nov 23 '18 at 7:01












                    • I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
                      – BernardL
                      Nov 23 '18 at 14:11
















                    0














                    You are getting nothing because there is no fixed title just named Sports and it does not work like a wildcard. If you want to get the attribute value of title, you can use get(attr_name) on your tag object that you get using find_all.



                    from bs4 import BeautifulSoup

                    html = '''<span title="Sports Football">Football</span>
                    <span title="Sports Badminton">Tennis</span>
                    <span title="Sports Ski Jump">Ski Jump</span>'''

                    soup = BeautifulSoup(html,"lxml")

                    title = [s.get('title') for s in soup.find_all('span')]
                    title
                    >> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']


                    In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.



                    sports = [s.text for s in soup.find_all('span')]
                    sports
                    >>['Football', 'Tennis', 'Ski Jump']





                    share|improve this answer























                    • title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
                      – Cua
                      Nov 23 '18 at 7:01












                    • I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
                      – BernardL
                      Nov 23 '18 at 14:11














                    0












                    0








                    0






                    You are getting nothing because there is no fixed title just named Sports and it does not work like a wildcard. If you want to get the attribute value of title, you can use get(attr_name) on your tag object that you get using find_all.



                    from bs4 import BeautifulSoup

                    html = '''<span title="Sports Football">Football</span>
                    <span title="Sports Badminton">Tennis</span>
                    <span title="Sports Ski Jump">Ski Jump</span>'''

                    soup = BeautifulSoup(html,"lxml")

                    title = [s.get('title') for s in soup.find_all('span')]
                    title
                    >> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']


                    In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.



                    sports = [s.text for s in soup.find_all('span')]
                    sports
                    >>['Football', 'Tennis', 'Ski Jump']





                    share|improve this answer














                    You are getting nothing because there is no fixed title just named Sports and it does not work like a wildcard. If you want to get the attribute value of title, you can use get(attr_name) on your tag object that you get using find_all.



                    from bs4 import BeautifulSoup

                    html = '''<span title="Sports Football">Football</span>
                    <span title="Sports Badminton">Tennis</span>
                    <span title="Sports Ski Jump">Ski Jump</span>'''

                    soup = BeautifulSoup(html,"lxml")

                    title = [s.get('title') for s in soup.find_all('span')]
                    title
                    >> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']


                    In addition to that, if you would only require the text for that element, just use the .text method on the tag object from find_all.



                    sports = [s.text for s in soup.find_all('span')]
                    sports
                    >>['Football', 'Tennis', 'Ski Jump']






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 23 '18 at 6:08

























                    answered Nov 23 '18 at 6:03









                    BernardL

                    2,3381929




                    2,3381929












                    • title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
                      – Cua
                      Nov 23 '18 at 7:01












                    • I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
                      – BernardL
                      Nov 23 '18 at 14:11


















                    • title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
                      – Cua
                      Nov 23 '18 at 7:01












                    • I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
                      – BernardL
                      Nov 23 '18 at 14:11
















                    title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
                    – Cua
                    Nov 23 '18 at 7:01






                    title = [s.get('title') for s in soup.find_all('span') if re.findall('(?<![a-zA-Z])Sports(?![a-zA-Z])',s.get('title'))] what about adding regular expressions ? i think it should be able to extract those containing Sports
                    – Cua
                    Nov 23 '18 at 7:01














                    I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
                    – BernardL
                    Nov 23 '18 at 14:11




                    I think regex is overkill just to check if the word Sports exists, by the way at that point is up to OP's intention on which elements he wants to extract.
                    – BernardL
                    Nov 23 '18 at 14:11











                    -1














                    Maybe the example you gave was just made up off the top of your head but the contents of your spans match what you are looking for exactly - so in that example you could work around by going:
                    sports = soup.find_all('span', {'title': 'Sports'}).contents
                    and that will give you the string versions of what you're looking for.






                    share|improve this answer



















                    • 1




                      That will fail, soup.find_all returns a list, not a tag object.
                      – BernardL
                      Nov 23 '18 at 6:06










                    • A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
                      – SmashGuy
                      Nov 23 '18 at 6:08










                    • Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
                      – Matthew Sciamanna
                      Nov 23 '18 at 23:43
















                    -1














                    Maybe the example you gave was just made up off the top of your head but the contents of your spans match what you are looking for exactly - so in that example you could work around by going:
                    sports = soup.find_all('span', {'title': 'Sports'}).contents
                    and that will give you the string versions of what you're looking for.






                    share|improve this answer



















                    • 1




                      That will fail, soup.find_all returns a list, not a tag object.
                      – BernardL
                      Nov 23 '18 at 6:06










                    • A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
                      – SmashGuy
                      Nov 23 '18 at 6:08










                    • Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
                      – Matthew Sciamanna
                      Nov 23 '18 at 23:43














                    -1












                    -1








                    -1






                    Maybe the example you gave was just made up off the top of your head but the contents of your spans match what you are looking for exactly - so in that example you could work around by going:
                    sports = soup.find_all('span', {'title': 'Sports'}).contents
                    and that will give you the string versions of what you're looking for.






                    share|improve this answer














                    Maybe the example you gave was just made up off the top of your head but the contents of your spans match what you are looking for exactly - so in that example you could work around by going:
                    sports = soup.find_all('span', {'title': 'Sports'}).contents
                    and that will give you the string versions of what you're looking for.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 23 '18 at 6:55









                    SmashGuy

                    1,0471613




                    1,0471613










                    answered Nov 23 '18 at 6:03









                    Matthew Sciamanna

                    214




                    214








                    • 1




                      That will fail, soup.find_all returns a list, not a tag object.
                      – BernardL
                      Nov 23 '18 at 6:06










                    • A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
                      – SmashGuy
                      Nov 23 '18 at 6:08










                    • Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
                      – Matthew Sciamanna
                      Nov 23 '18 at 23:43














                    • 1




                      That will fail, soup.find_all returns a list, not a tag object.
                      – BernardL
                      Nov 23 '18 at 6:06










                    • A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
                      – SmashGuy
                      Nov 23 '18 at 6:08










                    • Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
                      – Matthew Sciamanna
                      Nov 23 '18 at 23:43








                    1




                    1




                    That will fail, soup.find_all returns a list, not a tag object.
                    – BernardL
                    Nov 23 '18 at 6:06




                    That will fail, soup.find_all returns a list, not a tag object.
                    – BernardL
                    Nov 23 '18 at 6:06












                    A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
                    – SmashGuy
                    Nov 23 '18 at 6:08




                    A list comprehension like [i.text for i in sports] might give proper solution if your find_all works.
                    – SmashGuy
                    Nov 23 '18 at 6:08












                    Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
                    – Matthew Sciamanna
                    Nov 23 '18 at 23:43




                    Yeah sorry i was leaving the reader to turn the list into the list he wants at the top - should've been more clear
                    – Matthew Sciamanna
                    Nov 23 '18 at 23:43


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53441208%2fbeautiful-soup-get-arguments-attributes-which-contains-strings%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Berounka

                    Sphinx de Gizeh

                    Different font size/position of beamer's navigation symbols template's content depending on regular/plain...