Tag rows based on other columns values











up vote
1
down vote

favorite












I have a pandas dataframe:



street_name        eircode
Malborough Road BLT12
123 Fake Road NaN
My Street NaN


I would like to create another column called unique based on these conditions:




  1. If it has eircode, return 'yes' in the unique column, THEN

  2. If it doesn't have an eircode, check the first string in the street_name:


    • if the first string is a digit, return 'yes' in the unique column

    • if it is not, return 'no' in the unique column




I came up with this solution where:




  1. I changed the data types to string for both columns street_name and eircode

  2. Get the first string using a lambda function

  3. Defined a tagging function to be applied to the data frame


# change data types

df['eircode'] = df['eircode'].astype('str')
df['street_name'] = df['street_name'].astype('str')



# get the first string from street_name column
df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])



def tagging(x):
if x['eircode'] != 'nan':
return 'yes'
elif x['first_str'].isdigit() == True:
return 'yes'
else:
return 'no'

df['unique'] = df.apply(tagging, axis=1)


The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?










share|improve this question




























    up vote
    1
    down vote

    favorite












    I have a pandas dataframe:



    street_name        eircode
    Malborough Road BLT12
    123 Fake Road NaN
    My Street NaN


    I would like to create another column called unique based on these conditions:




    1. If it has eircode, return 'yes' in the unique column, THEN

    2. If it doesn't have an eircode, check the first string in the street_name:


      • if the first string is a digit, return 'yes' in the unique column

      • if it is not, return 'no' in the unique column




    I came up with this solution where:




    1. I changed the data types to string for both columns street_name and eircode

    2. Get the first string using a lambda function

    3. Defined a tagging function to be applied to the data frame


    # change data types

    df['eircode'] = df['eircode'].astype('str')
    df['street_name'] = df['street_name'].astype('str')



    # get the first string from street_name column
    df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])



    def tagging(x):
    if x['eircode'] != 'nan':
    return 'yes'
    elif x['first_str'].isdigit() == True:
    return 'yes'
    else:
    return 'no'

    df['unique'] = df.apply(tagging, axis=1)


    The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?










    share|improve this question


























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a pandas dataframe:



      street_name        eircode
      Malborough Road BLT12
      123 Fake Road NaN
      My Street NaN


      I would like to create another column called unique based on these conditions:




      1. If it has eircode, return 'yes' in the unique column, THEN

      2. If it doesn't have an eircode, check the first string in the street_name:


        • if the first string is a digit, return 'yes' in the unique column

        • if it is not, return 'no' in the unique column




      I came up with this solution where:




      1. I changed the data types to string for both columns street_name and eircode

      2. Get the first string using a lambda function

      3. Defined a tagging function to be applied to the data frame


      # change data types

      df['eircode'] = df['eircode'].astype('str')
      df['street_name'] = df['street_name'].astype('str')



      # get the first string from street_name column
      df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])



      def tagging(x):
      if x['eircode'] != 'nan':
      return 'yes'
      elif x['first_str'].isdigit() == True:
      return 'yes'
      else:
      return 'no'

      df['unique'] = df.apply(tagging, axis=1)


      The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?










      share|improve this question















      I have a pandas dataframe:



      street_name        eircode
      Malborough Road BLT12
      123 Fake Road NaN
      My Street NaN


      I would like to create another column called unique based on these conditions:




      1. If it has eircode, return 'yes' in the unique column, THEN

      2. If it doesn't have an eircode, check the first string in the street_name:


        • if the first string is a digit, return 'yes' in the unique column

        • if it is not, return 'no' in the unique column




      I came up with this solution where:




      1. I changed the data types to string for both columns street_name and eircode

      2. Get the first string using a lambda function

      3. Defined a tagging function to be applied to the data frame


      # change data types

      df['eircode'] = df['eircode'].astype('str')
      df['street_name'] = df['street_name'].astype('str')



      # get the first string from street_name column
      df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])



      def tagging(x):
      if x['eircode'] != 'nan':
      return 'yes'
      elif x['first_str'].isdigit() == True:
      return 'yes'
      else:
      return 'no'

      df['unique'] = df.apply(tagging, axis=1)


      The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?







      python string pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 at 16:37









      jpp

      87.1k194999




      87.1k194999










      asked Nov 21 at 17:40









      mahf_i

      679




      679
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          You can supply those separate conditions using the | operator, then map the resulting Boolean array to yes and no. The first condition just looks wether eircode is null, and the second uses a regex to check that street_name starts with a digit:



          df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
          >>> df
          street_name eircode unique
          0 Malborough Road BLT12 yes
          1 123 Fake Road NaN yes
          2 My Street NaN no





          share|improve this answer





















          • Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
            – mahf_i
            Nov 23 at 16:40


















          up vote
          2
          down vote













          With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.



          df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
          'eircode': ['BLT12', None, None]})

          cond1 = df['eircode'].isnull()
          cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()

          df['unique'] = np.where(cond1 & cond2, 'no', 'yes')

          print(df)

          eircode street_name unique
          0 BLT12 Malborough Road yes
          1 None 123 Fake Road yes
          2 None My Street no





          share|improve this answer

















          • 1




            code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
            – mahf_i
            Nov 23 at 16:35













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417777%2ftag-rows-based-on-other-columns-values%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          You can supply those separate conditions using the | operator, then map the resulting Boolean array to yes and no. The first condition just looks wether eircode is null, and the second uses a regex to check that street_name starts with a digit:



          df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
          >>> df
          street_name eircode unique
          0 Malborough Road BLT12 yes
          1 123 Fake Road NaN yes
          2 My Street NaN no





          share|improve this answer





















          • Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
            – mahf_i
            Nov 23 at 16:40















          up vote
          1
          down vote



          accepted










          You can supply those separate conditions using the | operator, then map the resulting Boolean array to yes and no. The first condition just looks wether eircode is null, and the second uses a regex to check that street_name starts with a digit:



          df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
          >>> df
          street_name eircode unique
          0 Malborough Road BLT12 yes
          1 123 Fake Road NaN yes
          2 My Street NaN no





          share|improve this answer





















          • Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
            – mahf_i
            Nov 23 at 16:40













          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          You can supply those separate conditions using the | operator, then map the resulting Boolean array to yes and no. The first condition just looks wether eircode is null, and the second uses a regex to check that street_name starts with a digit:



          df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
          >>> df
          street_name eircode unique
          0 Malborough Road BLT12 yes
          1 123 Fake Road NaN yes
          2 My Street NaN no





          share|improve this answer












          You can supply those separate conditions using the | operator, then map the resulting Boolean array to yes and no. The first condition just looks wether eircode is null, and the second uses a regex to check that street_name starts with a digit:



          df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
          >>> df
          street_name eircode unique
          0 Malborough Road BLT12 yes
          1 123 Fake Road NaN yes
          2 My Street NaN no






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 at 17:45









          sacul

          29k41639




          29k41639












          • Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
            – mahf_i
            Nov 23 at 16:40


















          • Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
            – mahf_i
            Nov 23 at 16:40
















          Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
          – mahf_i
          Nov 23 at 16:40




          Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
          – mahf_i
          Nov 23 at 16:40












          up vote
          2
          down vote













          With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.



          df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
          'eircode': ['BLT12', None, None]})

          cond1 = df['eircode'].isnull()
          cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()

          df['unique'] = np.where(cond1 & cond2, 'no', 'yes')

          print(df)

          eircode street_name unique
          0 BLT12 Malborough Road yes
          1 None 123 Fake Road yes
          2 None My Street no





          share|improve this answer

















          • 1




            code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
            – mahf_i
            Nov 23 at 16:35

















          up vote
          2
          down vote













          With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.



          df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
          'eircode': ['BLT12', None, None]})

          cond1 = df['eircode'].isnull()
          cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()

          df['unique'] = np.where(cond1 & cond2, 'no', 'yes')

          print(df)

          eircode street_name unique
          0 BLT12 Malborough Road yes
          1 None 123 Fake Road yes
          2 None My Street no





          share|improve this answer

















          • 1




            code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
            – mahf_i
            Nov 23 at 16:35















          up vote
          2
          down vote










          up vote
          2
          down vote









          With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.



          df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
          'eircode': ['BLT12', None, None]})

          cond1 = df['eircode'].isnull()
          cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()

          df['unique'] = np.where(cond1 & cond2, 'no', 'yes')

          print(df)

          eircode street_name unique
          0 BLT12 Malborough Road yes
          1 None 123 Fake Road yes
          2 None My Street no





          share|improve this answer












          With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.



          df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
          'eircode': ['BLT12', None, None]})

          cond1 = df['eircode'].isnull()
          cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()

          df['unique'] = np.where(cond1 & cond2, 'no', 'yes')

          print(df)

          eircode street_name unique
          0 BLT12 Malborough Road yes
          1 None 123 Fake Road yes
          2 None My Street no






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 at 17:45









          jpp

          87.1k194999




          87.1k194999








          • 1




            code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
            – mahf_i
            Nov 23 at 16:35
















          • 1




            code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
            – mahf_i
            Nov 23 at 16:35










          1




          1




          code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
          – mahf_i
          Nov 23 at 16:35






          code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
          – mahf_i
          Nov 23 at 16:35




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417777%2ftag-rows-based-on-other-columns-values%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Berounka

          Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

          Sphinx de Gizeh