Python: get values from one dataframe and add into another (row and column conditions)












2















Suppose we have the following two dataframes: df1 and df2



x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];

df1 = pd.DataFrame(x1)

#Add GDP column
df1['GDP'] = 0;

x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];

df2 = pd.DataFrame(x2)


What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.



For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year



The final output of df1 should look something like this:



|  partner   | trade_value | commodity |  year  |  GDP  | 
|------------|-------------|-----------|--------|-------|
| Afghanistan| 100 | 1 | 2013 | 5000 |
| Zambia | 110 | 2 | 2016 | NaN |
| Germany | 120 | 2 | 2014 | 8300 |
| Afghanistan| 150 | 2 | 2014 | 5500 |
| USA | 1120 | 5 | 2013 | 16500|


I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:



for index, row in df2.iterrows():
for column in df2:
df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];


I am getting the following error:



ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().









share|improve this question



























    2















    Suppose we have the following two dataframes: df1 and df2



    x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
    {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
    {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
    {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
    {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];

    df1 = pd.DataFrame(x1)

    #Add GDP column
    df1['GDP'] = 0;

    x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
    {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
    {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];

    df2 = pd.DataFrame(x2)


    What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.



    For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
    I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year



    The final output of df1 should look something like this:



    |  partner   | trade_value | commodity |  year  |  GDP  | 
    |------------|-------------|-----------|--------|-------|
    | Afghanistan| 100 | 1 | 2013 | 5000 |
    | Zambia | 110 | 2 | 2016 | NaN |
    | Germany | 120 | 2 | 2014 | 8300 |
    | Afghanistan| 150 | 2 | 2014 | 5500 |
    | USA | 1120 | 5 | 2013 | 16500|


    I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:



    for index, row in df2.iterrows():
    for column in df2:
    df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];


    I am getting the following error:



    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().









    share|improve this question

























      2












      2








      2








      Suppose we have the following two dataframes: df1 and df2



      x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
      {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
      {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
      {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
      {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];

      df1 = pd.DataFrame(x1)

      #Add GDP column
      df1['GDP'] = 0;

      x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
      {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
      {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];

      df2 = pd.DataFrame(x2)


      What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.



      For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
      I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year



      The final output of df1 should look something like this:



      |  partner   | trade_value | commodity |  year  |  GDP  | 
      |------------|-------------|-----------|--------|-------|
      | Afghanistan| 100 | 1 | 2013 | 5000 |
      | Zambia | 110 | 2 | 2016 | NaN |
      | Germany | 120 | 2 | 2014 | 8300 |
      | Afghanistan| 150 | 2 | 2014 | 5500 |
      | USA | 1120 | 5 | 2013 | 16500|


      I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:



      for index, row in df2.iterrows():
      for column in df2:
      df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];


      I am getting the following error:



      ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().









      share|improve this question














      Suppose we have the following two dataframes: df1 and df2



      x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
      {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
      {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
      {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
      {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];

      df1 = pd.DataFrame(x1)

      #Add GDP column
      df1['GDP'] = 0;

      x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
      {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
      {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];

      df2 = pd.DataFrame(x2)


      What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.



      For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
      I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year



      The final output of df1 should look something like this:



      |  partner   | trade_value | commodity |  year  |  GDP  | 
      |------------|-------------|-----------|--------|-------|
      | Afghanistan| 100 | 1 | 2013 | 5000 |
      | Zambia | 110 | 2 | 2016 | NaN |
      | Germany | 120 | 2 | 2014 | 8300 |
      | Afghanistan| 150 | 2 | 2014 | 5500 |
      | USA | 1120 | 5 | 2013 | 16500|


      I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:



      for index, row in df2.iterrows():
      for column in df2:
      df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];


      I am getting the following error:



      ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().






      python pandas dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 19:20









      Hassan DboukHassan Dbouk

      494




      494
























          2 Answers
          2






          active

          oldest

          votes


















          2














          IIUC use:



          x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
          {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
          {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
          {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
          {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
          df1 = pd.DataFrame(x1)

          x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
          {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
          {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
          df2 = pd.DataFrame(x2)




          # Melt the dataframe.
          temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
          # Extract year and convert to int.
          temp.variable = temp.variable.str.slice(4).astype(int)

          # Then merge it with df1.
          df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
          # Drop columns which are not required.
          df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})

          print (df3)

          commodity partner trade_value year GDP
          0 1 Afghanistan 100 2013 5000.0
          1 2 Zambia 110 2016 NaN
          2 2 Germany 120 2014 8300.0
          3 2 Afghanistan 150 2014 5500.0
          4 5 USA 1120 2013 15012.0





          share|improve this answer

































            0














            With one shot:



            In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(), 
            ...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)

            In [25]: df1
            Out[25]:
            commodity partner trade_value year GDP
            0 1 Afghanistan 100 2013 5000.0
            1 2 Zambia 110 2016 nan
            2 2 Germany 120 2014 8300.0
            3 2 Afghanistan 150 2014 5500.0
            4 5 USA 1120 2013 15012.0





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451976%2fpython-get-values-from-one-dataframe-and-add-into-another-row-and-column-condi%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              IIUC use:



              x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
              {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
              {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
              {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
              {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
              df1 = pd.DataFrame(x1)

              x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
              {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
              {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
              df2 = pd.DataFrame(x2)




              # Melt the dataframe.
              temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
              # Extract year and convert to int.
              temp.variable = temp.variable.str.slice(4).astype(int)

              # Then merge it with df1.
              df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
              # Drop columns which are not required.
              df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})

              print (df3)

              commodity partner trade_value year GDP
              0 1 Afghanistan 100 2013 5000.0
              1 2 Zambia 110 2016 NaN
              2 2 Germany 120 2014 8300.0
              3 2 Afghanistan 150 2014 5500.0
              4 5 USA 1120 2013 15012.0





              share|improve this answer






























                2














                IIUC use:



                x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
                {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
                {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
                {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
                {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
                df1 = pd.DataFrame(x1)

                x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
                {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
                {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
                df2 = pd.DataFrame(x2)




                # Melt the dataframe.
                temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
                # Extract year and convert to int.
                temp.variable = temp.variable.str.slice(4).astype(int)

                # Then merge it with df1.
                df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
                # Drop columns which are not required.
                df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})

                print (df3)

                commodity partner trade_value year GDP
                0 1 Afghanistan 100 2013 5000.0
                1 2 Zambia 110 2016 NaN
                2 2 Germany 120 2014 8300.0
                3 2 Afghanistan 150 2014 5500.0
                4 5 USA 1120 2013 15012.0





                share|improve this answer




























                  2












                  2








                  2







                  IIUC use:



                  x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
                  {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
                  {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
                  {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
                  {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
                  df1 = pd.DataFrame(x1)

                  x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
                  {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
                  {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
                  df2 = pd.DataFrame(x2)




                  # Melt the dataframe.
                  temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
                  # Extract year and convert to int.
                  temp.variable = temp.variable.str.slice(4).astype(int)

                  # Then merge it with df1.
                  df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
                  # Drop columns which are not required.
                  df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})

                  print (df3)

                  commodity partner trade_value year GDP
                  0 1 Afghanistan 100 2013 5000.0
                  1 2 Zambia 110 2016 NaN
                  2 2 Germany 120 2014 8300.0
                  3 2 Afghanistan 150 2014 5500.0
                  4 5 USA 1120 2013 15012.0





                  share|improve this answer















                  IIUC use:



                  x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013}, 
                  {'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
                  {'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
                  {'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
                  {'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
                  df1 = pd.DataFrame(x1)

                  x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
                  {'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
                  {'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
                  df2 = pd.DataFrame(x2)




                  # Melt the dataframe.
                  temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
                  # Extract year and convert to int.
                  temp.variable = temp.variable.str.slice(4).astype(int)

                  # Then merge it with df1.
                  df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
                  # Drop columns which are not required.
                  df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})

                  print (df3)

                  commodity partner trade_value year GDP
                  0 1 Afghanistan 100 2013 5000.0
                  1 2 Zambia 110 2016 NaN
                  2 2 Germany 120 2014 8300.0
                  3 2 Afghanistan 150 2014 5500.0
                  4 5 USA 1120 2013 15012.0






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 23 '18 at 23:50

























                  answered Nov 23 '18 at 19:59









                  AbhiAbhi

                  2,480320




                  2,480320

























                      0














                      With one shot:



                      In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(), 
                      ...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)

                      In [25]: df1
                      Out[25]:
                      commodity partner trade_value year GDP
                      0 1 Afghanistan 100 2013 5000.0
                      1 2 Zambia 110 2016 nan
                      2 2 Germany 120 2014 8300.0
                      3 2 Afghanistan 150 2014 5500.0
                      4 5 USA 1120 2013 15012.0





                      share|improve this answer




























                        0














                        With one shot:



                        In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(), 
                        ...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)

                        In [25]: df1
                        Out[25]:
                        commodity partner trade_value year GDP
                        0 1 Afghanistan 100 2013 5000.0
                        1 2 Zambia 110 2016 nan
                        2 2 Germany 120 2014 8300.0
                        3 2 Afghanistan 150 2014 5500.0
                        4 5 USA 1120 2013 15012.0





                        share|improve this answer


























                          0












                          0








                          0







                          With one shot:



                          In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(), 
                          ...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)

                          In [25]: df1
                          Out[25]:
                          commodity partner trade_value year GDP
                          0 1 Afghanistan 100 2013 5000.0
                          1 2 Zambia 110 2016 nan
                          2 2 Germany 120 2014 8300.0
                          3 2 Afghanistan 150 2014 5500.0
                          4 5 USA 1120 2013 15012.0





                          share|improve this answer













                          With one shot:



                          In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(), 
                          ...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)

                          In [25]: df1
                          Out[25]:
                          commodity partner trade_value year GDP
                          0 1 Afghanistan 100 2013 5000.0
                          1 2 Zambia 110 2016 nan
                          2 2 Germany 120 2014 8300.0
                          3 2 Afghanistan 150 2014 5500.0
                          4 5 USA 1120 2013 15012.0






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 23 '18 at 23:22









                          RomanPerekhrestRomanPerekhrest

                          55.5k32253




                          55.5k32253






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451976%2fpython-get-values-from-one-dataframe-and-add-into-another-row-and-column-condi%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Sphinx de Gizeh

                              Dijon

                              Guerrita