Python: get values from one dataframe and add into another (row and column conditions)
Suppose we have the following two dataframes: df1 and df2
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
#Add GDP column
df1['GDP'] = 0;
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.
For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year
The final output of df1 should look something like this:
| partner | trade_value | commodity | year | GDP |
|------------|-------------|-----------|--------|-------|
| Afghanistan| 100 | 1 | 2013 | 5000 |
| Zambia | 110 | 2 | 2016 | NaN |
| Germany | 120 | 2 | 2014 | 8300 |
| Afghanistan| 150 | 2 | 2014 | 5500 |
| USA | 1120 | 5 | 2013 | 16500|
I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:
for index, row in df2.iterrows():
for column in df2:
df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];
I am getting the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
python pandas dataframe
add a comment |
Suppose we have the following two dataframes: df1 and df2
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
#Add GDP column
df1['GDP'] = 0;
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.
For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year
The final output of df1 should look something like this:
| partner | trade_value | commodity | year | GDP |
|------------|-------------|-----------|--------|-------|
| Afghanistan| 100 | 1 | 2013 | 5000 |
| Zambia | 110 | 2 | 2016 | NaN |
| Germany | 120 | 2 | 2014 | 8300 |
| Afghanistan| 150 | 2 | 2014 | 5500 |
| USA | 1120 | 5 | 2013 | 16500|
I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:
for index, row in df2.iterrows():
for column in df2:
df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];
I am getting the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
python pandas dataframe
add a comment |
Suppose we have the following two dataframes: df1 and df2
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
#Add GDP column
df1['GDP'] = 0;
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.
For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year
The final output of df1 should look something like this:
| partner | trade_value | commodity | year | GDP |
|------------|-------------|-----------|--------|-------|
| Afghanistan| 100 | 1 | 2013 | 5000 |
| Zambia | 110 | 2 | 2016 | NaN |
| Germany | 120 | 2 | 2014 | 8300 |
| Afghanistan| 150 | 2 | 2014 | 5500 |
| USA | 1120 | 5 | 2013 | 16500|
I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:
for index, row in df2.iterrows():
for column in df2:
df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];
I am getting the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
python pandas dataframe
Suppose we have the following two dataframes: df1 and df2
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
#Add GDP column
df1['GDP'] = 0;
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
What I want to do is add the GDP for the countries in df1 from df2 based on the year indicated in df1.
For example: for the first row in df1, I want to get the GDP for Afghanistan for the year 2013.
I would have to go through df2 and find the row with the matching country and GDP-year and then get the value under GDP-year
The final output of df1 should look something like this:
| partner | trade_value | commodity | year | GDP |
|------------|-------------|-----------|--------|-------|
| Afghanistan| 100 | 1 | 2013 | 5000 |
| Zambia | 110 | 2 | 2016 | NaN |
| Germany | 120 | 2 | 2014 | 8300 |
| Afghanistan| 150 | 2 | 2014 | 5500 |
| USA | 1120 | 5 | 2013 | 16500|
I have spent hours trying map and loc methods but nothing worked. What I am currently working on is this:
for index, row in df2.iterrows():
for column in df2:
df1.loc[df1['partner'] == row['country'] and 'GDP-'+str(df1['year']) == column, ['GDP']] = row[column];
I am getting the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
python pandas dataframe
python pandas dataframe
asked Nov 23 '18 at 19:20
Hassan DboukHassan Dbouk
494
494
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
IIUC use:
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
# Melt the dataframe.
temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
# Extract year and convert to int.
temp.variable = temp.variable.str.slice(4).astype(int)
# Then merge it with df1.
df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
# Drop columns which are not required.
df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})
print (df3)
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 NaN
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
add a comment |
With one shot:
In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(),
...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)
In [25]: df1
Out[25]:
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 nan
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451976%2fpython-get-values-from-one-dataframe-and-add-into-another-row-and-column-condi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
IIUC use:
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
# Melt the dataframe.
temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
# Extract year and convert to int.
temp.variable = temp.variable.str.slice(4).astype(int)
# Then merge it with df1.
df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
# Drop columns which are not required.
df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})
print (df3)
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 NaN
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
add a comment |
IIUC use:
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
# Melt the dataframe.
temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
# Extract year and convert to int.
temp.variable = temp.variable.str.slice(4).astype(int)
# Then merge it with df1.
df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
# Drop columns which are not required.
df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})
print (df3)
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 NaN
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
add a comment |
IIUC use:
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
# Melt the dataframe.
temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
# Extract year and convert to int.
temp.variable = temp.variable.str.slice(4).astype(int)
# Then merge it with df1.
df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
# Drop columns which are not required.
df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})
print (df3)
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 NaN
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
IIUC use:
x1 = [{'partner': "Afghanistan", 'trade_value':100, 'commodity': 1, 'year': 2013},
{'partner': "Zambia",'trade_value':110, 'commodity': 2, 'year': 2016},
{'partner': "Germany",'trade_value':120, 'commodity': 2, 'year': 2014},
{'partner': "Afghanistan",'trade_value':150, 'commodity': 2, 'year': 2014},
{'partner': "USA",'trade_value':1120, 'commodity': 5, 'year': 2013}];
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'commodity': 5, 'GDP-2013': 5000, 'GDP-2014': 5500},
{'country': "USA", 'commodity': 5, 'GDP-2013': 15012, 'GDP-2014': 16500},
{'country': "Germany", 'commodity': 7, 'GDP-2013': 7500, 'GDP-2014': 8300}];
df2 = pd.DataFrame(x2)
# Melt the dataframe.
temp = df2.melt(id_vars=['commodity','country']).drop('commodity',1)
# Extract year and convert to int.
temp.variable = temp.variable.str.slice(4).astype(int)
# Then merge it with df1.
df3 = df1.merge(temp,left_on=['partner','year'],right_on=['country','variable'],how='left')
# Drop columns which are not required.
df3 = df3.drop(['country','variable'], axis=1).rename(columns={'value':'GDP'})
print (df3)
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 NaN
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
edited Nov 23 '18 at 23:50
answered Nov 23 '18 at 19:59
AbhiAbhi
2,480320
2,480320
add a comment |
add a comment |
With one shot:
In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(),
...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)
In [25]: df1
Out[25]:
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 nan
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
add a comment |
With one shot:
In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(),
...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)
In [25]: df1
Out[25]:
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 nan
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
add a comment |
With one shot:
In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(),
...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)
In [25]: df1
Out[25]:
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 nan
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
With one shot:
In [24]: df1['GDP'] = df1.apply(lambda s: np.where(df2.country.str.contains(s.partner).any(),
...: df2.loc[df2.country == s.partner].get('GDP-' + str(s.year), pd.Series([np.NaN])).values[0], np.NaN), axis=1)
In [25]: df1
Out[25]:
commodity partner trade_value year GDP
0 1 Afghanistan 100 2013 5000.0
1 2 Zambia 110 2016 nan
2 2 Germany 120 2014 8300.0
3 2 Afghanistan 150 2014 5500.0
4 5 USA 1120 2013 15012.0
answered Nov 23 '18 at 23:22
RomanPerekhrestRomanPerekhrest
55.5k32253
55.5k32253
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451976%2fpython-get-values-from-one-dataframe-and-add-into-another-row-and-column-condi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown