Conditionally $project from different fields
I have the unfortunate problem of a data structure changing on me around six months ago. Therefore, I have a document that used to look like...
{
fruits: [
{
id: 123
},
{
id: 456
}
]
}
(Important to note that the id
is not the ObjectId
BSON type, it's just a random series of characters generated by the client side).
...but now has the id
key changed to.
{
fruits: [
{
fruit_id: 'xxx'
},
{
fruit_id: 'yyy'
}
]
}
So, I'm trying to do a $project
to have both id
and fruit_id
changed to something generic like general_id
, so that I can continue with another aggregation like $group
and just reference the one field
I've tried something along the lines of:
[
$unwind: {
path: '$fruits'
},
$project: {
general_id: {
$cond: {
if: {
'fruits.fruit_id': {
$type: ['string']
}
},
then: '$fruits.fruit_id',
else: '$fruits.id'
}
}
}
]
mongodb mongoose mongodb-query aggregation-framework
add a comment |
I have the unfortunate problem of a data structure changing on me around six months ago. Therefore, I have a document that used to look like...
{
fruits: [
{
id: 123
},
{
id: 456
}
]
}
(Important to note that the id
is not the ObjectId
BSON type, it's just a random series of characters generated by the client side).
...but now has the id
key changed to.
{
fruits: [
{
fruit_id: 'xxx'
},
{
fruit_id: 'yyy'
}
]
}
So, I'm trying to do a $project
to have both id
and fruit_id
changed to something generic like general_id
, so that I can continue with another aggregation like $group
and just reference the one field
I've tried something along the lines of:
[
$unwind: {
path: '$fruits'
},
$project: {
general_id: {
$cond: {
if: {
'fruits.fruit_id': {
$type: ['string']
}
},
then: '$fruits.fruit_id',
else: '$fruits.id'
}
}
}
]
mongodb mongoose mongodb-query aggregation-framework
add a comment |
I have the unfortunate problem of a data structure changing on me around six months ago. Therefore, I have a document that used to look like...
{
fruits: [
{
id: 123
},
{
id: 456
}
]
}
(Important to note that the id
is not the ObjectId
BSON type, it's just a random series of characters generated by the client side).
...but now has the id
key changed to.
{
fruits: [
{
fruit_id: 'xxx'
},
{
fruit_id: 'yyy'
}
]
}
So, I'm trying to do a $project
to have both id
and fruit_id
changed to something generic like general_id
, so that I can continue with another aggregation like $group
and just reference the one field
I've tried something along the lines of:
[
$unwind: {
path: '$fruits'
},
$project: {
general_id: {
$cond: {
if: {
'fruits.fruit_id': {
$type: ['string']
}
},
then: '$fruits.fruit_id',
else: '$fruits.id'
}
}
}
]
mongodb mongoose mongodb-query aggregation-framework
I have the unfortunate problem of a data structure changing on me around six months ago. Therefore, I have a document that used to look like...
{
fruits: [
{
id: 123
},
{
id: 456
}
]
}
(Important to note that the id
is not the ObjectId
BSON type, it's just a random series of characters generated by the client side).
...but now has the id
key changed to.
{
fruits: [
{
fruit_id: 'xxx'
},
{
fruit_id: 'yyy'
}
]
}
So, I'm trying to do a $project
to have both id
and fruit_id
changed to something generic like general_id
, so that I can continue with another aggregation like $group
and just reference the one field
I've tried something along the lines of:
[
$unwind: {
path: '$fruits'
},
$project: {
general_id: {
$cond: {
if: {
'fruits.fruit_id': {
$type: ['string']
}
},
then: '$fruits.fruit_id',
else: '$fruits.id'
}
}
}
]
mongodb mongoose mongodb-query aggregation-framework
mongodb mongoose mongodb-query aggregation-framework
edited Nov 23 '18 at 10:27
asked Nov 23 '18 at 4:25
Modermo
552414
552414
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It really depends on what you are after here, but for a general case of knowing two possibilities it's probably better to use $ifNull
to return the value for the field if present, otherwise return the other field's value.
Adding a little more data for demonstration since you probably don't want to loose anything else in the array elements:
{
_id: 1,
fruits: [
{
id: 123,
data: 1
},
{
id: 456,
data: 2
}
]
},
{
_id: 2,
fruits: [
{
fruit_id: 'xxx',
data: 1
},
{
fruit_id: 'yyy',
data: 2
}
]
},
{
_id: 3,
fruits: [
{
fruit_id: 'xxx',
data: 1,
},
{
fruit_id: 'yyy',
data: 2
},
{
id: 123,
data: 3
},
{
id: 456,
data: 4
}
]
}
Then you can either do the process using $unwind
as the first step, which does make path naming easier and especially with $addFields
instead of $project
:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$addFields": {
"fruits": {
"id": "$$REMOVE",
"fruit_id": "$$REMOVE",
"general_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] }
}
}}
])
That uses $$REMOVE
from MongoDB 3.6 and above ( which should be the minimal version you are using ) in order to "remove" the fields you don't want. You don't need to do that and can just declare everything you actually want with $project
if you don't have support.
Then of course has the alternate with an $ifNull
expression.
This gives results on that data like:
{ "_id" : 1, "fruits" : { "data" : 1, "general_id" : 123 } }
{ "_id" : 1, "fruits" : { "data" : 2, "general_id" : 456 } }
{ "_id" : 2, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 2, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 3, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 3, "general_id" : 123 } }
{ "_id" : 3, "fruits" : { "data" : 4, "general_id" : 456 } }
If you wanted to $group
on that value, then there's no need for an intermediate "project" of any kind. Just do the $ifNull
directly in that stage:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] },
"count": { "$sum": 1 }
}}
])
And output:
{ "_id" : "yyy", "count" : 2 }
{ "_id" : "xxx", "count" : 2 }
{ "_id" : 456, "count" : 2 }
{ "_id" : 123, "count" : 2 }
Or if you actually did not need to $unwind
the array for other purposes, you can use $map
and some other manipulation with $objectToArray
and $arrayToObject
:
Model.aggregate([
{ "$addFields": {
"fruits": {
"$map": {
"input": "$fruits",
"in": {
"$mergeObjects": [
{ "$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$this" },
"cond": { "$not": { "$in": [ "$$this.k", ["fruit_id","id"] ] } }
}
}},
{
"general_id": { "$ifNull": ["$$this.id","$$this.fruit_id"] }
}
]
}
}
}
}}
])
Which returns results like:
{
"_id" : 1,
"fruits" : [
{
"data" : 1,
"general_id" : 123
},
{
"data" : 2,
"general_id" : 456
}
]
}
{
"_id" : 2,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
}
]
}
{
"_id" : 3,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
},
{
"data" : 3,
"general_id" : 123
},
{
"data" : 4,
"general_id" : 456
}
]
}
Adding an $unwind
after that returns just the same as before. But the more complex operations are probably better suited to where you want to keep this as an array.
This time we removed the id
and fruit_id
by converting each array element into an array of "key/value" pairs via $objectToArray
. We then $filter
the array based on those "k"
values, which are the names of the fields. The $arrayToObject
makes this an object again, with all other content except those fields.
The $mergeObjects
is to $map
what $addFields
is to the root "document", in that it takes multiple objects and "merges" them together. Thus the "filtered" object as described before, and the new object with only the general_id
key and it's value translated from whichever field was present.
Lists of More than Two Fields
As a final note, $ifNull
works better than $cond
where you have just two values, but neither is actually that great if there is a larger possible list. You can nest $cond
expressions or even use $switch
, but really it's probably best to filter content out via the $objectToArray
as seen before:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$fruits" },
"cond": { "$in": [ "$$this.k", valid_names ] }
}
},
"in": "$$this.v"
}},
0
]
},
"count": { "$sum": 1 }
}}
])
That usually makes the most sense, otherwise for working with such a list in dynamic way you end up building aggregation pipeline stages in code, such as with using $switch
would be:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
var branches = valid_names.map(name =>
({
"case": { "$gt": [`$fruits.${name}`, null ] },
"then": `$fruits.${name}`
})
)
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$switch": { branches, "default": null } },
"count": { "$sum": 1 }
}}
])
Which looks cleaner in your code, but actually sends a much larger pipeline in BSON:
[
{ "$unwind" : "$fruits" },
{ "$group" : {
"_id" : {
"$switch" : {
"branches" : [
{
"case" : { "$gt" : [ "$fruits.id", null ] },
"then" : "$fruits.id"
},
{
"case" : { "$gt" : [ "$fruits.fruit_id", null ] },
"then" : "$fruits.fruit_id"
},
{
"case" : { "$gt" : [ "$fruits.apple_id", null ] },
"then" : "$fruits.apple_id"
},
{
"case" : { "$gt" : [ "$fruits.orange_id", null ] },
"then" : "$fruits.orange_id"
}
],
"default" : null
}
},
"count" : { "$sum" : 1 }
}}
]
1
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440639%2fconditionally-project-from-different-fields%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It really depends on what you are after here, but for a general case of knowing two possibilities it's probably better to use $ifNull
to return the value for the field if present, otherwise return the other field's value.
Adding a little more data for demonstration since you probably don't want to loose anything else in the array elements:
{
_id: 1,
fruits: [
{
id: 123,
data: 1
},
{
id: 456,
data: 2
}
]
},
{
_id: 2,
fruits: [
{
fruit_id: 'xxx',
data: 1
},
{
fruit_id: 'yyy',
data: 2
}
]
},
{
_id: 3,
fruits: [
{
fruit_id: 'xxx',
data: 1,
},
{
fruit_id: 'yyy',
data: 2
},
{
id: 123,
data: 3
},
{
id: 456,
data: 4
}
]
}
Then you can either do the process using $unwind
as the first step, which does make path naming easier and especially with $addFields
instead of $project
:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$addFields": {
"fruits": {
"id": "$$REMOVE",
"fruit_id": "$$REMOVE",
"general_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] }
}
}}
])
That uses $$REMOVE
from MongoDB 3.6 and above ( which should be the minimal version you are using ) in order to "remove" the fields you don't want. You don't need to do that and can just declare everything you actually want with $project
if you don't have support.
Then of course has the alternate with an $ifNull
expression.
This gives results on that data like:
{ "_id" : 1, "fruits" : { "data" : 1, "general_id" : 123 } }
{ "_id" : 1, "fruits" : { "data" : 2, "general_id" : 456 } }
{ "_id" : 2, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 2, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 3, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 3, "general_id" : 123 } }
{ "_id" : 3, "fruits" : { "data" : 4, "general_id" : 456 } }
If you wanted to $group
on that value, then there's no need for an intermediate "project" of any kind. Just do the $ifNull
directly in that stage:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] },
"count": { "$sum": 1 }
}}
])
And output:
{ "_id" : "yyy", "count" : 2 }
{ "_id" : "xxx", "count" : 2 }
{ "_id" : 456, "count" : 2 }
{ "_id" : 123, "count" : 2 }
Or if you actually did not need to $unwind
the array for other purposes, you can use $map
and some other manipulation with $objectToArray
and $arrayToObject
:
Model.aggregate([
{ "$addFields": {
"fruits": {
"$map": {
"input": "$fruits",
"in": {
"$mergeObjects": [
{ "$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$this" },
"cond": { "$not": { "$in": [ "$$this.k", ["fruit_id","id"] ] } }
}
}},
{
"general_id": { "$ifNull": ["$$this.id","$$this.fruit_id"] }
}
]
}
}
}
}}
])
Which returns results like:
{
"_id" : 1,
"fruits" : [
{
"data" : 1,
"general_id" : 123
},
{
"data" : 2,
"general_id" : 456
}
]
}
{
"_id" : 2,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
}
]
}
{
"_id" : 3,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
},
{
"data" : 3,
"general_id" : 123
},
{
"data" : 4,
"general_id" : 456
}
]
}
Adding an $unwind
after that returns just the same as before. But the more complex operations are probably better suited to where you want to keep this as an array.
This time we removed the id
and fruit_id
by converting each array element into an array of "key/value" pairs via $objectToArray
. We then $filter
the array based on those "k"
values, which are the names of the fields. The $arrayToObject
makes this an object again, with all other content except those fields.
The $mergeObjects
is to $map
what $addFields
is to the root "document", in that it takes multiple objects and "merges" them together. Thus the "filtered" object as described before, and the new object with only the general_id
key and it's value translated from whichever field was present.
Lists of More than Two Fields
As a final note, $ifNull
works better than $cond
where you have just two values, but neither is actually that great if there is a larger possible list. You can nest $cond
expressions or even use $switch
, but really it's probably best to filter content out via the $objectToArray
as seen before:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$fruits" },
"cond": { "$in": [ "$$this.k", valid_names ] }
}
},
"in": "$$this.v"
}},
0
]
},
"count": { "$sum": 1 }
}}
])
That usually makes the most sense, otherwise for working with such a list in dynamic way you end up building aggregation pipeline stages in code, such as with using $switch
would be:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
var branches = valid_names.map(name =>
({
"case": { "$gt": [`$fruits.${name}`, null ] },
"then": `$fruits.${name}`
})
)
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$switch": { branches, "default": null } },
"count": { "$sum": 1 }
}}
])
Which looks cleaner in your code, but actually sends a much larger pipeline in BSON:
[
{ "$unwind" : "$fruits" },
{ "$group" : {
"_id" : {
"$switch" : {
"branches" : [
{
"case" : { "$gt" : [ "$fruits.id", null ] },
"then" : "$fruits.id"
},
{
"case" : { "$gt" : [ "$fruits.fruit_id", null ] },
"then" : "$fruits.fruit_id"
},
{
"case" : { "$gt" : [ "$fruits.apple_id", null ] },
"then" : "$fruits.apple_id"
},
{
"case" : { "$gt" : [ "$fruits.orange_id", null ] },
"then" : "$fruits.orange_id"
}
],
"default" : null
}
},
"count" : { "$sum" : 1 }
}}
]
1
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
add a comment |
It really depends on what you are after here, but for a general case of knowing two possibilities it's probably better to use $ifNull
to return the value for the field if present, otherwise return the other field's value.
Adding a little more data for demonstration since you probably don't want to loose anything else in the array elements:
{
_id: 1,
fruits: [
{
id: 123,
data: 1
},
{
id: 456,
data: 2
}
]
},
{
_id: 2,
fruits: [
{
fruit_id: 'xxx',
data: 1
},
{
fruit_id: 'yyy',
data: 2
}
]
},
{
_id: 3,
fruits: [
{
fruit_id: 'xxx',
data: 1,
},
{
fruit_id: 'yyy',
data: 2
},
{
id: 123,
data: 3
},
{
id: 456,
data: 4
}
]
}
Then you can either do the process using $unwind
as the first step, which does make path naming easier and especially with $addFields
instead of $project
:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$addFields": {
"fruits": {
"id": "$$REMOVE",
"fruit_id": "$$REMOVE",
"general_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] }
}
}}
])
That uses $$REMOVE
from MongoDB 3.6 and above ( which should be the minimal version you are using ) in order to "remove" the fields you don't want. You don't need to do that and can just declare everything you actually want with $project
if you don't have support.
Then of course has the alternate with an $ifNull
expression.
This gives results on that data like:
{ "_id" : 1, "fruits" : { "data" : 1, "general_id" : 123 } }
{ "_id" : 1, "fruits" : { "data" : 2, "general_id" : 456 } }
{ "_id" : 2, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 2, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 3, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 3, "general_id" : 123 } }
{ "_id" : 3, "fruits" : { "data" : 4, "general_id" : 456 } }
If you wanted to $group
on that value, then there's no need for an intermediate "project" of any kind. Just do the $ifNull
directly in that stage:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] },
"count": { "$sum": 1 }
}}
])
And output:
{ "_id" : "yyy", "count" : 2 }
{ "_id" : "xxx", "count" : 2 }
{ "_id" : 456, "count" : 2 }
{ "_id" : 123, "count" : 2 }
Or if you actually did not need to $unwind
the array for other purposes, you can use $map
and some other manipulation with $objectToArray
and $arrayToObject
:
Model.aggregate([
{ "$addFields": {
"fruits": {
"$map": {
"input": "$fruits",
"in": {
"$mergeObjects": [
{ "$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$this" },
"cond": { "$not": { "$in": [ "$$this.k", ["fruit_id","id"] ] } }
}
}},
{
"general_id": { "$ifNull": ["$$this.id","$$this.fruit_id"] }
}
]
}
}
}
}}
])
Which returns results like:
{
"_id" : 1,
"fruits" : [
{
"data" : 1,
"general_id" : 123
},
{
"data" : 2,
"general_id" : 456
}
]
}
{
"_id" : 2,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
}
]
}
{
"_id" : 3,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
},
{
"data" : 3,
"general_id" : 123
},
{
"data" : 4,
"general_id" : 456
}
]
}
Adding an $unwind
after that returns just the same as before. But the more complex operations are probably better suited to where you want to keep this as an array.
This time we removed the id
and fruit_id
by converting each array element into an array of "key/value" pairs via $objectToArray
. We then $filter
the array based on those "k"
values, which are the names of the fields. The $arrayToObject
makes this an object again, with all other content except those fields.
The $mergeObjects
is to $map
what $addFields
is to the root "document", in that it takes multiple objects and "merges" them together. Thus the "filtered" object as described before, and the new object with only the general_id
key and it's value translated from whichever field was present.
Lists of More than Two Fields
As a final note, $ifNull
works better than $cond
where you have just two values, but neither is actually that great if there is a larger possible list. You can nest $cond
expressions or even use $switch
, but really it's probably best to filter content out via the $objectToArray
as seen before:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$fruits" },
"cond": { "$in": [ "$$this.k", valid_names ] }
}
},
"in": "$$this.v"
}},
0
]
},
"count": { "$sum": 1 }
}}
])
That usually makes the most sense, otherwise for working with such a list in dynamic way you end up building aggregation pipeline stages in code, such as with using $switch
would be:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
var branches = valid_names.map(name =>
({
"case": { "$gt": [`$fruits.${name}`, null ] },
"then": `$fruits.${name}`
})
)
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$switch": { branches, "default": null } },
"count": { "$sum": 1 }
}}
])
Which looks cleaner in your code, but actually sends a much larger pipeline in BSON:
[
{ "$unwind" : "$fruits" },
{ "$group" : {
"_id" : {
"$switch" : {
"branches" : [
{
"case" : { "$gt" : [ "$fruits.id", null ] },
"then" : "$fruits.id"
},
{
"case" : { "$gt" : [ "$fruits.fruit_id", null ] },
"then" : "$fruits.fruit_id"
},
{
"case" : { "$gt" : [ "$fruits.apple_id", null ] },
"then" : "$fruits.apple_id"
},
{
"case" : { "$gt" : [ "$fruits.orange_id", null ] },
"then" : "$fruits.orange_id"
}
],
"default" : null
}
},
"count" : { "$sum" : 1 }
}}
]
1
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
add a comment |
It really depends on what you are after here, but for a general case of knowing two possibilities it's probably better to use $ifNull
to return the value for the field if present, otherwise return the other field's value.
Adding a little more data for demonstration since you probably don't want to loose anything else in the array elements:
{
_id: 1,
fruits: [
{
id: 123,
data: 1
},
{
id: 456,
data: 2
}
]
},
{
_id: 2,
fruits: [
{
fruit_id: 'xxx',
data: 1
},
{
fruit_id: 'yyy',
data: 2
}
]
},
{
_id: 3,
fruits: [
{
fruit_id: 'xxx',
data: 1,
},
{
fruit_id: 'yyy',
data: 2
},
{
id: 123,
data: 3
},
{
id: 456,
data: 4
}
]
}
Then you can either do the process using $unwind
as the first step, which does make path naming easier and especially with $addFields
instead of $project
:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$addFields": {
"fruits": {
"id": "$$REMOVE",
"fruit_id": "$$REMOVE",
"general_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] }
}
}}
])
That uses $$REMOVE
from MongoDB 3.6 and above ( which should be the minimal version you are using ) in order to "remove" the fields you don't want. You don't need to do that and can just declare everything you actually want with $project
if you don't have support.
Then of course has the alternate with an $ifNull
expression.
This gives results on that data like:
{ "_id" : 1, "fruits" : { "data" : 1, "general_id" : 123 } }
{ "_id" : 1, "fruits" : { "data" : 2, "general_id" : 456 } }
{ "_id" : 2, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 2, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 3, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 3, "general_id" : 123 } }
{ "_id" : 3, "fruits" : { "data" : 4, "general_id" : 456 } }
If you wanted to $group
on that value, then there's no need for an intermediate "project" of any kind. Just do the $ifNull
directly in that stage:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] },
"count": { "$sum": 1 }
}}
])
And output:
{ "_id" : "yyy", "count" : 2 }
{ "_id" : "xxx", "count" : 2 }
{ "_id" : 456, "count" : 2 }
{ "_id" : 123, "count" : 2 }
Or if you actually did not need to $unwind
the array for other purposes, you can use $map
and some other manipulation with $objectToArray
and $arrayToObject
:
Model.aggregate([
{ "$addFields": {
"fruits": {
"$map": {
"input": "$fruits",
"in": {
"$mergeObjects": [
{ "$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$this" },
"cond": { "$not": { "$in": [ "$$this.k", ["fruit_id","id"] ] } }
}
}},
{
"general_id": { "$ifNull": ["$$this.id","$$this.fruit_id"] }
}
]
}
}
}
}}
])
Which returns results like:
{
"_id" : 1,
"fruits" : [
{
"data" : 1,
"general_id" : 123
},
{
"data" : 2,
"general_id" : 456
}
]
}
{
"_id" : 2,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
}
]
}
{
"_id" : 3,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
},
{
"data" : 3,
"general_id" : 123
},
{
"data" : 4,
"general_id" : 456
}
]
}
Adding an $unwind
after that returns just the same as before. But the more complex operations are probably better suited to where you want to keep this as an array.
This time we removed the id
and fruit_id
by converting each array element into an array of "key/value" pairs via $objectToArray
. We then $filter
the array based on those "k"
values, which are the names of the fields. The $arrayToObject
makes this an object again, with all other content except those fields.
The $mergeObjects
is to $map
what $addFields
is to the root "document", in that it takes multiple objects and "merges" them together. Thus the "filtered" object as described before, and the new object with only the general_id
key and it's value translated from whichever field was present.
Lists of More than Two Fields
As a final note, $ifNull
works better than $cond
where you have just two values, but neither is actually that great if there is a larger possible list. You can nest $cond
expressions or even use $switch
, but really it's probably best to filter content out via the $objectToArray
as seen before:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$fruits" },
"cond": { "$in": [ "$$this.k", valid_names ] }
}
},
"in": "$$this.v"
}},
0
]
},
"count": { "$sum": 1 }
}}
])
That usually makes the most sense, otherwise for working with such a list in dynamic way you end up building aggregation pipeline stages in code, such as with using $switch
would be:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
var branches = valid_names.map(name =>
({
"case": { "$gt": [`$fruits.${name}`, null ] },
"then": `$fruits.${name}`
})
)
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$switch": { branches, "default": null } },
"count": { "$sum": 1 }
}}
])
Which looks cleaner in your code, but actually sends a much larger pipeline in BSON:
[
{ "$unwind" : "$fruits" },
{ "$group" : {
"_id" : {
"$switch" : {
"branches" : [
{
"case" : { "$gt" : [ "$fruits.id", null ] },
"then" : "$fruits.id"
},
{
"case" : { "$gt" : [ "$fruits.fruit_id", null ] },
"then" : "$fruits.fruit_id"
},
{
"case" : { "$gt" : [ "$fruits.apple_id", null ] },
"then" : "$fruits.apple_id"
},
{
"case" : { "$gt" : [ "$fruits.orange_id", null ] },
"then" : "$fruits.orange_id"
}
],
"default" : null
}
},
"count" : { "$sum" : 1 }
}}
]
It really depends on what you are after here, but for a general case of knowing two possibilities it's probably better to use $ifNull
to return the value for the field if present, otherwise return the other field's value.
Adding a little more data for demonstration since you probably don't want to loose anything else in the array elements:
{
_id: 1,
fruits: [
{
id: 123,
data: 1
},
{
id: 456,
data: 2
}
]
},
{
_id: 2,
fruits: [
{
fruit_id: 'xxx',
data: 1
},
{
fruit_id: 'yyy',
data: 2
}
]
},
{
_id: 3,
fruits: [
{
fruit_id: 'xxx',
data: 1,
},
{
fruit_id: 'yyy',
data: 2
},
{
id: 123,
data: 3
},
{
id: 456,
data: 4
}
]
}
Then you can either do the process using $unwind
as the first step, which does make path naming easier and especially with $addFields
instead of $project
:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$addFields": {
"fruits": {
"id": "$$REMOVE",
"fruit_id": "$$REMOVE",
"general_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] }
}
}}
])
That uses $$REMOVE
from MongoDB 3.6 and above ( which should be the minimal version you are using ) in order to "remove" the fields you don't want. You don't need to do that and can just declare everything you actually want with $project
if you don't have support.
Then of course has the alternate with an $ifNull
expression.
This gives results on that data like:
{ "_id" : 1, "fruits" : { "data" : 1, "general_id" : 123 } }
{ "_id" : 1, "fruits" : { "data" : 2, "general_id" : 456 } }
{ "_id" : 2, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 2, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 1, "general_id" : "xxx" } }
{ "_id" : 3, "fruits" : { "data" : 2, "general_id" : "yyy" } }
{ "_id" : 3, "fruits" : { "data" : 3, "general_id" : 123 } }
{ "_id" : 3, "fruits" : { "data" : 4, "general_id" : 456 } }
If you wanted to $group
on that value, then there's no need for an intermediate "project" of any kind. Just do the $ifNull
directly in that stage:
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$ifNull": [ "$fruits.id", "$fruits.fruit_id" ] },
"count": { "$sum": 1 }
}}
])
And output:
{ "_id" : "yyy", "count" : 2 }
{ "_id" : "xxx", "count" : 2 }
{ "_id" : 456, "count" : 2 }
{ "_id" : 123, "count" : 2 }
Or if you actually did not need to $unwind
the array for other purposes, you can use $map
and some other manipulation with $objectToArray
and $arrayToObject
:
Model.aggregate([
{ "$addFields": {
"fruits": {
"$map": {
"input": "$fruits",
"in": {
"$mergeObjects": [
{ "$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$this" },
"cond": { "$not": { "$in": [ "$$this.k", ["fruit_id","id"] ] } }
}
}},
{
"general_id": { "$ifNull": ["$$this.id","$$this.fruit_id"] }
}
]
}
}
}
}}
])
Which returns results like:
{
"_id" : 1,
"fruits" : [
{
"data" : 1,
"general_id" : 123
},
{
"data" : 2,
"general_id" : 456
}
]
}
{
"_id" : 2,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
}
]
}
{
"_id" : 3,
"fruits" : [
{
"data" : 1,
"general_id" : "xxx"
},
{
"data" : 2,
"general_id" : "yyy"
},
{
"data" : 3,
"general_id" : 123
},
{
"data" : 4,
"general_id" : 456
}
]
}
Adding an $unwind
after that returns just the same as before. But the more complex operations are probably better suited to where you want to keep this as an array.
This time we removed the id
and fruit_id
by converting each array element into an array of "key/value" pairs via $objectToArray
. We then $filter
the array based on those "k"
values, which are the names of the fields. The $arrayToObject
makes this an object again, with all other content except those fields.
The $mergeObjects
is to $map
what $addFields
is to the root "document", in that it takes multiple objects and "merges" them together. Thus the "filtered" object as described before, and the new object with only the general_id
key and it's value translated from whichever field was present.
Lists of More than Two Fields
As a final note, $ifNull
works better than $cond
where you have just two values, but neither is actually that great if there is a larger possible list. You can nest $cond
expressions or even use $switch
, but really it's probably best to filter content out via the $objectToArray
as seen before:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$fruits" },
"cond": { "$in": [ "$$this.k", valid_names ] }
}
},
"in": "$$this.v"
}},
0
]
},
"count": { "$sum": 1 }
}}
])
That usually makes the most sense, otherwise for working with such a list in dynamic way you end up building aggregation pipeline stages in code, such as with using $switch
would be:
var valid_names = [ "id", "fruit_id", "apple_id", "orange_id" ];
var branches = valid_names.map(name =>
({
"case": { "$gt": [`$fruits.${name}`, null ] },
"then": `$fruits.${name}`
})
)
Model.aggregate([
{ "$unwind": "$fruits" },
{ "$group": {
"_id": { "$switch": { branches, "default": null } },
"count": { "$sum": 1 }
}}
])
Which looks cleaner in your code, but actually sends a much larger pipeline in BSON:
[
{ "$unwind" : "$fruits" },
{ "$group" : {
"_id" : {
"$switch" : {
"branches" : [
{
"case" : { "$gt" : [ "$fruits.id", null ] },
"then" : "$fruits.id"
},
{
"case" : { "$gt" : [ "$fruits.fruit_id", null ] },
"then" : "$fruits.fruit_id"
},
{
"case" : { "$gt" : [ "$fruits.apple_id", null ] },
"then" : "$fruits.apple_id"
},
{
"case" : { "$gt" : [ "$fruits.orange_id", null ] },
"then" : "$fruits.orange_id"
}
],
"default" : null
}
},
"count" : { "$sum" : 1 }
}}
]
edited Nov 23 '18 at 9:42
answered Nov 23 '18 at 5:34
Neil Lunn
97.1k22170181
97.1k22170181
1
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
add a comment |
1
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
1
1
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
This is a very comprehensive answer. Thanks for that.
– Modermo
Nov 23 '18 at 10:27
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440639%2fconditionally-project-from-different-fields%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown