How does mask_zero in Keras Embedding layer work?
I thought mask_zero=True
will output 0's when the input value is 0, so the following layers could skip computation or something.
How does mask_zero
works?
Example:
data_in = np.array([
[1, 2, 0, 0]
])
data_in.shape
>>> (1, 4)
# model
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
m = Model(inputs=x, outputs=e)
p = m.predict(data_in)
print(p.shape)
print(p)
The actual output is: (the numbers are random)
(1, 4, 5)
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]]]
However, I thought the output will be:
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]]
python machine-learning keras word-embedding
add a comment |
I thought mask_zero=True
will output 0's when the input value is 0, so the following layers could skip computation or something.
How does mask_zero
works?
Example:
data_in = np.array([
[1, 2, 0, 0]
])
data_in.shape
>>> (1, 4)
# model
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
m = Model(inputs=x, outputs=e)
p = m.predict(data_in)
print(p.shape)
print(p)
The actual output is: (the numbers are random)
(1, 4, 5)
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]]]
However, I thought the output will be:
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]]
python machine-learning keras word-embedding
They're repeating the outputs of the last calculated steps. The documentation assures you that it's not "computing" them anymore. And since they're all the same for all the remaining steps, it's probably just a dummy repetition just to fill the shape of a numpy array.
– Daniel Möller
Nov 25 '17 at 13:15
Interested to know why these are non-zero. How are they computed?
– GRS
Nov 24 '18 at 16:10
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Dec 6 '18 at 12:56
add a comment |
I thought mask_zero=True
will output 0's when the input value is 0, so the following layers could skip computation or something.
How does mask_zero
works?
Example:
data_in = np.array([
[1, 2, 0, 0]
])
data_in.shape
>>> (1, 4)
# model
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
m = Model(inputs=x, outputs=e)
p = m.predict(data_in)
print(p.shape)
print(p)
The actual output is: (the numbers are random)
(1, 4, 5)
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]]]
However, I thought the output will be:
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]]
python machine-learning keras word-embedding
I thought mask_zero=True
will output 0's when the input value is 0, so the following layers could skip computation or something.
How does mask_zero
works?
Example:
data_in = np.array([
[1, 2, 0, 0]
])
data_in.shape
>>> (1, 4)
# model
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
m = Model(inputs=x, outputs=e)
p = m.predict(data_in)
print(p.shape)
print(p)
The actual output is: (the numbers are random)
(1, 4, 5)
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]
[ 0.00451764 -0.01433611 0.02606953 0.00328832 0.02650392]]]
However, I thought the output will be:
[[[ 0.02499047 0.04617121 0.01586803 0.0338897 0.009652 ]
[ 0.04782704 -0.04035913 -0.0341589 0.03020919 -0.01157228]
[ 0 0 0 0 0]
[ 0 0 0 0 0]]]
python machine-learning keras word-embedding
python machine-learning keras word-embedding
edited Nov 25 '18 at 18:13
today
10.6k21536
10.6k21536
asked Nov 25 '17 at 11:03
crazytomcatcrazytomcat
9919
9919
They're repeating the outputs of the last calculated steps. The documentation assures you that it's not "computing" them anymore. And since they're all the same for all the remaining steps, it's probably just a dummy repetition just to fill the shape of a numpy array.
– Daniel Möller
Nov 25 '17 at 13:15
Interested to know why these are non-zero. How are they computed?
– GRS
Nov 24 '18 at 16:10
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Dec 6 '18 at 12:56
add a comment |
They're repeating the outputs of the last calculated steps. The documentation assures you that it's not "computing" them anymore. And since they're all the same for all the remaining steps, it's probably just a dummy repetition just to fill the shape of a numpy array.
– Daniel Möller
Nov 25 '17 at 13:15
Interested to know why these are non-zero. How are they computed?
– GRS
Nov 24 '18 at 16:10
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Dec 6 '18 at 12:56
They're repeating the outputs of the last calculated steps. The documentation assures you that it's not "computing" them anymore. And since they're all the same for all the remaining steps, it's probably just a dummy repetition just to fill the shape of a numpy array.
– Daniel Möller
Nov 25 '17 at 13:15
They're repeating the outputs of the last calculated steps. The documentation assures you that it's not "computing" them anymore. And since they're all the same for all the remaining steps, it's probably just a dummy repetition just to fill the shape of a numpy array.
– Daniel Möller
Nov 25 '17 at 13:15
Interested to know why these are non-zero. How are they computed?
– GRS
Nov 24 '18 at 16:10
Interested to know why these are non-zero. How are they computed?
– GRS
Nov 24 '18 at 16:10
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Dec 6 '18 at 12:56
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Dec 6 '18 at 12:56
add a comment |
1 Answer
1
active
oldest
votes
Actually, setting mask_zero=True
for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would be m.layers[0].get_weights()
). Instead, it would affect the behavior of the following layers such as RNN layers.
If you inspect the source code of Embedding layer you would see a method called compute_mask
:
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
This output mask will be passed, as the mask
argument, to the following layers which support masking. This has been implemented in the __call__
method of base layer, Layer
:
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = copy.copy(kwargs)
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:
data_in = np.array([
[1, 0, 2, 0]
])
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
rnn = LSTM(3, return_sequences=True)(e)
m = Model(inputs=x, outputs=rnn)
m.predict(data_in)
array([[[-0.00084503, -0.00413611, 0.00049972],
[-0.00084503, -0.00413611, 0.00049972],
[-0.00144554, -0.00115775, -0.00293898],
[-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)
As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.
Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using weighted_masked_objective
:
def weighted_masked_objective(fn):
"""Adds support for masking and sample-weighting to an objective function.
It transforms an objective function `fn(y_true, y_pred)`
into a sample-weighted, cost-masked objective function
`fn(y_true, y_pred, weights, mask)`.
# Arguments
fn: The objective function to wrap,
with signature `fn(y_true, y_pred)`.
# Returns
A function with signature `fn(y_true, y_pred, weights, mask)`.
"""
when compiling the model:
weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]
You can verify this using the following example:
data_in = np.array([[1, 2, 0, 0]])
data_out = np.arange(12).reshape(1,4,3)
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
d = Dense(3)(e)
m = Model(inputs=x, outputs=d)
m.compile(loss='mse', optimizer='adam')
preds = m.predict(data_in)
loss = m.evaluate(data_in, data_out, verbose=0)
print(preds)
print('Computed Loss:', loss)
[[[ 0.009682 0.02505393 -0.00632722]
[ 0.01756451 0.05928303 0.0153951 ]
[-0.00146054 -0.02064196 -0.04356086]
[-0.00146054 -0.02064196 -0.04356086]]]
Computed Loss: 9.041069030761719
# verify that only the first two outputs
# have been considered in the computation of loss
print(np.square(preds[0,0:2] - data_out[0,0:2]).mean())
9.041070036475277
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. wheny in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an outputy
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, they
is always 100, but the real sequences are of variable length. How do we getmodel.predict()
to return these variable lengths?
– GRS
Nov 25 '18 at 20:28
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it withy
0s. How does one implement such Dense layers?
– GRS
Nov 25 '18 at 20:58
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47485216%2fhow-does-mask-zero-in-keras-embedding-layer-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Actually, setting mask_zero=True
for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would be m.layers[0].get_weights()
). Instead, it would affect the behavior of the following layers such as RNN layers.
If you inspect the source code of Embedding layer you would see a method called compute_mask
:
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
This output mask will be passed, as the mask
argument, to the following layers which support masking. This has been implemented in the __call__
method of base layer, Layer
:
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = copy.copy(kwargs)
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:
data_in = np.array([
[1, 0, 2, 0]
])
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
rnn = LSTM(3, return_sequences=True)(e)
m = Model(inputs=x, outputs=rnn)
m.predict(data_in)
array([[[-0.00084503, -0.00413611, 0.00049972],
[-0.00084503, -0.00413611, 0.00049972],
[-0.00144554, -0.00115775, -0.00293898],
[-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)
As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.
Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using weighted_masked_objective
:
def weighted_masked_objective(fn):
"""Adds support for masking and sample-weighting to an objective function.
It transforms an objective function `fn(y_true, y_pred)`
into a sample-weighted, cost-masked objective function
`fn(y_true, y_pred, weights, mask)`.
# Arguments
fn: The objective function to wrap,
with signature `fn(y_true, y_pred)`.
# Returns
A function with signature `fn(y_true, y_pred, weights, mask)`.
"""
when compiling the model:
weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]
You can verify this using the following example:
data_in = np.array([[1, 2, 0, 0]])
data_out = np.arange(12).reshape(1,4,3)
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
d = Dense(3)(e)
m = Model(inputs=x, outputs=d)
m.compile(loss='mse', optimizer='adam')
preds = m.predict(data_in)
loss = m.evaluate(data_in, data_out, verbose=0)
print(preds)
print('Computed Loss:', loss)
[[[ 0.009682 0.02505393 -0.00632722]
[ 0.01756451 0.05928303 0.0153951 ]
[-0.00146054 -0.02064196 -0.04356086]
[-0.00146054 -0.02064196 -0.04356086]]]
Computed Loss: 9.041069030761719
# verify that only the first two outputs
# have been considered in the computation of loss
print(np.square(preds[0,0:2] - data_out[0,0:2]).mean())
9.041070036475277
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. wheny in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an outputy
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, they
is always 100, but the real sequences are of variable length. How do we getmodel.predict()
to return these variable lengths?
– GRS
Nov 25 '18 at 20:28
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it withy
0s. How does one implement such Dense layers?
– GRS
Nov 25 '18 at 20:58
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
add a comment |
Actually, setting mask_zero=True
for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would be m.layers[0].get_weights()
). Instead, it would affect the behavior of the following layers such as RNN layers.
If you inspect the source code of Embedding layer you would see a method called compute_mask
:
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
This output mask will be passed, as the mask
argument, to the following layers which support masking. This has been implemented in the __call__
method of base layer, Layer
:
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = copy.copy(kwargs)
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:
data_in = np.array([
[1, 0, 2, 0]
])
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
rnn = LSTM(3, return_sequences=True)(e)
m = Model(inputs=x, outputs=rnn)
m.predict(data_in)
array([[[-0.00084503, -0.00413611, 0.00049972],
[-0.00084503, -0.00413611, 0.00049972],
[-0.00144554, -0.00115775, -0.00293898],
[-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)
As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.
Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using weighted_masked_objective
:
def weighted_masked_objective(fn):
"""Adds support for masking and sample-weighting to an objective function.
It transforms an objective function `fn(y_true, y_pred)`
into a sample-weighted, cost-masked objective function
`fn(y_true, y_pred, weights, mask)`.
# Arguments
fn: The objective function to wrap,
with signature `fn(y_true, y_pred)`.
# Returns
A function with signature `fn(y_true, y_pred, weights, mask)`.
"""
when compiling the model:
weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]
You can verify this using the following example:
data_in = np.array([[1, 2, 0, 0]])
data_out = np.arange(12).reshape(1,4,3)
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
d = Dense(3)(e)
m = Model(inputs=x, outputs=d)
m.compile(loss='mse', optimizer='adam')
preds = m.predict(data_in)
loss = m.evaluate(data_in, data_out, verbose=0)
print(preds)
print('Computed Loss:', loss)
[[[ 0.009682 0.02505393 -0.00632722]
[ 0.01756451 0.05928303 0.0153951 ]
[-0.00146054 -0.02064196 -0.04356086]
[-0.00146054 -0.02064196 -0.04356086]]]
Computed Loss: 9.041069030761719
# verify that only the first two outputs
# have been considered in the computation of loss
print(np.square(preds[0,0:2] - data_out[0,0:2]).mean())
9.041070036475277
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. wheny in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an outputy
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, they
is always 100, but the real sequences are of variable length. How do we getmodel.predict()
to return these variable lengths?
– GRS
Nov 25 '18 at 20:28
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it withy
0s. How does one implement such Dense layers?
– GRS
Nov 25 '18 at 20:58
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
add a comment |
Actually, setting mask_zero=True
for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would be m.layers[0].get_weights()
). Instead, it would affect the behavior of the following layers such as RNN layers.
If you inspect the source code of Embedding layer you would see a method called compute_mask
:
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
This output mask will be passed, as the mask
argument, to the following layers which support masking. This has been implemented in the __call__
method of base layer, Layer
:
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = copy.copy(kwargs)
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:
data_in = np.array([
[1, 0, 2, 0]
])
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
rnn = LSTM(3, return_sequences=True)(e)
m = Model(inputs=x, outputs=rnn)
m.predict(data_in)
array([[[-0.00084503, -0.00413611, 0.00049972],
[-0.00084503, -0.00413611, 0.00049972],
[-0.00144554, -0.00115775, -0.00293898],
[-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)
As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.
Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using weighted_masked_objective
:
def weighted_masked_objective(fn):
"""Adds support for masking and sample-weighting to an objective function.
It transforms an objective function `fn(y_true, y_pred)`
into a sample-weighted, cost-masked objective function
`fn(y_true, y_pred, weights, mask)`.
# Arguments
fn: The objective function to wrap,
with signature `fn(y_true, y_pred)`.
# Returns
A function with signature `fn(y_true, y_pred, weights, mask)`.
"""
when compiling the model:
weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]
You can verify this using the following example:
data_in = np.array([[1, 2, 0, 0]])
data_out = np.arange(12).reshape(1,4,3)
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
d = Dense(3)(e)
m = Model(inputs=x, outputs=d)
m.compile(loss='mse', optimizer='adam')
preds = m.predict(data_in)
loss = m.evaluate(data_in, data_out, verbose=0)
print(preds)
print('Computed Loss:', loss)
[[[ 0.009682 0.02505393 -0.00632722]
[ 0.01756451 0.05928303 0.0153951 ]
[-0.00146054 -0.02064196 -0.04356086]
[-0.00146054 -0.02064196 -0.04356086]]]
Computed Loss: 9.041069030761719
# verify that only the first two outputs
# have been considered in the computation of loss
print(np.square(preds[0,0:2] - data_out[0,0:2]).mean())
9.041070036475277
Actually, setting mask_zero=True
for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would be m.layers[0].get_weights()
). Instead, it would affect the behavior of the following layers such as RNN layers.
If you inspect the source code of Embedding layer you would see a method called compute_mask
:
def compute_mask(self, inputs, mask=None):
if not self.mask_zero:
return None
output_mask = K.not_equal(inputs, 0)
return output_mask
This output mask will be passed, as the mask
argument, to the following layers which support masking. This has been implemented in the __call__
method of base layer, Layer
:
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = copy.copy(kwargs)
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:
data_in = np.array([
[1, 0, 2, 0]
])
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
rnn = LSTM(3, return_sequences=True)(e)
m = Model(inputs=x, outputs=rnn)
m.predict(data_in)
array([[[-0.00084503, -0.00413611, 0.00049972],
[-0.00084503, -0.00413611, 0.00049972],
[-0.00144554, -0.00115775, -0.00293898],
[-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)
As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.
Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using weighted_masked_objective
:
def weighted_masked_objective(fn):
"""Adds support for masking and sample-weighting to an objective function.
It transforms an objective function `fn(y_true, y_pred)`
into a sample-weighted, cost-masked objective function
`fn(y_true, y_pred, weights, mask)`.
# Arguments
fn: The objective function to wrap,
with signature `fn(y_true, y_pred)`.
# Returns
A function with signature `fn(y_true, y_pred, weights, mask)`.
"""
when compiling the model:
weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]
You can verify this using the following example:
data_in = np.array([[1, 2, 0, 0]])
data_out = np.arange(12).reshape(1,4,3)
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
d = Dense(3)(e)
m = Model(inputs=x, outputs=d)
m.compile(loss='mse', optimizer='adam')
preds = m.predict(data_in)
loss = m.evaluate(data_in, data_out, verbose=0)
print(preds)
print('Computed Loss:', loss)
[[[ 0.009682 0.02505393 -0.00632722]
[ 0.01756451 0.05928303 0.0153951 ]
[-0.00146054 -0.02064196 -0.04356086]
[-0.00146054 -0.02064196 -0.04356086]]]
Computed Loss: 9.041069030761719
# verify that only the first two outputs
# have been considered in the computation of loss
print(np.square(preds[0,0:2] - data_out[0,0:2]).mean())
9.041070036475277
edited Nov 25 '18 at 21:19
answered Nov 25 '18 at 18:11
todaytoday
10.6k21536
10.6k21536
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. wheny in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an outputy
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, they
is always 100, but the real sequences are of variable length. How do we getmodel.predict()
to return these variable lengths?
– GRS
Nov 25 '18 at 20:28
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it withy
0s. How does one implement such Dense layers?
– GRS
Nov 25 '18 at 20:58
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
add a comment |
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. wheny in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an outputy
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, they
is always 100, but the real sequences are of variable length. How do we getmodel.predict()
to return these variable lengths?
– GRS
Nov 25 '18 at 20:28
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it withy
0s. How does one implement such Dense layers?
– GRS
Nov 25 '18 at 20:58
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. when
y in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an output y
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, the y
is always 100, but the real sequences are of variable length. How do we get model.predict()
to return these variable lengths?– GRS
Nov 25 '18 at 20:28
Thank you, so what happens at the evaluation of the model. Does it mean we need to shift our output vector by 1? binary classification i.e. when
y in 0,1
. Or assuming the loss is computed with the mask, how do we actually evaluate such generator at the end? When we run predictions, we still get an output y
, for which we need to manually fit the mask? For example, if we pad sequences to length 100, the y
is always 100, but the real sequences are of variable length. How do we get model.predict()
to return these variable lengths?– GRS
Nov 25 '18 at 20:28
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it with
y
0s. How does one implement such Dense layers?– GRS
Nov 25 '18 at 20:58
So what I'm trying to say, when I pass this output to the Dense layer, which doesn't support masking, it will still calculate some loss for the 1st and 3rd indeces in your example. And compare it with
y
0s. How does one implement such Dense layers?– GRS
Nov 25 '18 at 20:58
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
@GRS The loss will be computed according the mask as well. I have updated my answer. Please take a look.
– today
Nov 25 '18 at 21:19
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47485216%2fhow-does-mask-zero-in-keras-embedding-layer-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
They're repeating the outputs of the last calculated steps. The documentation assures you that it's not "computing" them anymore. And since they're all the same for all the remaining steps, it's probably just a dummy repetition just to fill the shape of a numpy array.
– Daniel Möller
Nov 25 '17 at 13:15
Interested to know why these are non-zero. How are they computed?
– GRS
Nov 24 '18 at 16:10
If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?
– today
Dec 6 '18 at 12:56