Derivative of inner product












3














If the inner product of some vector $mathbf{x}$ can be expressed as



$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$



where $G$ is some symmetric matrix, if I want the derivative of this inner product with respect to $mathbf{x}$, I should get a vector as a result since this is the derivative of a scalar function by a vector (https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector).



Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.



$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$



(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.



Why do I get this contradiction?










share|cite|improve this question



























    3














    If the inner product of some vector $mathbf{x}$ can be expressed as



    $$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$



    where $G$ is some symmetric matrix, if I want the derivative of this inner product with respect to $mathbf{x}$, I should get a vector as a result since this is the derivative of a scalar function by a vector (https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector).



    Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.



    $$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$



    (http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
    which is a row-vector.



    Why do I get this contradiction?










    share|cite|improve this question

























      3












      3








      3


      2





      If the inner product of some vector $mathbf{x}$ can be expressed as



      $$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$



      where $G$ is some symmetric matrix, if I want the derivative of this inner product with respect to $mathbf{x}$, I should get a vector as a result since this is the derivative of a scalar function by a vector (https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector).



      Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.



      $$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$



      (http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
      which is a row-vector.



      Why do I get this contradiction?










      share|cite|improve this question













      If the inner product of some vector $mathbf{x}$ can be expressed as



      $$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$



      where $G$ is some symmetric matrix, if I want the derivative of this inner product with respect to $mathbf{x}$, I should get a vector as a result since this is the derivative of a scalar function by a vector (https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector).



      Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.



      $$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$



      (http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
      which is a row-vector.



      Why do I get this contradiction?







      linear-algebra derivatives vectors inner-product-space






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 30 at 9:15









      The Bosco

      541212




      541212






















          4 Answers
          4






          active

          oldest

          votes


















          5














          For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$



          Being differentiable is equivalent to:
          $$
          f(x+h)=f(x)+df(x)cdot h+o(|h|)
          $$



          In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.



          Let's be more explicit:
          begin{align*}
          f(x+h)=& langle x+h,x+h rangle_G \
          =& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
          end{align*}



          Hence your differential is defined by
          $$
          df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
          $$

          where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.



          Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:



          $$
          df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
          $$

          where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.






          share|cite|improve this answer































            4














            The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state




            Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
            $$
            frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
            $$




            You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line






            share|cite|improve this answer





























              3














              More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)



              If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.






              share|cite|improve this answer





























                0














                Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
                $$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$



                Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
                $$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
                D_p(langle x,xrangle_G)=langle p,2xrangle_G$$

                whereas
                $$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
                D_p(langle x,xrangle_G)=langle p,2Gxrangle$$






                share|cite|improve this answer























                  Your Answer





                  StackExchange.ifUsing("editor", function () {
                  return StackExchange.using("mathjaxEditing", function () {
                  StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                  StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                  });
                  });
                  }, "mathjax-editing");

                  StackExchange.ready(function() {
                  var channelOptions = {
                  tags: "".split(" "),
                  id: "69"
                  };
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function() {
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled) {
                  StackExchange.using("snippets", function() {
                  createEditor();
                  });
                  }
                  else {
                  createEditor();
                  }
                  });

                  function createEditor() {
                  StackExchange.prepareEditor({
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: true,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: 10,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader: {
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  },
                  noCode: true, onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  });


                  }
                  });














                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function () {
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3019859%2fderivative-of-inner-product%23new-answer', 'question_page');
                  }
                  );

                  Post as a guest















                  Required, but never shown

























                  4 Answers
                  4






                  active

                  oldest

                  votes








                  4 Answers
                  4






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  5














                  For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$



                  Being differentiable is equivalent to:
                  $$
                  f(x+h)=f(x)+df(x)cdot h+o(|h|)
                  $$



                  In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.



                  Let's be more explicit:
                  begin{align*}
                  f(x+h)=& langle x+h,x+h rangle_G \
                  =& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
                  end{align*}



                  Hence your differential is defined by
                  $$
                  df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
                  $$

                  where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.



                  Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:



                  $$
                  df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
                  $$

                  where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.






                  share|cite|improve this answer




























                    5














                    For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$



                    Being differentiable is equivalent to:
                    $$
                    f(x+h)=f(x)+df(x)cdot h+o(|h|)
                    $$



                    In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.



                    Let's be more explicit:
                    begin{align*}
                    f(x+h)=& langle x+h,x+h rangle_G \
                    =& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
                    end{align*}



                    Hence your differential is defined by
                    $$
                    df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
                    $$

                    where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.



                    Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:



                    $$
                    df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
                    $$

                    where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.






                    share|cite|improve this answer


























                      5












                      5








                      5






                      For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$



                      Being differentiable is equivalent to:
                      $$
                      f(x+h)=f(x)+df(x)cdot h+o(|h|)
                      $$



                      In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.



                      Let's be more explicit:
                      begin{align*}
                      f(x+h)=& langle x+h,x+h rangle_G \
                      =& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
                      end{align*}



                      Hence your differential is defined by
                      $$
                      df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
                      $$

                      where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.



                      Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:



                      $$
                      df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
                      $$

                      where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.






                      share|cite|improve this answer














                      For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$



                      Being differentiable is equivalent to:
                      $$
                      f(x+h)=f(x)+df(x)cdot h+o(|h|)
                      $$



                      In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.



                      Let's be more explicit:
                      begin{align*}
                      f(x+h)=& langle x+h,x+h rangle_G \
                      =& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
                      end{align*}



                      Hence your differential is defined by
                      $$
                      df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
                      $$

                      where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.



                      Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:



                      $$
                      df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
                      $$

                      where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.







                      share|cite|improve this answer














                      share|cite|improve this answer



                      share|cite|improve this answer








                      edited Nov 30 at 10:43

























                      answered Nov 30 at 10:09









                      Picaud Vincent

                      1,20037




                      1,20037























                          4














                          The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state




                          Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
                          $$
                          frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
                          $$




                          You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line






                          share|cite|improve this answer


























                            4














                            The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state




                            Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
                            $$
                            frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
                            $$




                            You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line






                            share|cite|improve this answer
























                              4












                              4








                              4






                              The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state




                              Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
                              $$
                              frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
                              $$




                              You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line






                              share|cite|improve this answer












                              The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state




                              Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
                              $$
                              frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
                              $$




                              You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line







                              share|cite|improve this answer












                              share|cite|improve this answer



                              share|cite|improve this answer










                              answered Nov 30 at 9:29









                              caverac

                              13k21028




                              13k21028























                                  3














                                  More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)



                                  If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.






                                  share|cite|improve this answer


























                                    3














                                    More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)



                                    If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.






                                    share|cite|improve this answer
























                                      3












                                      3








                                      3






                                      More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)



                                      If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.






                                      share|cite|improve this answer












                                      More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)



                                      If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.







                                      share|cite|improve this answer












                                      share|cite|improve this answer



                                      share|cite|improve this answer










                                      answered Nov 30 at 9:42









                                      J.G.

                                      21.9k22034




                                      21.9k22034























                                          0














                                          Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
                                          $$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$



                                          Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
                                          $$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
                                          D_p(langle x,xrangle_G)=langle p,2xrangle_G$$

                                          whereas
                                          $$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
                                          D_p(langle x,xrangle_G)=langle p,2Gxrangle$$






                                          share|cite|improve this answer




























                                            0














                                            Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
                                            $$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$



                                            Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
                                            $$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
                                            D_p(langle x,xrangle_G)=langle p,2xrangle_G$$

                                            whereas
                                            $$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
                                            D_p(langle x,xrangle_G)=langle p,2Gxrangle$$






                                            share|cite|improve this answer


























                                              0












                                              0








                                              0






                                              Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
                                              $$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$



                                              Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
                                              $$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
                                              D_p(langle x,xrangle_G)=langle p,2xrangle_G$$

                                              whereas
                                              $$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
                                              D_p(langle x,xrangle_G)=langle p,2Gxrangle$$






                                              share|cite|improve this answer














                                              Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
                                              $$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$



                                              Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
                                              $$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
                                              D_p(langle x,xrangle_G)=langle p,2xrangle_G$$

                                              whereas
                                              $$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
                                              D_p(langle x,xrangle_G)=langle p,2Gxrangle$$







                                              share|cite|improve this answer














                                              share|cite|improve this answer



                                              share|cite|improve this answer








                                              edited Nov 30 at 16:21

























                                              answered Nov 30 at 16:14









                                              Michael Hoppe

                                              10.8k31834




                                              10.8k31834






























                                                  draft saved

                                                  draft discarded




















































                                                  Thanks for contributing an answer to Mathematics Stack Exchange!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  Use MathJax to format equations. MathJax reference.


                                                  To learn more, see our tips on writing great answers.





                                                  Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                                  Please pay close attention to the following guidance:


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function () {
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3019859%2fderivative-of-inner-product%23new-answer', 'question_page');
                                                  }
                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  Berounka

                                                  Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

                                                  Sphinx de Gizeh