Binary trees constructed from the bottom up











up vote
1
down vote

favorite












I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).



I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.



"Fusion binary trees"



(Not to be confused with fusion trees.)
Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.



Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.



Iteratively I can thus fuse all the nodes together into a single binary tree.



The random ensemble



I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).



The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.



My question



I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).



I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.



If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.










share|cite|improve this question


























    up vote
    1
    down vote

    favorite












    I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).



    I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.



    "Fusion binary trees"



    (Not to be confused with fusion trees.)
    Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.



    Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.



    Iteratively I can thus fuse all the nodes together into a single binary tree.



    The random ensemble



    I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).



    The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.



    My question



    I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).



    I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.



    If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.










    share|cite|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).



      I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.



      "Fusion binary trees"



      (Not to be confused with fusion trees.)
      Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.



      Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.



      Iteratively I can thus fuse all the nodes together into a single binary tree.



      The random ensemble



      I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).



      The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.



      My question



      I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).



      I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.



      If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.










      share|cite|improve this question













      I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).



      I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.



      "Fusion binary trees"



      (Not to be confused with fusion trees.)
      Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.



      Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.



      Iteratively I can thus fuse all the nodes together into a single binary tree.



      The random ensemble



      I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).



      The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.



      My question



      I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).



      I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.



      If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.







      trees random-graphs






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 28 at 10:00









      derpy

      1,127611




      1,127611






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.



          If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)



          There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.





          Here's a not-very nice but exact description of the distribution.



          Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.



          This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.



          The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.



          This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.



          Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.



          Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.



          In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.






          share|cite|improve this answer























          • Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
            – derpy
            Dec 3 at 16:13











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3016976%2fbinary-trees-constructed-from-the-bottom-up%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.



          If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)



          There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.





          Here's a not-very nice but exact description of the distribution.



          Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.



          This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.



          The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.



          This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.



          Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.



          Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.



          In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.






          share|cite|improve this answer























          • Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
            – derpy
            Dec 3 at 16:13















          up vote
          1
          down vote













          As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.



          If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)



          There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.





          Here's a not-very nice but exact description of the distribution.



          Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.



          This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.



          The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.



          This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.



          Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.



          Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.



          In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.






          share|cite|improve this answer























          • Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
            – derpy
            Dec 3 at 16:13













          up vote
          1
          down vote










          up vote
          1
          down vote









          As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.



          If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)



          There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.





          Here's a not-very nice but exact description of the distribution.



          Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.



          This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.



          The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.



          This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.



          Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.



          Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.



          In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.






          share|cite|improve this answer














          As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.



          If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)



          There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.





          Here's a not-very nice but exact description of the distribution.



          Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.



          This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.



          The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.



          This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
          $$
          2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
          $$

          because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.



          Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.



          Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.



          In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Nov 30 at 22:41

























          answered Nov 30 at 21:01









          Misha Lavrov

          43.2k555103




          43.2k555103












          • Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
            – derpy
            Dec 3 at 16:13


















          • Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
            – derpy
            Dec 3 at 16:13
















          Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
          – derpy
          Dec 3 at 16:13




          Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
          – derpy
          Dec 3 at 16:13


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3016976%2fbinary-trees-constructed-from-the-bottom-up%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Berounka

          Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

          Sphinx de Gizeh