Binary trees constructed from the bottom up
up vote
1
down vote
favorite
I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).
I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.
"Fusion binary trees"
(Not to be confused with fusion trees.)
Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.
Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.
Iteratively I can thus fuse all the nodes together into a single binary tree.
The random ensemble
I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).
The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.
My question
I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).
I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.
If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.
trees random-graphs
add a comment |
up vote
1
down vote
favorite
I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).
I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.
"Fusion binary trees"
(Not to be confused with fusion trees.)
Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.
Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.
Iteratively I can thus fuse all the nodes together into a single binary tree.
The random ensemble
I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).
The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.
My question
I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).
I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.
If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.
trees random-graphs
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).
I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.
"Fusion binary trees"
(Not to be confused with fusion trees.)
Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.
Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.
Iteratively I can thus fuse all the nodes together into a single binary tree.
The random ensemble
I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).
The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.
My question
I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).
I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.
If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.
trees random-graphs
I'm dealing with a set of random binary trees which I can't find referenced anywhere in literature. Computer scientists seem to prefer "random search trees" which is a different ensemble than mine (that's partially why I'm asking here rather than the CS stackexchange: I don't think this ensemble is relevant to CS at all, though I may be wrong).
I would be very grateful if anyone could point me in the right direction toward a better understanding of these lovely graphs.
"Fusion binary trees"
(Not to be confused with fusion trees.)
Consider a set of $ n $ labeled nodes $ {(1), dotsc, (n)} $, initially disjoint from each other. Two such nodes are considered "adjacent" if their labels differ by 1 modulo $ n $.
Adjacent nodes can be joined into a tree by introducing a new node having those nodes as left- and right-children. For instance, I can introduce a new node having $ (3) $ and $ (4) $ as children, and call it $ (3,4) $. Now $ (3) $ and $ (4) $ cannot be fused anymore, whereas $ (3,4) $ is considered adjacent to $ (2) $ and $ (5) $ and can be fused with them.
Iteratively I can thus fuse all the nodes together into a single binary tree.
The random ensemble
I can encode a sequence of fusions as a permutation of $ (1, dotsc, n) $, where $ k $ represents the fusion of the unique unfused node containing label $ k $ with the node immediately to its right (with (1) being considered "to the right of" (n)). This encoding is well-defined though not one-to-one (fusing far-away nodes one after another is independent of the order of fusions).
The random ensemble I'm considering is then defined by taking uniformly random permutations of $ (1, dotsc, n) $. Note that this is a biased distribution on the space of all binary trees with $ n $ leaves.
My question
I'm trying to compute the probability distribution of the graph distance between two adjacent leaves of this tree. I'm content with the asymptotic form for large $ n $. Note that this pdf is the same for all pairs of adjacent leaves due to the periodic boundary conditions (in alternative, use open boundary conditions and focus on the central leaves).
I expect from numerical results that this pdf have an average of $ O(log n) $, but I've been unsuccessful at finding analytical results.
If you have any idea on how to tackle this problem, or any references where this ensemble is considered in literature, thank you very much in advance.
trees random-graphs
trees random-graphs
asked Nov 28 at 10:00
derpy
1,127611
1,127611
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.
If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)
There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.
Here's a not-very nice but exact description of the distribution.
Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.
This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.
The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.
This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
$$
2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
$$
2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.
Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.
Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.
In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3016976%2fbinary-trees-constructed-from-the-bottom-up%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.
If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)
There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.
Here's a not-very nice but exact description of the distribution.
Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.
This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.
The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.
This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
$$
2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
$$
2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.
Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.
Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.
In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
add a comment |
up vote
1
down vote
As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.
If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)
There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.
Here's a not-very nice but exact description of the distribution.
Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.
This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.
The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.
This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
$$
2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
$$
2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.
Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.
Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.
In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
add a comment |
up vote
1
down vote
up vote
1
down vote
As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.
If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)
There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.
Here's a not-very nice but exact description of the distribution.
Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.
This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.
The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.
This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
$$
2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
$$
2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.
Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.
Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.
In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.
As a starting point, the expected value of the distance between two adjacent nodes is $4(1 - frac1n)$.
If you traverse the shortest path from node $1$ to node $2$, then from node $2$ to node $3$, and so on until you take the shortest path from node $n$ to node $1$, then you end up using each edge of the tree exactly twice. (An edge created when we fused the subtree containing nodes ${i,i+1, dots, j-1,j}$ to another subtree is used when we go from $i-1$ to $i$, and again when we go from $j$ to $j+1$.)
There are $2n-1$ total nodes in the tree, and therefore there are $2n-2$ edges. So the total length of these shortest paths is $4n-4$. By symmetry (in the periodic-boundary version), each shortest path has the same distribution, so each shortest path has length $frac{4n-4}{n} = 4(1 - frac1n)$ in expectation.
Here's a not-very nice but exact description of the distribution.
Another useful view of "fusion binary trees" with the open boundary condition is as "fission binary trees" (an equally made-up term). Start with a tree that has $1$ node and no edges. Then, repeatedly choose a uniformly random leaf, and add a left child and a right child to that leaf. After $n-1$ steps, number all the leaves $1, 2, dots, n$ from left to right.
This model of random binary trees is equivalent, except that all the choices are made in reverse. We can prove this by induction. In your model, if we choose a random pair $(k,k+1)$ to fuse into one node, the rest of the tree is constructed from the vertices ${1, 2, dots, k-1, (k,k+1), k+2, dots, n}$, so it's a random $(n-1)$-leaf fusion binary tree. By the induction hypothesis, this is equivalent to a random $(n-1)$-leaf fission binary tree. Additionally, each adjacent pair $(k,k+1)$ was equally likely to be chosen to fuse first, so every leaf of the $(n-1)$-leaf tree is equally likely to be the fused one - and choosing which leaf that is is equivalent to fission of a random leaf of the $(n-1)$-leaf tree.
The fission model gives us a different numbering of adjacent pairs: instead of numbering them ${1,2}$, ${2,3}$, and so on through ${n-1,n}$, we can number them according to the step in the fission process when that pair was split up. That is, the step at which their closest common ancestor was chosen as the leaf to receive children. Each pair gets a number between $1$ and $n-1$.
This is useful, because we can talk about the distribution of the distance between the adjacent pair split up at step $k$. At each step after step $k$, there are $2$ possible leaves to choose which would contribute $1$ to that distance (starting from a baseline of $2$). So the distance between the adjacent pair has distribution
$$
2 + mathrm{Bernoulli}(tfrac2{k+1}) + mathrm{Bernoulli}(tfrac2{k+2}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
where the Bernoulli terms are independent. Similarly, the distance between the leftmost and rightmost vertex has the distribution
$$
2 + mathrm{Bernoulli}(tfrac2{2}) + mathrm{Bernoulli}(tfrac2{3}) + dots + mathrm{Bernoulli}(tfrac2{n-1})
$$
because (after the very first fission step) the distance between them increases by $1$ whenever the first or last vertex in the fission tree splits.
Now let's go back to the fusion binary tree with periodic boundary, since that has symmetry. We can obtain it from the fission binary tree model by picking a random $s$ between $1$ and $n$, and numbering the leaves at the bottom of the $n$-leaf fission tree as $s, s+1, dots, n, 1, dots, s-1$ from left to right.
Each adjacent pair has a $frac1n$ chance to be the pair split up at step $k$ in the fission process, and a $frac1n$ chance to end up being the pair ${s-1,s}$. So we get a mixture distribution of the $n$ possible Bernoulli sums given above. This mixture distribution describes the distribution of adjacent-pair distance exactly, though it's not a very nice answer.
In particular, the $mathrm{Bernoulli}(frac2t)$ term has a $frac tn$ chance of being included for each adjacent pair, so we can think of this distribution as $2$ plus a sum of $n-2$ different $mathrm{Bernoulli}(frac2n)$ random variables. But this does not make the distribution $2 + mathrm{Binomial}(n-2, frac2n)$ because the $mathrm{Bernoulli}(frac2n)$'s are correlated: the $mathrm{Bernoulli}(frac2t)$ terms are not included independently.
edited Nov 30 at 22:41
answered Nov 30 at 21:01
Misha Lavrov
43.2k555103
43.2k555103
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
add a comment |
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
Thank you! The simple argument about the average is very nice. I also like the "fission tree" representation, though I still need to think about it properly. Regardless, this is very helpful, so thanks again.
– derpy
Dec 3 at 16:13
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3016976%2fbinary-trees-constructed-from-the-bottom-up%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown