Confusing Sampling from observed data












1














Suppose we are given some small set of data on bundles of electrical wires and increasing voltages run through them, and we note how many of the individual wires fail.



So for example, a large data set we have 6 observations, for each 6,



there is $w_{i}$ number of wires, voltage $v_{i}$ and $f_{i}$ of the wires fail.



And suppose we are given some of the information for example, ( note that each sample has increased voltage and we see increased proportion of failed wires).



$w_{1}=14$ and $f_{1}=4$



$w_{2}=13$ and $f_{2}=4$



$w_{3}=7$ and $f_{3}=3$



$w_{4}=10$ and $f_{4}=5$



$w_{5}=12$ and $f_{5}=7$



$w_{6}=20$ and $f_{6}=13$



ie we have a parameter space such that ( $t$ is the proportion that fail) ${t_{i}: t_{1} lt t_{2} lt t_{3} lt .. lt t_{6} le 1}$ Assuming a flat prior over this.



My goal is to model this as a conditional distribution and sample so that I can make some statements about each $t_{i}$, such as the mean and deviations of each.(assuming flat prior) (ie. from histrograms)



Firstly I know about sampling, but I am wondering how from just the simple data, how I can accurately form the conditional distribution? Using rejection or transformations for example, and then Gibbs to make some conclusions on the individual failure proportions.



My thoughts:



Well it seems that the number of wires that fail is a function of the voltage. As voltage increases, so to does the proportion of failed wires.



Possibly I could use rejection method to sample from the distribution that is creating this?



So I would want to find some function $g(x)$ such that $g(x) ge f(x)$ for all $x$ , then simulate uniform random variables and check the conditions.



However, as of now I don't have a distribution. I guess I could form a hand drawn using the points and x axis as 1,2,3,4,5,6... and y being the corresponding proportion rate of failure.



I know for a distribution, we need the probabilities to sum/integrate to 1.



The probabilities here I assume would be the probability that a certain proportion fail. So for n wires, we would have the probability that $p_{1}=frac{1}{n}$ proportional fail, a probability for $p_{2}=frac{2}{n} $proportion fail, all the way to the probability that all wires fail.



So it looks more like the form of a CDF, as voltage increases, ie if we write in the form of a function, $F(v_{1})=frac{4}{14}$ , $F(v_{2})=frac{4}{13}$, and so forth, so if we had an unlimited sample, as $n to infty$ , $F(v_{n}) to 1$



and I suppose then $F^{-1}=f$ would be our density, but I am still not sure how to do this in finite case.



Issues: We are not told anything about underlying distribution, parameters or form. Only the data given. So do we take the data that is given to be the initialising values?



I was thinking I could possibly just assume that the failures follow a binomial distribution, with the binomial parameter following some other distribution such as a beta. How does that sound? Would we then need to also put some distribution on the $w_{i}$ ? I would be okay trying it without that distribution, but I want to understand how I can have the failure probability increase



Any advice , ideas and answers are much appreciated.










share|cite|improve this question




















  • 1




    This is probably better asked on cross-validated since it's pretty technical statistics. stats.stackexchange.com
    – Ethan Bolker
    Nov 30 at 0:28










  • I guess some sort of logistic regression (or other generalized linear model) could help as you are modeling a probability as a function of other independent variables. After you estimate the values, simulation is easy as you said they are conditional binomial.
    – BGM
    Nov 30 at 3:33










  • Is there a problem with the simple binomial model?
    – Mike Hawk
    Dec 4 at 15:02










  • Can you show the voltages?
    – Yuri Negometyanov
    Dec 4 at 19:23










  • voltages are unknown
    – Learning
    Dec 4 at 19:55
















1














Suppose we are given some small set of data on bundles of electrical wires and increasing voltages run through them, and we note how many of the individual wires fail.



So for example, a large data set we have 6 observations, for each 6,



there is $w_{i}$ number of wires, voltage $v_{i}$ and $f_{i}$ of the wires fail.



And suppose we are given some of the information for example, ( note that each sample has increased voltage and we see increased proportion of failed wires).



$w_{1}=14$ and $f_{1}=4$



$w_{2}=13$ and $f_{2}=4$



$w_{3}=7$ and $f_{3}=3$



$w_{4}=10$ and $f_{4}=5$



$w_{5}=12$ and $f_{5}=7$



$w_{6}=20$ and $f_{6}=13$



ie we have a parameter space such that ( $t$ is the proportion that fail) ${t_{i}: t_{1} lt t_{2} lt t_{3} lt .. lt t_{6} le 1}$ Assuming a flat prior over this.



My goal is to model this as a conditional distribution and sample so that I can make some statements about each $t_{i}$, such as the mean and deviations of each.(assuming flat prior) (ie. from histrograms)



Firstly I know about sampling, but I am wondering how from just the simple data, how I can accurately form the conditional distribution? Using rejection or transformations for example, and then Gibbs to make some conclusions on the individual failure proportions.



My thoughts:



Well it seems that the number of wires that fail is a function of the voltage. As voltage increases, so to does the proportion of failed wires.



Possibly I could use rejection method to sample from the distribution that is creating this?



So I would want to find some function $g(x)$ such that $g(x) ge f(x)$ for all $x$ , then simulate uniform random variables and check the conditions.



However, as of now I don't have a distribution. I guess I could form a hand drawn using the points and x axis as 1,2,3,4,5,6... and y being the corresponding proportion rate of failure.



I know for a distribution, we need the probabilities to sum/integrate to 1.



The probabilities here I assume would be the probability that a certain proportion fail. So for n wires, we would have the probability that $p_{1}=frac{1}{n}$ proportional fail, a probability for $p_{2}=frac{2}{n} $proportion fail, all the way to the probability that all wires fail.



So it looks more like the form of a CDF, as voltage increases, ie if we write in the form of a function, $F(v_{1})=frac{4}{14}$ , $F(v_{2})=frac{4}{13}$, and so forth, so if we had an unlimited sample, as $n to infty$ , $F(v_{n}) to 1$



and I suppose then $F^{-1}=f$ would be our density, but I am still not sure how to do this in finite case.



Issues: We are not told anything about underlying distribution, parameters or form. Only the data given. So do we take the data that is given to be the initialising values?



I was thinking I could possibly just assume that the failures follow a binomial distribution, with the binomial parameter following some other distribution such as a beta. How does that sound? Would we then need to also put some distribution on the $w_{i}$ ? I would be okay trying it without that distribution, but I want to understand how I can have the failure probability increase



Any advice , ideas and answers are much appreciated.










share|cite|improve this question




















  • 1




    This is probably better asked on cross-validated since it's pretty technical statistics. stats.stackexchange.com
    – Ethan Bolker
    Nov 30 at 0:28










  • I guess some sort of logistic regression (or other generalized linear model) could help as you are modeling a probability as a function of other independent variables. After you estimate the values, simulation is easy as you said they are conditional binomial.
    – BGM
    Nov 30 at 3:33










  • Is there a problem with the simple binomial model?
    – Mike Hawk
    Dec 4 at 15:02










  • Can you show the voltages?
    – Yuri Negometyanov
    Dec 4 at 19:23










  • voltages are unknown
    – Learning
    Dec 4 at 19:55














1












1








1


1





Suppose we are given some small set of data on bundles of electrical wires and increasing voltages run through them, and we note how many of the individual wires fail.



So for example, a large data set we have 6 observations, for each 6,



there is $w_{i}$ number of wires, voltage $v_{i}$ and $f_{i}$ of the wires fail.



And suppose we are given some of the information for example, ( note that each sample has increased voltage and we see increased proportion of failed wires).



$w_{1}=14$ and $f_{1}=4$



$w_{2}=13$ and $f_{2}=4$



$w_{3}=7$ and $f_{3}=3$



$w_{4}=10$ and $f_{4}=5$



$w_{5}=12$ and $f_{5}=7$



$w_{6}=20$ and $f_{6}=13$



ie we have a parameter space such that ( $t$ is the proportion that fail) ${t_{i}: t_{1} lt t_{2} lt t_{3} lt .. lt t_{6} le 1}$ Assuming a flat prior over this.



My goal is to model this as a conditional distribution and sample so that I can make some statements about each $t_{i}$, such as the mean and deviations of each.(assuming flat prior) (ie. from histrograms)



Firstly I know about sampling, but I am wondering how from just the simple data, how I can accurately form the conditional distribution? Using rejection or transformations for example, and then Gibbs to make some conclusions on the individual failure proportions.



My thoughts:



Well it seems that the number of wires that fail is a function of the voltage. As voltage increases, so to does the proportion of failed wires.



Possibly I could use rejection method to sample from the distribution that is creating this?



So I would want to find some function $g(x)$ such that $g(x) ge f(x)$ for all $x$ , then simulate uniform random variables and check the conditions.



However, as of now I don't have a distribution. I guess I could form a hand drawn using the points and x axis as 1,2,3,4,5,6... and y being the corresponding proportion rate of failure.



I know for a distribution, we need the probabilities to sum/integrate to 1.



The probabilities here I assume would be the probability that a certain proportion fail. So for n wires, we would have the probability that $p_{1}=frac{1}{n}$ proportional fail, a probability for $p_{2}=frac{2}{n} $proportion fail, all the way to the probability that all wires fail.



So it looks more like the form of a CDF, as voltage increases, ie if we write in the form of a function, $F(v_{1})=frac{4}{14}$ , $F(v_{2})=frac{4}{13}$, and so forth, so if we had an unlimited sample, as $n to infty$ , $F(v_{n}) to 1$



and I suppose then $F^{-1}=f$ would be our density, but I am still not sure how to do this in finite case.



Issues: We are not told anything about underlying distribution, parameters or form. Only the data given. So do we take the data that is given to be the initialising values?



I was thinking I could possibly just assume that the failures follow a binomial distribution, with the binomial parameter following some other distribution such as a beta. How does that sound? Would we then need to also put some distribution on the $w_{i}$ ? I would be okay trying it without that distribution, but I want to understand how I can have the failure probability increase



Any advice , ideas and answers are much appreciated.










share|cite|improve this question















Suppose we are given some small set of data on bundles of electrical wires and increasing voltages run through them, and we note how many of the individual wires fail.



So for example, a large data set we have 6 observations, for each 6,



there is $w_{i}$ number of wires, voltage $v_{i}$ and $f_{i}$ of the wires fail.



And suppose we are given some of the information for example, ( note that each sample has increased voltage and we see increased proportion of failed wires).



$w_{1}=14$ and $f_{1}=4$



$w_{2}=13$ and $f_{2}=4$



$w_{3}=7$ and $f_{3}=3$



$w_{4}=10$ and $f_{4}=5$



$w_{5}=12$ and $f_{5}=7$



$w_{6}=20$ and $f_{6}=13$



ie we have a parameter space such that ( $t$ is the proportion that fail) ${t_{i}: t_{1} lt t_{2} lt t_{3} lt .. lt t_{6} le 1}$ Assuming a flat prior over this.



My goal is to model this as a conditional distribution and sample so that I can make some statements about each $t_{i}$, such as the mean and deviations of each.(assuming flat prior) (ie. from histrograms)



Firstly I know about sampling, but I am wondering how from just the simple data, how I can accurately form the conditional distribution? Using rejection or transformations for example, and then Gibbs to make some conclusions on the individual failure proportions.



My thoughts:



Well it seems that the number of wires that fail is a function of the voltage. As voltage increases, so to does the proportion of failed wires.



Possibly I could use rejection method to sample from the distribution that is creating this?



So I would want to find some function $g(x)$ such that $g(x) ge f(x)$ for all $x$ , then simulate uniform random variables and check the conditions.



However, as of now I don't have a distribution. I guess I could form a hand drawn using the points and x axis as 1,2,3,4,5,6... and y being the corresponding proportion rate of failure.



I know for a distribution, we need the probabilities to sum/integrate to 1.



The probabilities here I assume would be the probability that a certain proportion fail. So for n wires, we would have the probability that $p_{1}=frac{1}{n}$ proportional fail, a probability for $p_{2}=frac{2}{n} $proportion fail, all the way to the probability that all wires fail.



So it looks more like the form of a CDF, as voltage increases, ie if we write in the form of a function, $F(v_{1})=frac{4}{14}$ , $F(v_{2})=frac{4}{13}$, and so forth, so if we had an unlimited sample, as $n to infty$ , $F(v_{n}) to 1$



and I suppose then $F^{-1}=f$ would be our density, but I am still not sure how to do this in finite case.



Issues: We are not told anything about underlying distribution, parameters or form. Only the data given. So do we take the data that is given to be the initialising values?



I was thinking I could possibly just assume that the failures follow a binomial distribution, with the binomial parameter following some other distribution such as a beta. How does that sound? Would we then need to also put some distribution on the $w_{i}$ ? I would be okay trying it without that distribution, but I want to understand how I can have the failure probability increase



Any advice , ideas and answers are much appreciated.







probability statistics bayesian conditional-probability sampling






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 1 at 22:00

























asked Nov 26 at 0:25









Learning

135




135








  • 1




    This is probably better asked on cross-validated since it's pretty technical statistics. stats.stackexchange.com
    – Ethan Bolker
    Nov 30 at 0:28










  • I guess some sort of logistic regression (or other generalized linear model) could help as you are modeling a probability as a function of other independent variables. After you estimate the values, simulation is easy as you said they are conditional binomial.
    – BGM
    Nov 30 at 3:33










  • Is there a problem with the simple binomial model?
    – Mike Hawk
    Dec 4 at 15:02










  • Can you show the voltages?
    – Yuri Negometyanov
    Dec 4 at 19:23










  • voltages are unknown
    – Learning
    Dec 4 at 19:55














  • 1




    This is probably better asked on cross-validated since it's pretty technical statistics. stats.stackexchange.com
    – Ethan Bolker
    Nov 30 at 0:28










  • I guess some sort of logistic regression (or other generalized linear model) could help as you are modeling a probability as a function of other independent variables. After you estimate the values, simulation is easy as you said they are conditional binomial.
    – BGM
    Nov 30 at 3:33










  • Is there a problem with the simple binomial model?
    – Mike Hawk
    Dec 4 at 15:02










  • Can you show the voltages?
    – Yuri Negometyanov
    Dec 4 at 19:23










  • voltages are unknown
    – Learning
    Dec 4 at 19:55








1




1




This is probably better asked on cross-validated since it's pretty technical statistics. stats.stackexchange.com
– Ethan Bolker
Nov 30 at 0:28




This is probably better asked on cross-validated since it's pretty technical statistics. stats.stackexchange.com
– Ethan Bolker
Nov 30 at 0:28












I guess some sort of logistic regression (or other generalized linear model) could help as you are modeling a probability as a function of other independent variables. After you estimate the values, simulation is easy as you said they are conditional binomial.
– BGM
Nov 30 at 3:33




I guess some sort of logistic regression (or other generalized linear model) could help as you are modeling a probability as a function of other independent variables. After you estimate the values, simulation is easy as you said they are conditional binomial.
– BGM
Nov 30 at 3:33












Is there a problem with the simple binomial model?
– Mike Hawk
Dec 4 at 15:02




Is there a problem with the simple binomial model?
– Mike Hawk
Dec 4 at 15:02












Can you show the voltages?
– Yuri Negometyanov
Dec 4 at 19:23




Can you show the voltages?
– Yuri Negometyanov
Dec 4 at 19:23












voltages are unknown
– Learning
Dec 4 at 19:55




voltages are unknown
– Learning
Dec 4 at 19:55










2 Answers
2






active

oldest

votes


















0














$textbf{Edition of 06.12.2018}$



Let us consider the third observation ($w_3=7,quad f_3=3$).



The binomial distribution can be presented as the table of values
$$P(w,f,p)=binom wf p^f(1-p)^{w-f},quad f=0,1,dots,w,tag1$$
$$begin{vmatrix}
f & P(w_3,f,p) & P_ileft(w_3,f,dfrac{f_3}{w_3}right) & P_F(w_3,f)\
0 & (1-p)^7 & 0.0198945 & 0.0512821\
1 & 7p(1-p)^6 & 0.104446 & 0.130536\
2 & 21p^2(1-p)^5 & 0.235004 & 0.195804\
3 & 35p^3(1-p)^7 & 0.293755 & 0.217560\
4 & 35p^4(1-p)^3 & 0.220316 & 0.190365\
5 & 21p^5(1-p)^2 & 0.0991424 & 0.130536\
6 & 7p^6(1-p) & 0.0247856 & 0.0652681\
7 & p^7 & 0.0026556 & 0.018648
end{vmatrix}tag2$$

where $p$ is unknown probability of the fail result in the single test.



There are two main ways to obtain $p(w_3,f_3).$



The first way MLM (maximum likelihood method) is to determine $p$ as the frequency
$$p(w_3,f_3) = dfrac{f_3}{w_3},tag4$$
(see also Wolfram Alpha plot of distribution)



p=3/7



The second way is Fiducial (Fisher) approach, when $p$ considers as the random value, the distribution function of which is
$$f_F(w_i,f_i,p) = CP(f_i,p) = Cbinom {w_i}{f_i} p^{f_i}(1-p)^{w_i-f_i},tag5$$
where the constant $C$ should be found from the condition
$$intlimits_0^1 f_F(w_i,f_i,p),mathrm dp = 1,$$
For $i=3$
$$f_F(w_3,f_3,p) = C_3P(w_3,f_3,p)= C_3cdot35p^3(1-p)^4,tag6$$
$$C_3=dfrac1{intlimits_0^1 P(w_3,f_3,p),mathrm dp} = dfrac1{intlimits_0^1 35p^3(1-p)^4,mathrm dp}=8tag7$$
(see also Wolfram Alpha).



Therefore,
$$f_F(w_3,f_3,p) = 8binom 73p^3(1-p)^4 = 280p^3(1-p)^4,tag8$$
and the distribution $(1)$ changes to
$$P_F(w_3,f,p)= intlimits_0^1 binom{w_3}f p^f(1-p)^{w_3-f} f_F(w_3,f_3,p),mathrm dp,quad f=0,1,dots,7tag9$$
(see also Wolfram Alpha plot of distribution)



fiducial



This approach looks more strict, because it takes in account parameter $w_i.$



The expectation $E(f)$ can be calculated as



$$E(f) = sum_{f=0}^w fP(f),$$
and variance $V(f)$ - as
$$V(f) = sum_{f=0}^w (f-E(f))^2 P(f)$$



The obtained information about parameter $p$ allows to get the distributions law for any $w.$ For $w_3=20$ the plot of the calculated distributions for the first way is



p=3/7 w=20



and for the second one is



Fiducial w=20



This allow comparing the probability distributions under observations with inhomogeneous statistics.



$$begin{vmatrix}
i & w_i & f_i & F_i & f_{Fi} & Eleft(20,frac{f_i}{w_i}right) & Vleft(20,frac {f_i}{w_i}right) & E_F(20,p) & V_F(20,p) \
1 & 14 & 4 & dfrac27 & 15015p^4(1-p)^{10} & dfrac{40}7 & dfrac{200}{49} & dfrac{25}4 & dfrac{2475}{72}\
2 & 13 & 4 & dfrac4{13} & 10010p^4(1-p)^{9} & dfrac{80}{13} & dfrac{720}{169} & dfrac{20}3 & dfrac{175}{18}\
3 & 7 & 3 & dfrac37 & 280p^3(1-p)^{4} & dfrac{60}7 & dfrac{240}{49} & dfrac{80}9 & dfrac{1160}{81}\
4 & 10 & 5 & dfrac12 & 2772p^5(1-p)^{5} & 10 & 5 & 10 & dfrac{160}{13}\
5 & 12 & 7 & dfrac7{12} & 10296p^7(1-p)^{5} & dfrac{35}3 & dfrac{175}{36} & dfrac{80}7 & dfrac{544}{49}\
6 & 20 & 13 & dfrac{13}{20} & 1627920p^{13}(1-p)^{7} & 13 & dfrac{91}{20} & dfrac{140}{11} & dfrac{23520}{2783}\
end{vmatrix}tag{10}$$

$mathbf{Observation 1quad w_1=14quad f_1=4}$



MLM plot:



MLM 4/14



Fiducial plot:



Fiducial 4/14



$mathbf{Observation 2quad w_2=13quad f_2=4}$



MLM plot:



MLM 4/13



Fiducial plot:



Fiducial 4/13



$mathbf{Observation 3quad w_3=7quad f_3=3}$



MLM plot:



MLM 3/7



Fiducial plot:



Fiducial 3/7



$mathbf{Observation 4quad w_4=10quad f_4=5}$



MLM plot:



MLM 5/10



Fiducial plot:



Fiducial 5/10



$mathbf{Observation 5quad w_5=12quad f_1=7}$



MLM plot:



MLM 5/12



Fiducial plot:



Fiducial 5/12



$mathbf{Observation 6quad w_6=20quad f_6=13}$



MLM plot



MLM 13/20



Fiducial plot:



Fiducial 13/20



Analysis of the graphs shows that with an increase in the volume of statistics, the results for the two methods converge.






share|cite|improve this answer































    0














    A sketch will be of much help to resume the terms of the problem.



    Wire_Insulation_1



    We have a production of wires in which the insulation
    resistance is spread over a range of voltages with a certain PDF and relevant CDF.



    We set a voltage $V_k$ in the range, and we take a relatively small sample of wires,
    of size $w_k$ (variable for each test) and record the number of wires that fails $f_k$.



    The $w_k$ wires will have a distribution of breaking voltages which ideally follows
    the population CDF, that is, when dividing the vertical range of probability into
    $w_k$ equal intervals, we would expect to find one wire into each (placed at its center).

    That means to say that the elements projected on the vertical scale will follow there a uniform
    probability density
    on the $[0,1]$ interval.



    Then we are going to assign to $V_k$ a value $P'_k$ for the CDF, corresponding to the interval limit
    between failed/not-failed as indicated in the sketch ($0.4$ in the example shown).



    Now, with respect to the underlying population distribution, corresponding to a huge sample,
    a small sample will introduce two kind of error:

    - a "discretization" error, because of the gap interval between failed / survived;

    - a "sampling" error, because the sample will deviate from an exact uniform distribution.



    We can inglobate the two by asking ourselves:

    given $w_k$ elements from a uniform distribution on $[0,1]$, with $f_k$ that failed the test, which is the probability that
    one of the failed elements be at the limit of the threshold $0 le P'_k le 1$, the remaining $f_k-1$ be below that, and $w_k-f_k$ above.



    That is clearly expressible as
    $$ bbox[lightyellow] {
    p(P'_{,k} ),dP'_{,k} = w_{,k} ,dP'_{,k} left( matrix{ w_{,k} - 1 cr
    w_{,k} - 1 cr} right) {P'_{,k}} ^{f_{,k} - 1} left( {1 - P'_{,k} } right)^{w_{,k} - f_{,k} }
    }$$



    It is easy to check, through the expression of the Beta Function
    that the integral of the above correctly gives $1$.

    In fact $p(P'_{,k} )$ is a Beta Distribution PDF
    $$ bbox[lightyellow] {
    p(P'_{,k} ) = Betaleft( {f_{,k} ,,w_{,k} - f_{,k} + 1} right)
    }$$

    because
    $$
    wleft( matrix{ w - 1 cr f - 1 cr} right)
    = w{{Gamma left( w right)} over {Gamma left( f right)Gamma left( {w - f + 1} right)}}
    = {{Gamma left( {w + 1} right)} over {Gamma left( f right)Gamma left( {w + 1 - f} right)}} = {1 over {{rm B}left( {f,w - f + 1} right)}}
    $$



    Note that in the cited reference it is affirmed that
    The beta distribution is a suitable model for the random behavior of percentages and proportions.



    In the construction above, we have set the threshold $P'_k$ to coincide with the failed element of highest resistance.
    Actually there is a gap between this and the first good item (that with lower resistance), so that the threshold could be
    moved up to this. That is equivalent to choosing a $ Betaleft( {f_{,k}+1 ,,w_{,k} - f_{,k}} right)$.



    So, if there is not a need for more sophistication, we can take the threshold to be at half of the gap, thus to take
    $$ bbox[lightyellow] {
    p(P'_{,k} ) = Betaleft( {f_{,k} + 1/2,,w_{,k} - f_{,k} + 1/2} right)
    }$$

    which gives a mean and variance of
    $$ bbox[lightyellow] {
    Eleft( {P'_{,k} } right) = {{f_{,k} + 1/2} over {w_{,k} + 1}}quad {rm var}left( {P'_{,k} } right)
    = {{left( {f_{,k} + 1/2} right)left( {w_{,k} - f_{,k} + 1/2} right)} over {left( {w_{,k} + 1} right)^{,2} left( {w_{,k} + 2} right)}}
    }$$



    It is this mean the value to assign to $P'_k$, associated with an "error" following the Beta distribution around that.



    After that you can perform a regression on the plot $V_k, P_k$ obtained, or a distribution fitting,
    to estimate the underlying population CDF.






    share|cite|improve this answer























      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "69"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3013626%2fconfusing-sampling-from-observed-data%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      $textbf{Edition of 06.12.2018}$



      Let us consider the third observation ($w_3=7,quad f_3=3$).



      The binomial distribution can be presented as the table of values
      $$P(w,f,p)=binom wf p^f(1-p)^{w-f},quad f=0,1,dots,w,tag1$$
      $$begin{vmatrix}
      f & P(w_3,f,p) & P_ileft(w_3,f,dfrac{f_3}{w_3}right) & P_F(w_3,f)\
      0 & (1-p)^7 & 0.0198945 & 0.0512821\
      1 & 7p(1-p)^6 & 0.104446 & 0.130536\
      2 & 21p^2(1-p)^5 & 0.235004 & 0.195804\
      3 & 35p^3(1-p)^7 & 0.293755 & 0.217560\
      4 & 35p^4(1-p)^3 & 0.220316 & 0.190365\
      5 & 21p^5(1-p)^2 & 0.0991424 & 0.130536\
      6 & 7p^6(1-p) & 0.0247856 & 0.0652681\
      7 & p^7 & 0.0026556 & 0.018648
      end{vmatrix}tag2$$

      where $p$ is unknown probability of the fail result in the single test.



      There are two main ways to obtain $p(w_3,f_3).$



      The first way MLM (maximum likelihood method) is to determine $p$ as the frequency
      $$p(w_3,f_3) = dfrac{f_3}{w_3},tag4$$
      (see also Wolfram Alpha plot of distribution)



      p=3/7



      The second way is Fiducial (Fisher) approach, when $p$ considers as the random value, the distribution function of which is
      $$f_F(w_i,f_i,p) = CP(f_i,p) = Cbinom {w_i}{f_i} p^{f_i}(1-p)^{w_i-f_i},tag5$$
      where the constant $C$ should be found from the condition
      $$intlimits_0^1 f_F(w_i,f_i,p),mathrm dp = 1,$$
      For $i=3$
      $$f_F(w_3,f_3,p) = C_3P(w_3,f_3,p)= C_3cdot35p^3(1-p)^4,tag6$$
      $$C_3=dfrac1{intlimits_0^1 P(w_3,f_3,p),mathrm dp} = dfrac1{intlimits_0^1 35p^3(1-p)^4,mathrm dp}=8tag7$$
      (see also Wolfram Alpha).



      Therefore,
      $$f_F(w_3,f_3,p) = 8binom 73p^3(1-p)^4 = 280p^3(1-p)^4,tag8$$
      and the distribution $(1)$ changes to
      $$P_F(w_3,f,p)= intlimits_0^1 binom{w_3}f p^f(1-p)^{w_3-f} f_F(w_3,f_3,p),mathrm dp,quad f=0,1,dots,7tag9$$
      (see also Wolfram Alpha plot of distribution)



      fiducial



      This approach looks more strict, because it takes in account parameter $w_i.$



      The expectation $E(f)$ can be calculated as



      $$E(f) = sum_{f=0}^w fP(f),$$
      and variance $V(f)$ - as
      $$V(f) = sum_{f=0}^w (f-E(f))^2 P(f)$$



      The obtained information about parameter $p$ allows to get the distributions law for any $w.$ For $w_3=20$ the plot of the calculated distributions for the first way is



      p=3/7 w=20



      and for the second one is



      Fiducial w=20



      This allow comparing the probability distributions under observations with inhomogeneous statistics.



      $$begin{vmatrix}
      i & w_i & f_i & F_i & f_{Fi} & Eleft(20,frac{f_i}{w_i}right) & Vleft(20,frac {f_i}{w_i}right) & E_F(20,p) & V_F(20,p) \
      1 & 14 & 4 & dfrac27 & 15015p^4(1-p)^{10} & dfrac{40}7 & dfrac{200}{49} & dfrac{25}4 & dfrac{2475}{72}\
      2 & 13 & 4 & dfrac4{13} & 10010p^4(1-p)^{9} & dfrac{80}{13} & dfrac{720}{169} & dfrac{20}3 & dfrac{175}{18}\
      3 & 7 & 3 & dfrac37 & 280p^3(1-p)^{4} & dfrac{60}7 & dfrac{240}{49} & dfrac{80}9 & dfrac{1160}{81}\
      4 & 10 & 5 & dfrac12 & 2772p^5(1-p)^{5} & 10 & 5 & 10 & dfrac{160}{13}\
      5 & 12 & 7 & dfrac7{12} & 10296p^7(1-p)^{5} & dfrac{35}3 & dfrac{175}{36} & dfrac{80}7 & dfrac{544}{49}\
      6 & 20 & 13 & dfrac{13}{20} & 1627920p^{13}(1-p)^{7} & 13 & dfrac{91}{20} & dfrac{140}{11} & dfrac{23520}{2783}\
      end{vmatrix}tag{10}$$

      $mathbf{Observation 1quad w_1=14quad f_1=4}$



      MLM plot:



      MLM 4/14



      Fiducial plot:



      Fiducial 4/14



      $mathbf{Observation 2quad w_2=13quad f_2=4}$



      MLM plot:



      MLM 4/13



      Fiducial plot:



      Fiducial 4/13



      $mathbf{Observation 3quad w_3=7quad f_3=3}$



      MLM plot:



      MLM 3/7



      Fiducial plot:



      Fiducial 3/7



      $mathbf{Observation 4quad w_4=10quad f_4=5}$



      MLM plot:



      MLM 5/10



      Fiducial plot:



      Fiducial 5/10



      $mathbf{Observation 5quad w_5=12quad f_1=7}$



      MLM plot:



      MLM 5/12



      Fiducial plot:



      Fiducial 5/12



      $mathbf{Observation 6quad w_6=20quad f_6=13}$



      MLM plot



      MLM 13/20



      Fiducial plot:



      Fiducial 13/20



      Analysis of the graphs shows that with an increase in the volume of statistics, the results for the two methods converge.






      share|cite|improve this answer




























        0














        $textbf{Edition of 06.12.2018}$



        Let us consider the third observation ($w_3=7,quad f_3=3$).



        The binomial distribution can be presented as the table of values
        $$P(w,f,p)=binom wf p^f(1-p)^{w-f},quad f=0,1,dots,w,tag1$$
        $$begin{vmatrix}
        f & P(w_3,f,p) & P_ileft(w_3,f,dfrac{f_3}{w_3}right) & P_F(w_3,f)\
        0 & (1-p)^7 & 0.0198945 & 0.0512821\
        1 & 7p(1-p)^6 & 0.104446 & 0.130536\
        2 & 21p^2(1-p)^5 & 0.235004 & 0.195804\
        3 & 35p^3(1-p)^7 & 0.293755 & 0.217560\
        4 & 35p^4(1-p)^3 & 0.220316 & 0.190365\
        5 & 21p^5(1-p)^2 & 0.0991424 & 0.130536\
        6 & 7p^6(1-p) & 0.0247856 & 0.0652681\
        7 & p^7 & 0.0026556 & 0.018648
        end{vmatrix}tag2$$

        where $p$ is unknown probability of the fail result in the single test.



        There are two main ways to obtain $p(w_3,f_3).$



        The first way MLM (maximum likelihood method) is to determine $p$ as the frequency
        $$p(w_3,f_3) = dfrac{f_3}{w_3},tag4$$
        (see also Wolfram Alpha plot of distribution)



        p=3/7



        The second way is Fiducial (Fisher) approach, when $p$ considers as the random value, the distribution function of which is
        $$f_F(w_i,f_i,p) = CP(f_i,p) = Cbinom {w_i}{f_i} p^{f_i}(1-p)^{w_i-f_i},tag5$$
        where the constant $C$ should be found from the condition
        $$intlimits_0^1 f_F(w_i,f_i,p),mathrm dp = 1,$$
        For $i=3$
        $$f_F(w_3,f_3,p) = C_3P(w_3,f_3,p)= C_3cdot35p^3(1-p)^4,tag6$$
        $$C_3=dfrac1{intlimits_0^1 P(w_3,f_3,p),mathrm dp} = dfrac1{intlimits_0^1 35p^3(1-p)^4,mathrm dp}=8tag7$$
        (see also Wolfram Alpha).



        Therefore,
        $$f_F(w_3,f_3,p) = 8binom 73p^3(1-p)^4 = 280p^3(1-p)^4,tag8$$
        and the distribution $(1)$ changes to
        $$P_F(w_3,f,p)= intlimits_0^1 binom{w_3}f p^f(1-p)^{w_3-f} f_F(w_3,f_3,p),mathrm dp,quad f=0,1,dots,7tag9$$
        (see also Wolfram Alpha plot of distribution)



        fiducial



        This approach looks more strict, because it takes in account parameter $w_i.$



        The expectation $E(f)$ can be calculated as



        $$E(f) = sum_{f=0}^w fP(f),$$
        and variance $V(f)$ - as
        $$V(f) = sum_{f=0}^w (f-E(f))^2 P(f)$$



        The obtained information about parameter $p$ allows to get the distributions law for any $w.$ For $w_3=20$ the plot of the calculated distributions for the first way is



        p=3/7 w=20



        and for the second one is



        Fiducial w=20



        This allow comparing the probability distributions under observations with inhomogeneous statistics.



        $$begin{vmatrix}
        i & w_i & f_i & F_i & f_{Fi} & Eleft(20,frac{f_i}{w_i}right) & Vleft(20,frac {f_i}{w_i}right) & E_F(20,p) & V_F(20,p) \
        1 & 14 & 4 & dfrac27 & 15015p^4(1-p)^{10} & dfrac{40}7 & dfrac{200}{49} & dfrac{25}4 & dfrac{2475}{72}\
        2 & 13 & 4 & dfrac4{13} & 10010p^4(1-p)^{9} & dfrac{80}{13} & dfrac{720}{169} & dfrac{20}3 & dfrac{175}{18}\
        3 & 7 & 3 & dfrac37 & 280p^3(1-p)^{4} & dfrac{60}7 & dfrac{240}{49} & dfrac{80}9 & dfrac{1160}{81}\
        4 & 10 & 5 & dfrac12 & 2772p^5(1-p)^{5} & 10 & 5 & 10 & dfrac{160}{13}\
        5 & 12 & 7 & dfrac7{12} & 10296p^7(1-p)^{5} & dfrac{35}3 & dfrac{175}{36} & dfrac{80}7 & dfrac{544}{49}\
        6 & 20 & 13 & dfrac{13}{20} & 1627920p^{13}(1-p)^{7} & 13 & dfrac{91}{20} & dfrac{140}{11} & dfrac{23520}{2783}\
        end{vmatrix}tag{10}$$

        $mathbf{Observation 1quad w_1=14quad f_1=4}$



        MLM plot:



        MLM 4/14



        Fiducial plot:



        Fiducial 4/14



        $mathbf{Observation 2quad w_2=13quad f_2=4}$



        MLM plot:



        MLM 4/13



        Fiducial plot:



        Fiducial 4/13



        $mathbf{Observation 3quad w_3=7quad f_3=3}$



        MLM plot:



        MLM 3/7



        Fiducial plot:



        Fiducial 3/7



        $mathbf{Observation 4quad w_4=10quad f_4=5}$



        MLM plot:



        MLM 5/10



        Fiducial plot:



        Fiducial 5/10



        $mathbf{Observation 5quad w_5=12quad f_1=7}$



        MLM plot:



        MLM 5/12



        Fiducial plot:



        Fiducial 5/12



        $mathbf{Observation 6quad w_6=20quad f_6=13}$



        MLM plot



        MLM 13/20



        Fiducial plot:



        Fiducial 13/20



        Analysis of the graphs shows that with an increase in the volume of statistics, the results for the two methods converge.






        share|cite|improve this answer


























          0












          0








          0






          $textbf{Edition of 06.12.2018}$



          Let us consider the third observation ($w_3=7,quad f_3=3$).



          The binomial distribution can be presented as the table of values
          $$P(w,f,p)=binom wf p^f(1-p)^{w-f},quad f=0,1,dots,w,tag1$$
          $$begin{vmatrix}
          f & P(w_3,f,p) & P_ileft(w_3,f,dfrac{f_3}{w_3}right) & P_F(w_3,f)\
          0 & (1-p)^7 & 0.0198945 & 0.0512821\
          1 & 7p(1-p)^6 & 0.104446 & 0.130536\
          2 & 21p^2(1-p)^5 & 0.235004 & 0.195804\
          3 & 35p^3(1-p)^7 & 0.293755 & 0.217560\
          4 & 35p^4(1-p)^3 & 0.220316 & 0.190365\
          5 & 21p^5(1-p)^2 & 0.0991424 & 0.130536\
          6 & 7p^6(1-p) & 0.0247856 & 0.0652681\
          7 & p^7 & 0.0026556 & 0.018648
          end{vmatrix}tag2$$

          where $p$ is unknown probability of the fail result in the single test.



          There are two main ways to obtain $p(w_3,f_3).$



          The first way MLM (maximum likelihood method) is to determine $p$ as the frequency
          $$p(w_3,f_3) = dfrac{f_3}{w_3},tag4$$
          (see also Wolfram Alpha plot of distribution)



          p=3/7



          The second way is Fiducial (Fisher) approach, when $p$ considers as the random value, the distribution function of which is
          $$f_F(w_i,f_i,p) = CP(f_i,p) = Cbinom {w_i}{f_i} p^{f_i}(1-p)^{w_i-f_i},tag5$$
          where the constant $C$ should be found from the condition
          $$intlimits_0^1 f_F(w_i,f_i,p),mathrm dp = 1,$$
          For $i=3$
          $$f_F(w_3,f_3,p) = C_3P(w_3,f_3,p)= C_3cdot35p^3(1-p)^4,tag6$$
          $$C_3=dfrac1{intlimits_0^1 P(w_3,f_3,p),mathrm dp} = dfrac1{intlimits_0^1 35p^3(1-p)^4,mathrm dp}=8tag7$$
          (see also Wolfram Alpha).



          Therefore,
          $$f_F(w_3,f_3,p) = 8binom 73p^3(1-p)^4 = 280p^3(1-p)^4,tag8$$
          and the distribution $(1)$ changes to
          $$P_F(w_3,f,p)= intlimits_0^1 binom{w_3}f p^f(1-p)^{w_3-f} f_F(w_3,f_3,p),mathrm dp,quad f=0,1,dots,7tag9$$
          (see also Wolfram Alpha plot of distribution)



          fiducial



          This approach looks more strict, because it takes in account parameter $w_i.$



          The expectation $E(f)$ can be calculated as



          $$E(f) = sum_{f=0}^w fP(f),$$
          and variance $V(f)$ - as
          $$V(f) = sum_{f=0}^w (f-E(f))^2 P(f)$$



          The obtained information about parameter $p$ allows to get the distributions law for any $w.$ For $w_3=20$ the plot of the calculated distributions for the first way is



          p=3/7 w=20



          and for the second one is



          Fiducial w=20



          This allow comparing the probability distributions under observations with inhomogeneous statistics.



          $$begin{vmatrix}
          i & w_i & f_i & F_i & f_{Fi} & Eleft(20,frac{f_i}{w_i}right) & Vleft(20,frac {f_i}{w_i}right) & E_F(20,p) & V_F(20,p) \
          1 & 14 & 4 & dfrac27 & 15015p^4(1-p)^{10} & dfrac{40}7 & dfrac{200}{49} & dfrac{25}4 & dfrac{2475}{72}\
          2 & 13 & 4 & dfrac4{13} & 10010p^4(1-p)^{9} & dfrac{80}{13} & dfrac{720}{169} & dfrac{20}3 & dfrac{175}{18}\
          3 & 7 & 3 & dfrac37 & 280p^3(1-p)^{4} & dfrac{60}7 & dfrac{240}{49} & dfrac{80}9 & dfrac{1160}{81}\
          4 & 10 & 5 & dfrac12 & 2772p^5(1-p)^{5} & 10 & 5 & 10 & dfrac{160}{13}\
          5 & 12 & 7 & dfrac7{12} & 10296p^7(1-p)^{5} & dfrac{35}3 & dfrac{175}{36} & dfrac{80}7 & dfrac{544}{49}\
          6 & 20 & 13 & dfrac{13}{20} & 1627920p^{13}(1-p)^{7} & 13 & dfrac{91}{20} & dfrac{140}{11} & dfrac{23520}{2783}\
          end{vmatrix}tag{10}$$

          $mathbf{Observation 1quad w_1=14quad f_1=4}$



          MLM plot:



          MLM 4/14



          Fiducial plot:



          Fiducial 4/14



          $mathbf{Observation 2quad w_2=13quad f_2=4}$



          MLM plot:



          MLM 4/13



          Fiducial plot:



          Fiducial 4/13



          $mathbf{Observation 3quad w_3=7quad f_3=3}$



          MLM plot:



          MLM 3/7



          Fiducial plot:



          Fiducial 3/7



          $mathbf{Observation 4quad w_4=10quad f_4=5}$



          MLM plot:



          MLM 5/10



          Fiducial plot:



          Fiducial 5/10



          $mathbf{Observation 5quad w_5=12quad f_1=7}$



          MLM plot:



          MLM 5/12



          Fiducial plot:



          Fiducial 5/12



          $mathbf{Observation 6quad w_6=20quad f_6=13}$



          MLM plot



          MLM 13/20



          Fiducial plot:



          Fiducial 13/20



          Analysis of the graphs shows that with an increase in the volume of statistics, the results for the two methods converge.






          share|cite|improve this answer














          $textbf{Edition of 06.12.2018}$



          Let us consider the third observation ($w_3=7,quad f_3=3$).



          The binomial distribution can be presented as the table of values
          $$P(w,f,p)=binom wf p^f(1-p)^{w-f},quad f=0,1,dots,w,tag1$$
          $$begin{vmatrix}
          f & P(w_3,f,p) & P_ileft(w_3,f,dfrac{f_3}{w_3}right) & P_F(w_3,f)\
          0 & (1-p)^7 & 0.0198945 & 0.0512821\
          1 & 7p(1-p)^6 & 0.104446 & 0.130536\
          2 & 21p^2(1-p)^5 & 0.235004 & 0.195804\
          3 & 35p^3(1-p)^7 & 0.293755 & 0.217560\
          4 & 35p^4(1-p)^3 & 0.220316 & 0.190365\
          5 & 21p^5(1-p)^2 & 0.0991424 & 0.130536\
          6 & 7p^6(1-p) & 0.0247856 & 0.0652681\
          7 & p^7 & 0.0026556 & 0.018648
          end{vmatrix}tag2$$

          where $p$ is unknown probability of the fail result in the single test.



          There are two main ways to obtain $p(w_3,f_3).$



          The first way MLM (maximum likelihood method) is to determine $p$ as the frequency
          $$p(w_3,f_3) = dfrac{f_3}{w_3},tag4$$
          (see also Wolfram Alpha plot of distribution)



          p=3/7



          The second way is Fiducial (Fisher) approach, when $p$ considers as the random value, the distribution function of which is
          $$f_F(w_i,f_i,p) = CP(f_i,p) = Cbinom {w_i}{f_i} p^{f_i}(1-p)^{w_i-f_i},tag5$$
          where the constant $C$ should be found from the condition
          $$intlimits_0^1 f_F(w_i,f_i,p),mathrm dp = 1,$$
          For $i=3$
          $$f_F(w_3,f_3,p) = C_3P(w_3,f_3,p)= C_3cdot35p^3(1-p)^4,tag6$$
          $$C_3=dfrac1{intlimits_0^1 P(w_3,f_3,p),mathrm dp} = dfrac1{intlimits_0^1 35p^3(1-p)^4,mathrm dp}=8tag7$$
          (see also Wolfram Alpha).



          Therefore,
          $$f_F(w_3,f_3,p) = 8binom 73p^3(1-p)^4 = 280p^3(1-p)^4,tag8$$
          and the distribution $(1)$ changes to
          $$P_F(w_3,f,p)= intlimits_0^1 binom{w_3}f p^f(1-p)^{w_3-f} f_F(w_3,f_3,p),mathrm dp,quad f=0,1,dots,7tag9$$
          (see also Wolfram Alpha plot of distribution)



          fiducial



          This approach looks more strict, because it takes in account parameter $w_i.$



          The expectation $E(f)$ can be calculated as



          $$E(f) = sum_{f=0}^w fP(f),$$
          and variance $V(f)$ - as
          $$V(f) = sum_{f=0}^w (f-E(f))^2 P(f)$$



          The obtained information about parameter $p$ allows to get the distributions law for any $w.$ For $w_3=20$ the plot of the calculated distributions for the first way is



          p=3/7 w=20



          and for the second one is



          Fiducial w=20



          This allow comparing the probability distributions under observations with inhomogeneous statistics.



          $$begin{vmatrix}
          i & w_i & f_i & F_i & f_{Fi} & Eleft(20,frac{f_i}{w_i}right) & Vleft(20,frac {f_i}{w_i}right) & E_F(20,p) & V_F(20,p) \
          1 & 14 & 4 & dfrac27 & 15015p^4(1-p)^{10} & dfrac{40}7 & dfrac{200}{49} & dfrac{25}4 & dfrac{2475}{72}\
          2 & 13 & 4 & dfrac4{13} & 10010p^4(1-p)^{9} & dfrac{80}{13} & dfrac{720}{169} & dfrac{20}3 & dfrac{175}{18}\
          3 & 7 & 3 & dfrac37 & 280p^3(1-p)^{4} & dfrac{60}7 & dfrac{240}{49} & dfrac{80}9 & dfrac{1160}{81}\
          4 & 10 & 5 & dfrac12 & 2772p^5(1-p)^{5} & 10 & 5 & 10 & dfrac{160}{13}\
          5 & 12 & 7 & dfrac7{12} & 10296p^7(1-p)^{5} & dfrac{35}3 & dfrac{175}{36} & dfrac{80}7 & dfrac{544}{49}\
          6 & 20 & 13 & dfrac{13}{20} & 1627920p^{13}(1-p)^{7} & 13 & dfrac{91}{20} & dfrac{140}{11} & dfrac{23520}{2783}\
          end{vmatrix}tag{10}$$

          $mathbf{Observation 1quad w_1=14quad f_1=4}$



          MLM plot:



          MLM 4/14



          Fiducial plot:



          Fiducial 4/14



          $mathbf{Observation 2quad w_2=13quad f_2=4}$



          MLM plot:



          MLM 4/13



          Fiducial plot:



          Fiducial 4/13



          $mathbf{Observation 3quad w_3=7quad f_3=3}$



          MLM plot:



          MLM 3/7



          Fiducial plot:



          Fiducial 3/7



          $mathbf{Observation 4quad w_4=10quad f_4=5}$



          MLM plot:



          MLM 5/10



          Fiducial plot:



          Fiducial 5/10



          $mathbf{Observation 5quad w_5=12quad f_1=7}$



          MLM plot:



          MLM 5/12



          Fiducial plot:



          Fiducial 5/12



          $mathbf{Observation 6quad w_6=20quad f_6=13}$



          MLM plot



          MLM 13/20



          Fiducial plot:



          Fiducial 13/20



          Analysis of the graphs shows that with an increase in the volume of statistics, the results for the two methods converge.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Dec 8 at 22:21

























          answered Dec 5 at 20:42









          Yuri Negometyanov

          10.7k1725




          10.7k1725























              0














              A sketch will be of much help to resume the terms of the problem.



              Wire_Insulation_1



              We have a production of wires in which the insulation
              resistance is spread over a range of voltages with a certain PDF and relevant CDF.



              We set a voltage $V_k$ in the range, and we take a relatively small sample of wires,
              of size $w_k$ (variable for each test) and record the number of wires that fails $f_k$.



              The $w_k$ wires will have a distribution of breaking voltages which ideally follows
              the population CDF, that is, when dividing the vertical range of probability into
              $w_k$ equal intervals, we would expect to find one wire into each (placed at its center).

              That means to say that the elements projected on the vertical scale will follow there a uniform
              probability density
              on the $[0,1]$ interval.



              Then we are going to assign to $V_k$ a value $P'_k$ for the CDF, corresponding to the interval limit
              between failed/not-failed as indicated in the sketch ($0.4$ in the example shown).



              Now, with respect to the underlying population distribution, corresponding to a huge sample,
              a small sample will introduce two kind of error:

              - a "discretization" error, because of the gap interval between failed / survived;

              - a "sampling" error, because the sample will deviate from an exact uniform distribution.



              We can inglobate the two by asking ourselves:

              given $w_k$ elements from a uniform distribution on $[0,1]$, with $f_k$ that failed the test, which is the probability that
              one of the failed elements be at the limit of the threshold $0 le P'_k le 1$, the remaining $f_k-1$ be below that, and $w_k-f_k$ above.



              That is clearly expressible as
              $$ bbox[lightyellow] {
              p(P'_{,k} ),dP'_{,k} = w_{,k} ,dP'_{,k} left( matrix{ w_{,k} - 1 cr
              w_{,k} - 1 cr} right) {P'_{,k}} ^{f_{,k} - 1} left( {1 - P'_{,k} } right)^{w_{,k} - f_{,k} }
              }$$



              It is easy to check, through the expression of the Beta Function
              that the integral of the above correctly gives $1$.

              In fact $p(P'_{,k} )$ is a Beta Distribution PDF
              $$ bbox[lightyellow] {
              p(P'_{,k} ) = Betaleft( {f_{,k} ,,w_{,k} - f_{,k} + 1} right)
              }$$

              because
              $$
              wleft( matrix{ w - 1 cr f - 1 cr} right)
              = w{{Gamma left( w right)} over {Gamma left( f right)Gamma left( {w - f + 1} right)}}
              = {{Gamma left( {w + 1} right)} over {Gamma left( f right)Gamma left( {w + 1 - f} right)}} = {1 over {{rm B}left( {f,w - f + 1} right)}}
              $$



              Note that in the cited reference it is affirmed that
              The beta distribution is a suitable model for the random behavior of percentages and proportions.



              In the construction above, we have set the threshold $P'_k$ to coincide with the failed element of highest resistance.
              Actually there is a gap between this and the first good item (that with lower resistance), so that the threshold could be
              moved up to this. That is equivalent to choosing a $ Betaleft( {f_{,k}+1 ,,w_{,k} - f_{,k}} right)$.



              So, if there is not a need for more sophistication, we can take the threshold to be at half of the gap, thus to take
              $$ bbox[lightyellow] {
              p(P'_{,k} ) = Betaleft( {f_{,k} + 1/2,,w_{,k} - f_{,k} + 1/2} right)
              }$$

              which gives a mean and variance of
              $$ bbox[lightyellow] {
              Eleft( {P'_{,k} } right) = {{f_{,k} + 1/2} over {w_{,k} + 1}}quad {rm var}left( {P'_{,k} } right)
              = {{left( {f_{,k} + 1/2} right)left( {w_{,k} - f_{,k} + 1/2} right)} over {left( {w_{,k} + 1} right)^{,2} left( {w_{,k} + 2} right)}}
              }$$



              It is this mean the value to assign to $P'_k$, associated with an "error" following the Beta distribution around that.



              After that you can perform a regression on the plot $V_k, P_k$ obtained, or a distribution fitting,
              to estimate the underlying population CDF.






              share|cite|improve this answer




























                0














                A sketch will be of much help to resume the terms of the problem.



                Wire_Insulation_1



                We have a production of wires in which the insulation
                resistance is spread over a range of voltages with a certain PDF and relevant CDF.



                We set a voltage $V_k$ in the range, and we take a relatively small sample of wires,
                of size $w_k$ (variable for each test) and record the number of wires that fails $f_k$.



                The $w_k$ wires will have a distribution of breaking voltages which ideally follows
                the population CDF, that is, when dividing the vertical range of probability into
                $w_k$ equal intervals, we would expect to find one wire into each (placed at its center).

                That means to say that the elements projected on the vertical scale will follow there a uniform
                probability density
                on the $[0,1]$ interval.



                Then we are going to assign to $V_k$ a value $P'_k$ for the CDF, corresponding to the interval limit
                between failed/not-failed as indicated in the sketch ($0.4$ in the example shown).



                Now, with respect to the underlying population distribution, corresponding to a huge sample,
                a small sample will introduce two kind of error:

                - a "discretization" error, because of the gap interval between failed / survived;

                - a "sampling" error, because the sample will deviate from an exact uniform distribution.



                We can inglobate the two by asking ourselves:

                given $w_k$ elements from a uniform distribution on $[0,1]$, with $f_k$ that failed the test, which is the probability that
                one of the failed elements be at the limit of the threshold $0 le P'_k le 1$, the remaining $f_k-1$ be below that, and $w_k-f_k$ above.



                That is clearly expressible as
                $$ bbox[lightyellow] {
                p(P'_{,k} ),dP'_{,k} = w_{,k} ,dP'_{,k} left( matrix{ w_{,k} - 1 cr
                w_{,k} - 1 cr} right) {P'_{,k}} ^{f_{,k} - 1} left( {1 - P'_{,k} } right)^{w_{,k} - f_{,k} }
                }$$



                It is easy to check, through the expression of the Beta Function
                that the integral of the above correctly gives $1$.

                In fact $p(P'_{,k} )$ is a Beta Distribution PDF
                $$ bbox[lightyellow] {
                p(P'_{,k} ) = Betaleft( {f_{,k} ,,w_{,k} - f_{,k} + 1} right)
                }$$

                because
                $$
                wleft( matrix{ w - 1 cr f - 1 cr} right)
                = w{{Gamma left( w right)} over {Gamma left( f right)Gamma left( {w - f + 1} right)}}
                = {{Gamma left( {w + 1} right)} over {Gamma left( f right)Gamma left( {w + 1 - f} right)}} = {1 over {{rm B}left( {f,w - f + 1} right)}}
                $$



                Note that in the cited reference it is affirmed that
                The beta distribution is a suitable model for the random behavior of percentages and proportions.



                In the construction above, we have set the threshold $P'_k$ to coincide with the failed element of highest resistance.
                Actually there is a gap between this and the first good item (that with lower resistance), so that the threshold could be
                moved up to this. That is equivalent to choosing a $ Betaleft( {f_{,k}+1 ,,w_{,k} - f_{,k}} right)$.



                So, if there is not a need for more sophistication, we can take the threshold to be at half of the gap, thus to take
                $$ bbox[lightyellow] {
                p(P'_{,k} ) = Betaleft( {f_{,k} + 1/2,,w_{,k} - f_{,k} + 1/2} right)
                }$$

                which gives a mean and variance of
                $$ bbox[lightyellow] {
                Eleft( {P'_{,k} } right) = {{f_{,k} + 1/2} over {w_{,k} + 1}}quad {rm var}left( {P'_{,k} } right)
                = {{left( {f_{,k} + 1/2} right)left( {w_{,k} - f_{,k} + 1/2} right)} over {left( {w_{,k} + 1} right)^{,2} left( {w_{,k} + 2} right)}}
                }$$



                It is this mean the value to assign to $P'_k$, associated with an "error" following the Beta distribution around that.



                After that you can perform a regression on the plot $V_k, P_k$ obtained, or a distribution fitting,
                to estimate the underlying population CDF.






                share|cite|improve this answer


























                  0












                  0








                  0






                  A sketch will be of much help to resume the terms of the problem.



                  Wire_Insulation_1



                  We have a production of wires in which the insulation
                  resistance is spread over a range of voltages with a certain PDF and relevant CDF.



                  We set a voltage $V_k$ in the range, and we take a relatively small sample of wires,
                  of size $w_k$ (variable for each test) and record the number of wires that fails $f_k$.



                  The $w_k$ wires will have a distribution of breaking voltages which ideally follows
                  the population CDF, that is, when dividing the vertical range of probability into
                  $w_k$ equal intervals, we would expect to find one wire into each (placed at its center).

                  That means to say that the elements projected on the vertical scale will follow there a uniform
                  probability density
                  on the $[0,1]$ interval.



                  Then we are going to assign to $V_k$ a value $P'_k$ for the CDF, corresponding to the interval limit
                  between failed/not-failed as indicated in the sketch ($0.4$ in the example shown).



                  Now, with respect to the underlying population distribution, corresponding to a huge sample,
                  a small sample will introduce two kind of error:

                  - a "discretization" error, because of the gap interval between failed / survived;

                  - a "sampling" error, because the sample will deviate from an exact uniform distribution.



                  We can inglobate the two by asking ourselves:

                  given $w_k$ elements from a uniform distribution on $[0,1]$, with $f_k$ that failed the test, which is the probability that
                  one of the failed elements be at the limit of the threshold $0 le P'_k le 1$, the remaining $f_k-1$ be below that, and $w_k-f_k$ above.



                  That is clearly expressible as
                  $$ bbox[lightyellow] {
                  p(P'_{,k} ),dP'_{,k} = w_{,k} ,dP'_{,k} left( matrix{ w_{,k} - 1 cr
                  w_{,k} - 1 cr} right) {P'_{,k}} ^{f_{,k} - 1} left( {1 - P'_{,k} } right)^{w_{,k} - f_{,k} }
                  }$$



                  It is easy to check, through the expression of the Beta Function
                  that the integral of the above correctly gives $1$.

                  In fact $p(P'_{,k} )$ is a Beta Distribution PDF
                  $$ bbox[lightyellow] {
                  p(P'_{,k} ) = Betaleft( {f_{,k} ,,w_{,k} - f_{,k} + 1} right)
                  }$$

                  because
                  $$
                  wleft( matrix{ w - 1 cr f - 1 cr} right)
                  = w{{Gamma left( w right)} over {Gamma left( f right)Gamma left( {w - f + 1} right)}}
                  = {{Gamma left( {w + 1} right)} over {Gamma left( f right)Gamma left( {w + 1 - f} right)}} = {1 over {{rm B}left( {f,w - f + 1} right)}}
                  $$



                  Note that in the cited reference it is affirmed that
                  The beta distribution is a suitable model for the random behavior of percentages and proportions.



                  In the construction above, we have set the threshold $P'_k$ to coincide with the failed element of highest resistance.
                  Actually there is a gap between this and the first good item (that with lower resistance), so that the threshold could be
                  moved up to this. That is equivalent to choosing a $ Betaleft( {f_{,k}+1 ,,w_{,k} - f_{,k}} right)$.



                  So, if there is not a need for more sophistication, we can take the threshold to be at half of the gap, thus to take
                  $$ bbox[lightyellow] {
                  p(P'_{,k} ) = Betaleft( {f_{,k} + 1/2,,w_{,k} - f_{,k} + 1/2} right)
                  }$$

                  which gives a mean and variance of
                  $$ bbox[lightyellow] {
                  Eleft( {P'_{,k} } right) = {{f_{,k} + 1/2} over {w_{,k} + 1}}quad {rm var}left( {P'_{,k} } right)
                  = {{left( {f_{,k} + 1/2} right)left( {w_{,k} - f_{,k} + 1/2} right)} over {left( {w_{,k} + 1} right)^{,2} left( {w_{,k} + 2} right)}}
                  }$$



                  It is this mean the value to assign to $P'_k$, associated with an "error" following the Beta distribution around that.



                  After that you can perform a regression on the plot $V_k, P_k$ obtained, or a distribution fitting,
                  to estimate the underlying population CDF.






                  share|cite|improve this answer














                  A sketch will be of much help to resume the terms of the problem.



                  Wire_Insulation_1



                  We have a production of wires in which the insulation
                  resistance is spread over a range of voltages with a certain PDF and relevant CDF.



                  We set a voltage $V_k$ in the range, and we take a relatively small sample of wires,
                  of size $w_k$ (variable for each test) and record the number of wires that fails $f_k$.



                  The $w_k$ wires will have a distribution of breaking voltages which ideally follows
                  the population CDF, that is, when dividing the vertical range of probability into
                  $w_k$ equal intervals, we would expect to find one wire into each (placed at its center).

                  That means to say that the elements projected on the vertical scale will follow there a uniform
                  probability density
                  on the $[0,1]$ interval.



                  Then we are going to assign to $V_k$ a value $P'_k$ for the CDF, corresponding to the interval limit
                  between failed/not-failed as indicated in the sketch ($0.4$ in the example shown).



                  Now, with respect to the underlying population distribution, corresponding to a huge sample,
                  a small sample will introduce two kind of error:

                  - a "discretization" error, because of the gap interval between failed / survived;

                  - a "sampling" error, because the sample will deviate from an exact uniform distribution.



                  We can inglobate the two by asking ourselves:

                  given $w_k$ elements from a uniform distribution on $[0,1]$, with $f_k$ that failed the test, which is the probability that
                  one of the failed elements be at the limit of the threshold $0 le P'_k le 1$, the remaining $f_k-1$ be below that, and $w_k-f_k$ above.



                  That is clearly expressible as
                  $$ bbox[lightyellow] {
                  p(P'_{,k} ),dP'_{,k} = w_{,k} ,dP'_{,k} left( matrix{ w_{,k} - 1 cr
                  w_{,k} - 1 cr} right) {P'_{,k}} ^{f_{,k} - 1} left( {1 - P'_{,k} } right)^{w_{,k} - f_{,k} }
                  }$$



                  It is easy to check, through the expression of the Beta Function
                  that the integral of the above correctly gives $1$.

                  In fact $p(P'_{,k} )$ is a Beta Distribution PDF
                  $$ bbox[lightyellow] {
                  p(P'_{,k} ) = Betaleft( {f_{,k} ,,w_{,k} - f_{,k} + 1} right)
                  }$$

                  because
                  $$
                  wleft( matrix{ w - 1 cr f - 1 cr} right)
                  = w{{Gamma left( w right)} over {Gamma left( f right)Gamma left( {w - f + 1} right)}}
                  = {{Gamma left( {w + 1} right)} over {Gamma left( f right)Gamma left( {w + 1 - f} right)}} = {1 over {{rm B}left( {f,w - f + 1} right)}}
                  $$



                  Note that in the cited reference it is affirmed that
                  The beta distribution is a suitable model for the random behavior of percentages and proportions.



                  In the construction above, we have set the threshold $P'_k$ to coincide with the failed element of highest resistance.
                  Actually there is a gap between this and the first good item (that with lower resistance), so that the threshold could be
                  moved up to this. That is equivalent to choosing a $ Betaleft( {f_{,k}+1 ,,w_{,k} - f_{,k}} right)$.



                  So, if there is not a need for more sophistication, we can take the threshold to be at half of the gap, thus to take
                  $$ bbox[lightyellow] {
                  p(P'_{,k} ) = Betaleft( {f_{,k} + 1/2,,w_{,k} - f_{,k} + 1/2} right)
                  }$$

                  which gives a mean and variance of
                  $$ bbox[lightyellow] {
                  Eleft( {P'_{,k} } right) = {{f_{,k} + 1/2} over {w_{,k} + 1}}quad {rm var}left( {P'_{,k} } right)
                  = {{left( {f_{,k} + 1/2} right)left( {w_{,k} - f_{,k} + 1/2} right)} over {left( {w_{,k} + 1} right)^{,2} left( {w_{,k} + 2} right)}}
                  }$$



                  It is this mean the value to assign to $P'_k$, associated with an "error" following the Beta distribution around that.



                  After that you can perform a regression on the plot $V_k, P_k$ obtained, or a distribution fitting,
                  to estimate the underlying population CDF.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Dec 7 at 17:42

























                  answered Dec 6 at 22:22









                  G Cab

                  17.8k31237




                  17.8k31237






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Mathematics Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3013626%2fconfusing-sampling-from-observed-data%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Berounka

                      Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

                      Sphinx de Gizeh