Numpy transformation to normal distribution











up vote
1
down vote

favorite












I have an array of data. I checked if it was normally distributed:



import sys
import scipy
from scipy import stats
from scipy.stats import mstats
from scipy.stats import normaltest

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))
print scipy.stats.normaltest(Data)


The output was: (36.444648754208075, 1.2193968690198398e-08)



Then, I wrote a small script to normalise the data:



import sys
import numpy as np
fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)
TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


And then I checked for normality again using the first script and the output was
(36.444648754209595, 1.2193968690189117e-08).



...the same as the previous score, and not normally distributed.



is one of my scripts wrong?



Also, should I mention that the average of my data is 0.056, the numbers range from 0.014 to 0.171 (85 observations), I'm not sure if the fact that the numbers are so small matters.



A sample of the untransformed and transformed data:



Untransformed:



0.055
0.074
0.049
0.067
0.038
0.037
0.045
0.041


Transformed data:



-2.13696814254
-2.11796814254
-2.14296814254
-2.12496814254
-2.15396814254
-2.15496814254
-2.14696814254


Edit 1:



When I edit the code slightly to account for parenthesis being in the wrong place:



TransformedMean = (UntransformedArray - np.mean(UntransformedArray))
TransformedArray = (TransformedMean/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


The output I get it different:



Example:



-0.0385683544143
0.705333390576
-0.273484694937
0.431264326632
-0.704164652563
-0.743317375984


However, when I check for normality:
(36.444648754241328, 1.2193968689995659e-08)



It is still not normally distributed (and is still the exact same score as the other times)?



Edit 2:



I then tried a different method of normalising the data:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox

Data = [(float(line.strip())) for line in open(sys.argv[1])]
scipy.stats.boxcox(Data)


I get the error: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'float'



EDIT 3: Due to comment from user, the problem was understanding the difference in normalising values, versus normalising a distribution.



Edited code:



import sys
import numpy as np

fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)

List1 = np.log(UntransformedArray)
for i in List1:
print i


Checking for normalisation:
(4.0435072214905938, 0.13242304287973003)



(works in this case, depending on skewness of the data).



Edit 4: Or using a BoxCox transformation:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox
import numpy as np

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))

data = scipy.stats.boxcox(np.array(Data))
for i in data[0]:
print i


Check for normalisation: (2.9085877478631956, 0.23356523218452238)










share|improve this question




















  • 1




    Don't you have a parenthesis problem in the TransformedArray calc? ( UntransformedArray - np.mean(UntransformedArray) ) /np.std(UntransformedArray)
    – joao
    Nov 30 '15 at 13:30












  • This is what I have:TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray)) and it seems to run without complaining? Don't get any error about parenthesis?
    – Tom
    Nov 30 '15 at 13:42








  • 1




    Arithmetic division (/) has not the same priority has the minus (-) operation. Thus, you are dividing the mean/std, and then only after the subtraction is applied. I believe your parenthesis are misplaced there.
    – joao
    Nov 30 '15 at 13:51












  • Thanks. I've changed the script slightly (see edit). Is it possibly something wrong with the checking for normality script? The reason I ask is that now I've given the checking for normality script two different lists, (for example, my original transformed output, where all the numbers start with -2.XXX, and in my edit, where the numbers are e.g. 0.43, -0.7 etc), and I still get the exact same output from checking for normality script?
    – Tom
    Nov 30 '15 at 14:21










  • Re. boxcox: Try scipy.stats.boxcox(np.array(Data)) (and add import numpy as np at the top of your script if you don't already have it). By the way, scipy.stats.boxcox(Data) works in newer versions of scipy. What version are you using? Run import scipy; print(scipy.__version__) to find out.
    – Warren Weckesser
    Nov 30 '15 at 15:16















up vote
1
down vote

favorite












I have an array of data. I checked if it was normally distributed:



import sys
import scipy
from scipy import stats
from scipy.stats import mstats
from scipy.stats import normaltest

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))
print scipy.stats.normaltest(Data)


The output was: (36.444648754208075, 1.2193968690198398e-08)



Then, I wrote a small script to normalise the data:



import sys
import numpy as np
fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)
TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


And then I checked for normality again using the first script and the output was
(36.444648754209595, 1.2193968690189117e-08).



...the same as the previous score, and not normally distributed.



is one of my scripts wrong?



Also, should I mention that the average of my data is 0.056, the numbers range from 0.014 to 0.171 (85 observations), I'm not sure if the fact that the numbers are so small matters.



A sample of the untransformed and transformed data:



Untransformed:



0.055
0.074
0.049
0.067
0.038
0.037
0.045
0.041


Transformed data:



-2.13696814254
-2.11796814254
-2.14296814254
-2.12496814254
-2.15396814254
-2.15496814254
-2.14696814254


Edit 1:



When I edit the code slightly to account for parenthesis being in the wrong place:



TransformedMean = (UntransformedArray - np.mean(UntransformedArray))
TransformedArray = (TransformedMean/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


The output I get it different:



Example:



-0.0385683544143
0.705333390576
-0.273484694937
0.431264326632
-0.704164652563
-0.743317375984


However, when I check for normality:
(36.444648754241328, 1.2193968689995659e-08)



It is still not normally distributed (and is still the exact same score as the other times)?



Edit 2:



I then tried a different method of normalising the data:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox

Data = [(float(line.strip())) for line in open(sys.argv[1])]
scipy.stats.boxcox(Data)


I get the error: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'float'



EDIT 3: Due to comment from user, the problem was understanding the difference in normalising values, versus normalising a distribution.



Edited code:



import sys
import numpy as np

fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)

List1 = np.log(UntransformedArray)
for i in List1:
print i


Checking for normalisation:
(4.0435072214905938, 0.13242304287973003)



(works in this case, depending on skewness of the data).



Edit 4: Or using a BoxCox transformation:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox
import numpy as np

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))

data = scipy.stats.boxcox(np.array(Data))
for i in data[0]:
print i


Check for normalisation: (2.9085877478631956, 0.23356523218452238)










share|improve this question




















  • 1




    Don't you have a parenthesis problem in the TransformedArray calc? ( UntransformedArray - np.mean(UntransformedArray) ) /np.std(UntransformedArray)
    – joao
    Nov 30 '15 at 13:30












  • This is what I have:TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray)) and it seems to run without complaining? Don't get any error about parenthesis?
    – Tom
    Nov 30 '15 at 13:42








  • 1




    Arithmetic division (/) has not the same priority has the minus (-) operation. Thus, you are dividing the mean/std, and then only after the subtraction is applied. I believe your parenthesis are misplaced there.
    – joao
    Nov 30 '15 at 13:51












  • Thanks. I've changed the script slightly (see edit). Is it possibly something wrong with the checking for normality script? The reason I ask is that now I've given the checking for normality script two different lists, (for example, my original transformed output, where all the numbers start with -2.XXX, and in my edit, where the numbers are e.g. 0.43, -0.7 etc), and I still get the exact same output from checking for normality script?
    – Tom
    Nov 30 '15 at 14:21










  • Re. boxcox: Try scipy.stats.boxcox(np.array(Data)) (and add import numpy as np at the top of your script if you don't already have it). By the way, scipy.stats.boxcox(Data) works in newer versions of scipy. What version are you using? Run import scipy; print(scipy.__version__) to find out.
    – Warren Weckesser
    Nov 30 '15 at 15:16













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have an array of data. I checked if it was normally distributed:



import sys
import scipy
from scipy import stats
from scipy.stats import mstats
from scipy.stats import normaltest

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))
print scipy.stats.normaltest(Data)


The output was: (36.444648754208075, 1.2193968690198398e-08)



Then, I wrote a small script to normalise the data:



import sys
import numpy as np
fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)
TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


And then I checked for normality again using the first script and the output was
(36.444648754209595, 1.2193968690189117e-08).



...the same as the previous score, and not normally distributed.



is one of my scripts wrong?



Also, should I mention that the average of my data is 0.056, the numbers range from 0.014 to 0.171 (85 observations), I'm not sure if the fact that the numbers are so small matters.



A sample of the untransformed and transformed data:



Untransformed:



0.055
0.074
0.049
0.067
0.038
0.037
0.045
0.041


Transformed data:



-2.13696814254
-2.11796814254
-2.14296814254
-2.12496814254
-2.15396814254
-2.15496814254
-2.14696814254


Edit 1:



When I edit the code slightly to account for parenthesis being in the wrong place:



TransformedMean = (UntransformedArray - np.mean(UntransformedArray))
TransformedArray = (TransformedMean/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


The output I get it different:



Example:



-0.0385683544143
0.705333390576
-0.273484694937
0.431264326632
-0.704164652563
-0.743317375984


However, when I check for normality:
(36.444648754241328, 1.2193968689995659e-08)



It is still not normally distributed (and is still the exact same score as the other times)?



Edit 2:



I then tried a different method of normalising the data:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox

Data = [(float(line.strip())) for line in open(sys.argv[1])]
scipy.stats.boxcox(Data)


I get the error: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'float'



EDIT 3: Due to comment from user, the problem was understanding the difference in normalising values, versus normalising a distribution.



Edited code:



import sys
import numpy as np

fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)

List1 = np.log(UntransformedArray)
for i in List1:
print i


Checking for normalisation:
(4.0435072214905938, 0.13242304287973003)



(works in this case, depending on skewness of the data).



Edit 4: Or using a BoxCox transformation:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox
import numpy as np

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))

data = scipy.stats.boxcox(np.array(Data))
for i in data[0]:
print i


Check for normalisation: (2.9085877478631956, 0.23356523218452238)










share|improve this question















I have an array of data. I checked if it was normally distributed:



import sys
import scipy
from scipy import stats
from scipy.stats import mstats
from scipy.stats import normaltest

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))
print scipy.stats.normaltest(Data)


The output was: (36.444648754208075, 1.2193968690198398e-08)



Then, I wrote a small script to normalise the data:



import sys
import numpy as np
fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)
TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


And then I checked for normality again using the first script and the output was
(36.444648754209595, 1.2193968690189117e-08).



...the same as the previous score, and not normally distributed.



is one of my scripts wrong?



Also, should I mention that the average of my data is 0.056, the numbers range from 0.014 to 0.171 (85 observations), I'm not sure if the fact that the numbers are so small matters.



A sample of the untransformed and transformed data:



Untransformed:



0.055
0.074
0.049
0.067
0.038
0.037
0.045
0.041


Transformed data:



-2.13696814254
-2.11796814254
-2.14296814254
-2.12496814254
-2.15396814254
-2.15496814254
-2.14696814254


Edit 1:



When I edit the code slightly to account for parenthesis being in the wrong place:



TransformedMean = (UntransformedArray - np.mean(UntransformedArray))
TransformedArray = (TransformedMean/np.std(UntransformedArray))
NewList = TransformedArray.tolist()
for i in NewList:
print i


The output I get it different:



Example:



-0.0385683544143
0.705333390576
-0.273484694937
0.431264326632
-0.704164652563
-0.743317375984


However, when I check for normality:
(36.444648754241328, 1.2193968689995659e-08)



It is still not normally distributed (and is still the exact same score as the other times)?



Edit 2:



I then tried a different method of normalising the data:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox

Data = [(float(line.strip())) for line in open(sys.argv[1])]
scipy.stats.boxcox(Data)


I get the error: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'float'



EDIT 3: Due to comment from user, the problem was understanding the difference in normalising values, versus normalising a distribution.



Edited code:



import sys
import numpy as np

fileopen = open(sys.argv[1])
UntransformedArray =
for line in fileopen:
line = float(line.strip())
UntransformedArray.append(line)

List1 = np.log(UntransformedArray)
for i in List1:
print i


Checking for normalisation:
(4.0435072214905938, 0.13242304287973003)



(works in this case, depending on skewness of the data).



Edit 4: Or using a BoxCox transformation:



import sys
import scipy
from scipy import stats
from scipy.stats import boxcox
import numpy as np

Data =
for line in open(sys.argv[1]):
line = line.strip()
Data.append(float(line))

data = scipy.stats.boxcox(np.array(Data))
for i in data[0]:
print i


Check for normalisation: (2.9085877478631956, 0.23356523218452238)







python numpy normalization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 30 '15 at 15:25

























asked Nov 30 '15 at 13:23









Tom

8812




8812








  • 1




    Don't you have a parenthesis problem in the TransformedArray calc? ( UntransformedArray - np.mean(UntransformedArray) ) /np.std(UntransformedArray)
    – joao
    Nov 30 '15 at 13:30












  • This is what I have:TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray)) and it seems to run without complaining? Don't get any error about parenthesis?
    – Tom
    Nov 30 '15 at 13:42








  • 1




    Arithmetic division (/) has not the same priority has the minus (-) operation. Thus, you are dividing the mean/std, and then only after the subtraction is applied. I believe your parenthesis are misplaced there.
    – joao
    Nov 30 '15 at 13:51












  • Thanks. I've changed the script slightly (see edit). Is it possibly something wrong with the checking for normality script? The reason I ask is that now I've given the checking for normality script two different lists, (for example, my original transformed output, where all the numbers start with -2.XXX, and in my edit, where the numbers are e.g. 0.43, -0.7 etc), and I still get the exact same output from checking for normality script?
    – Tom
    Nov 30 '15 at 14:21










  • Re. boxcox: Try scipy.stats.boxcox(np.array(Data)) (and add import numpy as np at the top of your script if you don't already have it). By the way, scipy.stats.boxcox(Data) works in newer versions of scipy. What version are you using? Run import scipy; print(scipy.__version__) to find out.
    – Warren Weckesser
    Nov 30 '15 at 15:16














  • 1




    Don't you have a parenthesis problem in the TransformedArray calc? ( UntransformedArray - np.mean(UntransformedArray) ) /np.std(UntransformedArray)
    – joao
    Nov 30 '15 at 13:30












  • This is what I have:TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray)) and it seems to run without complaining? Don't get any error about parenthesis?
    – Tom
    Nov 30 '15 at 13:42








  • 1




    Arithmetic division (/) has not the same priority has the minus (-) operation. Thus, you are dividing the mean/std, and then only after the subtraction is applied. I believe your parenthesis are misplaced there.
    – joao
    Nov 30 '15 at 13:51












  • Thanks. I've changed the script slightly (see edit). Is it possibly something wrong with the checking for normality script? The reason I ask is that now I've given the checking for normality script two different lists, (for example, my original transformed output, where all the numbers start with -2.XXX, and in my edit, where the numbers are e.g. 0.43, -0.7 etc), and I still get the exact same output from checking for normality script?
    – Tom
    Nov 30 '15 at 14:21










  • Re. boxcox: Try scipy.stats.boxcox(np.array(Data)) (and add import numpy as np at the top of your script if you don't already have it). By the way, scipy.stats.boxcox(Data) works in newer versions of scipy. What version are you using? Run import scipy; print(scipy.__version__) to find out.
    – Warren Weckesser
    Nov 30 '15 at 15:16








1




1




Don't you have a parenthesis problem in the TransformedArray calc? ( UntransformedArray - np.mean(UntransformedArray) ) /np.std(UntransformedArray)
– joao
Nov 30 '15 at 13:30






Don't you have a parenthesis problem in the TransformedArray calc? ( UntransformedArray - np.mean(UntransformedArray) ) /np.std(UntransformedArray)
– joao
Nov 30 '15 at 13:30














This is what I have:TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray)) and it seems to run without complaining? Don't get any error about parenthesis?
– Tom
Nov 30 '15 at 13:42






This is what I have:TransformedArray = (UntransformedArray - np.mean(UntransformedArray)/np.std(UntransformedArray)) and it seems to run without complaining? Don't get any error about parenthesis?
– Tom
Nov 30 '15 at 13:42






1




1




Arithmetic division (/) has not the same priority has the minus (-) operation. Thus, you are dividing the mean/std, and then only after the subtraction is applied. I believe your parenthesis are misplaced there.
– joao
Nov 30 '15 at 13:51






Arithmetic division (/) has not the same priority has the minus (-) operation. Thus, you are dividing the mean/std, and then only after the subtraction is applied. I believe your parenthesis are misplaced there.
– joao
Nov 30 '15 at 13:51














Thanks. I've changed the script slightly (see edit). Is it possibly something wrong with the checking for normality script? The reason I ask is that now I've given the checking for normality script two different lists, (for example, my original transformed output, where all the numbers start with -2.XXX, and in my edit, where the numbers are e.g. 0.43, -0.7 etc), and I still get the exact same output from checking for normality script?
– Tom
Nov 30 '15 at 14:21




Thanks. I've changed the script slightly (see edit). Is it possibly something wrong with the checking for normality script? The reason I ask is that now I've given the checking for normality script two different lists, (for example, my original transformed output, where all the numbers start with -2.XXX, and in my edit, where the numbers are e.g. 0.43, -0.7 etc), and I still get the exact same output from checking for normality script?
– Tom
Nov 30 '15 at 14:21












Re. boxcox: Try scipy.stats.boxcox(np.array(Data)) (and add import numpy as np at the top of your script if you don't already have it). By the way, scipy.stats.boxcox(Data) works in newer versions of scipy. What version are you using? Run import scipy; print(scipy.__version__) to find out.
– Warren Weckesser
Nov 30 '15 at 15:16




Re. boxcox: Try scipy.stats.boxcox(np.array(Data)) (and add import numpy as np at the top of your script if you don't already have it). By the way, scipy.stats.boxcox(Data) works in newer versions of scipy. What version are you using? Run import scipy; print(scipy.__version__) to find out.
– Warren Weckesser
Nov 30 '15 at 15:16












3 Answers
3






active

oldest

votes

















up vote
2
down vote













As expected, subtracting the mean and rescaling to unit variance does not change the shape of the distribution. normaltest correctly returns the same output in both cases, telling you that your data is not normally distributed.






share|improve this answer




























    up vote
    1
    down vote













    I agree with Thomas. But to be more precise: You are standardizing the distribution of your array! This does not change the shape of the distribution! You might want to use the numpy.histogram() function to get an impression of the distributions!



    I think you have fallen prey to the confusing double usage of 'normalization'. On the one hand, normalization is used to describe standardization of variables (getting variables on the same scale - this is what you did). On the other hand, normalization is used to describe attempts of changing the shape of a probability distribution (the scipy.stats.normaltest() is used to check the shape of such distributions). One easy strategy to try to get a distribution more normally is to use a log transformation. numpy.log() might do the trick here, but only if the original distribution is not too skewed.






    share|improve this answer





















    • This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
      – Tom
      Nov 30 '15 at 15:16










    • glad it helped!
      – Dominix
      Nov 30 '15 at 20:52


















    up vote
    0
    down vote













    I came across the same problem. My data was not normal like yours and I had to transform my data to a normal distribution. For transforming your data to normal you should use normal score transform by different methods like as it is described here. You can also use these formulas. I have written a python code for changing your list of elements to normal distribution as follows:



    X = [0.055, 0.074, 0.049, 0.067, 0.038, 0.037, 0.045, 0.041]

    from scipy.stats import rankdata, norm

    newX = norm.ppf(rankdata(x)/(len(x) + 1))
    print(newX)

    output:
    [ 0.4307273 1.22064035 0.1397103 0.76470967 -0.76470967 -1.22064035
    -0.1397103 -0.4307273 ]


    You can see that your new data is completely normal after this transformation as you can see by Q-Q plot:



    from scipy import stats
    import matplotlib.pyplot as plt

    ax4 = plt.subplot(111)
    res = stats.probplot(newX, plot=plt)
    plt.show()





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33999669%2fnumpy-transformation-to-normal-distribution%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote













      As expected, subtracting the mean and rescaling to unit variance does not change the shape of the distribution. normaltest correctly returns the same output in both cases, telling you that your data is not normally distributed.






      share|improve this answer

























        up vote
        2
        down vote













        As expected, subtracting the mean and rescaling to unit variance does not change the shape of the distribution. normaltest correctly returns the same output in both cases, telling you that your data is not normally distributed.






        share|improve this answer























          up vote
          2
          down vote










          up vote
          2
          down vote









          As expected, subtracting the mean and rescaling to unit variance does not change the shape of the distribution. normaltest correctly returns the same output in both cases, telling you that your data is not normally distributed.






          share|improve this answer












          As expected, subtracting the mean and rescaling to unit variance does not change the shape of the distribution. normaltest correctly returns the same output in both cases, telling you that your data is not normally distributed.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 30 '15 at 14:19









          thomas

          1,200513




          1,200513
























              up vote
              1
              down vote













              I agree with Thomas. But to be more precise: You are standardizing the distribution of your array! This does not change the shape of the distribution! You might want to use the numpy.histogram() function to get an impression of the distributions!



              I think you have fallen prey to the confusing double usage of 'normalization'. On the one hand, normalization is used to describe standardization of variables (getting variables on the same scale - this is what you did). On the other hand, normalization is used to describe attempts of changing the shape of a probability distribution (the scipy.stats.normaltest() is used to check the shape of such distributions). One easy strategy to try to get a distribution more normally is to use a log transformation. numpy.log() might do the trick here, but only if the original distribution is not too skewed.






              share|improve this answer





















              • This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
                – Tom
                Nov 30 '15 at 15:16










              • glad it helped!
                – Dominix
                Nov 30 '15 at 20:52















              up vote
              1
              down vote













              I agree with Thomas. But to be more precise: You are standardizing the distribution of your array! This does not change the shape of the distribution! You might want to use the numpy.histogram() function to get an impression of the distributions!



              I think you have fallen prey to the confusing double usage of 'normalization'. On the one hand, normalization is used to describe standardization of variables (getting variables on the same scale - this is what you did). On the other hand, normalization is used to describe attempts of changing the shape of a probability distribution (the scipy.stats.normaltest() is used to check the shape of such distributions). One easy strategy to try to get a distribution more normally is to use a log transformation. numpy.log() might do the trick here, but only if the original distribution is not too skewed.






              share|improve this answer





















              • This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
                – Tom
                Nov 30 '15 at 15:16










              • glad it helped!
                – Dominix
                Nov 30 '15 at 20:52













              up vote
              1
              down vote










              up vote
              1
              down vote









              I agree with Thomas. But to be more precise: You are standardizing the distribution of your array! This does not change the shape of the distribution! You might want to use the numpy.histogram() function to get an impression of the distributions!



              I think you have fallen prey to the confusing double usage of 'normalization'. On the one hand, normalization is used to describe standardization of variables (getting variables on the same scale - this is what you did). On the other hand, normalization is used to describe attempts of changing the shape of a probability distribution (the scipy.stats.normaltest() is used to check the shape of such distributions). One easy strategy to try to get a distribution more normally is to use a log transformation. numpy.log() might do the trick here, but only if the original distribution is not too skewed.






              share|improve this answer












              I agree with Thomas. But to be more precise: You are standardizing the distribution of your array! This does not change the shape of the distribution! You might want to use the numpy.histogram() function to get an impression of the distributions!



              I think you have fallen prey to the confusing double usage of 'normalization'. On the one hand, normalization is used to describe standardization of variables (getting variables on the same scale - this is what you did). On the other hand, normalization is used to describe attempts of changing the shape of a probability distribution (the scipy.stats.normaltest() is used to check the shape of such distributions). One easy strategy to try to get a distribution more normally is to use a log transformation. numpy.log() might do the trick here, but only if the original distribution is not too skewed.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 30 '15 at 15:10









              Dominix

              814




              814












              • This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
                – Tom
                Nov 30 '15 at 15:16










              • glad it helped!
                – Dominix
                Nov 30 '15 at 20:52


















              • This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
                – Tom
                Nov 30 '15 at 15:16










              • glad it helped!
                – Dominix
                Nov 30 '15 at 20:52
















              This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
              – Tom
              Nov 30 '15 at 15:16




              This was really useful thank you, particularly in the clarification of the understanding. I have made an edit with the updated code that I used.
              – Tom
              Nov 30 '15 at 15:16












              glad it helped!
              – Dominix
              Nov 30 '15 at 20:52




              glad it helped!
              – Dominix
              Nov 30 '15 at 20:52










              up vote
              0
              down vote













              I came across the same problem. My data was not normal like yours and I had to transform my data to a normal distribution. For transforming your data to normal you should use normal score transform by different methods like as it is described here. You can also use these formulas. I have written a python code for changing your list of elements to normal distribution as follows:



              X = [0.055, 0.074, 0.049, 0.067, 0.038, 0.037, 0.045, 0.041]

              from scipy.stats import rankdata, norm

              newX = norm.ppf(rankdata(x)/(len(x) + 1))
              print(newX)

              output:
              [ 0.4307273 1.22064035 0.1397103 0.76470967 -0.76470967 -1.22064035
              -0.1397103 -0.4307273 ]


              You can see that your new data is completely normal after this transformation as you can see by Q-Q plot:



              from scipy import stats
              import matplotlib.pyplot as plt

              ax4 = plt.subplot(111)
              res = stats.probplot(newX, plot=plt)
              plt.show()





              share|improve this answer



























                up vote
                0
                down vote













                I came across the same problem. My data was not normal like yours and I had to transform my data to a normal distribution. For transforming your data to normal you should use normal score transform by different methods like as it is described here. You can also use these formulas. I have written a python code for changing your list of elements to normal distribution as follows:



                X = [0.055, 0.074, 0.049, 0.067, 0.038, 0.037, 0.045, 0.041]

                from scipy.stats import rankdata, norm

                newX = norm.ppf(rankdata(x)/(len(x) + 1))
                print(newX)

                output:
                [ 0.4307273 1.22064035 0.1397103 0.76470967 -0.76470967 -1.22064035
                -0.1397103 -0.4307273 ]


                You can see that your new data is completely normal after this transformation as you can see by Q-Q plot:



                from scipy import stats
                import matplotlib.pyplot as plt

                ax4 = plt.subplot(111)
                res = stats.probplot(newX, plot=plt)
                plt.show()





                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  I came across the same problem. My data was not normal like yours and I had to transform my data to a normal distribution. For transforming your data to normal you should use normal score transform by different methods like as it is described here. You can also use these formulas. I have written a python code for changing your list of elements to normal distribution as follows:



                  X = [0.055, 0.074, 0.049, 0.067, 0.038, 0.037, 0.045, 0.041]

                  from scipy.stats import rankdata, norm

                  newX = norm.ppf(rankdata(x)/(len(x) + 1))
                  print(newX)

                  output:
                  [ 0.4307273 1.22064035 0.1397103 0.76470967 -0.76470967 -1.22064035
                  -0.1397103 -0.4307273 ]


                  You can see that your new data is completely normal after this transformation as you can see by Q-Q plot:



                  from scipy import stats
                  import matplotlib.pyplot as plt

                  ax4 = plt.subplot(111)
                  res = stats.probplot(newX, plot=plt)
                  plt.show()





                  share|improve this answer














                  I came across the same problem. My data was not normal like yours and I had to transform my data to a normal distribution. For transforming your data to normal you should use normal score transform by different methods like as it is described here. You can also use these formulas. I have written a python code for changing your list of elements to normal distribution as follows:



                  X = [0.055, 0.074, 0.049, 0.067, 0.038, 0.037, 0.045, 0.041]

                  from scipy.stats import rankdata, norm

                  newX = norm.ppf(rankdata(x)/(len(x) + 1))
                  print(newX)

                  output:
                  [ 0.4307273 1.22064035 0.1397103 0.76470967 -0.76470967 -1.22064035
                  -0.1397103 -0.4307273 ]


                  You can see that your new data is completely normal after this transformation as you can see by Q-Q plot:



                  from scipy import stats
                  import matplotlib.pyplot as plt

                  ax4 = plt.subplot(111)
                  res = stats.probplot(newX, plot=plt)
                  plt.show()






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 21 at 21:53

























                  answered Nov 21 at 21:31









                  Sara

                  1086




                  1086






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33999669%2fnumpy-transformation-to-normal-distribution%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Berounka

                      Different font size/position of beamer's navigation symbols template's content depending on regular/plain...

                      Sphinx de Gizeh