Guest Post by Sweta Sharma, COO, InRev
For as long as I have been associated with the subject ‘Statistics’, I have invariably been asked to define the importance of ‘Normal Distribution’. As a follower of the club ‘Back Benchers’, I clearly remember being thrown out of the class on my obliviousness on the concept on which rested my degree. Of course, it was blasphemy and I would be given those “I can freeze you with my looks” for the next i classes (where i=1 to infinity). Non-Statisticians might laugh at this but we Statisticians consider Normal Distribution as pious as Gita is to Hindus.
Yes, that picture of the bell is called Normal Curve! It is undoubtedly the mother of all statistical fuss. Basically a normal distribution of data means that most of the examples in a set of data are close to the "average," while relatively few examples tend to one extreme or the other.
A normal curve is defined by two measures- Mean and Standard deviation. Btw, if you don’t know the meaning of these two terms; you are (statistically) doomed. If the curve is representing some distribution of a population, the location of the peak is the mean of the population and the variance of the population is depicted by how broad or narrow the curve is.
This clearly outlines the basic property of a normal curve. 68% of the population lies in 1 standard deviation range, 95% in 2 standard deviations and 99.7% in 3 standard deviations.
“What makes Normal Curve so important”? This question has been as perturbing to me as to most of the people who have raised it to me instead of using Google. There are various reasons to substantiate the importance of the very basis of most statistical tests.
Decoding this would translate into Normal distribution being the most user friendly, easily adaptable and universal. It actually forms the underlying assumption for most of the statistical tests so much so that most of the time we don’t even check for the normality assumption before applying the test and might end with faulty results.
What if your data doesn’t follow the normality assumption? You don’t really need to try fitting the data into a normal one and go for a non parametric test (a test which doesn’t have underlying distribution). Such tests are generally not preferred since they tend to be less powerful and don’t give much flexibility in terms of providing conclusions. Alternative to this can be
To be safe it’s always advisable to check for the normality assumption of the data before going ahead with any test. This also gives an insight into the dataset and you can be better prepared to handle it efficiently. Normal probability plot could be used to check for the normality assumption of the data.
No comments:
Post a Comment