This article is for beginners as well as intermediate level machine learning enthusiasts. Here, few most basic Statistics concepts are described in simple yet illustrative way.

1. Quantiles, Percentile and Quartiles

Quantiles are the "cut points" to divide a distribution into continous intervals with equal probabilities. For example a quantile to divide a distribution into equal two parts, which means 50% on each side. Thus, such quantile will be called Median. In similar way, based on number of equal partition a set of quantiles make they are collectively called

  1. Quantile which partition distribution into 2 equal groups is called Median.
  2. Quantiles for 3 equal partitions are called Tertiles or Terciles.
  3. Quartiles for 4 equal partitions where we call them 1st(25 percentile), 2nd(next 25th percentile), 3rd(another 25th percentile) and last/4th Quartile(last or top 25th percentile).
  4. Quintiles(5), Sextiles(6), Septiles(7), Octiles(9), Deciles(10) and so on.

Important Quantiles being:

  1. Percentile for 100-quantiles.
  2. Permiles or Milliles for 1000-quantiles.

Remember that, there are varieties of methods in practice or available in Python, R or Matlab's libraries to calculate quantile partition which differs in their implementation. If distribution or sample space is small then high quantile partitioning will be different from different implementation.

For example, for a distribution of 15 items or variables. 0th percentile will be the first item while 100th percentile will be the last. Similarly 3 Quartiles corresponding to 25th, 50th and 75th percentile will be on mid point as shown.

In this example, it will not be appropriate to call for percentile for each 100 partition as the distribution contains only 15 items. In such cases of small distribution size, result of high order quantiles will be different from different implementations. But when distribution size is high, all implementation from different libraries gives same or closely similar result.

Statistics Quantiles Percentiles and Quartiles

A very nice and crisp overview:

2. Standard Deviation, Standard Variance and Standard Error

Difference between Standard Deviation and Standard Error is one of the most confusing topic in Statistics. Here is the simple and most precise explaination:

Standard Deviation

Standard Deviation is the measure of how much the data is spread out from their mean/average to the both side of the number line.

Statistics: Standard Deviation

How the spread is calculated?

Using below formulae

Statistics: Standard Deviation

Standard Variance

The Standard variance is the average of the squared differences from the mean. Basically it is square of Standard Deviation.

Statistics: Standard Deviation

Standard Error

We just saw how Standard Deviation is calculated for a set of values.

When we have several set of values, we will have then several means for each set. If we plot means of each set in a number line and calculate standard deviation for the means of each set. The standard deviation so calculated is called Standard Error.

Statistics: Standard Deviation
Statistics: Standard Deviation

Here is one useful video description:

3. Normal Distrubution or Central Limit Theorem

Normal Distribution

Normal Distrubtion, Gaussian Distribution or Bell shapped curve are all the same name for statistical distrubtion which is (1.) Symmetrical from the mean/media and (2.) Relatively larger Standard Deviation which makes the distriubtion bell shaped instead of a spike shaped.

Statistics: Normal Distrubtion or Bell Curve
Statistics: Normal Distrubtion or Bell Curve

Normal Distrubtion is of special importance in Statistics because lots of day to day distrubtions follow this distrubtion pattern. For example Height vs Weight, Salary vs Employees, Traffic vs Time of the day etc.

Here is one useful video description:

4. p-Value

p-Value is the probability that random chance generated the data, or something else that is equal or rarer.

This simple definition is little tricky to understand at first reading. Let us break it 3 several parts.

(p-Value is the probability that random chance generated the data)1, (or something else that is equal)2 (or rarer)3.

First part tells is probability of an event happening, Second part tells add all those events probability whose chances of occurences are also same as the given event. Third part tells, add probabilities of all those events too whose probabilities are lower than the given event.

Let us take an example:

p-Value for 2 Heads while tossing 2 coins:

While probability for 2 Heads is 1/4. But in p-Value we add = (probability of 2 Heads)1 + (probability of 2 Tails)2 + (probabilities of all event whose probability ofr occurences are lower than 2 Heads)3

Another example illustrated below:

Statistics: p value

Here is one useful video description:


Thank you for reading it all along. Hope you liked this article!!


About the author

Prakash snippetnuggets

Prakash

You can contact him at [email protected]