Hi guys, we all know that statistics plays a key role in Analytics. But most of the people who are not from statistical background are scared of “Statistics”. So, here I am going to describe different statistical concepts along with very simple practical examples which will help you to get familiar with statistics as well as you will understand how is statistics used in Analytics to make a decision.
I will begin with the concept “Data” and subsequently I will cover up all the descriptive and inferential concepts of Statistics.
Being an Analyst you will analyze Data. So it is very important to have a clear idea about different types of data.
Firstly, we should know what is DATA???
Data is nothing but a Piece of Information. If I want to know the average sales of a firm for last 6 months, I need to collect some data, and by looking at those data I can spot the trend of average sales.
Classification: There are mainly two types of data
- Quantitative: When data is expressed in numerical terms, eg: Height, Weight. Quantitative data is also called “Variable”, as the value can change.
- Qualitative: When the data cannot be expressed in numerical terms, example: Religion, caste. Qualitative data is also called “Attribute”.
Variable and Attribute can be further classified as above.
Suppose, if I ask you” How many members are there in your family?” Your answer will be 2 or 3 or 5 or 10 etc. These are all isolated values. This type of data is known as “Discrete Data”. Now if I want know about your height, you might answer me in the following wayà5.8888 feet or 6.2222feet.So your height can take any value within a specific range i.e. between 5 & 6 or 6 & 7.This is an example of “Continuous Data” .Discrete and Continuous both are coming under the category of “Variable”.
There are some qualities or Attributes which cannot be compared be or cannot be rank ordered. Color of eyes, Color of hair, Religion cannot be ranked. As an individual you might have your own preference but logically you cannot say that which religion is the best. We call this kind of Quality as “Nominal”. When the quality can be ranked then that would be an “Ordinal Data”. If you are asked to give a feedback of this post by giving stars where 1star means “Very Bad”, 2 stars means “Bad”, 3 stars means “Good”, 4 means “Very Good” and 5 means “Excellent”. So here you are ranking the quality of this post. This is what we call Ordinal Data.
Now, if you have worked on R, you might know that R considers some data as “Factor”. Factor is nothing but a “Categorical Data”.
What is Categorical Data?
Categorical variable represents the types of data which may be divided into groups. Those groups are finite numbers. Gender is an example of Categorical Variable. If we divide Gender into groups we will get mainly 3 categories: “Male”, “Female” and “Others”. If R is considering any variable as factor, if you check the structure of that variable, you can see different levels of it. If you want to convert any variable into factor you can use as. factor () syntax.
Hope you find the post useful. I will be back very soon with another interesting topic of “Statistics”. For any kind of clarification, suggestion, feel free to write down on comment box. Till then “Keep on Learning, Keep on Practicing”.