Looking For Anything Specific?

Introduction (Some more classfications of Data only) : Part - 3

Statistical Knowledge | Statistics Series | Article - 3

Hello friends, This is my third article on my blog. In this article, we'll continue our discussions on series of introductory articles and this is Part - 3 of introductory articles. Now, it's time to dive into the article 😊

Aim of Article

In this article, we'll discuss four more classifications of Data only (not Variable).

Primary and Secondary Data

Primary Data : Data that are collected by the Researcher or Investigator himself/herself for his/her research work or any study work is known as Primary Data.
  • For Example : Data collected by a student (say, Ram) himself in physics laboratory for his practical work is an example of Primary Data.

Secondary Data : Data that are collected by someone else and the Researcher or Investigator only uses this data for his/her research work or any study work is known as Secondary Data.
  • For Example : If the data collected by Ram is also used by Gita (say) to complete her practical work, then the same data becomes an example of Secondary data. 

Note : If i ask Ram and Gita, what type of data you both have used to complete your practical work ? Ram says, "Primary Data", on the other hand, Gita says, "Secondary Data", if both are honest 😂

I have given a very basic example to understand the concept of primary and secondary data.  At present, It is enough for you because in this article our aim is to understand the classifications of data only. There will be a seperate article on Primary and Secondary Data in which we'll discuss it in more details.

Ungrouped and Grouped Data

Ungrouped Data : If data, collected on a variable, has been given as individual observations (data points), then such data is known as Ungrouped Data.
  • For Example : Consider the data on age (in years) of 20 persons as : 18, 20, 12, 15, 29, 34, 19, 36, 45, 46, 28, 25, 17, 14, 26, 39, 49, 59, 65, 70. It is an example of ungrouped data since data on age variable is available for each and every individuals.

Grouped Data : If data, collected on a variable, has been given as non-overlapping groups or intervals (more appropriately termed as Class Intervals) formed by aggregating individual observations (data points), then such data is known as Grouped Data.
  • For Example : Consider the data on age (in years) of 20 persons is collected in the following manner -
    • If age of an individual (person) lies in the set {11, 12, 13,...., 20}, record data as 11-20
    • If age of an individual (person) lies in the set {21, 22, 23,...., 30}, record data as 21-30
    • If age of an individual (person) lies in the set {31, 32, 33,...., 40}, record data as 31-40  and so on...
Then, the collected data on age will be 11-2011-20, 11-20, 11-20, 21-30, 31-40, 11-20, 31-40, 41-5041-50, 21-30, 21-30, 11-20, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 61-70, if actual age (in years) of those 20 persons are : 18, 20, 12, 15, 29, 34, 19, 36, 45, 46, 28, 25, 17, 14, 26, 39, 49, 59, 65, 70. Such data (11-20, 21-30, 31-40, 41-50, 51-60, 61-70) is known as grouped data.


Note : Mainly, we can form these groups or class intervals in two ways namely - 
  1. Inclusive Group/Interval : A group or interval whose upper limit (or point) is also included in the same interval, is known as inclusive interval. In the above example, I have made inclusive type of class interval since the interval 11-20 inludes the upper limit (20) of the interval 11-20. Similarly, 21-30 includes 30, 31-40 includes 40, 41-50 includes 50, 51-60 includes 60, 61-70 includes 70 and hence all intervals are of inclusive type.
  2. Exclusive Group/Interval : A group or interval whose upper limit (or point) is not included in the same interval but is included in the next interval, is known as exclusive interval. For the same example as mentioned above, the exclusive groups or class intervals can be formed as : 11-21, 21-3131-41,  41-51, 51-61, 61-71. Here, 21 (upper limit of interval 11-21) is not included in the interval 11-21 but is included in the next interval 21-31. Similar interpretation for other intervals too.

Here, you may have some questions in your mind like, how to decide the width of groups (intervals) or what is the suitable number of groups (intervals) that must be framed in any given problem ? There is a method to answer such questions. We'll talk about it in more details in upcoming articles.

Non-Frequency and Frequency Data

Non-Frequency Data : Data where the identity of each of the individual values has to be kept in view are called non-frequency data.
  • For Example : Consider the data on age (in years) of 20 persons as : 18, 20, 12, 15, 29, 34, 19, 36, 45, 46, 28, 25, 17, 14, 26, 39, 49, 59, 65, 70 and keep this data as it is. Then, this data is an example of non-frequency data since indentity of each individual value of age is maintained. It means that we exactly know  "what are the 20 individual values of age ?".

Frequency Data :
 Data where the identity of each of the individual values is not important at all are called frequency data. In such type of data, we are simply interested in the characteristic of groups or intervals formed by aggregating  individual observations (or values).
  • For Example : Again, consider the data on age (in years) of same 20 persons but here it is recorded in form of groups (intervals) as - 
    • If age of an individual (person) lies in the set {11, 12, 13,...., 20}, record data as 11-20
    • If age of an individual (person) lies in the set {21, 22, 23,...., 30}, record data as 21-30
    • If age of an individual (person) lies in the set {31, 32, 33,...., 40}, record data as 31-40  and so on...
Then, the collected data on age will be 11-2011-20, 11-20, 11-20, 21-30, 31-40, 11-20, 31-40, 41-5041-50, 21-30, 21-30, 11-20, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 61-70. Now, this collected grouped data is an example of frequency data since the identity of each individual value of age is lost due to grouping. How ? 

Suppose if i ask to anyone, "Tell me the actual age of these 20 persons by seeing only this grouped data (that is, 11-2011-20, 11-20, 11-20, 21-30, 31-40, 11-20, 31-40, 41-5041-50, 21-30, 21-30, 11-20, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 61-70)". He will never be able to tell the actual age of those 20 persons that means now individual values of age can not be identified. That's why it has been written above that the identity of each individual value of age is lost due to grouping.


Univariate, Bivariate and Multivariate Data

Univariate Data : If we have data on a single variable only, then data is known as Univariate Data.
  • For Example : Data on Age of a group of persons is an example of Univariate Data.

Univariate Data


Bivariate Data : If we have data on two different variables simultaneously, then data is known as Bivariate Data.
  • For Example : Data on Age and Gender of a group of persons is an example of Bivariate Data.

Bivariate Data

Multivariate Data : If we have data on more than two different variables simultaneously, then data is known as Multivariate Data.
  • For Example : Data on Age, Gender, Education, Weight, Income of a group of persons is an example of Multivariate Data.

Multivariate Data


Cross-Sectional and Time-Series Data


Cross-Sectional Data : Data that are collected on some (one or more than one) variables at a single  time point is known as cross-sectional data.
  • For Example : Suppose we have collected data on height, weight and age of intermediate students of session 2014-15 of any particular school. This data is an example of cross-sectional data because it has been collected at a single time point, that is, 2014-15.

Time-Series Data :
 Data that are collected on some (one or more than one) variables at various time points is known as time-series data.
  • For Example : Again, suppose we have collected data on height, weight and age of intermediate students of session 2014-15, 2015-16, 2016-17, 2017-18, 2018-19, 2019-20, 2020-21 of any particular school. Now, this data is an example of time-series data because it has been collected at various time points, that are,  2014-15, 2015-16, 2016-17, 2017-18, 2018-19, 2019-20, 2020-21.

Note : Time points may be in hours or days or months or years or any other time. In the above example, I have taken time point as a complete year. Session 2014-15 represents a complete year starting from 1 july 2014 to 30 june 2015.

Work for you : Write in the comment box, at least one example of all types of data that are covered in this article.

So, that's all about the classifications of data in this article. In the next article, we'll talk about next issue that is Presentation of Data. Till then, Good Bye !

Happy Learning ! ðŸ˜Š

If you find any mistake or have any suggestions,  just let me know using Suggestion Form given below (for mobile users) or in sidebar (for laptop users)Thank you in Advance ! ðŸ˜Š

Next Article : Link

Share this Article Via 

Post a Comment

0 Comments