Descriptive Statistics

Asit

3 years ago

What is descriptive Statistics?

Descriptive statistics summarize and organize characteristics of a data set/entire population.

In quantitative research, after collecting data, the first step of statistical analysis is to describe the characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).
The next step is inferential statistics, which helps you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Types of descriptive statistics

There are 3 main types of descriptive statistics:

1. Central tendency : describes averages of the data points.
2. Variability : describes – variation between the values.
3. Distribution : describes – the frequency of each value.

Central tendency

What are the measures of central tendency?

It is a measure to describe a single value middle or centre value (of the whole data set).
It is also called a measure of centre or central location.
Each of these measures describes a different indication of central value in the distribution.
There are three main measures of central tendency:
- - Mean
  - Mode and
  - Median

Mean (Arithmetic)

The mean is the sum of each value divided by the number of observations. This is also known as the arithmetic average.

The mean (or average) is the most popular and well-known measure of central tendency.
It can be used with both discrete and continuous data
As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.
If continuous data values are x₁, x2, …………..………………,x_n and n is the number of data

Mode

The mode is the most commonly occurring value in a distribution.
The mode is the most frequent value in our data set.
It is the highest bar in a histogram.

Median

The median is the middle value in distribution when the values are arranged in ascending or descending order.

The median divides the distribution in half (there are 50% of observations on either side of the median value).

In a distribution with an odd number of observations, the median value is the middle value.

If no of the data is even, the median is the average of the middle two values.
The median is less affected by outliers and skewed data.

Application of Mean, median and mode with data type

The best measure of central tendency with respect to the type of variable

How do outliers influence the measures of central tendency?

Outliers are extreme data value(s) that are notably different from the rest of the data.
Outliers can alter the results of Mean, Mode, and Median analysis.
The mean is more sensitive to the existence of outliers than the median or mode.

Variability or dispersion

The goal for variability is to obtain a measure of how spread out the scores are in distribution.
A measure of variability usually accompanies a measure of central tendency as basic descriptive statistics for a set of scores.
Variability serves both as a descriptive measure and as an important component of most inferential statistics.
As a descriptive statistic, variability measures the degree to which the scores are spread out or clustered together in a distribution.
In the context of inferential statistics, variability provides a measure of how accurately an individual score or sample represents the entire population.
When the population variability is small, all of the scores are clustered close together and any individual score or sample will necessarily provide a good representation of the entire set.
When variability is large and scores are widely spread, it is easy for one or two extreme scores to give a distorted picture of the general population.

Why Understanding Variability is Important

A low dispersion indicates that the data points tend to be clustered tightly around the centre.
High dispersion signifies that data points are far away.

Measuring Variability

Variability is determined by measuring distance. It can be measured by calculating –

1. 1. Range
  2. Interquartile range
  3. Standard deviation or variance.

a) Range

The range is the simplest measure of variability.
Take the smallest number and subtract it from the largest number to calculate the range. This shows the spread of our data.
The range is sensitive to outliers or values that are significantly higher or lower than the rest of the data set, and should not be used when outliers are present.
The range is the total distance covered by the distribution, from the highest value to the lowest value.

b) Interquartile range

The IQR, or the middle fifty, is the range for the middle fifty percent of the data. The IQR only considers middle values, so it is not affected by the outliers.
The interquartile range is the distance covered by the middle 50% of the distribution (the difference between Q1 and Q3).

IQR = Q3 -Q1

Steps to calculate IQR :-
- 1. List the data in numerical order.
  2. Find out the range and median.
  3. Consider data points above the Median in Q3 Zone
  4. Consider data points below the median in Q1 Zone
  5. Find the median of the data in Zone Q1.
  6. Find the median of the data in Zone Q3
  7. Find the interquartile range using the formula IQR = Q3 – Q1

b) Standard deviation

Standard deviation measures the standard distance between a data value and the mean.
Follow the following steps for calculating the standard deviation :

1. 1. Find out the mean of all values
  2. Subtract the mean from each data point to get the distance from the mean.
  3. Square each distance.
  4. Add up all of the squared distance.
  5. For Population: Divide the sum of the squared distances by N (N- number of data points in a Population)
  6. For sample: Divide the sum of the squared distances by n – 1 (n- number of data points in a sample)
  7. Do the square root of the above value to get the Standard deviation.

Properties of the Standard Deviation

If a constant is added to every score in a distribution, the standard deviation will not be changed.
If you visualize the scores in a frequency distribution histogram, then adding a constant will move each score so that the entire distribution is shifted to a new location.
The centre of the distribution (the mean) changes, but the standard deviation remains the same.
If each score is multiplied by a constant, the standard deviation will be multiplied by the same constant.
Multiplying by a constant will multiply the distance between scores, and because the standard deviation is a measure of distance, it will also be multiplied.