What is descriptive Statistics?
Descriptive statistics summarize and organize characteristics of a data set/entire population.
- In quantitative research, after collecting data, the first step of statistical analysis is to describe the characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).
- The next step is inferential statistics, which helps you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.
Types of descriptive statistics
There are 3 main types of descriptive statistics:
-
- Central tendency : describes averages of the data points.
- Variability : describes – variation between the values.
- Distribution : describes – the frequency of each value.
Central tendency
What are the measures of central tendency?
- It is a measure to describe a single value middle or centre value (of the whole data set).
- It is also called a measure of centre or central location.
- Each of these measures describes a different indication of central value in the distribution.
- There are three main measures of central tendency:
-
- Mean
- Mode and
- Median
-
Mean (Arithmetic)
The mean is the sum of each value divided by the number of observations. This is also known as the arithmetic average.
- The mean (or average) is the most popular and well-known measure of central tendency.
- It can be used with both discrete and continuous data
- As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.
- If continuous data values are x1, x2, …………..………………,xn and n is the number of data
Mode
- The mode is the most commonly occurring value in a distribution.
- The mode is the most frequent value in our data set.
- It is the highest bar in a histogram.
Median
The median is the middle value in distribution when the values are arranged in ascending or descending order.
The median divides the distribution in half (there are 50% of observations on either side of the median value).
In a distribution with an odd number of observations, the median value is the middle value.
- If no of the data is even, the median is the average of the middle two values.
- The median is less affected by outliers and skewed data.
Application of Mean, median and mode with data type
- The best measure of central tendency with respect to the type of variable
How do outliers influence the measures of central tendency?
- Outliers are extreme data value(s) that are notably different from the rest of the data.
- Outliers can alter the results of Mean, Mode, and Median analysis.
- The mean is more sensitive to the existence of outliers than the median or mode.
Variability or dispersion
- The goal for variability is to obtain a measure of how spread out the scores are in distribution.
- A measure of variability usually accompanies a measure of central tendency as basic descriptive statistics for a set of scores.
- Variability serves both as a descriptive measure and as an important component of most inferential statistics.
- As a descriptive statistic, variability measures the degree to which the scores are spread out or clustered together in a distribution.
- In the context of inferential statistics, variability provides a measure of how accurately an individual score or sample represents the entire population.
- When the population variability is small, all of the scores are clustered close together and any individual score or sample will necessarily provide a good representation of the entire set.
- When variability is large and scores are widely spread, it is easy for one or two extreme scores to give a distorted picture of the general population.
Why Understanding Variability is Important
- A low dispersion indicates that the data points tend to be clustered tightly around the centre.
- High dispersion signifies that data points are far away.
Measuring Variability
- Variability is determined by measuring distance. It can be measured by calculating –
-
-
- Range
- Interquartile range
- Standard deviation or variance.
-
a) Range
- The range is the simplest measure of variability.
- Take the smallest number and subtract it from the largest number to calculate the range. This shows the spread of our data.
- The range is sensitive to outliers or values that are significantly higher or lower than the rest of the data set, and should not be used when outliers are present.
- The range is the total distance covered by the distribution, from the highest value to the lowest value.
b) Interquartile range
- The IQR, or the middle fifty, is the range for the middle fifty percent of the data. The IQR only considers middle values, so it is not affected by the outliers.
- The interquartile range is the distance covered by the middle 50% of the distribution (the difference between Q1 and Q3).
IQR = Q3 -Q1
- Steps to calculate IQR :-
-
- List the data in numerical order.
- Find out the range and median.
- Consider data points above the Median in Q3 Zone
- Consider data points below the median in Q1 Zone
- Find the median of the data in Zone Q1.
- Find the median of the data in Zone Q3
- Find the interquartile range using the formula IQR = Q3 – Q1
-
b) Standard deviation
- Standard deviation measures the standard distance between a data value and the mean.
- Follow the following steps for calculating the standard deviation :
-
-
- Find out the mean of all values
- Subtract the mean from each data point to get the distance from the mean.
- Square each distance.
- Add up all of the squared distance.
- For Population: Divide the sum of the squared distances by N (N- number of data points in a Population)
- For sample: Divide the sum of the squared distances by n – 1 (n- number of data points in a sample)
- Do the square root of the above value to get the Standard deviation.
-
Properties of the Standard Deviation
- If a constant is added to every score in a distribution, the standard deviation will not be changed.
- If you visualize the scores in a frequency distribution histogram, then adding a constant will move each score so that the entire distribution is shifted to a new location.
- The centre of the distribution (the mean) changes, but the standard deviation remains the same.
- If each score is multiplied by a constant, the standard deviation will be multiplied by the same constant.
- Multiplying by a constant will multiply the distance between scores, and because the standard deviation is a measure of distance, it will also be multiplied.
Descriptive Statistics- Mean and Standard Deviation
- If you are given numerical values for the mean and the standard deviation, you should be able to construct a visual image (or a sketch) of the distribution of scores.
- As a general rule, about 70% of the values will be within one standard deviation of the mean, and about 95% of the values will be within a distance of two standard deviations of the mean.
Difference between Central Tendency and Variability
- Central tendency describes the central point of the distribution, and variability describes how the data values are scattered around that central point.
- Together, central tendency and variability are the two primary values that are used to describe a distribution of a population or a sample.
Frequency distributions
A frequency distribution is an organized tabulation of the number of individuals located in each category on the scale of measurement.
The following set of N = 20 scores was obtained from a 10-point statistics quiz. We will organize these scores by constructing a frequency distribution table. Scores:
8, 9, 8, 7, 10, 9, 6, 4, 9, 8, 7, 8, 10, 9, 8, 6, 9, 7, 8, 8
Frequency Table:
-
-
- Score (X) is in the first column
- The frequency associated with each score is recorded in the second column
- X values in the above frequency distribution table represent the scale of measurement, not the actual set of scores. Example: The x column lists the value 10 only one time, but the frequency column indicates that there are actually two values of X = 10
- The highest score is X = 10, and the lowest score is X = 4
- No one had a score of X = 5
- Plot the score vs frequency on a Bar graph for a visual understanding of the frequency distribution:
-
Frequency Bar Graph
-
-
- The frequency of each object is calculated
- And a Bar Graph is plotted – object vs Frequency
- On the x-axis: Objects
- On the y-axis: Frequencies
-
Also Read
- https://matistics.com/statistics-data-variables/
- https://matistics.com/descriptive-statistics/
- https://matistics.com/1-1-measurement-scale/
- https://matistics.com/point-biserial-correlation-and-biserial-correlation/
- https://matistics.com/2-0-statistics-distributions/
- https://matistics.com/1-2-statistics-population-and-sample/
- https://matistics.com/7-hypothesis-testing/
- https://matistics.com/8-errors-in-hypothesis-testing/
- https://matistics.com/9-one-tailed-hypothesis-test/
- https://matistics.com/10-statistical-power/
- https://matistics.com/11-t-statistics/
- https://matistics.com/12-hypothesis-t-test-one-sample/
- https://matistics.com/13-hypothesis-t-test-2-sample/
- https://matistics.com/14-t-test-for-two-related-samples/
- https://matistics.com/15-analysis-of-variance-anova-independent-measures/
- https://matistics.com/16-anova-repeated-measures/
- https://matistics.com/17-two-factor-anova-independent-measures/
- https://matistics.com/18-correlation/
- https://matistics.com/19-regression/
- https://matistics.com/20-chi-square-statistic/
- https://matistics.com/21-binomial-test/