What is a five number summary in statistics?

The five number summary is a set of basic descriptive statistics which provides information about a set of data. It identifies the shape, center, and spread of a statistic in universal terms which can be used to analyze any sample, regardless of the underlying distribution. It consists of 5 key metrics: the median value (the center), the range of a distribution (25th percentile to 75th percentile), and the maximum and minimum observed values.

Why Is The Five Number Summary Important?

The five number summary is a concise description of a set of observations. It can be quickly calculated, describes the general shape of the distribution, identifies the likely range of values, and - most importantly - does not involve any assumptions about the shape of the underlying distribution. In this sense, the five number summary is a universal description of the key practical elements of a distribution of observations.

How to calculate the five number summary

Sort The observations, ranking by value Count the Total Number of Observations For each percentile, take the appropriate point in the ranked list If the precise percentile falls between two points, average the nearest two points. Well the simple way is to use our five number summary calculator. But if you're doing this by hand: And if this was your homework assignment, you're welcome.....

How Do You Find Q1 and Q3?

Well the simple way is to use our five number summary calculator. But if you're doing this by hand... See list sorting exercise above (rank observations by value). Count the total number of records. Divide by 4. That is the observation in the list for the 25th percentile (Q1, the 1st quartile). Multiply this amount by 3. That is the observation in the list for the 75th percentile (the start of the upper quartile or the top of the 3rd quartile). Anything outside of that range is an outlier. If an observation falls between two points, the general convention is to average the points. There are more complicated approaches (a weighted average) but this usually will suffice. The second quartile is the gap between the 25th percentile and the median. The fourth quartile is the gap between the 75th percentile and the maximum value. This captures your interquartile distance. You can identify the upper half and lower half of a distribution using the smallest value, middle value, and largest value of the sample. This approach is independent of sample size.

How Do You Build a Box Plot?

The five number summary can be used to create a box plot graph. The range of the graph is denoted as the top of the first quartile and the top of the third quartile. You are treating the upper quartile and lower quartile as outlier data points. The quartile value is used to show the range of the quartile. The whisker diagram shows the range between the extreme values (maximum value, minimum value) of the data. There is another form of the boxplot referred to as a modified box plot. This adjusts the box and whisker plot so to drop outlier data value points. This site uses a histogram to as a descriptive statistic tool; we can add a modified boxplot if there's sufficient demand.While the five number summary is a good basic measure of a distribution, it doesn't show a full view of the standard deviation, mean, or variance. You need to carefully manage any suspected outlier data points.

What Are Upper and Lower Fences?

You can use the information from the 5 number summary calculator to calculate this. The upper and lower fences are a simple estimate of the potential outliers of a distribution. This approach uses the interquartile range (Q3 - Q1 values) to assess how far outliers may exist. The inner fence is 1.5 x the interquartile range above / below the 1st and 3rd quartiles (respectively). The outer fence is 3.0 x the interquartile range. Note that the lower bounds of these ranges can be a negative number (if the IQR is wide and the absolute values of the first quartile are small. This is common in many logistics problems. In most cases, the underlying data isn't from a normal distribution.

What is the lower hinge?

That's how some people refer to the first quartile. This is the bottom of your frequency distribution.

Additional Measures - Seven Number Summary

For convenience, we've enclosed two additional measures (10th and 90th percentile) which can be used to generate a similar package known as the seven number summary. The additional two metrics gives you better visibility into what is happening at the tails of the distribution. While outliers and distribution tails are a small fraction of your data, they can frequently have a disproportionate impact on overall performance. For example, a group of likely voters may exhibit a range of satisfaction scores with a particular candidate - but only the top and bottom 10% is truly motivated enough to take action based on their opinions. In business, similar models can be used to explain customer defection to another supplier and contribution margin economics within a distribution business.

Data Storage

This tool is designed to make it easy to repeat statistical calculations. You can save your data to local device storage (if your phone or computer supports HTML5), allowing you to retrieve and edit data from past calculations. A list of saved datasets is provided below the main calculation area - click on the name of the dataset and the data table above will update. Important: these are locally saved only (cannot be accessed on other devices, are not sent to our servers, and will be deleted if your cache is cleared). If you need to save this data permanently or share it between devices (or with a colleage), send it as a link. Click on the dataset name to load it into the list of data points in the calculator, hit the calculate button, and copy the URL. You can easily email the URL to your colleagues or post it on a message board. When anyone clicks on the URL, it will contain the shared values.