Understanding Descriptive Statistics
Interpret the key results for Descriptive Statistics. Learn more about Minitab. Complete the following steps to interpret descriptive statistics. Key output includes N, the mean, the median, the standard deviation, and several graphs. In This Topic. Step 1: Describe the size of your sample;. Interpret the key results for Display Descriptive Statistics. Learn more about Minitab 18 Complete the following steps to interpret display descriptive statistics. Key output includes N, the mean, the median, the standard deviation, and several graphs. In This Topic.
Sign in. Statistics is a branch of mathematics that deals with collecting, interpreting, organization and interpretation of data. Initially, when we get the data, instead of applying fancy algorithms and making some predictions, we first try to read and understand the data by applying statistical techniques. By doing this, we are able to understand what type of distribution data has.
This blog aims to answer following questions :. What is Descriptive Statistics? Types of Descriptive Statistics? What is Skewness? What is Kurtosis? What is Correlation? Descriptive statistics involves summarizing and organizing the data so they can be easily understood. Descriptive statistics, unlike inferential statistics, seeks to describe the data, but do not attempt to make inferences from the sample to the whole population.
Here, we typically describe the data in a sample. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory. Descriptive statistics are broken down into two categories.
Measures of central tendency and measures of variability spread. Mean or Average is a central tendency of the data i. In a way, it is a single number which can estimate the value of whole data set. Median is the value which divides the data in 2 equal parts i. We will talk about IQR later in this blog. Median will be a middle term, if number of terms is odd.
Median will be average of middle 2 terms, if number of terms is even. The median is 59 which will divide set what is sell by date on milk numbers into equal two parts. Since there are even numbers in the set, the answer is average of middle numbers 51 and Note: When values are in arithmetic progression difference between the consecutive terms is constant.
Here it is 2. An mean of these 5 numbers is 6 and so median. Mode is the term appearing maximum time in data set i. In this data set, mode is 67 because it has more than rest of the values, i. But there could be a data set where there is no mode at all as all values appears same number of times.
If two values appeared same time and more than the rest of the values then the data set is bimodal. If three values appeared same time and more than the rest of the values then the data set is trimodal and for n modes, that data set is multimodal. Measure of Spread refers to the idea of variability within your data.
Standard deviation is the measurement of average distance between each quantity and mean. That is, how data is spread out from mean.
A low standard deviation indicates that the data points how to interpret descriptive statistics results to be close to the mean of the data set, while a high standard deviation indicates that the data points are spread out over a wider range of values. There are situations when we have to choose between sample or population Standard Deviation.
When we are asked to find SD of some part of a how to repair worn shoe soles, a segment of population; then we use sample Standard Deviation. But when we have to deal with a whole population, then we use population Standard Deviation. Though sample is a part of a population, their SD formulas should have been same, but it is not. To find out more about it, refer this link.
As you know, in descriptive statistics, we generally deal with a data available in a sample, not in a how to shrink movie files for email. So if we use previous data set, and substitute the values in sample formula. And answer is It is an average of absolute differences between each value in a set of values, and the average of all values of that set.
So if we use previous data set, and substitute the values. Variance is a square of average distance between each quantity and mean. That is it is square of standard deviation. Range is one of the simplest techniques of descriptive statistics.
It is the difference between lowest and highest value. Percentile is a way to represent position of a values in data set. To how to turn on location services on iphone 4 percentile, values in data set should always be in ascending order. The median 59 has 4 values less than itself out of 8. In statistics and probability, quartiles are values that divide your how to interpret descriptive statistics results into quarters provided data is sorted in an ascending order.
There are three quartile values. First quartile value is at 25 percentile. Second quartile is 50 percentile and third quartile is 75 percentile.
Second quartile Q2 is median of the whole data. First quartile Q1 is median of upper half of the data. And Third Quartile Q3 is median of lower half of the data. So here, by analogy. Note: If you sort data in how to decorate a desk at work order, IQR will be The magnitude will be same, just sign will differ.
Negative IQR is fine, if your data is in descending order. It just we negate smaller values from larger values, we prefer ascending order Q3 - Q1. Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or undefined. In a perfect normal distribution, the tails on either side of the curve are exact mirror images of each other.
This situation is also called negative skewness. This situation is also called positive skewness. How to the skewness coefficient? To calculate skewness coefficient of the sample, there are two methods:. Therefore, if frequency of values is very low then it will not give a stable measure of central tendency. For example, the mode in both these sets of data is In the first set of data, the mode only appears twice.
But in second set. The exact interpretation of the measure of Kurtosis used to be disputed, but is now settled. Its about existence of outliers. Kurtosis is a measure of whether the data are heavy-tailed profusion of outliers or light-tailed lack of outliers relative to a normal distribution. There are three types of Kurtosis. Mesokurtic is the distribution which has similar kurtosis as normal distribution kurtosis, which is zero.
Distribution is the distribution which has kurtosis greater than a Mesokurtic distribution. Tails of such distributions are thick and heavy. If the curve of a distribution is more peaked than Mesokurtic curve, it is referred to as a Leptokurtic curve. Distribution is the distribution which has kurtosis lesser than a Mesokurtic distribution. Tails of such distributions thinner. If a curve of a distribution is less peaked than a Mesokurtic curve, it is referred to as a Platykurtic curve.
The main difference between skewness and kurtosis is that the skewness refers to the degree of symmetry, whereas the kurtosis refers to the degree of presence of outliers in the distribution. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related.
It ranges from If r is close to 0, it means there is no relationship between the variables. If r is positive, it what happened to paige davis on trading spaces that as one variable gets larger the other gets larger. This was a basic run-down of some basic statistical techniques that can help a you to understand data science in a long run. Thanks for Reading. I am always open for your questions and suggestions.
You can share this on Facebook, Twitter, Linkedin, so someone in need might stumble upon this. You can reach me at:. Live and breath ML. All views are my own.
Types of descriptive statistics
Descriptive statistics are important for establishing the validity of your sample as a representation of the sampled population. Including these in your dissertation will allow comparison to other similar studies, while placing your results in perspective. Descriptive Statistics and Interpreting Statistics Resources. Bartz, A. E. (). Use the mean to describe the sample with a single value that represents the center of the data. Many statistical analyses use the mean as a standard measure of the center of the distribution of the data. The median and the mean both measure central tendency. But unusual values, called outliers, affect the median less than they affect the mean. This page shows examples of how to obtain descriptive statistics, with footnotes explaining the output. The data used in these examples were collected on high schools students and are scores on various tests, including science, math, reading and social studies (socst).The variable female is a dichotomous variable coded 1 if the student was female and 0 if male.
This page shows examples of how to obtain descriptive statistics, with footnotes explaining the output. The data used in these examples were collected on high schools students and are scores on various tests, including science, math, reading and social studies socst.
The variable female is a dichotomous variable coded 1 if the student was female and 0 if male. In the syntax below, the get file command is used to load the data into SPSS. In quotes, you need to specify where the data file is located on your computer. Remember that you need to use the.
There are several commands that you can use to get descriptive statistics for a continuous variable. We will show two: descriptives and examine. We have added some options to each of these commands, and we have deleted unnecessary subcommands to make the syntax as short and understandable as possible.
You will find that the examine command always produces a lot of output. This can be very helpful if you know what you are looking for, but can be overwhelming if you are not used to it. If you need just a few numbers, you may want to use the descriptives command.
Each as shown below. We will use the hsb2. N — This is the number of valid observations for the variable. The total number of observations is the sum of N and the number of missing values. Mean — This is the arithmetic mean across the observations. It is the most widely used measure of central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values. It measures the spread of a set of observations.
The larger the standard deviation is, the more spread out the observations are. Variance — The variance is a measure of variability. It is the sum of the squared distances of data value from the mean divided by the variance divisor. The Corrected SS is the sum of squared distances of data value from the mean.
Therefore, the variance is the corrected SS divided by N Instead, we use standard deviation. Skewness — Skewness measures the degree and direction of asymmetry. A symmetric distribution such as a normal distribution has a skewness of 0, and a distribution that is skewed to the left, e. Valid — This refers to the non-missing cases. In this column, the N is given, which is the number of non-missing cases; and the Percent is given, which is the percent of non-missing cases. Missing — This refers to the missing cases.
In this column, the N is given, which is the number of missing cases; and the Percent is given, which is the percent of the missing cases. Total — This refers to the total number cases, both non-missing and missing. In this column, the N is given, which is the total number of cases in the data set; and the Percent is given, which is the total percent of cases in the data set.
Error — These are the standard errors for the descriptive statistics. The standard error gives some idea about the variability possible in the statistic.
This gives you some idea about the variability of the estimate of the true population mean. However, you cannot assume that all outliers have been removed from the trimmed mean. Median — This is the median. The median splits the distribution such that half of all values are above this value, and half are below.
Deviation — Standard deviation is the square root of the variance. Range — The range is a measure of the spread of a variable. It is equal to the difference between the largest and the smallest observations. It is easy to compute and easy to understand. However, it is very insensitive to variability. Interquartile Range — The interquartile range is the difference between the upper and the lower quartiles.
It measures the spread of a data set. It is robust to extreme observations. Kurtosis — Kurtosis is a measure of the heaviness of the tails of a distribution. In SAS, a normal distribution has kurtosis 0. Extremely nonnormal distributions may have high positive or negative kurtosis values, while nearly normal distributions will have kurtosis values close to 0. Weighted Average — These are the percentiles for the variable write.
Some of the values are fractional, which is a result of how they are calculated. If there is not a value at exactly the 5th percentile, for example, the value is interpolated. They are calculated the way that Tukey originally proposed when he came up with the idea of a boxplot. The values are not interpolated; rather, they are approximations that can be obtained with little calculation.
Percentiles — These columns given you the values of the variable at various percentiles. These tell you about the distribution of the variable. Percentiles are determined by ordering the values of the variable from lowest to highest, and then looking at whatever percent to see the value of the variable there.
For example, in the column labeled 5 , the value of the variable write is Because this is a weighted average, SPSS is taking into account the fact that there are several values of 35, which is why the weighted average is It is a measure of central tendency. It is the middle number when the values are arranged in ascending or descending order. Sometimes, the median is a better measure of central tendency than the mean.
It is less sensitive than the mean to extreme observations. A histogram shows the frequency of values of a variable. The size of the bins is determined by default when you use the examine command to create a histogram, but you can use either the graph or ggraph command to create a histogram over which you can have much more control. In this histogram, each bin contains two values. For example, the first bin contains values 30 and 31, the second bin contains 32 and 33, and so on.
The histogram is a graphical representation of the percentiles that were displayed above. As with percentiles, the purpose of the histogram is the give you an idea about the distribution of the variable. Stem — This is the stem. It is the number in the 10s place of the value of the variable.
For example, in the first line, the stem is 3 and leaves are 1. The value of the variable is The 3 is in the 10s place, so it is the stem. Leaf — This is the leaf. It is the number in the 1s place of the value of the variable. The number of leaves tells you how many of these numbers is in the variable. For example, on the fifth line, there is one 8 and five 9s hence, the frequency is six. This means that there is one value of 38 and five values of 39 in the variable write.
This is the maximmum score unless there are values more than 1. This is the minimum score unless there are values less than 1. Click here to report an error on this page or leave a comment. Your Name required. Your Email must be a valid email for us to receive the report! How to cite this page. Valid N listwise — This is the number of non-missing values. Minimum — This is the minimum, or smallest, value of the variable.
Maximum — This is the maximum, or largest, value of the variable. Case processing summary a. Descriptive statistics a. Statistic — These are the descriptive statistics. Percentiles a. Histogram a. Frequency — This is the frequency of the leaves. Boxplot a. This is the third quartile Q3 , also known as the 75th percentile.