Descriptive stats 2
Measures of spread (5 number summary)
Here we are trying to figure how spread out the data is. Two given set data may have same mean but have different min and max.
Histogram is useful for this
5 number summary =>
first quartile (median of data on left side of main median),
second quartile (median) ,
third quartile (median of data on right side of main median),
Standard deviation -
Measure spread of data set with a single value
it tells us, on average how much each item varies from the mean.
For example when looking at sets of prices of 2 different stocks, the set where the standard deviation is higher is considered riskier since the instances seem to fluctuate rapidly between more than one extremes hence signifying risk.
Shape of our data
5 tips for first view understanding of data
- plot the data
- deal with the outliers (may need domain expertise)
- You can fully describe a normally distributed data with mean and standard deviation
- for skewed data you will need 5 number summary to describe it correctly
Descriptive vs inferential statistics
Descriptive statistics is about describing the data we have. By itself the data is just a bunch of numbers sitting in a grid. Think of it like very first scientists naming the solar system, making sense of the universe as they went about discovering it.
Inferential statistics on the other hand is about drawing conclusions from the data. Perhaps even foretelling trends from the patterns you find in hidden in the data. Kinda like predicting the next solar storm or maybe even doomsday.
It is about drawing conclusions about the entire universe (which we cannot survey) from the data that we gathered from our sample set of planets (that we can survey).