Statistics | Revision Notes

Statistics | Revision Notes

Statistics

  • The mean x̅ of n values x1, x2, x3, ...... xn is given by 
    \overline { x } = \frac { x _ { 1 } + x _ { 2 } + x _ { 3 } + \ldots + x _ { n } } { n }
  • Mean of grouped data (without class-intervals)
    • Direct method : If the frequencies of n observations x1, x2, x3, ..... xn be f1, f2, f3, ..... fn respectively, then the mean x̅ is given by
    • Deviation method or Assumed mean method
      In this case, the mean x is given by
      \overline { x } = a + \frac { \Sigma f _ { i } \left( x _ { i } - a \right) } { \Sigma f _ { i } } = a + \frac { \sum f _ { i } d _ { i } } { \Sigma f _ { i } }
      Where, a = assumed mean, Σfi = total frequency, di = xi – a
      Σf(xi – a) = sum of the products of deviations and corresponding frequencies.
  • Mean of grouped data (with class-intervals)
    In this case the class marks are treated as xi.
    \text { Class mark } = \frac { \text { Lower class limit } + \text { Upper class limit } } { 2 }
    • Direct method
      If the frequencies corresponding to the class marks x1, x2, x3, ........ xn be f1, f2, f3 , ........ fn respectively, then mean x̅ is given by

      \overline { x } = \frac { x _ { 1 } f _ { 1 } + x _ { 2 } f _ { 2 } + x _ { 3 } f _ { 3 } +......+x _ { n} f _ {n}} { f _ { 1 } + f _ { 2 } + f _ { 3 } +......+ f _ {n} }=\frac { \Sigma f _ { i } x _ { i } } { \Sigma f _ { i } }
    • Deviation or Assumed mean method
      In this case the mean x is given by
      \overline { x } = a + \frac { \Sigma f _ { i } d _ { i } } { \Sigma f _ { i } }
      Where, a = assumed mean, Σfi = total frequency and di = xi – a
    • Step Deviation method
      In this case we use the following formula.
      \overline { x } = a+ \frac { \Sigma f _ { i } \left( \frac { x _ { i } - a } { h } \right) } { \Sigma f _ { i } } \times h = a + h \left( \frac { \Sigma f_ { i } u _ { i } } { \Sigma f _ { i } } \right)
      Where, a = assumed mean, Σfi = total frequency, h = class-size, u _ { i } = \frac { x _ { i } - a } { h }.
  • Mode is that value among the observations which occurs most often i.e., the value of the observation having the maximum frequency.
  • If in a data more than one value have the same maximum frequency, then the data is said to be multimodal.
  • In a grouped frequency distribution, the class which has the maximum frequency is called the modal class.
  • We use the following formula to find the mode of a grouped frequency distribution.
    \operatorname { Mode } \left( \mathrm { M } _ { o } \right) = l + \left( \frac { f _ { 1 } - f _ { 0 } } { 2 f _ { 1 } - f _ { 0 } - f _ { 2 } } \right) \times h
    • where
      l = lower limit of modal class,
      h = size of the class-interval,
      f1 = frequency of the modal class,
      f0 = frequency of the class preceding the modal class,
      f2 = frequency of the class succeeding the modal class.
  • Median is the value of the middle most item when the data are arranged in ascending or descending order of magnitude.
  • Median of ungrouped data
    • If the number of items n in the data is odd, then
      \text { Median } = \text { value of } \left( \frac { n + 1 } { 2 } \right) \text { item }
    • If the total number of items n in the data is even, then
      \text { Median } = \frac { 1 } { 2 } \left[ \text { value of } \frac { n } { 2 } \text { th item } + \text { value of } \left( \frac { n } { 2 } + 1 \right) \text { th item } \right]
  • Cumulative frequency of a particular value of the variable (or class) is the sum total of all the frequencies up to that value (or the class).
  • There are two types of cumulative frequency distributions.
    • cumulative frequency distribution of less than type.
    • cumulative frequency distribution of more than type.
  • Median of grouped data with class-intervals
    In this case, we first find the half of the total frequencies, i.e., n/2. The class in which n/2 lies is called the median class and the median lies in this class.
    We use the following formula for finding the median.
    \operatorname { Median } \left( \mathrm { M } _ { e } \right) = l + \left( \frac { \frac { n } { 2 } - c f } { f } \right) \times h
    • Where,
      l = lower limit of the median class,
      n = number of observations,
      cf = cumulative frequency of the class preceding the median class,
      f = frequency of the median class,
      h = class size.
  • The three measures mean, mode and median are connected by the following relations.
    Mode = 3 median – 2 mean
  • The graphical representation of a cumulative frequency distribution is called an ogive or cumulative frequency curve.
  • We can draw two types of ogives for a frequency distribution. These are less than ogive and more than ogive.
  • For less than ogive, we plot the points corresponding to the ordered pairs given by (upper limit, corresponding less than cumulative frequency). After joining these points by a free hand curve, we get an ogive of less than type.
  • For more than ogive, we plot the points corresponding to the ordered pairs given by (lower limit, corresponding more than cumulative frequency). After joining these points by a free hand curve, we get an ogive of more than type.
  • Ogive can be used to estimate the median of data. There are two methods to do so.
    • First method : Mark a point corresponding to n/2, where n is the total frequency, on cumulative frequency axis (y-axis).
      From this point, draw a line parallel to x-axis to cut the ogive at a point.
      From this point, draw a line perpendicular to the x-axis to get another point.
      The abscissa of this point gives median.
    • Second method : Draw both the ogives (less than ogive and more than ogive) on the same graph paper which cut each other at a point.
      From this point, draw a line perpendicular to the x-axis, to get another point.
      The abscissa of this point gives median.