# 5.1 Statistics

## Detailed Theory: Statistics

### **1. Introduction to Statistics**

#### **1.1 What is Statistics?**

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data.

**Two Main Branches:**

**Descriptive Statistics:** Methods for summarizing and describing data (mean, median, graphs, etc.)

**Inferential Statistics:** Methods for making predictions or inferences about a population based on sample data

#### **1.2 Basic Terminology**

**a) Data**

Information collected for analysis.

**Types of Data:**

**Qualitative Data:** Descriptive/non-numerical data (colors, gender, yes/no)

**Quantitative Data:** Numerical data that can be measured

* **Discrete Data:** Countable values (number of students, cars)
* **Continuous Data:** Measurable values (height, weight, temperature)

**b) Population vs Sample**

**Population:** Complete set of all items/individuals of interest

**Sample:** A subset of the population selected for study

**c) Variable**

A characteristic that can take different values.

**Example:** In a study of students: age, height, marks are variables

**d) Parameter vs Statistic**

**Parameter:** Numerical measure describing a population characteristic (denoted by Greek letters: $$\mu$$, $$\sigma$$)

**Statistic:** Numerical measure describing a sample characteristic (denoted by Roman letters: $$\bar{x}$$, $$s$$)

***

### **2. Data Collection and Organization**

#### **2.1 Methods of Data Collection**

1. **Direct Observation:** Watching and recording
2. **Experiments:** Controlled conditions
3. **Surveys/Questionnaires:** Asking questions
4. **Interviews:** Face-to-face questioning
5. **Secondary Data:** Using existing data

#### **2.2 Frequency Distribution**

A table showing how often each value or range of values occurs.

**a) For Ungrouped Data**

**Example:** Test scores: 5, 7, 8, 5, 9, 7, 5, 8, 7, 7

**Frequency Table:**

| Score (x) | Frequency (f) |
| --------- | ------------- |
| 5         | 3             |
| 7         | 4             |
| 8         | 2             |
| 9         | 1             |
| **Total** | **10**        |

**b) For Grouped Data**

When data has many different values, we group them into classes.

**Example:** Heights of 50 students (in cm)

| Class Interval | Frequency (f) |
| -------------- | ------------- |
| 150-155        | 5             |
| 155-160        | 10            |
| 160-165        | 15            |
| 165-170        | 12            |
| 170-175        | 8             |
| **Total**      | **50**        |

#### **2.3 Types of Frequency**

**1. Absolute Frequency:** Simple count (denoted by $$f$$)

**2. Relative Frequency:** Proportion or percentage

$$\text{Relative Frequency} = \frac{\text{Frequency of class}}{\text{Total frequency}}$$

**3. Cumulative Frequency:** Running total of frequencies

**Less than type:** Cumulative frequency up to upper limit of each class

**More than type:** Cumulative frequency from lower limit of each class

#### **2.4 Class Interval Details**

**Class Limits:** Lower and upper bounds of a class

**Class Boundaries:** True limits (for continuous data)

If classes are 150-154, 155-159, etc., boundaries are 149.5-154.5, 154.5-159.5

**Class Width:** Difference between upper and lower boundaries

**Class Mark (Midpoint):** Average of class limits

$$\text{Class Mark} = \frac{\text{Lower limit} + \text{Upper limit}}{2}$$

***

### **3. Measures of Central Tendency**

These are single values that represent the center of a data set.

#### **3.1 Mean (Average)**

**a) Arithmetic Mean for Ungrouped Data**

For $$n$$ values $$x\_1, x\_2, \ldots, x\_n$$:

$$\text{Mean} = \bar{x} = \frac{x\_1 + x\_2 + \cdots + x\_n}{n} = \frac{\sum\_{i=1}^{n} x\_i}{n}$$

**Example:** Find mean of: 5, 8, 12, 15, 10

$$\bar{x} = \frac{5 + 8 + 12 + 15 + 10}{5} = \frac{50}{5} = 10$$

**b) Arithmetic Mean for Grouped Data**

For grouped data with frequencies:

$$\bar{x} = \frac{\sum f\_i x\_i}{\sum f\_i}$$

where $$x\_i$$ = class mark, $$f\_i$$ = frequency of i-th class

**Example:** Find mean from:

| Class     | Frequency (f) | Class Mark (x) | f × x   |
| --------- | ------------- | -------------- | ------- |
| 0-10      | 5             | 5              | 25      |
| 10-20     | 8             | 15             | 120     |
| 20-30     | 12            | 25             | 300     |
| 30-40     | 5             | 35             | 175     |
| **Total** | **30**        |                | **620** |

$$\bar{x} = \frac{620}{30} = 20.67$$

**c) Assumed Mean Method (Shortcut)**

For large numbers, use:

$$\bar{x} = A + \frac{\sum f\_i d\_i}{\sum f\_i}$$

where $$A$$ = assumed mean, $$d\_i = x\_i - A$$

**d) Step Deviation Method**

When class intervals are equal:

$$\bar{x} = A + h \times \frac{\sum f\_i u\_i}{\sum f\_i}$$

where $$h$$ = class width, $$u\_i = \frac{x\_i - A}{h}$$

#### **3.2 Median**

The middle value when data is arranged in order.

**a) For Ungrouped Data**

**Step 1:** Arrange data in ascending order

**Step 2:** If $$n$$ is odd: $$\text{Median} = \left(\frac{n+1}{2}\right)$$-th term

If $$n$$ is even: $$\text{Median} = \frac{\left(\frac{n}{2}\right)\text{-th term} + \left(\frac{n}{2}+1\right)\text{-th term}}{2}$$

**Example 1 (odd):** 3, 7, 1, 9, 5 → Arrange: 1, 3, 5, 7, 9

$$n=5$$ (odd), Median = $$\left(\frac{5+1}{2}\right)$$-th term = 3rd term = 5

**Example 2 (even):** 4, 8, 2, 6 → Arrange: 2, 4, 6, 8

$$n=4$$ (even), Median = $$\frac{2\text{nd term} + 3\text{rd term}}{2} = \frac{4+6}{2} = 5$$

**b) For Grouped Data**

For grouped data with cumulative frequency:

$$\text{Median} = L + \left(\frac{\frac{N}{2} - F}{f}\right) \times h$$

where:

$$L$$ = lower boundary of median class

$$N$$ = total frequency

$$F$$ = cumulative frequency before median class

$$f$$ = frequency of median class

$$h$$ = class width

**Median Class:** First class with cumulative frequency ≥ $$\frac{N}{2}$$

**Example:** Find median from:

| Class     | Frequency (f) | Cumulative Frequency |
| --------- | ------------- | -------------------- |
| 0-10      | 5             | 5                    |
| 10-20     | 8             | 13                   |
| 20-30     | 12            | 25                   |
| 30-40     | 5             | 30                   |
| **Total** | **30**        |                      |

$$N=30$$, $$\frac{N}{2}=15$$

Median class is 20-30 (first with CF ≥ 15)

$$L=20$$, $$F=13$$, $$f=12$$, $$h=10$$

$$\text{Median} = 20 + \left(\frac{15 - 13}{12}\right) \times 10 = 20 + \left(\frac{2}{12}\right) \times 10$$

$$= 20 + \frac{20}{12} = 20 + 1.67 = 21.67$$

#### **3.3 Mode**

The value that occurs most frequently.

**a) For Ungrouped Data**

Simply the most frequent value.

**Example:** 3, 5, 7, 5, 2, 5, 9 → Mode = 5 (appears 3 times)

**Note:** Data can have no mode, one mode (unimodal), or multiple modes (bimodal, multimodal)

**b) For Grouped Data**

For grouped data:

$$\text{Mode} = L + \left(\frac{f\_1 - f\_0}{2f\_1 - f\_0 - f\_2}\right) \times h$$

where:

$$L$$ = lower boundary of modal class

$$f\_1$$ = frequency of modal class

$$f\_0$$ = frequency of class before modal class

$$f\_2$$ = frequency of class after modal class

$$h$$ = class width

**Modal Class:** Class with highest frequency

**Example:** Find mode from:

| Class | Frequency (f) |
| ----- | ------------- |
| 0-10  | 5             |
| 10-20 | 8             |
| 20-30 | 12            |
| 30-40 | 5             |
| 40-50 | 3             |

Modal class is 20-30 (highest frequency 12)

$$L=20$$, $$f\_1=12$$, $$f\_0=8$$, $$f\_2=5$$, $$h=10$$

$$\text{Mode} = 20 + \left(\frac{12 - 8}{2\times12 - 8 - 5}\right) \times 10 = 20 + \left(\frac{4}{24 - 13}\right) \times 10$$

$$= 20 + \left(\frac{4}{11}\right) \times 10 = 20 + \frac{40}{11} = 20 + 3.64 = 23.64$$

#### **3.4 Relationship between Mean, Median, Mode**

For moderately skewed distributions:

$$\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}$$

This is called the **empirical relationship**.

***

### **4. Measures of Dispersion (Variability)**

These measure how spread out the data is.

#### **4.1 Range**

Simplest measure of dispersion:

$$\text{Range} = \text{Maximum value} - \text{Minimum value}$$

**Limitation:** Affected by extreme values, doesn't consider all data

#### **4.2 Mean Deviation**

Average of absolute deviations from central value.

**a) Mean Deviation about Mean**

For ungrouped data:

$$\text{MD}(\bar{x}) = \frac{\sum |x\_i - \bar{x}|}{n}$$

For grouped data:

$$\text{MD}(\bar{x}) = \frac{\sum f\_i |x\_i - \bar{x}|}{\sum f\_i}$$

**b) Mean Deviation about Median**

For ungrouped data:

$$\text{MD}(\text{Med}) = \frac{\sum |x\_i - \text{Median}|}{n}$$

For grouped data:

$$\text{MD}(\text{Med}) = \frac{\sum f\_i |x\_i - \text{Median}|}{\sum f\_i}$$

**c) Mean Deviation about Mode**

For ungrouped data:

$$\text{MD}(\text{Mode}) = \frac{\sum |x\_i - \text{Mode}|}{n}$$

For grouped data:

$$\text{MD}(\text{Mode}) = \frac{\sum f\_i |x\_i - \text{Mode}|}{\sum f\_i}$$

#### **4.3 Variance and Standard Deviation**

Most important measures of dispersion.

**a) Variance (**$$\sigma^2$$ **or** $$s^2$$**)**

Average of squared deviations from mean.

**For ungrouped data:**

**Population Variance:** $$\sigma^2 = \frac{\sum (x\_i - \mu)^2}{N}$$

**Sample Variance:** $$s^2 = \frac{\sum (x\_i - \bar{x})^2}{n-1}$$

**For grouped data:**

**Population Variance:** $$\sigma^2 = \frac{\sum f\_i (x\_i - \mu)^2}{\sum f\_i}$$

**Sample Variance:** $$s^2 = \frac{\sum f\_i (x\_i - \bar{x})^2}{(\sum f\_i) - 1}$$

**b) Standard Deviation (**$$\sigma$$ **or** $$s$$**)**

Square root of variance. More interpretable as it has same units as data.

**For ungrouped data:**

**Population SD:** $$\sigma = \sqrt{\frac{\sum (x\_i - \mu)^2}{N}}$$

**Sample SD:** $$s = \sqrt{\frac{\sum (x\_i - \bar{x})^2}{n-1}}$$

**For grouped data:**

**Population SD:** $$\sigma = \sqrt{\frac{\sum f\_i (x\_i - \mu)^2}{\sum f\_i}}$$

**Sample SD:** $$s = \sqrt{\frac{\sum f\_i (x\_i - \bar{x})^2}{(\sum f\_i) - 1}}$$

**c) Shortcut Formulas for Variance**

**Direct Method:** $$\sigma^2 = \frac{\sum f\_i x\_i^2}{\sum f\_i} - \left(\frac{\sum f\_i x\_i}{\sum f\_i}\right)^2$$

**Step Deviation Method:** $$\sigma^2 = h^2 \times \left\[\frac{\sum f\_i u\_i^2}{\sum f\_i} - \left(\frac{\sum f\_i u\_i}{\sum f\_i}\right)^2\right]$$

where $$u\_i = \frac{x\_i - A}{h}$$

#### **4.4 Coefficient of Variation (CV)**

Relative measure of dispersion, expressed as percentage:

$$\text{CV} = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100%$$

Used to compare variability of different data sets.

**Lower CV means less variability relative to mean.**

**Example:** Compare two data sets:

Set A: Mean = 50, SD = 5 → CV = $$\frac{5}{50} \times 100% = 10%$$

Set B: Mean = 100, SD = 15 → CV = $$\frac{15}{100} \times 100% = 15%$$

Set A is more consistent (lower CV).

#### **4.5 Quartiles and Interquartile Range (IQR)**

**a) Quartiles**

Divide data into four equal parts:

**Q1 (First Quartile):** 25th percentile

**Q2 (Second Quartile):** 50th percentile (same as median)

**Q3 (Third Quartile):** 75th percentile

**b) For Ungrouped Data**

**To find Q1:** Value at position $$\frac{n+1}{4}$$

**To find Q3:** Value at position $$\frac{3(n+1)}{4}$$

**c) For Grouped Data**

Similar to median formula:

$$Q\_k = L + \left(\frac{\frac{kN}{4} - F}{f}\right) \times h$$

where $$k=1,2,3$$ for Q1, Q2, Q3

**d) Interquartile Range (IQR)**

$$\text{IQR} = Q\_3 - Q\_1$$

Measures spread of middle 50% of data.

**e) Quartile Deviation (Semi-IQR)**

$$\text{QD} = \frac{Q\_3 - Q\_1}{2}$$

**f) Coefficient of Quartile Deviation**

$$\text{Coefficient of QD} = \frac{Q\_3 - Q\_1}{Q\_3 + Q\_1}$$

***

### **5. Graphical Representation of Data**

#### **5.1 Bar Graph**

For categorical/discrete data. Bars with gaps between them.

**Types:**

**Simple Bar Graph:** One variable

**Multiple Bar Graph:** Compare multiple variables

**Component Bar Graph:** Shows parts of whole

#### **5.2 Histogram**

For continuous grouped data. Bars without gaps.

Area of bars represents frequency.

**Key Points:**

Classes must be continuous

If class intervals are unequal, adjust heights

#### **5.3 Frequency Polygon**

Line graph connecting midpoints of tops of histogram bars.

**To draw:** Plot points (class mark, frequency) and connect them.

#### **5.4 Ogive (Cumulative Frequency Curve)**

Graph of cumulative frequency.

**Types:**

**Less than Ogive:** Plot upper limits vs cumulative frequency (rising curve)

**More than Ogive:** Plot lower limits vs cumulative frequency (falling curve)

**Median from Ogive:** Intersection point of less than and more than ogives gives median

#### **5.5 Pie Chart (Circle Graph)**

Shows proportions as sectors of a circle.

**Angle for each category:** $$\text{Angle} = \frac{\text{Frequency}}{\text{Total frequency}} \times 360^\circ$$

#### **5.6 Box Plot (Box-and-Whisker Plot)**

Shows five-number summary: Minimum, Q1, Median, Q3, Maximum

**Construction:**

Draw box from Q1 to Q3

Draw line inside box at median

Draw whiskers to min and max (or to 1.5×IQR for outliers)

***

### **6. Correlation and Regression**

#### **6.1 Correlation**

Measures strength and direction of linear relationship between two variables.

**a) Types of Correlation**

**Positive Correlation:** Both variables increase together

**Negative Correlation:** One increases, other decreases

**No Correlation:** No relationship

**Perfect Correlation:** All points lie on straight line

**b) Karl Pearson's Correlation Coefficient (r)**

Measures linear correlation:

$$r = \frac{\sum (x\_i - \bar{x})(y\_i - \bar{y})}{\sqrt{\sum (x\_i - \bar{x})^2 \cdot \sum (y\_i - \bar{y})^2}}$$

Shortcut formula:

$$r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{\[n\sum x^2 - (\sum x)^2]\[n\sum y^2 - (\sum y)^2]}}$$

**Properties of r:**

$$-1 \leq r \leq 1$$

$$r=1$$: Perfect positive correlation

$$r=-1$$: Perfect negative correlation

$$r=0$$: No linear correlation

**c) Spearman's Rank Correlation Coefficient (ρ)**

For ranked data or non-linear relationships:

$$\rho = 1 - \frac{6\sum d\_i^2}{n(n^2 - 1)}$$

where $$d\_i$$ = difference in ranks

**If ties exist:** Use formula with adjustments

#### **6.2 Regression**

Finds relationship to predict one variable from another.

**a) Regression Lines**

**Line of regression of y on x:** Predicts y from x

$$y - \bar{y} = r\frac{\sigma\_y}{\sigma\_x}(x - \bar{x})$$

**Line of regression of x on y:** Predicts x from y

$$x - \bar{x} = r\frac{\sigma\_x}{\sigma\_y}(y - \bar{y})$$

**b) Regression Coefficients**

**Regression coefficient of y on x:** $$b\_{yx} = r\frac{\sigma\_y}{\sigma\_x}$$

**Regression coefficient of x on y:** $$b\_{xy} = r\frac{\sigma\_x}{\sigma\_y}$$

**Note:** $$b\_{yx} \times b\_{xy} = r^2$$

**c) Angle Between Regression Lines**

If $$\theta$$ is angle between two regression lines:

$$\tan\theta = \frac{1 - r^2}{|r|} \cdot \frac{\sigma\_x \sigma\_y}{\sigma\_x^2 + \sigma\_y^2}$$

**Special Cases:**

If $$r=0$$: Lines are perpendicular

If $$r=\pm1$$: Lines coincide (angle = 0)

***

### **7. Probability Basics (for Statistics)**

#### **7.1 Basic Concepts**

**Probability:** Measure of likelihood of an event (0 to 1)

**Sample Space (S):** Set of all possible outcomes

**Event (E):** Subset of sample space

#### **7.2 Probability Formulas**

**Classical Probability:** $$P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}$$

**Addition Rule:** $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

For mutually exclusive events: $$P(A \cup B) = P(A) + P(B)$$

**Complement Rule:** $$P(A') = 1 - P(A)$$

**Multiplication Rule:** For independent events: $$P(A \cap B) = P(A) \times P(B)$$

**Conditional Probability:** $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

***

### **8. Random Variables and Probability Distributions**

#### **8.1 Random Variable**

Variable whose values depend on outcomes of random experiment.

**Discrete Random Variable:** Countable values

**Continuous Random Variable:** Measurable values

#### **8.2 Probability Distribution**

For discrete random variable X with values $$x\_1, x\_2, \ldots$$ and probabilities $$p\_1, p\_2, \ldots$$:

**Conditions:** $$0 \leq p\_i \leq 1$$ and $$\sum p\_i = 1$$

#### **8.3 Mean (Expected Value) of Discrete Random Variable**

$$\mu = E(X) = \sum x\_i p\_i$$

#### **8.4 Variance of Discrete Random Variable**

$$\sigma^2 = E(X^2) - \[E(X)]^2 = \sum x\_i^2 p\_i - (\sum x\_i p\_i)^2$$

#### **8.5 Standard Deviation**

$$\sigma = \sqrt{\text{Variance}}$$

#### **8.6 Binomial Distribution**

For experiments with:

* Fixed number of trials (n)
* Two outcomes (success/failure)
* Constant probability of success (p)
* Independent trials

**Probability Mass Function:**

$$P(X = r) = \binom{n}{r} p^r (1-p)^{n-r}$$ for $$r=0,1,2,\ldots,n$$

where $$\binom{n}{r} = \frac{n!}{r!(n-r)!}$$

**Mean:** $$\mu = np$$

**Variance:** $$\sigma^2 = np(1-p)$$

#### **8.7 Normal Distribution**

Most important continuous distribution (bell curve).

**Properties:**

Bell-shaped, symmetric about mean

Mean = median = mode

Total area under curve = 1

**Standard Normal Distribution:** Mean = 0, SD = 1

**Z-score:** $$z = \frac{x - \mu}{\sigma}$$

***

### **9. Solved Examples**

#### **Example 1:** Find Mean, Median, Mode

Data: 12, 15, 18, 12, 20, 15, 12, 25, 18

**Solution:**

**Mean:** $$\bar{x} = \frac{12+15+18+12+20+15+12+25+18}{9} = \frac{147}{9} = 16.33$$

**Median:** Arrange: 12, 12, 12, 15, 15, 18, 18, 20, 25

$$n=9$$ (odd), Median = $$\frac{9+1}{2}=5$$-th term = 15

**Mode:** 12 (appears 3 times, most frequent)

#### **Example 2:** Grouped Data Calculations

Given:

| Class | Frequency |
| ----- | --------- |
| 0-10  | 5         |
| 10-20 | 8         |
| 20-30 | 12        |
| 30-40 | 7         |
| 40-50 | 3         |

Find mean, median, mode, standard deviation.

**Solution:**

First prepare table:

| Class     | f      | x (mid) | fx      | fx²       | CF |
| --------- | ------ | ------- | ------- | --------- | -- |
| 0-10      | 5      | 5       | 25      | 125       | 5  |
| 10-20     | 8      | 15      | 120     | 1800      | 13 |
| 20-30     | 12     | 25      | 300     | 7500      | 25 |
| 30-40     | 7      | 35      | 245     | 8575      | 32 |
| 40-50     | 3      | 45      | 135     | 6075      | 35 |
| **Total** | **35** |         | **825** | **24075** |    |

**Mean:** $$\bar{x} = \frac{825}{35} = 23.57$$

**Median:** $$N=35$$, $$\frac{N}{2}=17.5$$

Median class: 20-30 (CF reaches 25 at this class)

$$L=20$$, $$F=13$$, $$f=12$$, $$h=10$$

Median = $$20 + \left(\frac{17.5-13}{12}\right) \times 10 = 20 + \frac{45}{12} = 23.75$$

**Mode:** Modal class: 20-30 (highest f=12)

$$L=20$$, $$f\_1=12$$, $$f\_0=8$$, $$f\_2=7$$, $$h=10$$

Mode = $$20 + \left(\frac{12-8}{24-8-7}\right) \times 10 = 20 + \frac{4}{9} \times 10 = 24.44$$

**Variance:** $$\sigma^2 = \frac{\sum fx^2}{\sum f} - \left(\frac{\sum fx}{\sum f}\right)^2$$

$$= \frac{24075}{35} - (23.57)^2 = 687.86 - 555.66 = 132.2$$

**Standard Deviation:** $$\sigma = \sqrt{132.2} = 11.5$$

#### **Example 3:** Correlation Calculation

Find correlation coefficient for:

| x | y |
| - | - |
| 1 | 2 |
| 2 | 4 |
| 3 | 5 |
| 4 | 4 |
| 5 | 6 |

**Solution:**

Prepare table:

| x      | y      | xy     | x²     | y²     |
| ------ | ------ | ------ | ------ | ------ |
| 1      | 2      | 2      | 1      | 4      |
| 2      | 4      | 8      | 4      | 16     |
| 3      | 5      | 15     | 9      | 25     |
| 4      | 4      | 16     | 16     | 16     |
| 5      | 6      | 30     | 25     | 36     |
| **15** | **21** | **71** | **55** | **97** |

$$n=5$$, $$\sum x=15$$, $$\sum y=21$$, $$\sum xy=71$$, $$\sum x^2=55$$, $$\sum y^2=97$$

$$r = \frac{n\sum xy - \sum x \sum y}{\sqrt{\[n\sum x^2 - (\sum x)^2]\[n\sum y^2 - (\sum y)^2]}}$$

$$= \frac{5\times71 - 15\times21}{\sqrt{\[5\times55 - 15^2]\[5\times97 - 21^2]}}$$

$$= \frac{355 - 315}{\sqrt{\[275-225]\[485-441]}} = \frac{40}{\sqrt{50 \times 44}} = \frac{40}{\sqrt{2200}} = \frac{40}{46.9} = 0.853$$

Strong positive correlation.

***

### **10. Important Formulas Summary**

#### **10.1 Measures of Central Tendency**

**Mean (ungrouped):** $$\bar{x} = \frac{\sum x\_i}{n}$$

**Mean (grouped):** $$\bar{x} = \frac{\sum f\_i x\_i}{\sum f\_i}$$

**Median (grouped):** $$L + \left(\frac{\frac{N}{2} - F}{f}\right) \times h$$

**Mode (grouped):** $$L + \left(\frac{f\_1 - f\_0}{2f\_1 - f\_0 - f\_2}\right) \times h$$

**Empirical Relation:** Mode = 3×Median - 2×Mean

#### **10.2 Measures of Dispersion**

**Range:** Max - Min

**Variance:** $$\sigma^2 = \frac{\sum f\_i x\_i^2}{\sum f\_i} - \left(\frac{\sum f\_i x\_i}{\sum f\_i}\right)^2$$

**Standard Deviation:** $$\sigma = \sqrt{\text{Variance}}$$

**Coefficient of Variation:** $$\text{CV} = \frac{\sigma}{\bar{x}} \times 100%$$

**Quartiles:** $$Q\_k = L + \left(\frac{\frac{kN}{4} - F}{f}\right) \times h$$

**IQR:** $$Q\_3 - Q\_1$$

#### **10.3 Correlation and Regression**

**Correlation Coefficient:**

$$r = \frac{n\sum xy - \sum x \sum y}{\sqrt{\[n\sum x^2 - (\sum x)^2]\[n\sum y^2 - (\sum y)^2]}}$$

**Regression Line (y on x):** $$y - \bar{y} = r\frac{\sigma\_y}{\sigma\_x}(x - \bar{x})$$

**Regression Coefficients:** $$b\_{yx} = r\frac{\sigma\_y}{\sigma\_x}$$, $$b\_{xy} = r\frac{\sigma\_x}{\sigma\_y}$$

**Relation:** $$b\_{yx} \times b\_{xy} = r^2$$

#### **10.4 Probability Distributions**

**Binomial:** $$P(X=r) = \binom{n}{r} p^r (1-p)^{n-r}$$

**Mean of Binomial:** $$np$$

**Variance of Binomial:** $$np(1-p)$$

***

### **11. Exam Tips and Common Mistakes**

#### **11.1 Common Mistakes to Avoid**

1. **Using wrong formula** for grouped vs ungrouped data
2. **Confusing population vs sample formulas** (divide by n vs n-1 for variance)
3. **Forgetting to arrange data** before finding median
4. **Incorrect class boundaries** for grouped data
5. **Misinterpreting correlation coefficient** (correlation ≠ causation)

#### **11.2 Problem-Solving Strategy**

1. **Identify data type:** Ungrouped or grouped? Discrete or continuous?
2. **Choose correct formulas** based on what's asked
3. **Create tables** for organized calculations (especially for grouped data)
4. **Show all steps** clearly
5. **Include units** in final answer

#### **11.3 Quick Checks**

1. **Mean, median, mode relationship:** For symmetric data: Mean = Median = Mode
2. **Standard deviation:** Always non-negative
3. **Correlation coefficient:** Between -1 and 1
4. **Probability:** Between 0 and 1
5. **Variance formulas:** Population: divide by N, Sample: divide by n-1

This comprehensive theory covers all aspects of statistics with detailed explanations and examples, making it easy to understand while being thorough enough for exam preparation.

##