5.1 Statistics

Detailed Theory: Statistics

1. Introduction to Statistics

1.1 What is Statistics?

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data.

Two Main Branches:

Descriptive Statistics: Methods for summarizing and describing data (mean, median, graphs, etc.)

Inferential Statistics: Methods for making predictions or inferences about a population based on sample data

1.2 Basic Terminology

a) Data

Information collected for analysis.

Types of Data:

Qualitative Data: Descriptive/non-numerical data (colors, gender, yes/no)

Quantitative Data: Numerical data that can be measured

  • Discrete Data: Countable values (number of students, cars)

  • Continuous Data: Measurable values (height, weight, temperature)

b) Population vs Sample

Population: Complete set of all items/individuals of interest

Sample: A subset of the population selected for study

c) Variable

A characteristic that can take different values.

Example: In a study of students: age, height, marks are variables

d) Parameter vs Statistic

Parameter: Numerical measure describing a population characteristic (denoted by Greek letters: μ\mu, σ\sigma)

Statistic: Numerical measure describing a sample characteristic (denoted by Roman letters: xˉ\bar{x}, ss)


2. Data Collection and Organization

2.1 Methods of Data Collection

  1. Direct Observation: Watching and recording

  2. Experiments: Controlled conditions

  3. Surveys/Questionnaires: Asking questions

  4. Interviews: Face-to-face questioning

  5. Secondary Data: Using existing data

2.2 Frequency Distribution

A table showing how often each value or range of values occurs.

a) For Ungrouped Data

Example: Test scores: 5, 7, 8, 5, 9, 7, 5, 8, 7, 7

Frequency Table:

Score (x)
Frequency (f)

5

3

7

4

8

2

9

1

Total

10

b) For Grouped Data

When data has many different values, we group them into classes.

Example: Heights of 50 students (in cm)

Class Interval
Frequency (f)

150-155

5

155-160

10

160-165

15

165-170

12

170-175

8

Total

50

2.3 Types of Frequency

1. Absolute Frequency: Simple count (denoted by ff)

2. Relative Frequency: Proportion or percentage

Relative Frequency=Frequency of classTotal frequency\text{Relative Frequency} = \frac{\text{Frequency of class}}{\text{Total frequency}}

3. Cumulative Frequency: Running total of frequencies

Less than type: Cumulative frequency up to upper limit of each class

More than type: Cumulative frequency from lower limit of each class

2.4 Class Interval Details

Class Limits: Lower and upper bounds of a class

Class Boundaries: True limits (for continuous data)

If classes are 150-154, 155-159, etc., boundaries are 149.5-154.5, 154.5-159.5

Class Width: Difference between upper and lower boundaries

Class Mark (Midpoint): Average of class limits

Class Mark=Lower limit+Upper limit2\text{Class Mark} = \frac{\text{Lower limit} + \text{Upper limit}}{2}


3. Measures of Central Tendency

These are single values that represent the center of a data set.

3.1 Mean (Average)

a) Arithmetic Mean for Ungrouped Data

For nn values x1,x2,,xnx_1, x_2, \ldots, x_n:

Mean=xˉ=x1+x2++xnn=i=1nxin\text{Mean} = \bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{\sum_{i=1}^{n} x_i}{n}

Example: Find mean of: 5, 8, 12, 15, 10

xˉ=5+8+12+15+105=505=10\bar{x} = \frac{5 + 8 + 12 + 15 + 10}{5} = \frac{50}{5} = 10

b) Arithmetic Mean for Grouped Data

For grouped data with frequencies:

xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

where xix_i = class mark, fif_i = frequency of i-th class

Example: Find mean from:

Class
Frequency (f)
Class Mark (x)
f × x

0-10

5

5

25

10-20

8

15

120

20-30

12

25

300

30-40

5

35

175

Total

30

620

xˉ=62030=20.67\bar{x} = \frac{620}{30} = 20.67

c) Assumed Mean Method (Shortcut)

For large numbers, use:

xˉ=A+fidifi\bar{x} = A + \frac{\sum f_i d_i}{\sum f_i}

where AA = assumed mean, di=xiAd_i = x_i - A

d) Step Deviation Method

When class intervals are equal:

xˉ=A+h×fiuifi\bar{x} = A + h \times \frac{\sum f_i u_i}{\sum f_i}

where hh = class width, ui=xiAhu_i = \frac{x_i - A}{h}

3.2 Median

The middle value when data is arranged in order.

a) For Ungrouped Data

Step 1: Arrange data in ascending order

Step 2: If nn is odd: Median=(n+12)\text{Median} = \left(\frac{n+1}{2}\right)-th term

If nn is even: Median=(n2)-th term+(n2+1)-th term2\text{Median} = \frac{\left(\frac{n}{2}\right)\text{-th term} + \left(\frac{n}{2}+1\right)\text{-th term}}{2}

Example 1 (odd): 3, 7, 1, 9, 5 → Arrange: 1, 3, 5, 7, 9

n=5n=5 (odd), Median = (5+12)\left(\frac{5+1}{2}\right)-th term = 3rd term = 5

Example 2 (even): 4, 8, 2, 6 → Arrange: 2, 4, 6, 8

n=4n=4 (even), Median = 2nd term+3rd term2=4+62=5\frac{2\text{nd term} + 3\text{rd term}}{2} = \frac{4+6}{2} = 5

b) For Grouped Data

For grouped data with cumulative frequency:

Median=L+(N2Ff)×h\text{Median} = L + \left(\frac{\frac{N}{2} - F}{f}\right) \times h

where:

LL = lower boundary of median class

NN = total frequency

FF = cumulative frequency before median class

ff = frequency of median class

hh = class width

Median Class: First class with cumulative frequency ≥ N2\frac{N}{2}

Example: Find median from:

Class
Frequency (f)
Cumulative Frequency

0-10

5

5

10-20

8

13

20-30

12

25

30-40

5

30

Total

30

N=30N=30, N2=15\frac{N}{2}=15

Median class is 20-30 (first with CF ≥ 15)

L=20L=20, F=13F=13, f=12f=12, h=10h=10

Median=20+(151312)×10=20+(212)×10\text{Median} = 20 + \left(\frac{15 - 13}{12}\right) \times 10 = 20 + \left(\frac{2}{12}\right) \times 10

=20+2012=20+1.67=21.67= 20 + \frac{20}{12} = 20 + 1.67 = 21.67

3.3 Mode

The value that occurs most frequently.

a) For Ungrouped Data

Simply the most frequent value.

Example: 3, 5, 7, 5, 2, 5, 9 → Mode = 5 (appears 3 times)

Note: Data can have no mode, one mode (unimodal), or multiple modes (bimodal, multimodal)

b) For Grouped Data

For grouped data:

Mode=L+(f1f02f1f0f2)×h\text{Mode} = L + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h

where:

LL = lower boundary of modal class

f1f_1 = frequency of modal class

f0f_0 = frequency of class before modal class

f2f_2 = frequency of class after modal class

hh = class width

Modal Class: Class with highest frequency

Example: Find mode from:

Class
Frequency (f)

0-10

5

10-20

8

20-30

12

30-40

5

40-50

3

Modal class is 20-30 (highest frequency 12)

L=20L=20, f1=12f_1=12, f0=8f_0=8, f2=5f_2=5, h=10h=10

Mode=20+(1282×1285)×10=20+(42413)×10\text{Mode} = 20 + \left(\frac{12 - 8}{2\times12 - 8 - 5}\right) \times 10 = 20 + \left(\frac{4}{24 - 13}\right) \times 10

=20+(411)×10=20+4011=20+3.64=23.64= 20 + \left(\frac{4}{11}\right) \times 10 = 20 + \frac{40}{11} = 20 + 3.64 = 23.64

3.4 Relationship between Mean, Median, Mode

For moderately skewed distributions:

Mode=3×Median2×Mean\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}

This is called the empirical relationship.


4. Measures of Dispersion (Variability)

These measure how spread out the data is.

4.1 Range

Simplest measure of dispersion:

Range=Maximum valueMinimum value\text{Range} = \text{Maximum value} - \text{Minimum value}

Limitation: Affected by extreme values, doesn't consider all data

4.2 Mean Deviation

Average of absolute deviations from central value.

a) Mean Deviation about Mean

For ungrouped data:

MD(xˉ)=xixˉn\text{MD}(\bar{x}) = \frac{\sum |x_i - \bar{x}|}{n}

For grouped data:

MD(xˉ)=fixixˉfi\text{MD}(\bar{x}) = \frac{\sum f_i |x_i - \bar{x}|}{\sum f_i}

b) Mean Deviation about Median

For ungrouped data:

MD(Med)=xiMediann\text{MD}(\text{Med}) = \frac{\sum |x_i - \text{Median}|}{n}

For grouped data:

MD(Med)=fixiMedianfi\text{MD}(\text{Med}) = \frac{\sum f_i |x_i - \text{Median}|}{\sum f_i}

c) Mean Deviation about Mode

For ungrouped data:

MD(Mode)=xiModen\text{MD}(\text{Mode}) = \frac{\sum |x_i - \text{Mode}|}{n}

For grouped data:

MD(Mode)=fixiModefi\text{MD}(\text{Mode}) = \frac{\sum f_i |x_i - \text{Mode}|}{\sum f_i}

4.3 Variance and Standard Deviation

Most important measures of dispersion.

a) Variance (σ2\sigma^2 or s2s^2)

Average of squared deviations from mean.

For ungrouped data:

Population Variance: σ2=(xiμ)2N\sigma^2 = \frac{\sum (x_i - \mu)^2}{N}

Sample Variance: s2=(xixˉ)2n1s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}

For grouped data:

Population Variance: σ2=fi(xiμ)2fi\sigma^2 = \frac{\sum f_i (x_i - \mu)^2}{\sum f_i}

Sample Variance: s2=fi(xixˉ)2(fi)1s^2 = \frac{\sum f_i (x_i - \bar{x})^2}{(\sum f_i) - 1}

b) Standard Deviation (σ\sigma or ss)

Square root of variance. More interpretable as it has same units as data.

For ungrouped data:

Population SD: σ=(xiμ)2N\sigma = \sqrt{\frac{\sum (x_i - \mu)^2}{N}}

Sample SD: s=(xixˉ)2n1s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}

For grouped data:

Population SD: σ=fi(xiμ)2fi\sigma = \sqrt{\frac{\sum f_i (x_i - \mu)^2}{\sum f_i}}

Sample SD: s=fi(xixˉ)2(fi)1s = \sqrt{\frac{\sum f_i (x_i - \bar{x})^2}{(\sum f_i) - 1}}

c) Shortcut Formulas for Variance

Direct Method: σ2=fixi2fi(fixifi)2\sigma^2 = \frac{\sum f_i x_i^2}{\sum f_i} - \left(\frac{\sum f_i x_i}{\sum f_i}\right)^2

Step Deviation Method: σ2=h2×[fiui2fi(fiuifi)2]\sigma^2 = h^2 \times \left[\frac{\sum f_i u_i^2}{\sum f_i} - \left(\frac{\sum f_i u_i}{\sum f_i}\right)^2\right]

where ui=xiAhu_i = \frac{x_i - A}{h}

4.4 Coefficient of Variation (CV)

Relative measure of dispersion, expressed as percentage:

CV=Standard DeviationMean×100%\text{CV} = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100\%

Used to compare variability of different data sets.

Lower CV means less variability relative to mean.

Example: Compare two data sets:

Set A: Mean = 50, SD = 5 → CV = 550×100%=10%\frac{5}{50} \times 100\% = 10\%

Set B: Mean = 100, SD = 15 → CV = 15100×100%=15%\frac{15}{100} \times 100\% = 15\%

Set A is more consistent (lower CV).

4.5 Quartiles and Interquartile Range (IQR)

a) Quartiles

Divide data into four equal parts:

Q1 (First Quartile): 25th percentile

Q2 (Second Quartile): 50th percentile (same as median)

Q3 (Third Quartile): 75th percentile

b) For Ungrouped Data

To find Q1: Value at position n+14\frac{n+1}{4}

To find Q3: Value at position 3(n+1)4\frac{3(n+1)}{4}

c) For Grouped Data

Similar to median formula:

Qk=L+(kN4Ff)×hQ_k = L + \left(\frac{\frac{kN}{4} - F}{f}\right) \times h

where k=1,2,3k=1,2,3 for Q1, Q2, Q3

d) Interquartile Range (IQR)

IQR=Q3Q1\text{IQR} = Q_3 - Q_1

Measures spread of middle 50% of data.

e) Quartile Deviation (Semi-IQR)

QD=Q3Q12\text{QD} = \frac{Q_3 - Q_1}{2}

f) Coefficient of Quartile Deviation

Coefficient of QD=Q3Q1Q3+Q1\text{Coefficient of QD} = \frac{Q_3 - Q_1}{Q_3 + Q_1}


5. Graphical Representation of Data

5.1 Bar Graph

For categorical/discrete data. Bars with gaps between them.

Types:

Simple Bar Graph: One variable

Multiple Bar Graph: Compare multiple variables

Component Bar Graph: Shows parts of whole

5.2 Histogram

For continuous grouped data. Bars without gaps.

Area of bars represents frequency.

Key Points:

Classes must be continuous

If class intervals are unequal, adjust heights

5.3 Frequency Polygon

Line graph connecting midpoints of tops of histogram bars.

To draw: Plot points (class mark, frequency) and connect them.

5.4 Ogive (Cumulative Frequency Curve)

Graph of cumulative frequency.

Types:

Less than Ogive: Plot upper limits vs cumulative frequency (rising curve)

More than Ogive: Plot lower limits vs cumulative frequency (falling curve)

Median from Ogive: Intersection point of less than and more than ogives gives median

5.5 Pie Chart (Circle Graph)

Shows proportions as sectors of a circle.

Angle for each category: Angle=FrequencyTotal frequency×360\text{Angle} = \frac{\text{Frequency}}{\text{Total frequency}} \times 360^\circ

5.6 Box Plot (Box-and-Whisker Plot)

Shows five-number summary: Minimum, Q1, Median, Q3, Maximum

Construction:

Draw box from Q1 to Q3

Draw line inside box at median

Draw whiskers to min and max (or to 1.5×IQR for outliers)


6. Correlation and Regression

6.1 Correlation

Measures strength and direction of linear relationship between two variables.

a) Types of Correlation

Positive Correlation: Both variables increase together

Negative Correlation: One increases, other decreases

No Correlation: No relationship

Perfect Correlation: All points lie on straight line

b) Karl Pearson's Correlation Coefficient (r)

Measures linear correlation:

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \cdot \sum (y_i - \bar{y})^2}}

Shortcut formula:

r=nxy(x)(y)[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

Properties of r:

1r1-1 \leq r \leq 1

r=1r=1: Perfect positive correlation

r=1r=-1: Perfect negative correlation

r=0r=0: No linear correlation

c) Spearman's Rank Correlation Coefficient (ρ)

For ranked data or non-linear relationships:

ρ=16di2n(n21)\rho = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)}

where did_i = difference in ranks

If ties exist: Use formula with adjustments

6.2 Regression

Finds relationship to predict one variable from another.

a) Regression Lines

Line of regression of y on x: Predicts y from x

yyˉ=rσyσx(xxˉ)y - \bar{y} = r\frac{\sigma_y}{\sigma_x}(x - \bar{x})

Line of regression of x on y: Predicts x from y

xxˉ=rσxσy(yyˉ)x - \bar{x} = r\frac{\sigma_x}{\sigma_y}(y - \bar{y})

b) Regression Coefficients

Regression coefficient of y on x: byx=rσyσxb_{yx} = r\frac{\sigma_y}{\sigma_x}

Regression coefficient of x on y: bxy=rσxσyb_{xy} = r\frac{\sigma_x}{\sigma_y}

Note: byx×bxy=r2b_{yx} \times b_{xy} = r^2

c) Angle Between Regression Lines

If θ\theta is angle between two regression lines:

tanθ=1r2rσxσyσx2+σy2\tan\theta = \frac{1 - r^2}{|r|} \cdot \frac{\sigma_x \sigma_y}{\sigma_x^2 + \sigma_y^2}

Special Cases:

If r=0r=0: Lines are perpendicular

If r=±1r=\pm1: Lines coincide (angle = 0)


7. Probability Basics (for Statistics)

7.1 Basic Concepts

Probability: Measure of likelihood of an event (0 to 1)

Sample Space (S): Set of all possible outcomes

Event (E): Subset of sample space

7.2 Probability Formulas

Classical Probability: P(E)=Number of favorable outcomesTotal number of outcomesP(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}

Addition Rule: P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)

For mutually exclusive events: P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)

Complement Rule: P(A)=1P(A)P(A') = 1 - P(A)

Multiplication Rule: For independent events: P(AB)=P(A)×P(B)P(A \cap B) = P(A) \times P(B)

Conditional Probability: P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}


8. Random Variables and Probability Distributions

8.1 Random Variable

Variable whose values depend on outcomes of random experiment.

Discrete Random Variable: Countable values

Continuous Random Variable: Measurable values

8.2 Probability Distribution

For discrete random variable X with values x1,x2,x_1, x_2, \ldots and probabilities p1,p2,p_1, p_2, \ldots:

Conditions: 0pi10 \leq p_i \leq 1 and pi=1\sum p_i = 1

8.3 Mean (Expected Value) of Discrete Random Variable

μ=E(X)=xipi\mu = E(X) = \sum x_i p_i

8.4 Variance of Discrete Random Variable

σ2=E(X2)[E(X)]2=xi2pi(xipi)2\sigma^2 = E(X^2) - [E(X)]^2 = \sum x_i^2 p_i - (\sum x_i p_i)^2

8.5 Standard Deviation

σ=Variance\sigma = \sqrt{\text{Variance}}

8.6 Binomial Distribution

For experiments with:

  • Fixed number of trials (n)

  • Two outcomes (success/failure)

  • Constant probability of success (p)

  • Independent trials

Probability Mass Function:

P(X=r)=(nr)pr(1p)nrP(X = r) = \binom{n}{r} p^r (1-p)^{n-r} for r=0,1,2,,nr=0,1,2,\ldots,n

where (nr)=n!r!(nr)!\binom{n}{r} = \frac{n!}{r!(n-r)!}

Mean: μ=np\mu = np

Variance: σ2=np(1p)\sigma^2 = np(1-p)

8.7 Normal Distribution

Most important continuous distribution (bell curve).

Properties:

Bell-shaped, symmetric about mean

Mean = median = mode

Total area under curve = 1

Standard Normal Distribution: Mean = 0, SD = 1

Z-score: z=xμσz = \frac{x - \mu}{\sigma}


9. Solved Examples

Example 1: Find Mean, Median, Mode

Data: 12, 15, 18, 12, 20, 15, 12, 25, 18

Solution:

Mean: xˉ=12+15+18+12+20+15+12+25+189=1479=16.33\bar{x} = \frac{12+15+18+12+20+15+12+25+18}{9} = \frac{147}{9} = 16.33

Median: Arrange: 12, 12, 12, 15, 15, 18, 18, 20, 25

n=9n=9 (odd), Median = 9+12=5\frac{9+1}{2}=5-th term = 15

Mode: 12 (appears 3 times, most frequent)

Example 2: Grouped Data Calculations

Given:

Class
Frequency

0-10

5

10-20

8

20-30

12

30-40

7

40-50

3

Find mean, median, mode, standard deviation.

Solution:

First prepare table:

Class
f
x (mid)
fx
fx²
CF

0-10

5

5

25

125

5

10-20

8

15

120

1800

13

20-30

12

25

300

7500

25

30-40

7

35

245

8575

32

40-50

3

45

135

6075

35

Total

35

825

24075

Mean: xˉ=82535=23.57\bar{x} = \frac{825}{35} = 23.57

Median: N=35N=35, N2=17.5\frac{N}{2}=17.5

Median class: 20-30 (CF reaches 25 at this class)

L=20L=20, F=13F=13, f=12f=12, h=10h=10

Median = 20+(17.51312)×10=20+4512=23.7520 + \left(\frac{17.5-13}{12}\right) \times 10 = 20 + \frac{45}{12} = 23.75

Mode: Modal class: 20-30 (highest f=12)

L=20L=20, f1=12f_1=12, f0=8f_0=8, f2=7f_2=7, h=10h=10

Mode = 20+(1282487)×10=20+49×10=24.4420 + \left(\frac{12-8}{24-8-7}\right) \times 10 = 20 + \frac{4}{9} \times 10 = 24.44

Variance: σ2=fx2f(fxf)2\sigma^2 = \frac{\sum fx^2}{\sum f} - \left(\frac{\sum fx}{\sum f}\right)^2

=2407535(23.57)2=687.86555.66=132.2= \frac{24075}{35} - (23.57)^2 = 687.86 - 555.66 = 132.2

Standard Deviation: σ=132.2=11.5\sigma = \sqrt{132.2} = 11.5

Example 3: Correlation Calculation

Find correlation coefficient for:

x
y

1

2

2

4

3

5

4

4

5

6

Solution:

Prepare table:

x
y
xy

1

2

2

1

4

2

4

8

4

16

3

5

15

9

25

4

4

16

16

16

5

6

30

25

36

15

21

71

55

97

n=5n=5, x=15\sum x=15, y=21\sum y=21, xy=71\sum xy=71, x2=55\sum x^2=55, y2=97\sum y^2=97

r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

=5×7115×21[5×55152][5×97212]= \frac{5\times71 - 15\times21}{\sqrt{[5\times55 - 15^2][5\times97 - 21^2]}}

=355315[275225][485441]=4050×44=402200=4046.9=0.853= \frac{355 - 315}{\sqrt{[275-225][485-441]}} = \frac{40}{\sqrt{50 \times 44}} = \frac{40}{\sqrt{2200}} = \frac{40}{46.9} = 0.853

Strong positive correlation.


10. Important Formulas Summary

10.1 Measures of Central Tendency

Mean (ungrouped): xˉ=xin\bar{x} = \frac{\sum x_i}{n}

Mean (grouped): xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

Median (grouped): L+(N2Ff)×hL + \left(\frac{\frac{N}{2} - F}{f}\right) \times h

Mode (grouped): L+(f1f02f1f0f2)×hL + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h

Empirical Relation: Mode = 3×Median - 2×Mean

10.2 Measures of Dispersion

Range: Max - Min

Variance: σ2=fixi2fi(fixifi)2\sigma^2 = \frac{\sum f_i x_i^2}{\sum f_i} - \left(\frac{\sum f_i x_i}{\sum f_i}\right)^2

Standard Deviation: σ=Variance\sigma = \sqrt{\text{Variance}}

Coefficient of Variation: CV=σxˉ×100%\text{CV} = \frac{\sigma}{\bar{x}} \times 100\%

Quartiles: Qk=L+(kN4Ff)×hQ_k = L + \left(\frac{\frac{kN}{4} - F}{f}\right) \times h

IQR: Q3Q1Q_3 - Q_1

10.3 Correlation and Regression

Correlation Coefficient:

r=nxyxy[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}

Regression Line (y on x): yyˉ=rσyσx(xxˉ)y - \bar{y} = r\frac{\sigma_y}{\sigma_x}(x - \bar{x})

Regression Coefficients: byx=rσyσxb_{yx} = r\frac{\sigma_y}{\sigma_x}, bxy=rσxσyb_{xy} = r\frac{\sigma_x}{\sigma_y}

Relation: byx×bxy=r2b_{yx} \times b_{xy} = r^2

10.4 Probability Distributions

Binomial: P(X=r)=(nr)pr(1p)nrP(X=r) = \binom{n}{r} p^r (1-p)^{n-r}

Mean of Binomial: npnp

Variance of Binomial: np(1p)np(1-p)


11. Exam Tips and Common Mistakes

11.1 Common Mistakes to Avoid

  1. Using wrong formula for grouped vs ungrouped data

  2. Confusing population vs sample formulas (divide by n vs n-1 for variance)

  3. Forgetting to arrange data before finding median

  4. Incorrect class boundaries for grouped data

  5. Misinterpreting correlation coefficient (correlation ≠ causation)

11.2 Problem-Solving Strategy

  1. Identify data type: Ungrouped or grouped? Discrete or continuous?

  2. Choose correct formulas based on what's asked

  3. Create tables for organized calculations (especially for grouped data)

  4. Show all steps clearly

  5. Include units in final answer

11.3 Quick Checks

  1. Mean, median, mode relationship: For symmetric data: Mean = Median = Mode

  2. Standard deviation: Always non-negative

  3. Correlation coefficient: Between -1 and 1

  4. Probability: Between 0 and 1

  5. Variance formulas: Population: divide by N, Sample: divide by n-1

This comprehensive theory covers all aspects of statistics with detailed explanations and examples, making it easy to understand while being thorough enough for exam preparation.