Latex
Could you rewrite the presentation using Latex?
LECTURE 1 (3)
STA 811
STATISTICAL INFERENCE
By
Dr. Argwings Otieno
Department Of Mathematical Statistics
19th January 2021
INTRODUCTION
◼ A variable X follows a particular distribution◼ Distribution represented by ƒ(x,θ)◼ Where θ∈Θ.. in unknown◼ θ is called a parameter◼ Θ.. is called parameter space◼ Problem: Estimation of θ or τ(θ)◼ Where τ(θ) is some function◼ Estimation based on sample (X1, X2, X3, ………., Xn)◼ Two approaches to estimation
1. Point estimation: Gives a single value obtained from a specific estimator δ(X1, X2, X3, … … …, Xn)
2. Interval estimation: Construct an interval defined by two statistics δ1(X1, X2, X3, … … …, Xn)
and δ2(X1, X2, X3, … … …, Xn) where δ1 < δ2■ Then θ or τ(θ) will fall within interval with some specified probability■ Some properties of good estimators: 1. Unbiased Estimator θ on average estimates true parameter θ ϵ(θ) = θ, ∀ θ 2. Sufficiency: Estimator has all the information about parameter
3. Consistency: As sample becomes large estimator tends to true parameter
Simple consistency ϵ > 0
Limn→∞ θ – ϵ < θn < θ + ϵ = 1, for, θ ∈ Θ..
◼ Mean squared error Consistent
Limn→∞ ϵθn – θ2 < θ + ϵ = 1, for, θ ∈ Θ..
◼ Note◼ MSE consistent imply simple consistency
◼ θn is MSE consistent if
1. θn is unbiased
2. θn is simple consistent◼ Other properties◼ Minimum variance unbiased estimator◼ Best Asymptotically normal
For ϵ > 0 Lim n→∞ √n θn – θ ∼ N.. 0, σ2(θ)
◼ Class of unbiased estimators is infinite◼ Example■ Let (X1, X2, … … …, Xn) be a sample from ƒ(x, θ)■ If ϵ(Xι) = θ, ∀ θ the Xι is unbiased estimator
■ Also X– = ϵ ∑n ι=1 Xι
n
is unbiased estimator
ϵ(X– ) = ϵ ∑n ι=1 Xι
n
= ∑n ι=1 ϵ(Xι )
n
= θ
■ In general ∑n ι=1 κιXι
n
is unbiased estimator if
■ ϵ ∑n ι=1 κι Xι = ∑n ι=1 κι ϵ(Xι) = ∑n ι=1 κι θ = θ, ∀ θ■ This happens if ∑n ι=1 κι = 1■ Hence unbiased estimators are infinite.■ We need further restriction on the class of unbiased estimators.◼ Minimum Variance Unbiased Estimator MVUE■ An estimator δ(X1, X2, ………., Xn) in MVUE if
1. ϵδ(X1, X2, … … … …., Xn) = θ ∀ θ
2. Varδ(X1, X2, … …., Xn)] is minimum◼ Likelihood Function■ If (X1, X2, … …., Xn)is a random sample from ƒ(x,θ)
■ Likelihood function L.. (θ] =∏n ƒ(xi, θ)■ Sufficient Statistic θ = T.. (X1, X2, ……….., Xn)■ If L.. = θ, θ (X1, X2, … … …, Xn)
■ Where ((X1, X2, … … …, Xn)is a function of X1, X2, … … …, Xn
2 Statistical Inference.nb
■ Or a Constant
Statistical Inference.nb 3
LECTURE 1(9)
INTRODUCTION TO
REGRESSION ANALYSIS
COURSE LECTURER: DR. JULIUS K. KOECH, PHD
DEPARTMENT: MATHEMATICS & COMPUTER SCIENCE
■ In correlation, the two variables are treated as equals■ In regression analysis, one variable is considered as the independent, predictor, covariate,
denoted by (X) and the other dependent, outcome, response variable and is usually denoted by Y.
Regression: Overview
Basic Idea: Use data to identify relationships between variables, and use these relationships to
make predictions.
What is “Linear”?
■ Remember this:■ γ = mχ + ?
5 10 15
20
5
10
15
(functional form of a model with a line of best fit)
4 Statistical Inference.nb
■ The goal of regression analysis is to express the response/outcome variable as a function of the
predictor variables■ Hence non-representative or improperly collected data result in a poor fit and wrong conclusions
Effective Use of Regression Data
Thus, for effective use of regression analysis one must;■ Investigate the data collection process and ensure good data management plan■ Identify any limitations in data collected■ Restrict conclusions accordingly i.e discuss only important findings
Use of Regression Analysis■ Making Prediction■ Model Specification■ Parameter Estimation
The Linear Regression Model■ A linear regression model model expresses the conditional expectation of γ given Χ, ϵ(γ|Χ = χ),
as a linear function of Χ.■ Each sample observation is assumed to be generated by an underlying process described by the
linear model.■ We want to find the best line, (linear function) γ = ƒ(χ) to explain the data
5 10 15 20 25 30
5
10
15
20
(data and line of best fit)
Linear Regression Model
Relationship Between Variables is a linear function
Statistical Inference.nb 5
γ = β0 + β1 χ + ϵ where :γ is the Dependent(Resposnse Variableβ0 is the population γ – interceptβ1 is the population slopeχ is the Independent(Explanatory)Variableϵ is the Random Errror
Important questions in Regression Analysis● What is the association between γ and χ?● How can changes in γ be explained by changes in χ?● What are the functional relationships between γ and χ?
A functional relationship is symbolically written as:γ = ƒ (χ)
-2 -1 1 2
–
1.0
–
0.5
0.5
1.0
1.5
2.0
(Piecewise function of ƒ(χ) =
χ Log(χ) with a Tangent approaching Infinity at zero)
γ = 1 Χ where 1 is the SLOPE of the line
Example: Linear Relationshipγ = 0 + 1 χ where 0 is the intercept,1 is the slope◼ Concerns: ◼ The proposed functional relationship will not fit exactly, i.e something is either wrong with the data
(errors in measurement), or the model is adequate(errors in specification).◼ The relationship is not truly known until we assign values to the parameters of the model.
The possibility of errors into the proposed relationship is acknowledged in the functional symbolism
as follows:γ = ƒ(χ) + ϵ where ϵ is a random variable representing the result of both errors in model specification
6 Statistical Inference.nb
and measurement. The variance of ϵ is the background variability with respect to which we will asses
the significance of the factors (explanatory variables).
Linear Regression model with two or more covariatesγι = β0 + β1 χι+ ϵι for ι = 1, 2, … ….,
γι= α + β1 χ1 + β2 χ2 + ϵ
Regression Estimation assuming Binary and Categorical Variables
1. γι = α + β1age + β2gender male = 1female = 0
2. γι = α + β1 age + β2 education primary = 1secondary = 2
university = 3
3. γι = α + β1 + β2gender male = 1female = 0 + β3 education
1 = none
2 = none
3 = none
4 = none
Interpreting Regression Coefficients● In the equation (1) above, assume
α = 2.35, β1 = 5.75 and β3= 8.20
Interpreting Linear coefficients ● Fix the values of parameters in equation (1) and discuss the effect of these covariates on the
outcome
Subscript[γ, ι] = 2.35 * 5.75 age + 8.20 gender 1 = male0 = female
Effects of Covariates on the response variable (assume γι in salary in US Dollars
● What is the effect of age on the response variable salary?
Linear Regression
The predicted value of γ is given by:
γ = β0 + ∑ j=1 χj + βj
The vector of coefficients β is the regression model.
If χ0 = 1, the formula becomes a matric product : γ = χ β
The error term:
Statistical Inference.nb 7
Another way to emphasize, ϵ = γ – ƒ(χ) or emphasizing that ƒ(χ) depends on unknown parameters.
γ = ƒ( χ | β0, β1) + ϵ
What if we don’t know the function form of the relationship?
Parameter estimates are given by the following equations:
β1 = xyxx
β0 = – β χ
β1 = xyxx
Significance testing in linear regression
H
.
. 0 = β1 – β2 vs H.. 1 = β1 ≠ β2
Steps in Regression Analysis■ Examine the scatter plot of the data.■ Does the relationship look linear?■ Are there points in locations they shouldn’t be?■ Do we need a transformation?■ Assuming a linear function looks appropriate, estimate the regression parameters.■ How do we do this? [Use Method of Least Squares]■ Test whether there really is a statistically significant linear relationship. Just because we assumed
a linear function it does not follow that the data support this assumption.■ How do we test this?[F-Test for Variances]■ If there is a significant linear relationship, estimate the response, γ, for the given values of χ, and
compute the residuals.■ Examine the residuals for systematic inadequacies in the linear model as fit to the data.■ Is there evidence that a more complicated relationship (say a polynomial) should be considered;
are there problems with the regression assumptions? (Residual analysis).■ Are there specific data points which do not seem to follow the proposed relationship?
Linear Model Specification■ Identify the dependent and the independent variables of interest.■ The dependent variable is typically an outcome, such as wages earned or attendance at a post
secondary institution.■ The independent variables are factors known to affect the outcome variable of interest.■ Specifying a regression model involves selecting a dependent variable and the related
independent variables.
8 Statistical Inference.nb
■ The type of dependent variable determines the type of regression, which is either a linear or
logistic regression model.
THANK YOU
Statistical Inference.nb 9
LECTURE 1(10)
STA 814
MULTIVARIATE ANALYSIS
By Dr. Arwings Otieno
2020/2021
MOTIVATION
◼ Univariate : A single variable is considered● Examples :● Pupils score in a subject● Height of individuals● Body Mass Index (BMI)● Weight of babies at birth● Blood Pressure
BIVARIATE DATA
■ Bivariate means to 2 variables■ Examples■ (Age, Weight) for babies under 5 babies■ (Age, Height), for babies under 5 years■ (Wife, Husband) ages■ Notation: Variables (χ, γ)■ Values : (x,y)■ Notation : Joint pdf or pmf
ƒχ, γ(x,y)
10 Statistical Inference.nb
■ Parameters : ϵ(χ) = μχ, ϵ(γ) = μ γ, Var(χ) = σχ2, Var(γ) = σγ2
Covariance : Cov(χ,γ) = σχ,γ
= ϵ(χ – μχ)(γ – μγ)
■ Correlation between χ and γ
ρχ,γ = Cov(χ,γ)√Var (χ)Var(γ)= σχγσχ σγ
-1 ≤ ρχγ ≤ 1
χ = χγ ’ ϵ(χ) = ϵ χγ = μχμγ
Cov(χ) = Cov χγ
= Var (χ) Cov (χ, γ)Cov (γ, χ) Varγ
= σχ2 σχγσχγ σγ2 = σχ2 ρχγ σχ σγρχγ σχ σγ σγ2
Basic Matrix Results
■ Matrix, m*n array of elements in rows, columns■ Also m*n = (αy),ι = 1,2,………,, = 1,2,………,■ Transpose of is denoted ’ ■ Column Vector: χ*1 row vector χ ‘ = (χ1, χ2,………….., χ)
χ^ ‘ χ^ = ∑ ι=1 χι2
n*m m*r = ∑mj=1 aij bjk,ι = 1,2,………..,, = 1,2,……….,■ ≠ ■ If ,,ℭ are matrices with row and columns compatible, then:
()ℭ = (ℭ)
(+ℭ) = + ℭ
(+)ℭ = ℭ + ℭ
■ Also : ’’ =
Statistical Inference.nb 11
( +)’ = ’ + ’
()’ = ’ ’
■ Matrix n*n is non – singular if | ≠ 0■ The Inverse Α-1 of matrix is such that :
ΑΑ-1 = Α-1 Α = ℑ
■ Trace of a square matrix Α* is defined as (Α) = ∑ι=1n αιι
■ Properties: (Α+Β) = (Α) + (Β) and (ΑΒ) = (ΒΑ)
■ If Α* is diagonal, then | Α | = ∏ ι=1 αιι■ A square matrix is Τ* Upper Triangular■ If ij = 0, i > j ■ Lower Triangular■ If ij = 0, i < j ■ If Α* is symmetric then Α = Τ Τ'■ For a matrix Αn*n■ Rank ℛ(Α) = minimum independent(row, col)■ Rank Efficient if ℛ(Α) = < min(,)■ Characteristic equation for Α* is | Α -λℐ | = 0■ Is a polynomial of degree in λ■ Solutions λ1, λ2, ... ... ... .., λ are EIGENVALUES or characteristic roots■ For given characteristic root λ0, the solution χ to the homogeneous equation is (Α - λ0 ℐ) χ = 0
which is called an eigenvector■ A matrix Αn*n is ORTHOGONAL if Α-1 = Α’■ For a symmetric matrix Αn*n
■ Quadratic form χ ‘( Αχ) = ∑ ι=1 αιι χι2 + 2 ∑ι<1 αιj χιχj■ A symmetric matrix Α* is :■ Positive definite if χ ' Αχ > 0, ∀ χ ≠ 0■ Negative definite if χ ‘ Αχ < 0, ∀ χ ≠ 0■ Positive semi-definite if χ ' Αχ ≥ 0■ Negative semi-definite if χ ' Αχ ≤ 0
12 Statistical Inference.nb
LECTURE 2
LEAST SQUARES METHOD OF
ESTIMATION
By Dr. Julius Koech
Department of Mathematics and Computer Science
Overview of The Regression Model
■ Regression Model estimates the nature of the relationship between the dependent/outcome and
independent/predictor variables■ Looks at the effect of change on the dependent variable as a result of the changes in the
measured covariates■ Strength of the relationship■ Statistical significance of the relationship
The Bivariate and Multivariate Models
(Education) χ —————————————————–> γ (Income)
(Education ) χ1 —————————————————–> γ
(Sex) χ2 —————————————————–> γ Income}
(Years Of Experience) χ3 —————————————————–> γ
(Age) χ4 —————————————————–> γ
Causation is not association !!
Price of Rice <----------------------------------------------------------> Quantity of Rice Produced
Regression Line ■ The regression model is γ = β0 + β1 χ + ϵ■ Data about x and y are obtained from a sample.■ From the sample of values of x and y, estimates b0 of β0 and b1 of β1 are obtained using the least
squares or another method.■ The resulting estimate of the model is y = b0 + b1χ
Statistical Inference.nb 13
■ The symbol y is termed “y hat” and refers to the predicted values of the dependent variable y that
are associated with the values of x.
Uses of Regression■ Amount of change change in a dependent variable that results from changes in the independent
variable(s), can be used to estimate elasticities, returns on investment in human capital, etc.■ Attempt to determine causes of phenomena■ Prediction and forecasting of sales, economic growth, etc.■ Inform policy through use of improved theoretical models.
Challenge with determining the line of best fit
How would you draw a line through the points? How do you determine which line “fits best”?■ The line of “Best Fit” means difference between actual Y values and predicted Y values are a
Minimum. This implies variability when using linear estimation.■ Estimating the error term when using the method of least squares is given the equation : ■ The method of Least Square minimizes the sum of the squared differences errors (SSE=ϵ]
∑ι=1 [Yι – Y]2 = ∑ι=1 ϵι2■ The general form of the LPM is :■ Yι = B1 + B2 X2 ι + B3 X3 ι + ……….. + Bk Xkι + uι■ Or, the above equation can be written in matrix form as Y = χB + uι where B = χΤ χ-1 χΤY , B = bm and uι is the sum of squared errors■ Yι = BX + uι where Y is the outcome variable, X is a vector of predictors ( sometimes referred to
as the the design matrix), and is the error term.■ B1 is the intercept■ B1 to Bk are the slope coefficients.■ Collectively, they are the regression coefficient or regression parameters.■ Each slope coefficient measures the (partial) rate of change in the MEAN VALUE of Y for a unit
change in the value of the covariate.
Linear Model in Matrix Form( Illustration with a sample data set)
ID Age (x1) Gender (x2) Dis_km (x3) Wt (y)
1 18 1 5 50
2 20 0 6 60
3 25 1 2 70
■ Gender : gender of respondent, 1= Male, 0 = Female■ Age: Age in years
14 Statistical Inference.nb
■ Dist_Km : Distance in Kilometers■ Weight : Weight of respondent in Kg. This is also the response variable of interest.
Linear Model Illustration
Y =
y1
y2
y3
=
50
60
70
X =
1 18 1 5
1 20 0 6
1 25 1 2
β = β0β1β2β3
Familiarize yourself with finding the inverse of an n*p matrix (data structure).
If A = a bc d then A-1 = 1ad – bc d -bc – a ■ During your free time read on this concept!!■ Matrix Transpose ■ Can you try this assignment 1( Will give a window of two weeks for submission)■ We will later learn how to use R to solve for parameter values in a linear model.
The General Equation for a Linear Model■ The general linear model in matrix form for estimating coefficients with method of least squares is
given by the following equation :θ = χΤ χ-1 χΤY
Thank You All!!
Statistical Inference.nb 15
LECTURE 2A
Sufficient Statistics
■ Factorization criterion■ Let X1, X2, … … …, Xn be ranDOM SAMPLE FROM ƒ(χ,θ) ■ Then Statistic = (X1, X2, … … …, Xn) is sufficient iff
ƒ(X1, X2, … … …, Xn;θ) = ((X1, X2, … … …, Xn), θ)ℊ((X1, X2, … … …, Xn)
= (, θ)ℊ((X1, X2, … … …, Xn)
■ Remark: If = (X1, X2, … … …, Xn) is sufficient for θ then it is also sufficient for (θ) where (.) is
a one-to-one mapping■ Jointly sufficient Statistic■ Let X1, X2, … … …, Xn be a sample from■ pdf ƒ(χ, θ1, θ2, … … … .., θr)■ Where θι ι= 1,2,…………, are unknown parameters■ Then
1 = 1(X1, X2, … … …, Xn), 2 = 1(X1, X2, … … …, Xn),………., r= r(X1, X2, … … …, Xn)
are JOINTLY SUFFICIENT for θ1, θ2, … … … .., θr iff
ƒ(X1, X2, … … …, Xn; θ1, θ r) = (1(X1, X2, … … …, Xn), 2(X1, X2, … … …, Xn), …………,
r(X1, X2, … … …, Xn);θ1, θ2,………,θ r)ℊ(X1, X2, … … …, Xn)
= (1,2,……….., r , θ1, θ2, … … … .., θr)ℊ(X1, X2, … … …, Xn)
■ Example 1■ Let X1, X2, … … …, Xn be a random sample from Bernoulli()■ Show that :
(1). ∑ι=1χι and (2). χ = ∑ι=1 χι are sufficient for .■ 1) Solution :
■ Likelihood Function L(;X1, X2, … … …, Xn) = ∏ ι=1 ƒ(χι,)
= [∏ ι=1 χι (1 – ) 1-χι
= ∑ι=1 χι (1 – ) – ∑ι=1 χι
= (1-) –
16 Statistical Inference.nb
= ƒ(,)ℊ(X1, X2, … … …, Xn)■ Hence = ∑χι is SUFFICIENT for
■ 2) Solution :
■ Write ∑ι=1 χι = χ ■ Then :
L(; X1, X2, … … …, Xn) = χ(1-) – χ
= ƒ(χ,)ℊ(X1, X2, … … …, Xn)
■ Example : Let (X1, X2, … … …, Xn) ∼ Ν(μ, σ2■ Then :
ƒ (X1, X2, … … …, Xn; μ, σ2 = ∏ ι=1 ƒ(χι; μ, σ2
= (∏ ι=1) 1√2πσ2 exp -(χι- μ) 22σ2
■ Show that : ∑ ι=1 (χι – χ ) 2 are JOINTLY SUFFICIENT for μ, σ2)
■ If
ƒ(X1, X2, … … …, Xn; θ1, θ2,………,θ r) = (1(X1, X2, … … …, Xn), (2(X1, X2, … … …, Xn),
r(X1, X2, … … …, Xn); θ1, θ2,………,θ r)ℊ(X1, X2, … … …, Xn)
= 1(1,θ1) 2(2, θ2) … … r(r , θr)ℊ(X1, X2, … … …, Xn)
■ Then ι is SUFFICIENT for θι , ι = 1, 2, … … …, r
Complete Statistics■ A statistic = (X1, X2, … … …, Xn) is said to be COMPLETE if :
ϵ[ψ()] = 0 ∀ θ
⟹ ψ() = 0 ∀ ■ Except possibly for a set for which the probability measure is 0 ∀ θ
Completeness■ Example :
χ ∼ Bernoulli()
ƒ(χ;) = χ (1 – ) 1-χ, χ = 0.1
Statistical Inference.nb 17
Likelihood
L(X1, X2, … … …, Xn; ) = [∏ ι=1 χι (1 – ) 1-χι = χ(1-) – χ
■ Note ∑ ι=1 χι = χ is sufficient for
⟹ = χ is also sufficient for
■ To show that = ∑ ι=1 χι is complete :
ϵ[ψ()] = ∑ψ() ƒ(,)
= ∑ ι=1ψ() (1 – ) –
= ∑ ι=0 α() (1 – ) – , where α() = ψ()
= 0 ∀
■ Thus α() = 0 for = 0,1,2,……..,
⟹ ψ() = 0 ∀
⟹ ψ() = 0 ∀ , ∵ ≠ 0
■ Hence is COMPLETE
Lehmann-Scheffé Theorem■ Let χ have a pdf ƒ(χ;θ) and = (X1, X2, … … …, Xn) be a SUFFICIENT STATISTIC for θ.
Suppose is also complete. Then every estimable function ℊ(θ) possess an unbiased estimator
with UNIFORMLY MINIMUM VARIANCE (UMVUE)■ That is, If is sufficient and complete for θ and ϵ[()] = ℊ(θ)■ Then () is UMVUE■ Example : χ ∼ Bernoulli() :
= ∑ ι=1 χι is sufficient for
Also is complete
■ ϵ = () = ∀ , That is, Unbiased■ It follows that () = χ■ UMVUE
18 Statistical Inference.nb
Maximum Likelihood
Let X have pdf ƒ(χ,θ) and X1, X2, … … …, Xn be a random sample
Define : Likelihood Function
L(X1, X2, … … …, Xn; θ) = [∏ ι=1 ƒ(χι, θ)
Choose θ such that L(χ; θ) is maximum for fixed (X1, X2, … … …, Xn)
Also for more than one parameter
L(X1, X2, … … …, Xn; θ1 θ2,…….., θκ) =[∏ ι=1 ƒ(χι;θ1 θ2,…….., θ)
Choose (θ1 θ2,…….., θκ ) such that L(χ;θ) is maximum
Maximizing the likelihood is same as maximizing log-likelihood■ Log-Likelihood (θ) = log[L(χ, θ)]■ Example : (X1, X2, … … …, Xn) is a sample from Ν(μ, σ2
■ Likelihood L(μ, σ2 = [∏ ι=1 ƒχι; μ, σ2]
= (( 12πσ2 ) 2 exp ( -12σ2 ) ∑ ι=1 (χι – μ)2)
■ (μ, σ2) = -n2 ln 2πσ2) – ( 12πσ2 ) ∑ ι=1 (χι – μ)2
∂∂μ= 0 ⟹ ∑ ι=1 (χι – μ) = 0, ⟹ μ = χ
∂∂σ2 = 0, ⟹ ( -2πσ2 ) + ∑ ι=1 χι-μ22σ2 = 0, ⟹ σ2 = ∑ ι=1 χι-μ2 = ∑ ι=1 χι- χ2
■ Therefore the maximum likelihood estimates are
μ = χ, σ2 = ∑ ι=1 χι- χ2
However, μ = χ is UNBIASED while
σ2 = ∑ ι=1 χι-χ2 = BIASED
Conclusion
■ MLE are not necessarily unbiased■ Exercise : Find MLE of θ if ƒ(χ,θ) = 1θ , for 0 < χ < θ■ Show θ = max(X1, X2, ... ... ..., Xn) is the MLE biased
Statistical Inference.nb 19
θ = (+1) max(X1, X2, … … …, Xn)
Properties of MLE
■ 1. They are simple consistent and mean squared error consistent■ 2. They are functions of sufficient statistics■ 3. They have properties of invariance■ 4. They are asymptotically efficient and Best Asymptotic Normal(BAN) estimates
Invariance Property■ If χ is ƒ(χ,θ), (X1, X2, … … …, Xn) is a sample■ θ is MLE of θ ■ Then if ψ(θ) is a single values function of θ
■ Then ψ(θ) is the MLE of ψ(θ)■ Example : (X1, X2, … … …, Xn) is Poisson(θ) , Find MLE of Pr(χ = 0).
■ Solution : ƒ(χ,θ) = ⅇ-θ θχχ! , χ = 0,1,2,3,………..■ Then MLE θ = χ■ But Pr(χ = 0) = ⅇ-θ which is a single – valued function of θ■ Hence by invariance property: mle(Pr(χ = 0)) = ⅇ-χ■ Show that MLE are functions of sufficient statistics
■ L(χ,θ) = ∏ ι=1 ƒ(χι, θ)
= ((χ, θ))κ(X1, X2, … … …, Xn),, sufficient
■ Log-Likelihood :
(θ) = ln(((χ, θ)) + ln(κ(X1, X2, … … …, Xn)
Solving……..
∂∂θ = 0, ⟹ ∂ ln χ,θ∂θ = 0
Solution……..
θ = ƒ()
Hence MLE are necessarily functions of sufficient statistics
■ Asymptotic normality
■ The MLE θ1, θ2,…….., θκ) for the parameters of the density
■ ƒ(χ; θ1, θ2,…….., θκ) From a sample size of are for large sample approximately distributed as
multivariate normal with means θ1, θ2,…….., θκ■ And matrix R) in the quadratic form where
20 Statistical Inference.nb
R = (ι)
(ι) = -ϵ[ ∂2logƒ(χ,θ1,θ2,……..,θκ)∂θι ∂θθ ]
■ The variance-Covariance matrix is 1 R -1■ Note ƒθ1, θ2,…….., θκ) = 1
2π k2 (R)-1 12 exp
-1
2 (θ – θ) ‘ (R)(θ – θ)
Statistical Inference.nb 21
LECTURE 4
INTRODUCTION TO
ESTIMABILITY
COURSE LECTURER: DR.JULIUS KOECH
REVIEW: Ordinary Least Squares Approach■ Find β that minimize |γ – χβ 2 = ϵ ϵ
■ The Ordinary Least Square Estimates are β = χ χ)-1 χ γ■ Under assumptions, the Ordinary Least Square estimates are Maximum Likelihood.
ϵ ∼ N0, σ2 ℐ for ⟹ γ ∼ Nχβ, σ2ℐ) and
⟹ β ∼ Nβ, σ2χ χ)-1)
σ2 = (ϵ ϵ
N –
■ To test a hypothesis, we construct “test statistics”.■ The Null Hypothesis ℌ0■ Typically what we want to disprove(no effect)■ ⇒ The Alternative Hypothesis ℌA expresses outcome of interest.
Contrasts■ We are usually not interested in the whole β vector.■ A contrast can select a specific effect of interest:■ ⇒ A contrast is a vector of length .
⇒ ℭ β is a linear combination of regression coefficients β.
Let ℭ= [1 0 0 0 0 ……..]
ℭ β = 1 xβ1 + 1 xβ2 + 1 xβ2 + 1 xβ3 + 1 xβ4 + 1 xβ5 …..= β1
ℭ = [0 -1 1 0 0 ..]
ℭ β = 0 xβ1 + -1 xβ2 + 1 xβ3 + 0 xβ4 + 0 xβ4 + 0 xβ5 + …..= β3 – β2
22 Statistical Inference.nb
(sample matrix normal distribution)
■ Under the assumption: ℭ β ∼ Nℭ β, σ2 ℭχ χ)-1) ℭ■ Definition : A linear function of the parameters, α is said to be an estimable function of α if it is
identically equal to some linear function of the expected value of the vector of observations, γ.
That is, α is estimable if :
α = ϵ[γ]
■ Consider the general linear model γ*1 = χ* β*1 + ϵ+1■ We say that β is identifiable if knowing the mean ϵ(γ) gives us β.■ Definition : The parameterization β is identifiable if for any β1 and β2
ƒ(β1) = ƒ(β2) implies β1 = β2■ Estimability ≠ Identifiability !!■ Identifiability : Attribute of the model■ Estimability : Attribute of the data■ Identifiability addresses the question of whether (and with what degree of certainty) it is possible to
uniquely estimate parameters for a given model and data set.■ Within the framework of identifiability, one considers whether the parameters can be estimated
uniquely, in the best-case scenario of noise-free, perfectly measured data.■ While this is unrealistic, it is a pre-requisite to successful estimation from real-world data.■ The term identifiability is also referred to as estimability (McLean and McAuley, 2012)■ Existence of sampling errors or variability may hinder the ability to uniquely estimate the
parameters of interest in a linear model.■ Therefore, every linear model can be viewed as estimable and identifiable.■ However, this may depend on the nature of the design matrix if its of FULL RANK or NOT.
Thank You
Statistical Inference.nb 23
LECTURE 5
ESTIMATION OF SPACE AND
ERROR SPACE
COURSE LECTURER: Dr. JULIUS
KOECH
DEPARTMENT: MATHEMATICS AND COMPUTER SCIENCE
THE LINEAR MODEL AND THE ERROR
■ Throughout the start of the lecture to this stage, we have seen that the only equation we ever
really need is the linear model and this is given below :
outcomeι = modelι + ϵ
The Norma Linear Model■ The normal linear model may be written in matrix form as : γ = χβ + ϵ where ϵ ∼ N0, σ2 ℐ ■ The fundamental idea can be predicted from a model and some error associated with that
prediction ϵι
ASSUMPTIONS OF THE LINEAR MODEL IN ESTIMATION■ As mentioned, the normal linear model incorporates strong assumptions about the data. some of
these assumptions include :■ Linearity■ Constant Variance (homoscedasticity)■ Normality■ Independence
The Error Component Problems
24 Statistical Inference.nb
■ errors may be heterogeneous (unequal variance)■ Errors may be correlated■ errors may not be normally distributed■ The last defect is less serious than the first two because even if the errors are not normal, the βs’
will tend to normality due to the power of the central limit theorem. With larger datasets, normality
of the data is not much of a problem.
Known Methods that Minimize The Error Prediction■ Mixed-effects models■ Random effects models■ Co-integration approach : Good model for minimizing errors■ Use of Bayesian Estimation, but need to specify priors of linear parameters and make assumption
of the given distribution■ And many other models not specified here
Errors In The Linear Model■ Familiarize yourself with these types of linear models as the first two will be covered in the
upcoming classes■ Read also on how to employ either SAS or Wolfram Mathematica statistical software’s in
analyzing data using these models.
QUESTIONS?
Thank You!!
Statistical Inference.nb 25