计量经济学 建立模型论文
王岩0912010220刘冬一0912010140王雪0912010219庄晓静0904030224王冰蕾0920010236
Econometrics Term Project
Ⅰ.Abstract
Our group use regression analysis to make quantitative estimates about the variables' relationships with the decision made for getting master degree. The abstract is as followed :,
Firstly , developed our topic ( described in II and III );
Secondly , collected our data (III , IV will give you a detail description);
Thirdly , advanced data sources (we will give a questionnaire in the appendix, and the resource we will show in the part IV );
Fourthly , practical advice (In part V ,we made three test to give some practical advice ) ;
Hoping our abstract will give you a good guide in overview the whole project .
Ⅱ.Introduction
After discussion, our group are confusing about what factors could affect the undergraduate make a decision for getting master degree . So we found a regression analysis model. Firstly ,review the literature and develop the theoretical model;
Secondly, specify the model :select the independent variables and the functional form; Thirdly, hypothesize the expected signs of the coefficients; Then,collect the data. Inspect and clean the data; Fifthly, estimate and evaluate the equation ; The six step , document the results ; Last but the most , we do some tests to decide weather our hypothesis is correct .
Ⅲ.Study objective and data description
Our project is to study what factors affect the students to make decision for getting master degree.
We definite four independent variables :the family revenue, college grades, the pressure of employment, the satisfaction of major and the gender as a dummy variable.
A general theoretical functional relationship like: Qi = f (Ri, GRi, Pi ,Si ,GEi ) Ri=the annual revenue of family
Qi =the quantity of students gonna to get master degree GRi =the college grades
Pi =the pressure of employment Si =the satisfaction of the major
GEi=a dummy variable equal to 1 if the ith student is male,0 for female. For those variables expect dummy variable, we divided the degree of favour into five scoring (1-5) attaching total disagree ,very disagree ,neutral, very agree and total agree to measure respectively, and for dummy variable we definite male is 1 and female is 0.
*The questionnaire will be attached to the last page paper
Ⅳ.Data collection and data description.
We are in the form of a questionnaire survey to collect data, and conducted a survey to 27 students, So,we estimated on a sample of 27 students(i=27). Here is the questionnaire scores we collected:
Qi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Ri 3 3 2 3 4 4 5 4 5 5 3 5 2 1 2 2 2 3 4 2 1 2 3 3 5 2 5 GRi 4 2 5 3 5 3 4 2 3 5 4 4 3 4 4 3 5 4 2 3 4 3 1 3 3 3 2 Pi 2 3 3 4 2 3 5 2 4 5 3 4 5 5 1 4 5 4 2 5 4 4 2 4 5 4 5 Si 2 5 2 4 4 2 2 4 3 3 4 2 3 2 3 3 3 4 2 2 2 3 2 4 3 5 3 GEi 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0
We can draw the following equation:
Qi =19.0263-1.3097 Ri -3.3687 GRi +2.7989 Pi -0.1022 Si +1.0273GEi (1.1287) (1.4324) (1.3521) ( 1.5317) (3.225) t=-1.1603 -2.3518 2.0669 -0.0665 0.3185 R 2=0.1465 N=27
_
_
When we canceled the variable Ri, R 2=0.1221, it decreased. When we canceled the variable
_
_
Si, R 2=0.1859, it increased. When we canceled the variable GEi, R 2=0.1093,it decreased. So we can draw the conclusion that the variable Si may be an irrelevant variable.
We concerned about an omitted variable as well. We definite another independent variable Ai=the atmosphere around. After enter a omit variable for modification, we investigate and collect data again,the new datas as followings:
Q 1 2 3 4 5 R 3 3 2 3 4 GR 4 2 5 3 5 P 2 3 3 4 2 A 3 4 5 5 3 GE 1 0 1 0 1
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
4 5 4 5 5 3 5 2 1 2 2 2 3 4 2 1 2 3 3 5 2 5 3 4 2 3 5 4 4 3 4 4 3 5 4 2 3 4 3 1 3 3 3 2 3 5 2 4 5 3 4 5 5 1 4 5 4 2 5 4 4 2 4 5 4 5 5 5 4 5 5 5 4 5 3 5 4 4 2 3 2 2 1 3 4 3 5 4 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0
We can draw the following equation:
Qi =24.9947-0.7758 Ri -2.8903 GRi +2.5063 Pi -2.1009 Si +0.0287 GEi (1.1025) (1.3669) (1.2696) ( 1.2322) (3.0291)
t=-0.7036 -2.1146 1.9741 -1.705 0.0095 R 2=0.2501 N=27
_
_
The R 2 is increase, this prove Ai is really an omitted variable.
Ⅴ.Multi-regression analysis
The first step for serial correlation test:
As we can see in the form that: p=0.0470>0 so it is the first-order serial correlation.
Now we use Durbin-Watson d Test to deal with it
=47/992=0.05
Because N=28 k=5 we use in Appendix B to find the upper critical d value, dU, and the lower critical d value, dL that dL=1.03 du=1.85 Set up the test hypotheses H0: ρ ≤ 0 HA: ρ > 0
Because d
-
for 0.05 is smaller than dl ,so we reject the null hypotheses.Now we start in correcting a serial correlation problem. Our test is an impure serial correlation:
We think that there is an omitted variable in our test. It is easy to make the Durbin-Watson d Test abnormal when there is an omitted variable. So we put another variable GE in our test to solve this problem.
The second step:
For a heteroskedasticity test, there is no need for us to use the test for
Heteroskedasticity ,here are the reasons : (1)there isn't any obvious specification errors;
(2)the subject of the research do not likely to be afflicted with heteroskedasticity ,because only large variations in the size of the dependent variable are particularly susceptible to heteroskedasticity;
(3) The graph of the residuals doesn't show any evidence of heteroskedasticity.
The third step:
We conducted a multicollinearity test,as we all know there are five major consequences of multicollinearity: 1. Estimates will remain unbiased
2. The variances and standard errors of the estimates will increase:
a. Harder to distinguish the effect of one variable from the effect of another, so much more likely to make large errors in estimating the βs than without multicollinearity .
b. As a result, the estimated coefficients, although still unbiased, now come from distributions with much larger variances and, therefore, larger standard errors 3. The computed t-scores will fall.
4. Estimates will become very sensitive to changes in specification:
a. The addition or deletion of an explanatory variable or of a few observations will often cause major changes in the values of the s when significant multicollinearity exists.
b. For example, if you drop a variable, even one that appears to be statistically insignificant, the coefficients of the remaining variables in the equation sometimes will change dramatically.
c. This is again because with multicollinearity, it is much harder to distinguish the effect of one variable from the effect of another.
5. The overall fit of the equation and the estimation of the coefficients of nonmulticollinear variables will be largely unaffected.
The Detection of Multicollinearity:
First, we realize that some multicollinearity in every equation: all variables are correlated to some degree.
So it’s really a question of how much multicollonearity exists in an equation, rather than whether any multicollineatity exists.
There are basically two characteristics that help detect the degree of multicollinearity for a given application: 1. High simple correlation coefficients 2. High Variance Inflation Factors (VIFs) a. r=0.628
b. VIF=I/(1-R2)=1.6513
Essentially three remedies for multicollinearity: 1. Do nothing: a. M ulticollinearity will not necessarily reduce the t-scores enough to make them statistically insignificant and/or change the estimated coefficients to make them differ from expectations.
b. the deletion of a multicollinear variable that belongs in an equation will cause specification bias.
2. Drop a redundant variable:
a. Viable strategy when two variables measure essentially the same thing. b. Always use theory as the basis for this decision. 3. Increase the sample size:
a. This is frequently impossible but a useful alternative to be considered if feasible.
b. The idea is that the larger sample normally will reduce the variance of the estimated coefficients, diminishing the impact of the multicollinearity.
In this model we choose remedy of “Do noting”.
Ⅵ. Conclusion
We chose our topic, definite the independent variables and dependent variable, collected the data and run the Multi-regression model. We found the different between the model and our expectation. We use a series of tests to found the irrelevant variables, omitted variable, multicollinearity , serial correlation and heteroskedasticity. Then we have a function that can reflect the relationship between the independent variables and dependent variable.
Ⅶ .Appendix
影响考研因素的调查问卷
性别: 男
女 专业:
学校:
注:以下的调查问卷将分为:完全不同意,比较不同意,一般,比较同意,完全同意五个等级,并依次用1—5五个标准来进行程度的划分,请同学们按照自己的想法来打分,划勾标注。
1. 学生的家庭经济条件对其做出考研决定的影响很大: 完全不同意1———2———3———4———5完全同意
2. 大学成绩优秀的同学更倾向于考研:
完全不同意1———2———3———4———5完全同意 3. 感觉就业压力大所以选择考研:
完全不同意1———2———3———4———5完全同意 4. 对本科专业不满意者更倾向于考研:
完全不同意1———2———3———4———5完全同意
Ⅷ. Bibliography
①Using Econometrics,6e A.H.Studenmund; ②http://www.pearsoned.com