The data file WAGE.dta contains information on monthly salaries of 741 working men in 1980. These dataset also includes information on education, experience, and tenure as well as personal and family background characteristics such as marital status, parents' education and number of siblings. In our lectures, we used data on married working women to estimate the return to education. Now, you want to estimate the return to education for working men using the data provided.

Start by estimating the following model, using OLS:

lwage = β_{0 }+ β_{1} educ + β_{2} exper + β_{3} tenure + β_{4} married + β_{5} black + β_{6} urban + β_{7} south + u

The estimated coefficient on education tells us that, keeping all the other factors constant, one additional year of education increases wages, on average, by Blank 1 percent (round to 1 decimal place). Due to the omitted variable bias, we suspect that the estimated coefficient on the OLS regression is Blank 2 (overstating/understating) the true effect of education on wages.

The variable feduc gives us the number of years of schooling of the father of each men in the sample. You use it as an instrument because you believe that feduc is positively correlated with education, controlling for all the other exogenous variables (exper, tenure, married, black, urban and south), and that father's education is uncorrelated with factors in the error term u, such as unobserved ability and motivation. Use 2SLS to estimate the return to education (assume that the errors are heteroskedastic). Surprisingly, now the coefficient on education tells us that, all else constant, an additional year of education increases earnings by ______ percent (round to 1 decimal place).

In question 2, you used feduc as an instrument for educ, and you obtained an estimate of the return to education. The purpose of this question is to convince yourself that using feduc as an instrument for educ is NOT the same as just plugging feduc in for educ and running an OLS regression. Therefore, run OLS the regression of lwage directly on feduc and all the other exogenous regressors (exper, tenure, married, black, urban and south), assuming heteroskedasticity of errors. The results of the estimation tells us that, controlling for all the other factors, one additional year in a man's father's education is associated with a(n) Blank 1 (increase/decrease) in his wage of about Blank 2 percent (percentage points, rounded to one decimal place). The t-statistic on feduc is about Blank 3 (use two decimal points).

The purpose of the next couple of questions is to compare the estimates and standard errors obtained by correctly using 2SLS (in Stata, that means using the command ivregress 2SLS with the appropriate syntax) with those obtained using inappropriate procedures.

You used the correct procedure in question 2. Now, manually carry out 2SLS. That is, first regress (using OLS) educ on the instrument (feduc) and the other exogenous variables (exper, tenure, married, black, urban, south). Obtain the fitted values (in Stata that means using the predict command). Call the fitted values educ_hat. Then, run the second stage regression of lwage on educ_hat, and the other control variables (exper, tenure, married, black, urban, south). Assume errors are heteroskedastic on both stages of the regression. Look at the estimate of the education coefficient and compare it with the one you obtained in question 2. Which of the following is correct?

The estimated coefficient on education is lower than the one obtained in question 2.

The estimated coefficient on education is exactly the same as the one obtained in question 2.

The estimated coefficient on education is higher than the one obtained in question 2.

The estimated coefficient on education is almost the same but not quite the same as the one obtained in question 2.

You used the correct procedure in question 2. In question 4, you manually carried out 2SLS. Look at the robust standard error you obtained for the education coefficient and compare it with the one you obtained in question 2. The robust standard error for the education coefficient obtained in question 4 is Blank 1 (lower than, identical to, higher than) the one obtained in question 2. The robust standard errors obtained from the second stage regression when manually carrying out 2SLS are generally Blank 2 (inappropriate, accurate).

In this question, use the following two-step procedure, which generally yields inconsistent parameter estimates of the coefficients, besides yielding inconsistent standard errors. In step one, regress educ on the instrument (feduc) only. Obtain the fitted values and call them educ_tilda. Note that this is an INCORRECT first stage regression. Then, in the second step, run the regression of lwage on educ_tilda and the control variables (exper, tenure, married, black, urban, south). How do the estimate and standard error of the education coefficient from this incorrect, two-step procedure compare with the correct estimate and standard error you obtained in question 2?

The estimate of the education coefficient is lower than the one obtained in question 2, but the standard error is higher than the one obtained in question 2.

Both the estimate and the standard error of the education coefficient are lower than the ones obtained in question 2.

Both the estimate and the standard error of the education coefficient are higher than the ones obtained in question 2.

The estimate of the education coefficient is higher than the one obtained in question 2, but the standard error is lower than the one obtained in question 2.

Your dataset has other variables that could be used as instrumental variables for education. In particular, mother's education, the number of siblings, and birth order could be considered valid instruments for education.

Use feduc, meduc and sibs as instruments for educ and estimate the regression

by 2SLS (Again, assume heteroskedasticity of errors).

The effect of one more year of education is now associated with an increase in wage of about Blank 1 percent (rounded to one decimal place), a somewhat Blank 2 (smaller/larger) effect than the one estimated in question 2. The standard error for the coefficient on education on this regression with 3 instruments is Blank 3 (higher/lower) than the one in question 2.

In this case of a single endogenous variable, your textbook describe a easy way to check whether the instruments are weak. Carry out that test for this case in which you have 3 instrumental variables (feduc, meduc, sibs) and the usual control variables (exper, tenure, married, black, urban, south). The value of the test statistic is Blank 1 (use one decimal point), which means that you Blank 2 (need, do not need) to worry about weak instruments.

You want to check if your instruments are exogenous. Since your coefficients are overidentified (3 instruments and 1 endogenous variable) it is possible to test the overidentifying restrictions. Carry out the test of overidentifying restrictions. You start by computing the residulas from the estimated 2SLS regression using all the three instruments. Exogeneity implies that the instruments should be approximately uncorrelated with the computed residuals. Therefore, calculate the appropriate test statistic the way we presented in lectures (not the way the textbook presents it) to carry out the test. You should have a test statistic that follows a F distribution (not a chi-squared distribution). The test statistic has the value of Blank 1 (rounded to the second decimal), and follows a F-distribution with 2,∞ degrees of freedom. This means you will Blank 2 (reject, fail to reject) the null hypothesis that the instruments are Blank 3 (exogenous, not exogenous). (Note: Use a 5% significance level for the test)

Upload your log file showing all the work you did to answer the questions above. Failure to submit a proper log file will decrease your homework grade by 75%

One fine body…