Эконометрика — МИЭФ, 2023 midterm 1

МИЭФЭконометрика2023midterm 1
Скачать задачи PDF

Question 1

Multiple-choice test

Which of the following statements is true?

  1. If the calculated value of the FF statistic is higher than the critical value, we reject the alternative hypothesis in favor of the null hypothesis.

  2. The FF statistic is always nonnegative as SSRrSSR_r is never smaller than SSRurSSR_{ur}.

  3. Degrees of freedom of a restricted model is always less than the degrees of freedom of an unrestricted model.

  4. The FF statistic is more flexible than the tt statistic to test a hypothesis with a single restriction.

  5. None of the above.

Question 2

Multiple-choice test

In a regression model, if variance of the dependent variable YY, conditional on an explanatory variable XX, is not constant, then:

  1. The tt statistics are invalid and confidence intervals are valid for small sample sizes.

  2. The tt statistics are valid and confidence intervals are invalid for small sample sizes.

  3. The tt statistics and confidence intervals are valid no matter how large the sample size is.

  4. The tt statistics and confidence intervals are both invalid no matter how large the sample size is.

  5. The OLS estimators are biased, and hence no need to discuss tt statistics and confidence intervals.

Question 3

Multiple-choice test

In econometrics, simultaneity bias arises when:

  1. Strictly exogenous explanatory variables determine the dependent variable through a step-by-step process.

  2. The disturbance term is correlated with the dependent variable.

  3. One or more of the explanatory variables is jointly determined with the dependent variable.

  4. Heteroscedasticity is present in the model.

  5. There is correlation between some explanatory variables.

Question 4

Multiple-choice test

For the model

Yi=β1+β2Xi+ui,Y_i=\beta_1+\beta_2X_i+u_i,

where XiX_i are non-stochastic and the Model A assumptions are satisfied, the estimator

b=i=2n(YiYi1)i=2n(XiXi1)b= \frac{\sum_{i=2}^{n}(Y_i-Y_{i-1})} {\sum_{i=2}^{n}(X_i-X_{i-1})}

is, generally speaking:

  1. An unbiased and efficient estimator of β2\beta_2.

  2. An unbiased but inefficient estimator of β2\beta_2.

  3. A biased estimator of β2\beta_2.

  4. A non-linear estimator of β2\beta_2.

  5. Non-stochastic.

Question 5

Multiple-choice test

For the sample of 55 observations, functions (1) and (2) were estimated:

Y=β0+β1X1+β2X2+u(1)Y=\beta_0+\beta_1X_1+\beta_2X_2+u \tag{1} Y=β0+β1(X1X2)+u.(2)Y=\beta_0+\beta_1(X_1-X_2)+u. \tag{2}

The determination coefficients for these models are R12=0.9R_1^2=0.9 and R22=0.7R_2^2=0.7, respectively. The FF statistic for testing the hypothesis β1=β2\beta_1=\beta_2 in (1) equals:

  1. 6.76.7.

  2. 8.28.2.

  3. 3030.

  4. 6060.

  5. You cannot test this hypothesis using (1) and (2).

Question 6

Multiple-choice test

The function of expenditures for cosmetics depending on disposable personal income has been estimated using OLS for a representative sample of people:

Y=β0+β1D1+β2X+β3X(1D2)+u,Y=\beta_0+\beta_1D_1+\beta_2X+\beta_3X(1-D_2)+u,

where YY is expenditure for cosmetics, XX is disposable personal income, D1=1D_1=1 for females and 00 for males, and D2=1D_2=1 for males and 00 for females.

For this regression the following is correct:

  1. The estimates of intercept are the same for male and female subsamples, while the estimates of slope coefficient, generally speaking, differ for them.

  2. The estimates of slope coefficient are the same for male and female subsamples, while the estimates of intercept, generally speaking, differ for them.

  3. Both intercepts and slope coefficients estimated, generally speaking, differ for male and female subsamples.

  4. Both intercepts and slope coefficients estimated are the same for male and female subsamples.

  5. The combination of intercept and slope dummies is incorrect, and the model cannot be estimated.

Question 7

Multiple-choice test

If you have estimated the parameters of the following model using OLS directly, with the Gauss-Markov conditions satisfied,

y=α+β1x1+β2x2+(β2(1+β3))x3+u,y=\alpha+\beta_1x_1+\beta_2x_2+\bigl(\beta_2(1+\beta_3)\bigr)x_3+u,

then:

  1. You can get an unbiased estimate of β3\beta_3.

  2. You cannot get an unbiased estimate of β3\beta_3, but can easily get a consistent estimate of it.

  3. You cannot get an unbiased, or biased but consistent, estimate of β3\beta_3.

  4. You cannot get any estimate of β3\beta_3.

  5. All the above statements are incorrect.

Question 8

Multiple-choice test

If OLS is used in a simple regression model in the case of heteroscedasticity, the population variance of the slope coefficient is

var(b2)=i=1nxi2σi2(i=1nxi2)2.(1)\operatorname{var}(b_2)= \frac{\sum_{i=1}^{n}x_i^2\sigma_i^2} {\left(\sum_{i=1}^{n}x_i^2\right)^2}. \tag{1}

The formula for the homoscedasticity case is

var(b2)=σ2i=1nxi2.(2)\operatorname{var}(b_2)= \frac{\sigma^2}{\sum_{i=1}^{n}x_i^2}. \tag{2}

Let σi2=σ2ki\sigma_i^2=\sigma^2k_i, where kik_i are unknown weights and ki=1\sum k_i=1. Then:

  1. Expression (1) is always greater than (2).

  2. Expression (1) is always less than (2).

  3. Expression (1) is greater than or equal to (2).

  4. Expression (1) is less than or equal to (2).

  5. Expression (1) can be greater than, less than, or equal to (2), depending on the nature of the relationship between σi\sigma_i and xix_i.

Question 9

Multiple-choice test

In the regression model

y=α+βx+u,y=\alpha+\beta x+u,

where uu satisfies the Gauss-Markov conditions and is normally distributed, the explanatory variable xx includes random measurement errors that are independent, normally distributed, homoscedastic, not autocorrelated, and have zero expected values. Suppose β<0\beta<0 and the mean value of xx is negative. When estimating the model using OLS, for large samples:

  1. The estimator of α\alpha will be biased upwards.

  2. The estimator of α\alpha will be biased downwards.

  3. The estimator of α\alpha will be unbiased.

  4. The estimator of α\alpha may be biased upwards or downwards.

  5. The OLS estimator of α\alpha does not exist.

Question 10

Multiple-choice test

For a simultaneous equations model with 7 equations, 7 endogenous variables and 7 exogenous variables, the following statement is true:

  1. With that number of potential instruments, any equation is identified in the model.

  2. An equation in the model is identified if and only if only exogenous variables are available on its right-hand side.

  3. The number of potential instruments is insufficient to make all the equations identified.

  4. No equation can be overidentified in the model.

  5. None of the above.

Question 11

Multiple-choice test

The economic model is described by the following simultaneous equations:

\begin{aligned} y_1&=\delta+\tau y_2+\pi x_2+u_2, \tag{1}\\ y_2&=\alpha+\pi y_1+\gamma x_1+\phi x_2+u_1. \tag{2} \end{aligned}

Here y1y_1 and y2y_2 are endogenous variables; x1x_1 and x2x_2 are stochastic exogenous variables; and u1u_1 and u2u_2 are disturbance terms satisfying the Gauss-Markov conditions. Indicate the correct statement:

  1. You may apply TSLS in (1), but not in (2).

  2. You may apply TSLS in (2), but not in (1).

  3. You may apply TSLS in both (1) and (2).

  4. You may not apply TSLS in either (1) or (2).

  5. TSLS is not needed since OLS provides consistent estimates in (1) and (2).

Question 12

Multiple-choice test

The model with the dependent variable PiP_i (monthly pension), as a function of work experience WEiWE_i and average earnings EARNiEARN_i, is being considered:

Pi=β1+β2WEi+β3EARNi+ui.P_i=\beta_1+\beta_2WE_i+\beta_3EARN_i+u_i.

The value of pension is restricted by the values PUP_U and PLP_L from the top and from the bottom, but there are no actual observations in the sample with P=PUP=P_U or P=PLP=P_L. The student decided to estimate a Tobit model for this sample. Indicate the correct statement:

  1. The Tobit estimators of the model coefficients are biased and inconsistent.

  2. The Tobit estimators of the model coefficients are biased but consistent.

  3. The Tobit model estimates will be the same as the OLS estimates here.

  4. The Tobit model may not be estimated for this sample.

  5. None of the above.

Part 2. Free Response Questions — 1 hour 30 minutes.

Section A. Answer all questions from this section (original Questions 1-2).

Question 13

Written Question 1 — 25 marks

A student is investigating factors that affect schoolchildren's consumption of unhealthy food at fast-food restaurants, such as McDonald's. Let YiY_i be the average number of hamburgers consumed per month in 2021 and let XiX_i be age. The student wants to understand whether the dependence differs between boys and girls. She introduces a dummy variable DiD_i equal to 1 for boys and 0 for girls.

Using a sample of 17 boys and 13 girls, for a total of 30 observations, she first runs the simple regression

Y^i=0.56+0.24Xi,R2=0.17,\widehat Y_i=-0.56+0.24X_i,\qquad R^2=0.17,

with standard errors

(0.53)(0.10).(1)(0.53)\qquad(0.10). \tag{1}

Assuming that boys eat more frequently at a fast-food restaurant, she defines a slope dummy variable (XD)i=XiDi(XD)_i=X_iD_i and fits the regression

Y^i=1.43+0.19Xi+0.52Di+0.78(XD)i,R2=0.36,\widehat Y_i=-1.43+0.19X_i+0.52D_i+0.78(XD)_i,\qquad R^2=0.36,

with standard errors

(0.36)(0.07)(0.33)(0.42).(2)(0.36)\qquad(0.07)\qquad(0.33)\qquad(0.42). \tag{2}

(a)

  • What is the meaning of the coefficients of regression (2)?
  • Is there any difference in the influence of XiX_i on YiY_i between boys and girls? How should the significance of this difference be tested?
  • Can the answer to the previous question be obtained using the Chow test? What additional information is needed, and how can it be obtained?
  • Can the answer to the previous question be obtained using the Chow test? What additional information is needed, and how can it be obtained?

(b)

When the student showed her results to the supervisor, the supervisor advised her to evaluate a simplified regression of the form

Yi=β0+β1Di+ui.(3)Y_i=\beta_0+\beta_1D_i+u_i. \tag{3}

This regression does not take into account the effect of age XiX_i. The student did not have a computer with her to recalculate the coefficients. The supervisor noted that it is sufficient to know the average number of hamburgers consumed per month for girls, Y0\overline Y_0, and boys, Y1\overline Y_1, because it can be shown that

β1=Y1Y0andβ0=Y0.\beta_1=\overline Y_1-\overline Y_0 \qquad\text{and}\qquad \beta_0=\overline Y_0.
  • Show that these statements are true for regression (3).
  • Provide the intuition behind these statements.

Question 14

Written Question 2 — 25 marks

A student tries to determine how expenditure on education EiE_i, in billions of dollars, relates to GDP YiY_i, in billions of dollars, and population PiP_i, in millions, using data on 34 developed and developing countries with high, medium, and low aggregate income for 2020. Here and below, eie_i denotes the regression residual.

She estimates

E^i=4.52+0.043Yi+ei,R2=0.75,i=1,,34,\widehat E_i=-4.52+0.043Y_i+e_i,\qquad R^2=0.75,\qquad i=1,\ldots,34,

with standard errors

(3.40)(0.004).(1)(3.40)\qquad(0.004). \tag{1}

(a)

  • Why may the student fear the presence of heteroscedasticity? Explain using your understanding of heteroscedasticity.
  • How could heteroscedasticity influence the regression results?
  • How can heteroscedasticity be detected using graphs? Specify the relevant graphs.
  • The student arranges countries by YiY_i and runs two regressions in specification (1). For the 10 countries with the highest YiY_i values, she obtains SSR1=5795.4SSR_1=5795.4. For the 14 countries with the lowest YiY_i values, she obtains SSR2=41.7SSR_2=41.7. Conduct an appropriate heteroscedasticity test using this information.
  • The supervisor advises the student to use per-capita values Ei/PiE_i/P_i and Yi/PiY_i/P_i instead of the absolute values EiE_i and YiY_i. The student again arranges countries by Yi/PiY_i/P_i. For the 10 countries with the highest Yi/PiY_i/P_i, she obtains SSR1=0.19SSR_1=0.19; for the 14 countries with the lowest Yi/PiY_i/P_i, she obtains SSR2=0.33SSR_2=0.33. Explain the idea behind this advice and assess its usefulness using an appropriate test.

(b)

The student next estimates the multiple regression

E^i=1.570.0056Yi+0.88Pi+ei,R2=0.98,\widehat E_i=-1.57-0.0056Y_i+0.88P_i+e_i,\qquad R^2=0.98,

with standard errors

(0.94)(0.0027)(0.044).(2)(0.94)\qquad(0.0027)\qquad(0.044). \tag{2}
  • Compare the coefficients on YiY_i in equations (1) and (2). How have the meaning, value, and significance of the coefficient changed? Which value seems more reasonable, and why?
  • To test equation (2) for heteroscedasticity, the student uses the Breusch-Pagan test and obtains R2=0.38R^2=0.38 for the auxiliary regression. Complete the test, describe the procedure, and state the result.

On the advice of a friend, the student estimates the model in logarithms:

lnEi=9.630.37lnYi+1.37lnPi+ei,R2=0.95,\ln E_i=9.63-0.37\ln Y_i+1.37\ln P_i+e_i,\qquad R^2=0.95,

with standard errors

(0.34)(0.09)(0.09).(3)(0.34)\qquad(0.09)\qquad(0.09). \tag{3}

For equation (3), the auxiliary regression has R2=0.16R^2=0.16 for the Breusch-Pagan test and R2=0.36R^2=0.36 for the White test with cross terms.

  • Why might using logarithms help eliminate heteroscedasticity? Did it help according to the Breusch-Pagan and White tests? Complete both tests. Why do their results differ, and which test should be trusted more?
  • What would you advise the student to do to eliminate heteroscedasticity in equation (2)?

Section B. Answer one question from this section (original Question 3 or Question 4).

Question 15

Written Question 3 — 25 marks

A student in ICEF's econometrics course uses data on a sample of 100 students to study which factors determine the score YiY_i, out of 100 points, on the winter econometrics exam. Since econometrics relies heavily on statistics, one possible factor is the student's knowledge of statistics ZiZ_i:

Yi=β1+β2Zi+ui.(1)Y_i=\beta_1+\beta_2Z_i+u_i. \tag{1}

Direct measurement of ZiZ_i is not possible. The available variable is SiS_i, the score, also out of 100 points, obtained in the second-year statistics exam. Because students were nervous during this exam, the student assumes a measurement error:

Si=Zi+wi,S_i=Z_i+w_i,

where wiw_i is independent of ZiZ_i and uiu_i, with

E(wi)=0,Var(wi)=σw2.E(w_i)=0,\qquad \operatorname{Var}(w_i)=\sigma_w^2.

Using OLS, she obtains

Y^i=8.26+0.80Si,R2=0.42,\widehat Y_i=-8.26+0.80S_i,\qquad R^2=0.42,

with standard errors

(6.00)(0.09).(2)(6.00)\qquad(0.09). \tag{2}

(a)

  • What are the consequences of measurement error in the regressor when estimating β2\beta_2 by OLS?
  • A friend points out that the statistics exam was graded very harshly, with students' grades lowered, so it may be more appropriate to assume E(wi)=μw<0E(w_i)=\mu_w<0. What additional consequences for estimation of β2\beta_2 by OLS does this assumption have?

(b)

  • What are the consequences of measurement error when estimating the intercept β1\beta_1 by OLS under the assumption E(wi)=0E(w_i)=0?
  • Illustrate graphically the result obtained in the previous question for the estimation of β1\beta_1.
SY000
Effect of attenuation bias on the fitted intercept

(c)

The student also has grades in other subjects: MiM_i for mathematics, BiB_i for banking, LiL_i for linear algebra, and others. She assumes that these variables are not subject to measurement error. She regresses SiS_i on all these variables, saves the residuals EiE_i, and includes them in the equation

Y^i=20.69+1.00Si0.47Ei,R2=0.46,\widehat Y_i=-20.69+1.00S_i-0.47E_i,\qquad R^2=0.46,

with standard errors

(7.64)(0.12)(0.19).(3)(7.64)\qquad(0.12)\qquad(0.19). \tag{3}
  • Comment on the aim and logic of this procedure and on the obtained results.
  • During the winter econometrics exam, students were also nervous, introducing measurement error viv_i into YiY_i, with E(vi)=0E(v_i)=0 and Var(vi)=σv2\operatorname{Var}(v_i)=\sigma_v^2. How does this assumption affect the properties of the estimate of β2\beta_2 in equation (2)?
  • Would your conclusions change if, because the econometrics exam was just before New Year, graders were instructed to resolve all controversial cases in favor of students, so that E(vi)=a>0E(v_i)=a>0? No rigorous derivation is required.

Question 16

Written Question 4 — 25 marks

After graduating from university, a student joins a consulting firm dealing with the promotion of candidates and the organization of online elections. Her first assignment is to advise potential candidate A for the position of head of the student organization.

Candidate A is young and inexperienced. He can spend only $2,000 on advertising, but respondents rank his attractiveness as 5. Candidate B is more experienced and plans to spend $5,000 on advertising, with an attractiveness rank of 2. Each candidate receives 3 free appearances on local television; additional appearances cost $799 each.

The student has data from 200 past elections:

  • VV: number of votes cast for the candidate;
  • EE: binary variable equal to 1 if the candidate was elected;
  • ADAD: amount spent on promotion, in thousands of US dollars;
  • TVTV: number of appearances on television special events;
  • APPAPP: personal appeal of the candidate, on a scale from 1 to 5.

Using these data, she estimates the following models. Standard errors, or their counterparts, are in parentheses. For the probit model,

f(z)=12πez2/2f(z)=\frac{1}{\sqrt{2\pi}}e^{-z^2/2}

is the standard normal probability density function.

V^i=41.60+25.15ADi+32.64TVi+21.60APPi,R2=0.66,\widehat V_i=-41.60+25.15AD_i+32.64TV_i+21.60APP_i,\qquad R^2=0.66,

with standard errors

(17.84)(2.08)(2.75)(4.72).(1-OLS)(17.84)\qquad(2.08)\qquad(2.75)\qquad(4.72). \tag{1-OLS} E^i=0.74+0.14ADi+0.18TVi+0.11APPi,R2=0.51,\widehat E_i=-0.74+0.14AD_i+0.18TV_i+0.11APP_i,\qquad R^2=0.51,

with standard errors

(0.13)(0.016)(0.02)(0.04).(2-OLS)(0.13)\qquad(0.016)\qquad(0.02)\qquad(0.04). \tag{2-OLS} E^i=5.64+0.64ADi+0.75TVi+0.54APPi,McFadden R2=0.49,\widehat E_i=-5.64+0.64AD_i+0.75TV_i+0.54APP_i,\qquad \text{McFadden }R^2=0.49,

with standard errors

(0.94)(0.13)(0.12)(0.19).(3-Probit)(0.94)\qquad(0.13)\qquad(0.12)\qquad(0.19). \tag{3-Probit}

(a)

  • Explain the meaning of regression (1). Compare the candidates' chances based on model (1), assuming that a higher expected number of votes is an indicator of success.
  • Explain the meaning of regression (2) and its coefficients.
  • Explain the logic of model (3), including the mechanism used to obtain its regression results.

(b)

According to model (3), what are the chances for each candidate to be elected? Compare them with the results of model (2). Which model can be trusted more?

Recall the candidate data:

  • Candidate A: AD=2AD=2, TV=3TV=3, APP=5APP=5;
  • Candidate B: AD=5AD=5, TV=3TV=3, APP=2APP=2.

(c)

Using the marginal effects of advertising and television appearances in model (3), advise Candidate A how to reallocate his funds between advertising and television appearances to close the gap with Candidate B or overtake him. One additional television appearance costs $799, and the candidate has no additional funds. Show with calculations that the proposed reallocation can improve Candidate A's chances.