Sunday, February 24, 2019
Econometrics Chapter Summaries Essay
2) underlying Ideas of Linear Regression The Two-Variable warningIn this chapter we introduced few fundamental ideas of throwback analysis. Starting with the key concept of the population reverting function (PRF), we unquestionable the concept of unidimensional PRF. This book is primarily concerned with e wideate PRFs, that is, backslidings that argon genius-dimensional in the parameters regardless of whether or non they atomic trope 18 elongate in the changeables. We consequently introduced the idea of the stochastic PRF and discussed in detail the nature and role of the stochastic fracture term u. PRF is, of course, a theoretical or idealized construct because, in practice, both we leave is a take in(s) from whatever(prenominal) population.This necessitated the discussion of the sample turnaround function (SRF). We indeed considered the question of how we really go about fetching the SRF. Here we discussed the popular manner of ordinary least squ ars ( OLS) and presented the appropriate formulas to see the parameters of the PRF. We illustrated the OLS method with a fully worked-out numerical exemplification as well as with roughly(prenominal) possible examples. Our next travail is to remark out how good the SRF obtained by OLS is as an estimator of the trustworthy PRF. We undertake this important task in Chapter 3.3) The Two-Variable Model Hypothesis scrutinyIn Chapter 2 we showed how to estimate the parameters of the two- multivariate linear retroversion framework. In this chapter we showed how the estimated pose suffer be utilise for the purpose of drawing inferences about the adjust population recedeion role model. Although the two-variable model is the simplest possible linear regression model, the ideas introduced in these two chapters atomic human body 18 the foundation of the to a greater extent than involved aggregate regression models that we leave discuss in ensuing chapters. As we go out see, in ma ny ways the multiple regression model is a straight extension of the two-variable model.4) Multiple Regression Estimation and Hypothesis TestingIn this chapter we considered the simplest of the multiple regression models, namely, the three-variable linear regression modelone bloodsucking variable and two instructive variables. Although in many ways a straightforward extension of the two-variable linear regression model, the three-variable model introduced some(prenominal)(prenominal) newfound concepts, such as partial regression coefficients, adjusted and unadjusted multiple coefficient of design,and multicollinearity. Insofar as love of the parameters of the multiple regression coefficients is concerned, we still worked at bottom the framework of the classical linear regression model and used the method of ordinary least self-coloureds (OLS). The OLS estimators of multiple regression, like the two-variable model, possess several(prenominal) loveable statistical properties summed up in the Gauss-Markov property of outdo linear straightforward estimators (BLUE).With the assumption that the disturbance term bonds the normal distri stick outdion with zero crocked and constant variance 2, we saw that, as in the two-variable boldness, distri furtherively estimated coefficient in the multiple regression fits the normal distribution with a retrieve(a) comp ar to the true population appraise and the variances given by the formulas developed in the text. Unfortunately, in practice, 2 is not known and has to be estimated. The OLS estimator of this unsung variance is . But if we replace 2 by , then, as in the two-variable case, apiece estimated coefficient of the multiple regression follows the t distribution, not the normal distribution. The acquaintance that each multiple regression coefficient follows the t distribution with d.f. equal to (n k), where k is the turn of parameters estimated (including the intercept), means we burn down use the t distribution to riddle statistical hypotheses about each multiple regression coefficient item-by-itemly.This fucking be done on the basis of either the t essay of conditional relation or the confidence interval ground on the t distribution. In this respect, the multiple regression model does not differ much from the two-variable model, chuck out that proper allowance must be made for the d.f., which now front on the number of parameters estimated. However, when testing the hypothesis that all partial pitch coefficients argon coincidingly equal to zero, the individual t testing referred to forward is of no help.Here we should use the analysis of variance (ANOVA) technique and the concomitant F test. Incidentally, testing that all partial slope coefficients atomic number 18 simultaneously equal to zero is the same as testing that the multiple coefficient of determination R2 is equal to zero. Therefore, the F test eject in addition be used to test this latter but equiva lent hypothesis. We also discussed the question of when to leave a variable or a group of variables to a model, use either the t test or the F test. In this stage setting we also discussed the method of restricted least squ bes.5) Functional Forms of Regression ModelsIn this chapter we considered models that ar linear in parameters, or that can be rendered as such with suitable transformation, but that argon not necessarily linear in variables. There argon a variety of such models, each having special applications. We considered five major types of nonlinear-in-variable but linear-in-parameter models, namely 1.The log-linear model, in which some(prenominal) the restricted variable and the explanatory variable atomic number 18 in logarithmic form. 2.The log-lin or growth model, in which the symbiotic variable is logarithmic but the in bloodsucking variable is linear. 3.The lin-log model, in which the certified variable is linear but the in hooked variable is logarithmic. 4. The reciprocal model, in which the dependent variable is linear but the independent variable is not. 5.The polynominal model, in which the independent variable enters with variant powers. Of course, in that respect is nothing that prevents us from combining the rollick articles of one or to a greater extent(prenominal) of these models.Thus, we can have a multiple regression model in which the dependent variable is in log form and some of the X variables argon also in log form, but some argon in linear form. We studied the properties of these various models in terms of their relevance in applied research, their slope coefficients, and their elasticity coefficients. We also showed with several examples the situations in which the various models could be used. Needless to say, we will come across several more examples in the lieder of the text. In this chapter we also considered the regression-through-the-origin model and discussed some of its features. It cannot be overemphasi zed that in choosing among the competing models, the overriding objective should be the economic relevance of the various models and not merely the summary statistics, such as R2.Model constructing requires a proper balance of theory, availability of the appropriate data, a good understanding of the statistical properties of the various models, and the elusive quality that is called practical judgment. Since the theory underlying a topic of interest is never perfect, there is no such thing as a perfect model. What we fancy for is a reasonably good model that will balance all these criteria. Whatever model is chosen in practice, we have to gift blow-by-blow attention to the units in which the dependent and independent variables are expressed, for the interpretation of regression coefficients whitethorn hinge upon units ofmeasurement.6) Dummy Variable Regression ModelsIn this chapter we showed how qualitative, or dummy, variables taking determine of 1 and 0 can be introduced in to regression models alongside quantitative variables. As the various examples in the chapter showed, the dummy variables are essentially a data-classifying device in that they divide a sample into various subgroups ground on qualities or attributes (sex, marital status, race, religion, etc.) and implicitly diddle individual regressions for each subgroup. Now if there are differences in the responses of the dependent variable to the variation in the quantitative variables in the various subgroups, they will be reflected in the differences in the intercepts or slope coefficients of the various subgroups, or both. Although it is a versatile tool, the dummy variable technique has to be handled carefully. First, if the regression model contains a constant term (as most models usually do), the number of dummy variables must be one less than the number of classifications of each qualitative variable.Second, the coefficient attached to the dummy variables must always be taken in relatio n to the control, or benchmark, groupthe group that gets the shelter of zero. Finally, if a model has several qualitative variables with several classes, introduction of dummy variables can consume a large number of degrees of freedom (d.f.). Therefore, we should weigh the number of dummy variables to be introduced into the model against the total number of observations in the sample. In this chapter we also discussed the possibility of committing a specification shift, that is, of fitting the wrong model to the data. If intercepts as well as slopes are expected to differ among groups, we should build a model that incorporates both the differential intercept and slope dummies.In this case a model that introduces only the differential intercepts is in all likelihood to spark advance to a specification mistake. Of course, it is not always easy a priori to find out which is the true model. Thus, some amount of experimentation is mandatory in a cover study, especially in situation s where theory does not provide much guidance. The topic of specification error is discussed further in Chapter 7. In this chapter we also briefly discussed the linear probability model (LPM) in which the dependent variable is itself binary. Although LPMcan be estimated by ordinary least square (OLS), there are several troubles with a routine application of OLS. well-nigh of the problems can be resolved easily and some cannot. Therefore, alternative estimating procedures are needed. We mentioned two such alternatives, the logit and probit models, but we did not discuss them in peck of the about advanced nature of these models (but see Chapter 12).7) Model Selection Criteria and TestsThe major points discussed in this chapter can be summarized as follows 1.The classical linear regression model assumes that the model used in empirical analysis is mighty specified. 2.The term correct specification of a model can mean several things, including a.No theoretically relevant variable h as been excluded from the model. b.No unnecessary or foreign variables are included in the model. c.The structural form of the model is correct.d.There are no errors of measurement.3.If a theoretically relevant variable(s) has been excluded from the model, the coefficients of the variables retained in the model are in the main unilateral as well as in accordant, and the error variance and the threadbare errors of the OLS estimators are biased. As a result, the conventional t and F tests remain of questionable value. 4.Similar consequences ensue if we use the wrong functional form. 5.The consequences of including irrelevant variables(s) in the model are less serious in that estimated coefficients still remain unbiased and consistent, the error variance and model errors of the estimators are correctly estimated, and the conventional hypothesis-testing procedure is still valid. The major penalty we pay is that estimated standard errors tend to be relatively large, which means par ameters of the model are estimated rather imprecisely.As a result, confidence intervals tend to be somewhat wider. 6.In view of the potential seriousness of specification errors, in this chapter we considered several diagnostic tools to help us find out if we have the specification error problem in any concrete situation. These tools include a lifelike examination of the residuals and more formal tests, such as MWD and RESET. Since the search for a theoretically correct model can be exasperating, inthis chapter we considered several practical criteria that we should keep in mind in this search, such as (1) parsimony, (2) identifiability, (3) goodness of fit, (4) theoretical consistency, and (5) predictive power. As Granger notes, In the last analysis, model building is probably both an art and a science. A sound knowledge of theoretical econometrics and the availability of an efficient computer class are not enough to ensure success.8) Multicollinearity What Happens If Explanato ry Variables are match? An important assumption of the classical linear regression model is that there is no exact linear relationship(s), or multicollinearity, among explanatory variables. Although cases of exact multicollinearity are rare in practice, situations of near exact or high multicollinearity bump frequently. In practice, therefore, the term multicollinearity refers to situations where two or more variables can be highly linearly related. The consequences of multicollinearity are as follows. In cases of perfect multicollinearity we cannot estimate the individual regression coefficients or their standard errors. In cases of high multicollinearity individual regression coefficients can be estimated and the OLS estimators retain their BLUE property.But the standard errors of one or more coefficients tend to be large in relation to their coefficient values, thereby reducing t values. As a result, based on estimated t values, we can say that the coefficient with the low t va lue is not statistically different from zero. In other words, we cannot assess the peripheral or individual contribution of the variable whose t value is low. mean that in a multiple regression the slope coefficient of an X variable is the partial regression coefficient, which measures the (marginal or individual) effect of that variable on the dependent variable, holding all other Xvariables constant.However, if the objective of study is to estimate a group of coefficients pretty accurately, this can be done so long as collinearity is not perfect. In this chapter we considered several methods of detecting multicollinearity, pointing out their pros and cons. We also discussed the various remedies that have been proposed to solve the problem of multicollinearity and noted their strengths and weaknesses. Since multicollinearity is a feature of a given sample, we cannot foretell which method of detecting multicollinearity or which healing(p) measure will work in any given concrete s ituation.9) Heteroscedasticity What Happens If the Error Variance Is Nonconstant? A tiny assumption of the classical linear regression model is that the disturbances ui all have the same (i.e., homoscedastic) variance. If this assumption is not satisfied, we have heteroscedasticity. Heteroscedasticity does not destroy the unbiasedness property of OLS estimators, but these estimators are no longer efficient. In other words, OLS estimators are no longer BLUE. If heteroscedastic variances i2 are known, then the method of weighted least squares (WLS) provides BLUE estimators. Despite heteroscedasticity, if we pass over to use the usual OLS method not only to estimate the parameters (which remain unbiased) but also to establish confidence intervals and test hypotheses, we are likely to draw misleading conclusions, as in the NYSE Example 9.8. This is because estimated standard errors are likely to be biased and therefore the resulting t ratios are likely to be biased, too.Thus, it is im portant to find out whether we are face with the heteroscedasticity problem in a specific application. There are several diagnostic tests of heteroscedasticity, such as plotting the estimated residuals against one or more of the explanatory variables, the Park test, the Glejser test, or the rank correlation test (See Problem 9.13). If one or more diagnostic tests reveal that we have the heteroscedasticity problem, remedial measures are called for. If the true error variance i2 is known, we can use the method of WLS to obtain BLUE estimators. Unfortunately, knowledge about the true error variance is seldom available in practice.As a result, we are forced to lay down some plausible assumptions about the nature of heteroscedasticity and to transform our data so that in the transformed model the error term is homoscedastic. We then present OLS to the transformed data, which amounts to using WLS. Of course, some skill and experience are required to obtain the appropriate transformatio ns. But without such a transformation, the problem of heteroscedasticity is water-insoluble in practice. However, if the sample size is reasonably large, we can use sportsmanlikes procedure to obtain heteroscedasticity-corrected standard errors.10) Autocorrelation What Happens If Error Terms are Correlated? The majorpoints of this chapter are as follows1.In the presence of autocorrelation OLS estimators, although unbiased, are not efficient. In minuscule, they are not BLUE. 2.Assuming the Markov first-order autoregressive, the AR(1), scheme, we pointed out that the conventionally computed variances and standard errors of OLS estimators can be seriously biased. 3.As a result, standard t and F tests of meaning can be seriously misleading. 4.Therefore, it is important to know whether there is autocorrelation in any given case. We considered three methods of detecting autocorrelation a.graphical plotting of the residualsb.the runs testc.the Durbin-Watson d test5.If autocorrelation is found, we suggest that it be corrected by appropriately transforming the model so that in the transformed model there is no autocorrelation. We illustrated the actual mechanics with several examples.11) Simultaneous Equation ModelsIn contrast to the single equality models discussed in the preceding chapters, in simultaneous equality regression models what is a dependent (endogenous) variable in one par appears as an explanatory variable in another equation. Thus, there is a feedback relationship surrounded by the variables. This feedback creates the simultaneity problem,rendering OLS inappropriate to estimate the parameters of each equation each. This is because the endogenous variable that appears as an explanatory variable in another equation may be correlated with the stochastic error term of that equation. This violates one of the critical assumptions of OLS that the explanatory variable be either fixed, or nonrandom, or if random, that it be uncorrelated with the err or term. Because of this, if we use OLS, the estimates we obtain will be biased as well as inconsistent. Besides the simultaneity problem, a simultaneous equation model may have an identification problem.An identification problem means we cannot uniquely estimate the values of the parameters of an equation. Therefore, before we estimate a simultaneous equation model, we must find out if an equation insuch a model is identified. One cumbersome method of finding out whether an equation is identified is to obtain the lessen form equations of the model. A reduced form equation expresses a dependent (or endogenous) variable wholly as a function of exogenous, or pre determine, variables, that is, variables whose values are determined outside the model. If there is a one-to-one correspondence between the reduced form coefficients and the coefficients of the original equation, then the original equation is identified. A cutoff to determining identification is via the order condition of i dentification. The order condition counts the number of equations in the model and the number of variables in the model (both endogenous and exogenous).Then, based on whether some variables are excluded from an equation but included in other equations of the model, the order condition decides whether an equation in the model is underidentified, on the dot identified, or overidentified. An equation in a model is underidentified if we cannot estimate the values of the parameters of that equation. If we can obtain unique values of parameters of an equation, that equation is said to be but identified. If, on the other hand, the estimates of one or more parameters of an equation are not unique in the sense that there is more than one value of some parameters, that equation is said to be overidentified. If an equation is underidentified, it is a dead-end case. There is not much we can do, short of changing the specification of the model (i.e., developing another model).If an equation is exactly identified, we can estimate it by the method of indirect least squares (ILS). ILS is a two-step procedure. In step 1, we apply OLS to the reduced form equations of the model, and then we retrieve the original structural coefficients from the reduced form coefficients. ILS estimators are consistent that is, as the sample size increases indefinitely, the estimators converge to their true values. The parameters of the overidentified equation can be estimated by the method of two-stage least squares (2SLS). The basic idea place 2SLS is to replace the explanatory variable that is correlated with the error term of the equation in which that variable appears by a variable that is not so correlated. Such a variable is called a proxy, or instrumental, variable.2SLS estimators, like the ILS estimators, are consistent estimators.12) Selected Topics in Single Equation Regression ModelsIn this chapter we discussed several topics of considerable practical importance. The first topic we discussed was dynamic modeling, in which meter or lag explicitly enters into the analysis. In such models the current value of the dependent variable depends upon one or more lagged values of the explanatory variable(s). This dependence can be due to psychological, technological, or institutional reasons. These models are generally known as distributed lag models. Although the inclusion of one or more lagged terms of an explanatory variable does not violate any of the standard CLRM assumptions, the estimation of such models by the usual OLS method is generally not recommended because of the problem of multicollinearity and the fact that every additional coefficient estimated means a loss of degrees of freedom. Therefore, such models are usually estimated by imposing some restrictions on the parameters of the models (e.g., the values of the various lagged coefficients decline from the first coefficient onward).This is the approach adopt by the Koyck, the adaptive expectations, and the partial, or stock, adjustment models. A unique feature of all these models is that they replace all lagged values of the explanatory variable by a single lagged value of the dependent variable. Because of the presence of the lagged value of the dependent variable among explanatory variables, the resulting model is called an autoregressive model. Although autoregressive models achieve economy in the estimation of distributed lag coefficients, they are not free from statistical problems. In particular, we have to guard against the possibility of autocorrelation in the error term because in the presence of autocorrelation and the lagged dependent variable as an explanatory variable, the OLS estimators are biased as well as inconsistent.In discussing the dynamic models, we pointed out how they help us to assess the short- and long-run impact of an explanatory variable on the dependent variable. The next topic we discussed related to the phenomenon of spurious, or nonsense, regressio n. Spurious regression arises when we regress a nonstationary random variable on one or more nonstationary random variables. A time series is said to be (weakly) stationary, if its mean, variance, and covariances at various lags are not time dependent. To find out whether a time series is stationary, we can use the unit root test. If the unit root test (or other tests) shows that the time series of interest is stationary,then the regression based on such time series may not be spurious. We also introduced the concept of cointegration. Two or more time series are said to be cointegrated if there is a stable, long-term relationship between the two even though individually each may be nonstationary.If this is the case, regression involving such time series may not be spurious. Next we introduced the random walk model, with or without drift. Several financial time series are found to follow a random walk that is, they are nonstationary either in their mean value or their variance or bo th. Variables with these characteristics are said to follow stochastic trends. Stock prices are a prime example of a random walk. It is hard to tell what the price of a stock will be tomorrow just by knowing its price today. The best guess about tomorrows price is todays price plus or minus a random error term (or shock, as it is called). If we could predict tomorrows price fairly accurately, we would all be millionairesThe next topic we discussed in this chapter was the dummy dependent variable, where the dependent variable can take values of either 1 or 0. Although such models can be estimated by OLS, in which case they are called linear probability models (LPM), this is not the recommended procedure since probabilities estimated from such models can sometimes be negative or greater than 1. Therefore, such models are usually estimated by the logit or probit procedures. In this chapter we illustrated the logit model with concrete examples. convey to excellent computer packages, es timation of logit and probit models is no longer a rich or forbidding task.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment