If the covariance of the errors is unknown, one can get a consistent estimate of , say. [3] One strategy for building an implementable version of GLS is the Feasible Generalized Least Squares (FGLS) estimator. In FGLS, we proceed in two stages: (1) the model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors covariance matrix (to do so, we often need to examine the model adding additional constraints, for example if the errors follow a time series process, we generally need some theoretical assumptions on this process to ensure that a consistent estimator is available); and (2) using the consistent estimator of the covariance matrix of the errors, we implement GLS ideas.

For example, going back to our height prediction scenario, there may be more variation in the heights of people who are ten years old than in those who are fifty years old, or there more be more variation in the heights of people who weight 100 pounds than in those who weight 200 pounds. The upshot of this is that some points in our training data are more likely to be effected by noise than some other such points, which means that some points in our training set are more reliable than others. We don’t want to ignore the less reliable points completely (since that would be wasting valuable information) but they should count less in our computation of the optimal constants c0, c1, c2, …, cn than points that come from regions of space with less noise. To do this one can use the technique known as weighted least squares which puts more “weight” on more reliable points. In practice though, since the amount of noise at each point in feature space is typically not known, approximate methods (such as feasible generalized least squares.

Note that for empirical tests, the appropriate W is not known for sure and must be estimated. For this feasible generalized least squares.