iheartpopla.blogg.se - Residuals in fathom 2

RESIDUALS IN FATHOM 2 SOFTWARE

The column labeled " FITS1" contains the predicted responses, while the column labeled " RESI1" contains the ordinary residuals. , n as the difference between the observed and predicted responses:įor example, consider the following very small (contrived) data set containing n = 4 data points ( x, y). ResidualsĪs you know, ordinary residuals are defined for each observation, i = 1. However, this time, we add a little more detail. Perhaps it is in this sense that one would want to treat the red data point as influential.Previously in Lesson 4 we mentioned two measures that we use to help identify outliers. On the other hand, the red data point did substantially inflate the mean square error.

Incidentally, recall that earlier in this lesson, we deemed the red data point not influential for this example because it did not affect the estimated regression equation all that much. Again, it is "off the chart." Based on studentized residuals, the red data point in this example is deemed influential. The studentized residual for the red data point (6.69013) sticks out like a sore thumb. Again, the studentized residuals appear in the column labeled " TRES1." The studentized residual for the red data point is t 21 = 6.69013.īecause n – k – 2 = 21–1–2 = 18, in order to determine if the red data point is influential, we compare the studentized residual to a t distribution with 18 degrees of freedom: Let's return to the Example #2 data set ( influence2.txt):įor the sake of saving space, I intentionally only show the output for the first three and last three observations. Based on studentized residuals, the red data point is deemed influential.Īnother example. But, the studentized residual for the fourth (red) data point (–19.799) sticks out like a very sore thumb. Three of the studentized residuals - –1.7431, 0.1217, and, 1.6361 - are all reasonable values for this distribution. We see that almost all of the t values for this distribution fall between -4 and 4. Looking at a plot of the t distribution with 1 degree of freedom: Therefore, the t distribution has 4 – 1 – 2 = 1 degree of freedom. If a data point's studentized residual is extreme-that is, it sticks out like a sore thumb-then the data point is deemed influential. That is, all we need to do is compare the studentized residuals to the t distribution with ( n – k – 2 ) degrees of freedom. To do that we rely on the fact that, in general, studentized residuals follow a t distribution with ( n– k–2) degrees of freedom. Now we just have to decide if this is large enough to deem the data point influential.

RESIDUALS IN FATHOM 2 SOFTWARE

Regressing y on x and requesting the studentized residuals, we obtain the following software output:Īs you can see, the studentized residual (" TRES1") for the red data point is t 4 = -19.7990. Let's return to our example with n = 4 data points (3 blue and 1 red): If an observation has a studentized residual that is larger than 3 (in absolute value) we can call it an outlier. In general, studentized residuals are going to be more effective for detecting outlying Y observations than standardized residuals. Where \(r_i\) is the i th standardized residual, n = the number of observations, and k = the number of predictors.

y i denote the observed response for the i th observation, and.

Standardizing the deleted residuals produces studentized residuals. Then, we compare the observed response values to their fitted values based on the models with the i th observation deleted.

The basic idea is to delete the observations one at a time, each time refitting the regression model on the remaining n–1 observations.

To address this issue, studentized residuals offer an alternative criterion for identifying outliers. When trying to identify outliers, one problem that can arise is when there is a potential outlier that influences the regression model to such an extent that the estimated regression function is "pulled" towards the potential outlier, so that it isn't flagged as an outlier using the standardized residual criterion. So far, we have learned various measures for identifying extreme x values (high leverage observations) and unusual y values (outliers).