Segmentation of Model Localization Sub-areas by Getis Statistics

Models for large areas (global models) are often biased in smaller sub-areas, even when the model is unbiased for the whole area. Localization of the global model removes the local bias, but the problem is to find homogenous sub-areas in which to localize the function. In this study, we used the eCognition Professional 4.0 (later versions called Definies Pro) segmentation process to segment the study area into homogeneous sub-areas with respect to residuals of the global model of the form height and/or local Getis statistics calculated for the residuals, i.e., Gi-indices. The segmentation resulted in four different rasters: 1) residuals of the global model, 2) the local Gi-index, and 3) residuals and the local Gi-index weighted by the inverse of the variance, and 4) without weighting. The global model was then localized (re-fitted) for these sub-areas. The number of resulting sub-areas varied from 4 to 366. On average, the root mean squared errors (RMSEs) were 3.6% lower after localization than the global model RMSEs in sub-areas before localization. However, the localization actually increased the RMSE in some sub-areas, indicating the sub-area were not appropriate for local fitting. For 56% of the sub-areas, coordinates and distance from coastline were not statistically significant variables, in other words these areas were spatially homogenous. To compare the segmentations, we calculated an aggregate standard error of the RMSEs of the single sub-areas in the segmentation. The segmentations in which the local index was present had slightly lower standard errors than segmentations based on residuals.


Introduction
Growing stock properties such as height, volume, and form height are costly, time-consuming, and sometimes impossible to measure in field surveying.The height of only a fraction of all the trees is normally measured, and the volume and the form height are estimated using models.Such a model can be either parametric or non-parametric, and, in addition, can be either a local or a global one."Global" refers to the whole of the area under consideration as a single unit, no matter how large that unit may be.A model is called global if the same model is fitted over the entire area.A local model, however, has at least one parameter which is estimated only for one particular sub-area.
The estimates of the global model may be unbiased over the whole area, but for one or more specific sub-areas they could be badly biased (e.g.Korhonen 1993).For example, there may be unexplained spatial variation in the dependent variable due to one or more variables that for some reason or other cannot be included in the model.The question is how to figure out sub-areas in which this unexplained variation is homogeneous, i.e. the most appropriate size and location of sub-areas for localizing the global model.There are several alternatives for how the localization could be carried out.In this study, we first segment the entire study area and then re-fit the original regression model individually to each sub-area.We do not add new variables to the model; only remove statistically insignificant variables from the original form height model.
Delineation of proper localization sub-areas can be problematic if localization is carried out by local fitting of the models only at certain points or small areas.There are an infinite number of different solutions to this problem.Selecting the optimal ones is, in principle, possible, but not feasible in practise.However, the sizes of the subareas should be large enough, to allow enough observations for local fitting of the model, and the sub-areas should not be too large, as larger sub-areas become increasingly heterogeneous.If the sub-areas are fully homogeneous, it is enough to adjust the level of predictions in each sub-area with a simple correction coefficient.The alternative is to use methods which do not need segmentation into reasonably homogene-ous regions.These can be variance-based, e.g.kriging (Cressie 1991, Köhl andGertner 1997), mixed models including some local-level random effect/s (Kangas and Korhonen 1995), or nonparametric such as knn imputation methods (Sironen et al. 2008) with respect to the form height in the study area.Compared to these methods, localization of a parametric model by local fitting or a local correction coefficient is simple to implement, once the segmentation of the global area to homogeneous sub-areas is carried out.
The local indicators of spatial association (LISA) measure the degree of clustering of similar values in the neighbourhood of the calculation points, called pivots.The first LISAs were introduced in the 1990s by Anselin (1995) (local Moran's I i and local Geary's c i ) and Getis andOrd (1992, Ord andGetis 1995) (G i and G i * ).Both Geary's c i and Moran's I i detect departures from spatial randomness but are calculated differently and so yield different numerical results.G i and G i * are commonly called "hot spot analysis" and indicate where there are spatial clusters of variable values with either high or low z-scores within the whole dataset.These indices have recently been used in many scientific disciplines, for example epidemiology and regional and urban sciences.In forestry, indices have been used to characterize within-stand structure (Wells and Getis 1999), competition between single trees (Shi and Zhang 2003, Wulder et al. 2007), for forest damage detection (Weishampel et al. 2007) and model analysis (Zhang et al. 2005, Wulder et al. 2007).
In an earlier study (Räty and Kangas 2007), we tested the suitability of LISAs such as Moran's I i , Geary's c i , and G i and G i * for delineation of homogenous sub-areas from residuals of the global model.The Moran's I i and Geary's c i could show only a couple of statistically significant clusters which were small in size whereas both Getis statistics, G i and G i * , did indicate several clusters dividing the study area into both positive and negative sub-areas (see Räty and Kangas 2007, Fig. 4).In the second study (Räty and Kangas 2008), we divided the whole study area using a recursive partitioning algorithm (following classification and regression trees (CART) introduced by Breiman et al. 1984) using the G i * -index and/or residuals and spatial coordinates as dividers.The idea in CART is to find the most profitable divisions among the given variables (spatial coordinates), which minimizes the deviance in the dependent variable (the G i * -index and/ or residuals).The result of the dividing process can be presented as a tree, but since the dividers were coordinates the division could also be drawn over a map.Although CART was successful, the results from this method did not differ from division of the study area into the eleven forestry administrative centres (see Räty and Kangas 2008, Fig. 2a), which were used as a baseline for the assessment of the method.The sub-areas that CART delineated were rectangular in shape, where they should have been more flexible and variable in both size and geometry.
In this study, we segment the study area using eCognition 4.0 Professional® (also known as Definiens Pro), which is designed to cluster and segment satellite and aerial images but accept all kinds of raster data layers as well (eCognition... 2009).The software uses general object-oriented image analysis, in which the initial segmentation is based on primary features, such as digital value (Benz et al. 2004).An object is created by progressively merging pixels using a bottom up region-merging technique which is controlled by a heterogeneity variable.Heterogeneity is a combination of colour and shape variables and is limited by a scale parameter, a threshold value for the optimization process in region-merging, acting for instance as a stop criterion.The initial segmentation routine of the software, which we use, has been employed in segmenting the data in various papers (e.g., Guindon et al. 2004, Bock et al. 2005, Yu et al. 2006, Johansen et al. 2007, Van Coillie et al. 2007, Straatsma and Baptist 2008, Triepke et al. 2008, Mustonen et al. 2008).The decision variables in the present study are the same as in the CART-study, the residuals and the G i * -index with combination to the spatial position.
The aim of this study is to evaluate 1) the localization sub-areas formed by the multiresolution segmentation, and 2) to compare these segmentations with segmentations formed by CART.The latter segmentations have been presented in detail in our previous study and here we introduce the results briefly (Räty and Kangas 2008, Table 8).
Having the segmentations done we also wanted 3) to test whether the properties of the local data, such as the range and the variation of the variables in each sub-area, contributed to the success of localization.We compared the values for model outputs and the residuals when they are localized, with the same values when they are not localized.

Materials
Our data is sample tree (Pinus sylvestris) data from the ninth National Forest Inventory of Finland (NFI9) (Fig. 1, Table 1a).The trees were selected from a plot by angle gauge (BAF = 2).We modelled a measure proportional to the form height (f), hereafter simply called form height (Eq.1).
where diameter at breast height (d), and volume (v) are from the tree data and the total basal area (BA) from the plot data.In Eq. 1 are three geo-2800 3000 3200 3400 3600 3800 4000 6600 6800 7000 7200 7400 7600 7800 graphical measures: XC-and YC-coordinates (Eq.2), and the distance from the coastline, RDIST (Eq. 3) calculated as: where X and Y are the geographic coordinates in kilometres, and DIST is the straight-line Euclidian distance from the coastline in kilometres, which are all in Kartastokoordinaattijärjestelmä (KKJ), a national Gauss-Krüger map projection system used in Finland.
The main interest is in the residuals of this global model and later in the residuals of the fitted local models.A residual is the difference between the true value ( f ) and the fitted value ( f ) (Eq. 4): In earlier studies, we have used LISAs in the localization of regression models (Räty andKangas 2007, 2008).Here we used the same local G i * -index (Eq.5) (Getis andOrd 1992, Getis andOrd 1996), where ε j is the residual of the global model (Eq.4), w ij are binary weights, and the neighbourhood is a 20-km circular (see Räty and Kangas 2007 for the neighbourhood selection), i.e., observation j has unit weight if it is less than 20 km distant from pivot i; otherwise it is zero (see Anselin 1995).In the G i * -index, pivot i also belongs to neighbour- hood j.We used the Z-standardized version of the index, ZG i * , hereafter G i * -index or simply G * in tables (Eq.6), where E[G i * ] is the expected value, which is the mean of the G i * -indices, and SD[G i * ] is the standard deviation.

Methods
In preliminary preparations, we interpolated the point data (single trees) of residuals and G i *indices into grids which hereafter we call (data) "layers" with ArcMap® inverse distance weighting (IDW) where the missing pixel values were calculated as a weighted sum of the neighbours and the weights were the inverses of the distances to the power of two between five nearest observations and the pixel (Longley et al. 2005).We then passed these layers to eCognition Professional 4.0® (Definiens Imaging) for multiresolution segmentations (Benz et al. 2004, eCognition... 2009).
In addition, we formed two additional layers as combinations of the former two, namely residuals and the local G i * -index with and without weighting by the inverse of the variance.We produced segmentations for each of the four layers (the abbreviation is in parentheses) by changing the number of sub-areas indirectly via the scale parameter, whose values are shown under the name of the each individual segmentation after the abbreviation in Table 2.This procedure led into several segmentations.The number of segmentations of the layer is mentioned after the content of the segmentation in the following list: 1) Residuals of the global model (RES), six segmentations 2) G i * -index (G * ), 11 segmentations 3) Residuals and G i * -index in layers (Res;G * ), seven segmentations 4) Weighted residuals and G i * -index: here the layers were weighted by the inverse of the layer variances (Weight), six segmentations Increasing the scale parameter decreased the number of sub-areas.The parameter values were not pre-determined since the effect of the scale parameter was not consistent.We used the scale parameter analysis in eCognition to determine the values individually for each of the segmentations.All other decision variables were at their defaults: the shape factor had the value of 0.1 and smoothness/compactness 0.5.The first refers to the weight given to the shape of the sub-area delineated versus that given to the digital value in the layer.Value 0.1 here means that the delinea- tion depended mainly on the digital value, which was appropriate for our purposes, since our homogeneity information was in the layer and we had no interest in limiting the shapes of the sub-areas.
The second parameter was for the smoothness of the borderline versus compactness of the objects delineated.Its value of 0.5 was a compromise; we did not favour either of these two parameters.We allowed the program to do its initial segmentation of the study area without any limitation, and then localized the model (Eq. 1) into the resulting sub-areas.We rejected sub-areas with less than ten observations from the localization since the regression function used had nine variables and a constant, so that with ten observations the original function had no degree of freedom at all.This elimination removed altogether 128 sub-areas from a total of 2682 sub-areas from 14 segmentations for a total of 30 segmentations.However, this limit could have been higher since 30 or more observations are usually recommended for fitting regression.However setting the limit up to 30 would have raised the number of rejected sub-areas up to 472, or18% of all sub-areas.This is why we kept the limit to ten observations.
The re-fitting of the global form height model (Eq. 1) was carried out with the stepping function with the penalty criteria (stepAIC), which can step both backward and forward to the localized model in the process of the variable selection.This function is available in the statistical software R (Venables and Ripley 2002).Each variable in a localized regression model had to reach the p = 0.05 level (statistically significant).The success of the localization was measured by the root mean squared error (RMSE): where ε i is the residual of the global model (Eq.4), and df is the degrees of freedom.The RMSE can be expressed also as a square root of the sum of the squared bias and variance, σ 2 (Eq.7).This RMSE was calculated for both the situation before localization and after it in the sub-areas.The bias was zero for the localized functions since the model was fitted to the local data available.The bias is the mean (Eq.8) of residuals (Eq.4) where loc is the localized function and glo is the global function for a sub-area i.The first comparison above shows the changes in respect to the global level and the latter the actual change in that sub-area under localization in percentages.
One way to assess the success of localization is to look at which variables were included in the localized function (X-, and Y-coordinates, and distance from coastline, RDIST).If spatial variables were included in the localized function, the segmentation failed on the criterion of spatial homogeneity, i.e., the sub-area had spatial variation left.
Besides the segmentation algorithm, the success of localization may also depend on the properties of the local data used for local fitting.The accuracy of the regression model depends in many ways on the design matrix X (see Draper and Smith 1981).Here we assume that the goodness of the data for model fitting into a sub-area can be measured by the balance of the data, the range, and variance of the variables in the sub-areas.By balance, we mean that the distribution of independent variables (d, BA) is uniform (Table 1a).
To test balance, we calculate Shannon's entropy index: where p is the proportion of the number of trees in a particular category, e.g.diameter class or basal area class, to the total number of the trees in a sub-area.For the whole dataset Shannon's entropy index is 1.96 for the diameter and 1.91 for the basal area (Table 1b).The entropy for uniformly distributed data, which has exactly the same number of trees and is divided into as many classes as the present data, would be 2.56 and 2.69 respectively.We compare entropy (Eq.11) with the local RMSEs by calculating correlations between them.
To study the effect of balance further, we divided the sub-areas into classes according to the G i * -index values (Table 3).If G i * + σ < 0, the area was classified as negative.If G i * -σ > 0, the sub-area was classified as positive.The sub-areas fitting neither criterion were classified as neutral.We calculated correlations between the localized RMSEs of the class and the Shannon's entropy index for the diameter and properties of the variables in the sub-area.
To compare different segmentations across the whole area, we calculated an aggregate estimate of the standard error for the area as a weighted sum of squared RMSEs for all points within the area, i.e., mean squared errors (MSEs), se aggr : where m is the number of sub-areas in a segmentation and n i is the number of observations in sub-area i.

Results
For the localized models the RMSEs had a median and mean 3-4% lower than the global RMSE of 0.1027 dm 3 /cm 2 (Eq. 1) (Table 4).The minimum RMSEs were 39-42% of global model values whereas the maximum RMSE was one and a half times the global model value (ΔRMSE).The minimum, mean and median RMSEs all increased when the number of sub-areas decreased.For the maximum RMSEs, the effect was the reverse, the RMSE decreasing as the number of sub-areas decreased.The localized RMSEs (ΔRMSE i ) were also 3-4% lower (mean and median) than the RMSE before localization in the sub-areas, but the variation in these values was 9.4% (-8.5 to 0.9%) when the range for the ΔRMSE was 14.7% (-10.9 to 3.8%).In other words, in small sub-areas both the greatest benefits and the drawbacks of localization are the most probable.
Attaining spatial homogeneity in sub-areas was one target.The percentage of localized functions without spatial variables in the sub-areas varied from 0 to 75.At first, the percentage increased as the number of sub-areas increased, rising above 50% after 40 sub-areas and reaching its maximum of 60%, between 100 and 200 sub-areas.Finally, the percentage lacking spatial variables saturated at the level of 55% (Fig. 2a).The number of variables in a localized function decreased as the number of sub-areas increased (Fig. 2b), but the change was slight.The localized function had three variables after 50 sub-areas and two variables after 90 sub-areas.The number of spatial variables in all localized functions dropped from a maximum of three to zero after 50 sub-areas (Fig. 2c).We have studied here the median values for the variables and thus there may be some localized functions at any number of sub-areas which have some spatial variables left (Fig. 2d).Therefore it is not possible to create fully homogenous sub-areas with the data layers and segmentation criteria used.
Our hypothesis was that the uniformity of the distribution of variables (balance) has an impact on the localization and, as a measure of this balance, we calculated Shannon's entropy index for diameter and basal area distributions in the subareas.For these two indices and for the ranges and variances of the diameter and basal area, we calculated the correlations to the localized  5a).For the diameter, both Shannon's entropy index and the variance had from medium to large positive correlation with RMSE i,loc, i.e., the variance and the entropy index were large for large RMSEs.One would expect that both correlations would be negative.
In the all three classes of sub-area classification (Table 3), the entropy index showed medium strength positive correlation with localized RMSE i,loc (Table 5b).The evenness in distribution of the diameters has therefore no positive impact on the localization result in the studied case.Two correlations were large and positive: that of the localized RMSE i,loc with global RMSE i,glo and that of the change in localization ΔRMSE i (Eq.9).If we look at the localized models and their variables with this same classification into positive, negative or neutral, we can see differences between the models (Table 6).In neutral subareas, where the mean G i * -index did not differ significantly from zero, the localized models had more variables in general and the occurrence percentages for all variables in the models were higher (except for YC 2 ) than for the models in the two other classes.The largest differences  were in the occurrences of the basal area (BA) and diameter (d and d 2 ) variables, which were considerably higher for neutral sub-areas than in the two other classes, i.e., in neutral sub-areas the differences in form height were explained by tree and stand variables more often than in positive or negative sub-areas.In other words, there was more heterogeneity in form height in neutral classes than in the other two classes.This is a logical result since a sub-area belonging to the neutral class can be by definition either a cluster of near zero values or mixture of miscellaneous G i * -indices having their mean near zero.
Evaluating the segmentations could be done via aggregate standard errors.The minimum was se aggr = 0.0967 dm 3 /cm 2 when the residual layer was segmented using the scale parameter value 14 (Res14) (Table 7); i.e., the residual layer was split into 73 sub-areas.There were altogether 14 segmentations that had a se aggr below the 0.1 RDIST = Euclidian distance from coastline XC = X-coordinate dm 3 /cm 2 level.All four layers based on residual and/or G i * -index alone or together were present in that group and segmentations had from nine to 265 sub-areas.Four residual layer segmentations are among five with the highest aggregated errors, however.The maximum se aggr , 0.1031 dm 3 /cm 2 , was reached with a residual layer split into 22 sub-areas by scale parameter value 27 (RES27).Among the ten highest aggregate errors for segmentations, there were segmentations from all four layers having from six to 62 sub-areas.

Discussion
We have segmented the study area for smaller localization sub-areas by different methods.By changing the variable layers on which the segmentation was based and (by changing region merging criteria) the number of the sub-areas, we ended up having 30 different segmentations for the study area with 4 to 366 sub-areas (Table 2).As in earlier research (Räty and Kangas 2008), the G i * -index turned out to be more efficient as the area divider than the residuals of the global model.The changes in RMSEs for the G i * -index segmentations were larger, i.e., the localized RMSEs were smaller (Table 4), as were the aggregate standard errors in general (Table 7).This implies that a variable, in this case the local G i * -index, which averages the residuals over a larger neighbourhood, reveals the trends hidden in the residuals.
A known deficiency of LISAs is that they cannot identify clusters of variable values near zero or mean value of the variables over the entire layer (Tiefelsdorf and Boots 1997).In this study the mean is zero for residuals since the fitted regression model has no bias.For localization area delineation, this is not a good feature because an index value near zero can mean either a homogenous or heterogeneous surrounding.The classification of sub-areas according to G i * -index into three classes, negative, neutral, and positive (Table 3), showed differences in the RMSEs and the changes in RMSEs in the localization (Table 5b).The "neutral" indices are those with values around 0, which might be expected to illustrate the difficulties of LISAs.Firstly, there were about twice as many both negative and positive sub-areas as neutral ones, and the neutral sub-areas were three times larger than the others in all segmentations (Fig. 3).This is partly due to the number of sub-areas in the segmentations.When the number decreased, the sub-areas became larger, lost homogeneity and were classified as neutral.There will thus be segmentations where all sub-areas area classified as neutral (Weight36, Weight56, RES47), or where one of the three classes is missing (G*103, G*130) or is small in size compared to the others (RES35) (Fig. 4).If these sub-areas are excluded from the calculations, the mean area for neutral sub-areas drops to less than twice that of the others.
Secondly, looking at the sub-areas with a G i * -index near zero reveals that the correlations between the values for Shannon's entropy index performed on tree diameter as an underlying variable and the localized RMSE i,loc , and between localized RMSE i,loc and original RMSE i,glo are higher in sub-areas classifi ed as either positive or negative G i * -index (Table 5b), than for the sub-areas in the neutral group.The fi rst correlation suggests that the balance in data (i.e., even distribution of diameters) does not lead to lower RMSEs in the models localized to sub-areas.
The second implies that the localized RMSE i,loc is similar to the original global RMSE i,glo in that an area with an originally high RMSE stays high in respect to the other localized RMSEs, whereas an originally low RMSE is low in respect to the other localized RMSEs.This is a logical result; the G i * -index near zero by defi nition means that there is nothing that could be fi xed by re-fi tting the model containing exactly the same variables as the original one.The correlation between the Shannon's entropy index and the number of distribution classes in the local data was large and positive.The disadvantage of Shannon's entropy index is that the natural logarithm of zero is undefi ned and therefore the index cannot be defi ned  for empty distribution classes and we had to omit the empty classes from the calculations.The original data here had 13 diameter classes (Table 1b), but for the single sub-areas there were from 4 to 12 classes; i.e., none of the sub-areas had objects in all available classes.All Shannon's entropy indices would have been undefined had they been calculated without our constraint.Therefore Shannon's entropy index is not appropriate to measure balance; the actual number of classes used in the calculations should somehow be included in the index.Thirdly, the RMSEs alone do not necessarily show the relative importance of the various variables in the data is following the localization.Earlier in this study we draw percentages of the models which have no spatial variables and the length of the localized models (Fig. 2a, b).Tree and stand variables were included more often than spatial variables in the localized models.The closer look at the variables in the localized models (Table 6) showed that the heterogeneity in form height in the neutral sub-areas was larger than in positive or negative sub-areas.When the segmentations with mainly neutral sub-areas were removed, the remaining neutral class became similar to the other two classes with respect to form height. Wulder and Boots (1998) suggested that Getis statistics could be used for image segmentation.Mallinis et al. (2008) used LISA measures and classification and regression trees (CART) in the classification phase after the initial segmentation with eCognition.We have tested these methods separately (Räty and Kangas 2008) (Tables 4, 7, 8).With fewer than 30 sub-areas, the RMSEs were at the same level for both methods, but where more than 30 sub-areas were produced; the maximum RMSEs of CART tended to be higher than those for segmentations of the same size arrived at in the current research.At the same time, the minimum and median RMSEs for the current research's sub-area were lower than for the CART-sub-area.On the other hand, there was no difference between methods in the aggregate standard errors, where the methods were equivalent.In comparison to CART, sub-areas created by our current division procedure are more variable in shape and size.For an example, see Fig. 4, which compares the sub-areas created by the current research's method, using different input layers, but with the same number of subareas in each).
In three of the four segmentations the layer is interpolated from a point data of the local indicators which are calculated from a 20-km neighbourhood.One data point in a dataset actually represents properties of its surroundings and the indicators next to each other are correlated if their neighbourhoods overlap.We use the indicator only for segmentation purposes to show the changes in the values in the layer and do not use them in any calculations.Therefore the above mentioned two properties are not crucial for the work, however their existence is good to realize.The indicators were calculated only for the variable of interest, i.e. the form height of a tree or actually for the residuals of the globally fitted form height model, since we wanted to find homogeneous sub-areas for it.The other variables related to the model are not taken into account in any steps of the segmentation.Their values and properties are used for the model localization and studied afterwards.The method is not meant to be able to form unchangeable sub-areas over time.The sub-areas segmented are unarguably valid only for this data, variable, and model.However, as the differences between CART and the current study were quite small, most probably the same sub-areas are applicable for many different tree-level variables.The method, on the contrary, is adoptable for other datasets, variables, and models.It would also be possible to use a weighted combination of RMSEs of several different models as a reference point, and consider several models at the same time.
To conclude, our purpose was to use the eCognition Professional 4.0 software to segment our study area using various inputs and segmentation parameters, and to document and analyze the effects of this segmentation on the localized form height regression models.The minimum local RMSEs are, after all, promisingly low, implying that the method did identify uniform sub-areas.The maximum local RMSEs are high, however.The methods for improving the estimates are a subject for future study.This could be as simple as, for example, adding variables already available in the inventory data (e.g., site index, site fertility, and temperature sum), or digital terrain models to the segmentation method and/or the regression model.Judging the segmentations we found that a segmentation containing a layer of G i * -index had a lower aggregate standard error that the segmentation based only on the residuals.When we classified the sub-areas into three classes according to the mean G i * -index value and its variance in the sub-area, we discovered that those sub-areas whose mean G i * -index differed from zero i.e. which were classified as either positive or negative, differed from neutral ones, whose mean G i * -index were near zero.Since neutral sub-areas were more heterogeneous than the negative and positive sub-areas, a possible application could be to use the method presented in this paper to delineate positive and negative sub-areas and then to localize the models only to these sub-areas.

Fig. 2 .
Fig. 2. The relationship between a) percentages of localized functions which have no spatial variables, b) number of variables in localized function, c) number of spatial variables in all localized functions, and d) number of spatial variables in those localized functions versus number of sub-areas in segmentations.The line in both figures is the locally fitted trend line.

Fig. 3 .
Fig. 3.The proportion of positive (dash line) and negative (solid line) sub-areas mean areas to neutral ones as percentages.

Table 1b .
The frequency distributions of diameter and stand basal area for the Scots pine (Pinus sylvestris) sample trees data.

Table 1a .
Basic properties of the Scots pine (Pinus sylvestris) sample trees: diameter at breast height (d), diameter at 6 metres height (d 6 ), height (h), volume (v), basal area (BA), measure proportional to the form height (f), and actual form height (fh).

Table 2 .
For all sub-areas made: the number of subareas (DIV) in them, the name of the segmentation (SEG), and descriptive statistics of the number of trees in sub-areas (Observations).

Table 3 .
The classification of the sub-areas into three classes: negative, neutral, and positive, according to the signs of the mean of G i * -index (Mean) and its standard deviation (SD) in the sub-area.
model RMSE i,loc in sub-areas (Table

Table 4 .
The ratio (in percent) of localization RMSE to the global model RMSE and to the global model RMSE in the same sub-area.

Table 5a .
Correlations between RMSE of localized function and following six measures: Shannon's entropy index (H) for diameter (d) and basal area (BA), ranges (Range) and variances (Var) of diameter and basal area in the sub-areas.

Table 5b .
Sub-areas were classified into three classes: negative, neutral, and positive according to the G i *.Here is the number and total area they represent reported.In the second half of the table are correlations (Cor) between RMSE of localized function (RMSE i,loc ), and Shannon's index for diameter (H(d)), situation before localization (RMSE i,glo ), and change in RMSE in the localization (ΔRMSE i ) for the classes.

Table 6 .
Variables in the regression models after localization when the data is divided into three classes: negative, neutral and positive, according to the G i * -index values and deviation in the sub-areas.The number in the Table is the percentage of the localized models where the variable occurs.

Table 7 .
The aggregate standard errors for the segmentations.