Calibrating Predicted Tree Diameter Distributions in Catalonia , Spain

Several probability density functions have been used in describing the diameter distributions of forest stands. In a case where both the stand basal area and number of stems per hectare are assessed, the fitted or predicted distribution is scaled using only one of these variables, with the result that the distribution often gives incorrect values for the other variable. Using a distribution that provides incorrect values for known characteristics means wasting information. Calibrating the distribution so that it is compatible with the additional information on stand characteristics is a way to avoid such wasting. This study examined the effect of calibration on the accuracy of the predicted diameter distributions of the main tree species of Catalonia. The distributions were calibrated with and without considering the prediction errors of the frequencies of diameter classes. When prediction errors were assumed, the calibration was done with and without making allowance for estimation errors in the stand level calibration variables. Calibrated distributions were more accurate than non-calibrated in terms of sums of different powers of diameters. The set of calibration variables that gave the most accurate results included six stand variables: number of trees per hectare, stand basal area, basal-area-weighted mean diameter, non-weighted mean diameter, median diameter, and basal area median diameter. Of the tested three-variable combinations the best was: number of trees per hectare, stand basal area, and basal-area-weighted mean diameter. Means were more useful calibration variables than medians.


Introduction
Catalonian forests are characterized by heterogeneous stands with a large variation in the spatial distribution of trees, tree species composition, number of stems per hectare, diameter distribution, and vertical structure of the stand.Efficient management planning tools for these forests require growth and yield functions that can produce detailed predictions of stand development under different management schedules.Several researchers have recently developed growth and yield models based on an individual tree approach (e.g.Palahí 2002, Palahí et al. 2003, Palahí and Grau Corbí 2003, Trasobares 2003, Trasobares and Pukkala 2004, Trasobares et al. 2004a, Trasobares et al. 2004b) to address this need in Catalonia.However, when only stand-level inventory data are available, estimating the diameter distribution of trees is required in management planning which uses tree wise growth models (Siitonen 1993).This is the prevailing situation for instance in Finland where ocular inventory is used to assess stand-level characteristics.Only a few actual measurements are taken, most of them being relascope counts of stand basal area.Mean or median diameter is assessed by subjectively selecting a mean or median tree and measuring its diameter (e.g.Mehtätalo 2004).
Various probability density functions, such as normal, gamma, Johnson's SB, Gram-Charlier, beta and Weibull, have been used in describing diameter distributions of forest stands (e.g.Cajanus 1914, Bailey and Dell 1973, Maltamo et al. 1995, Hafley and Schreuder 1977, Maltamo et al. 2000, Palahí et al. 2006b).Palahí et al (2006b) compared the beta, Johnson's SB, Weibull and left-truncated Weibull functions for describing the diameter distributions of basal area and number of stems in forest stands of Catalonia.In this study, the left-truncated Weibull function for stand basal area appeared to be the most accurate function.
In a case where both the stand basal area and the number of stems per hectare are assessed, the predicted distribution is often scaled using one of these variables, and the other is only used as an explanatory variable to predict the parameters of the distribution (Kangas and Maltamo 2003).Using a distribution that provides incorrect values for known characteristics means wasting information.Therefore, it seems rational to calibrate the predicted distribution so that it is compatible with any additional information on the stand characteristics (Kangas andMaltamo 2000, 2003).
The aim of this study was to examine how the calibration affects the accuracy of the predicted diameter distributions of stands dominated by the main tree species of Catalonia.The method is based on the calibration estimation method of Deville and Särndal (1992), which has already been studied by Kangas andMaltamo (2000, 2003) in Finnish conditions.In the first variant of calibration procedure (referred to as Method 1), prediction errors of class frequencies and estimation errors of stand level calibration variables were not taken into account.Method 2 considered the prediction errors but not estimation errors in field-assessed calibration variables, and Method 3 took into account both the prediction and estimation errors.
Since there were no models available for predicting the diameter distribution function that has been found to be the best in Catalonia (Palahí et al. 2006b), new parameter prediction models were developed for the main tree species of Catalonia based on the left-truncated Weibull function for basal area (Palahí et al. 2006b).

Material
The data were provided by the Spanish National Forest Inventory (ICONA, 1993a, ICONA, 1993b, ICONA, 1993cand ICONA, 1993d).They consisted of a systematic sample of permanent plots distributed on a square grid of 1 km, with a 10-year re-measurement interval.From the inventory plots over the whole of Catalonia, all plots of the first inventory (1989)(1990) with at least 20 trees were used in this study (see Table 1).This resulted in 3284 plots, the main characteristics of which can be seen in Table 1.One of the eight most common tree species of Catalonia was dominant in 2787 plots.The plots represented all the different stand structures that can be found in Catalonian forests.Most stands were naturally regenerated.
The sampling method of the national forest inventory uses concentric circular plots in which the plot radius depends on the tree's diameter at the breast height (dbh, 1.3 m): 5 m radius is used for trees with dbh between 7.5 and 12.49 cm; 10 m for 12.5-22.49cm; 15 m for 22.5-42.49cm; and 25 m for dbh greater than or equal to 42.5 cm.Because of the use of variable-radius concentric plots, the number of stems and the basal area per hectare represented by the tree were calculated for every measured tree.The fitting of the diameter distributions used these frequencies rather than assuming an equal frequency of every measured tree.

Fitting the Diameter Distributions
The probability density function of the left-truncated Weibull is (Zutter et al. 1986) where t is the truncation diameter, d is dbh, and b and c are parameters.The truncation diameter t was taken as 7.5 cm, which is the smallest diameter measured on the plots used as study material.Parameters b and c were estimated by maximizing the following log-likelihood function: in this study.The same 3284 plots that were used to fit the parameter prediction models were used to test the calibration alternatives.It was assumed that the empirical distributions of the plots represent true diameter distributions of Catalonian stands.It was supposed that although the individual distributions contain sampling error, all those distributions are possible in Catalonian forests as true population values.This is justified taking into account the high variability of stand structure due to partial fire damages, irregular cutting systems, multiple tree species and variable growing conditions.
Once the parameters of the distribution function had been predicted, the function was used to calculate the frequencies of trees in different diameter classes.The lower limit of the first diameter class was equal to the truncation diameter (7.5 cm).As the Weibull distribution has no upper limit, it was assumed that the maximum diameter class is the one beyond which the predicted frequency of the class falls below 0.1 trees per hectare.The frequencies obtained from the truncated Weibull function represented the basal areas of diameter classes.They were converted into number of trees by dividing the frequency by the basal area of the class mid-point tree.
The class frequencies obtained in this way were calibrated using the following variable combinations: In the first two cases, the class frequencies were only scaled so that the total number of trees or the total basal area of the classes equalled with N or G.In the third case, when Method 1 was used, the class frequencies were calibrated by solving the where L is the likelihood function, n is the number of trees on the plot, g i is the basal area represented by tree i (m 2 ha -1 ) and G is the total basal area of trees (m 2 ha -1 ).The IMSL library routine DBCONF (IMSL… 1997) was employed to maximize the logarithm of the likelihood function.

Modelling the Parameters of Distributions
The estimated parameters were regressed using mean stand characteristics as explanatory variables.Species-specific models were fitted for the eight most common species (Table 1).In addition, a general model for all species was fitted, and it was used when none of the eight major species was the dominant species (accounted for at least 50% of stand basal area) of the plot.Due to the fact that the modelling data only seldom included stand age, and since the goal was to develop models that could be implemented in any-aged stands, variables such as site index or stand age were not used as explanatory variables.Instead, after graphical analyses, stand basal area (G), number of trees per hectare (N) and their transformations and combinations (like the quadratic mean diameter, Dq G N = 40000/π / ( ) ) together with geo-topographical variables available for the plots (elevation, slope, aspect, etc) were used as potential predictors.Out of the variables of the latter group, only elevation was a significant predictor.Linear regression analysis and the ordinary least squares (OLS) technique were used to fit the models.All predictors had to be significant at the 0.05 level and the residuals had to indicate an unbiased model.

Calibrating the Diameter Distribution
The effect of calibration was tested with the predicted diameter distributions.Parameters b and c of the truncated Weibull for the diameter distribution of stand basal area were predicted from N, G and elevation using models developed following optimization problem with the simplex method (see Deville andSärndal 1992, Kangas andMaltamo 2000): where s i + and s i -measure how much the calibrated frequency of diameter class i (w i ) exceeds (s i + ) or falls short of (s i -) the non-calibrated frequency (f i ); I is the number of diameter classes; and g i , w i , f i and d i are, respectively, the tree basal area (m 2 ), calibrated frequency, non-calibrated frequency and mid-point diameter (cm) of diameter class i. One-cm diameter classes were used with Method 1.
The calibration adjusted the frequencies of diameter classes so that the total number of trees per hectare (N), stand basal area (G), when calculated from the mid-point trees of diameter classes, agreed with the measured values of these characteristics.When the mean diameters were used as additional calibration variables the following constraints were added to the problem formulation (Kangas andMaltamo 2000, Pukkala andMiina 2005): In the last problem, where N, G, D g , D, D gM , and D M were all used to calibrate the distribution, all the constraints were simultaneously included in the problem formulation.
When Method 2 was used in calibration, the prediction errors of class frequencies were taken into account as proposed by Mehtätalo (2004).The objective function was now: where σ i is the standard deviation of the prediction error of tree frequency in diameter class i.
The prediction errors were calculated for 5-cm diameter classes using the whole study material of 3284 plots (Fig. 1A).Therefore, Method 2 used 5-cm diameter classes also in calibration.The prediction errors were used to calculate the relative RMSEs (root mean square errors) of the class frequencies (Fig. 1B).These were used to derive the standard deviations (σ i ) in Eq. 11 (predicted frequency × relative RMSE).The constraints in Method 2 were the same as in Method 1. Method 3 was otherwise similar to Method 2 except that the assumed estimation errors in calibration variables were also taken into account.In the absence of information from Spain, Finnish studies were consulted (Table 2 in Haara 2002) and the following relative standard errors were assumed: 30% for N, 20% for G and 15% for all means and medians.Method 3 is the way in which calibration estimation should be used in practical calculations which are based on ocular stand inventory and predicted diameter distribution.It was therefore made more realistic also in terms of prediction errors of class frequencies; they were halved from the values in Fig. 1B because the values in Fig. 1B are in reality only partly due to prediction errors while the other part is sampling error.Method 3 was used only for cases in which there were more than one calibration variable.As preliminary tests suggested that medians are less useful calibration variables than means, medians were not used with Method 3.
When N and G are used as calibration variables with Method 3, the objective function and the constraints for N and G become as follows (the other constraints are the same as before): min ( ) where N + , N -, G + and G -measure how much the number of trees or stand basal area exceeds (N + ,G + ) or falls short of (N -, G -) the field assessed value (N, G).The other calibration problems were formulated in the corresponding way.

Testing the Alternative Calibration Methods
Some of the goal programming problems that were formulated to calibrate the diameter distributions were infeasible.When Method 1 was used, the last problem that included 6 calibration variables could not be solved in about 0.5% of the plots and the problem with N, G and D g was infeasible in about 10 plots.The plots that had at least one unfeasible calibration problem were not used in the comparisons of calibration variables.With Method 2, which used 5-cm diameter classes, the number of adjustable frequencies was small.Therefore, in about 20% of the plots at least one 3-variable problem was infeasible.As many as 85% of the problems were not feasible when there were 6 calibration variables.The plots in which there were no infeasible 3-variable problems were used to compare combinations of 1, 2 or 3 calibration variables, and with 6 calibration variables those plots were used which could be calibrated for all 6 variables.All problems were feasible with Method 3 with a consequence that the comparisons were based on the whole material on 3284 plots.The calibration methods were evaluated in the same way as in Maltamo (1997), by calculating the relative biases and RMSEs (square root of mean squared error) for various diameter sums: where c is the power to which the diameter is raised, D j c is the empirical diameter sum of plot j with power c, D j c is the corresponding diameter sum calculated from the predicted distribution, and N is the number of plots.The empirical diameter sums for plot j were calculated from where n j is the number of trees measured on plot j, n ij is the number of trees per hectare represented by tree i, and d ij the diameter of tree i of plot j.Power c varied from zero to four.Power zero yields the total number of trees per hectare.
The second, third and fourth powers of diameter approximate the stand basal area, stand volume and the economic value of a stand, respectively (Maltamo et al. 1995).To avoid discrepancies caused by other sources of error (e.g.height and volume models) diameter sums were used as the comparison criteria instead of tree volumes (Maltamo et al. 1995).

Parameter Prediction Models
The coefficients of the models for predicting the parameters of the two-parameter truncated Weibull function for the main forest tree species in Catalonia were significant (Tables 3 and  4).The t-values of all parameter estimates were greater than two.N, G, D q , and transformations of these variables were the main predictors in the parameter prediction models.Elevation (Ele) or squared elevation (Ele 2 ) was a significant predictor in the common model, and in the models for Pinus sylvestris, P. uncinata and Quercus ilex (Tables 3 and 4).
The model efficiency (R 2 ), bias and the absolute and relative RMSE for the prediction models of parameters b and c are shown in Table 2.The model efficiency (R 2 ) and precision (RMSE%) were much lower for the models of parameter c than for the models of parameter b.No serious bias was found in any of the parameter prediction models.The precision (RMSE%) of the model for b ranged from 14% in Pinus halepensis to 71% in Abies alba.The precision of the model for c was the highest in Quercus suber (27%) and lowest in Q. ilex (46%).

Effect of Calibration on the Predicted Distributions
Clear differences in the performance of calibration alternatives were found (Tables 5, 6 and 7).The total ranks for bias and precision summarize the overall accuracy of the tested combinations of calibration variables in terms of the diameter sums used as comparison criteria.Only small differences were found in the ranking of sets of calibration variables between Methods 1 and 2 (Tables 5 and 6) and Method 3 also resulted in nearly similar ranking of the tested combinations of calibration variables (Table 7).
The least accurate results were obtained when using only N or G for scaling the predicted diameter distributions.The combination that gave the most accurate results was the one that included most stand variables: N, G, D g , D, D gM , and D M   Table 5.Relative biases and RMSEs of different diameter sums that measure the difference between empirical and calibrated predicted distributions when the prediction errors of class frequencies are not considered (Method 1).D g is the basal-area-weighted mean diameter (cm), D is mean diameter (cm), D gM is the basal area median diameter (cm), D M is the median diameter (cm), n j is the frequency (number of trees per hectare) and d j the mid-point diameter of class i, and R stands for rank.6.Relative biases and RMSEs of different diameter sums that measure the difference between empirical and calibrated distributions when the prediction errors of class frequencies are taken into account (Method 2).D g is the basal-area-weighted mean diameter (cm), D is mean diameter (cm), D gM is the basal area median diameter (cm), D M is the median diameter (cm), n j is the frequency (number of trees per hectare) and d j the mid-point diameter of class i, and R stands for rank.
with Methods 1 and 2, and N, G, D g and D with Method 3.This was followed in the ranking by combination N, G, and D g .The third position in the overall accuracy was for N, G, and D. The non-weighted mean diameter (D) was almost equally useful calibration variable as the basalarea-weighted mean (D g ) in terms of rank sum (Tables 5 and 6), sometimes even better (Table 7).However, when sums of diameters raised to the third or fourth power was looked at, D g was clearly superior to D. Using a median diameter (D M or D gM ) as a calibration variable did not improve much the accuracy of estimating or predicting the diameter distributions when compared to the situation where only G and N were used.Accurate results for both the stand basal area and the total number of trees required that both N and G were used in calibration.Using nonweighted mean diameter (D) as a third calibration variable removed all errors from the sums of first powers of diameter, also with Method 3 in which exact agreement was not required.The use of D g with N and G enabled accurate results for the sum of the third powers of diameter.An accurate estimation of the sum of fourth powers required the use of all six or four calibration variables, but combination N, G, and D g was also quite good.
Examples of the ability of different sets calibration variables to improve the predicted diameter distributions of six plots with Method 1 are shown in Fig. 2.There were clear differences between the best set (six stand variables) and using G as a scaling variable (Fig. 2).Fig. 2 shows that by using six calibration variables (alternative 'All' in Fig. 2), uni-modal, bi-modal, descending and irregular distributions could be accurately described.
Fig. 3 shows examples of the effect of calibration method (1-3) on the resulting distribution.Method 3, in which there are no strict constraints for N, G and mean diameters, produces smooth distributions that do not deviate much from the predicted non-calibrated distribution.

Discussion
This study developed parameter prediction models for the truncated Weibull function for the diameter distribution of stand basal area of the main forest tree species of Catalonia.It then examined how the calibration affected the accuracy of the predicted distributions.The modelling data used in the study reflected the complexity and heterogeneity of Catalonian forests.The study data encompassed plots of regular and irregular stand structures with unimodal, decreasing, uniform and even multimodal size distributions of trees (see Figs. 2 and 3).In these types of stands, plots with the same basal area and number of stems may have quite different diameter distributions.results concern the functions' ability to describe the distribution of trees larger than 7.49 cm in diameter.In addition, the plots were rather small and most of them had too few trees to reliably characterize the diameter distribution of the whole stand.Because of this, only plots with at least 20 measured trees were used.This is reflected in the accuracy of the parameter prediction models.However, since the sampling method was not specifically designed to develop models and estimate diameter distributions, the sample presents some limitations.One limitation was that small trees (< 7.5 cm) were not measured individually with a consequence that all our The idea behind the comparison criteria was to study the performance of different calibration alternatives in estimating and predicting variables that correlate with the number of stems, mean diameter, stand basal area, stand volume, and the economic value of the stand, but at the same time avoiding discrepancies caused by other sources of error (e.g.height and volume models).Therefore, diameter sums were used instead of for instance stand volume of stumpage value of trees.
The sum of the absolute deviations from the non-calibrated frequencies (Eq. 3) was used as the distance measure to be minimised subject to the calibration equations.Kangas and Maltamo (2000) tested several distance functions and obtained the most accurate results for the function that was used in this study.However, Kangas and Maltamo (2003) concluded that further studies are required to analyse the performance of different distance functions.The problem of the objective function employed in this study is that neighbouring diameter classes can have very unequal calibrated frequencies.The peaks and falls of the true diameter distributions would most probably distribute over more diameter classes than the solutions of our calibration problems suggest.However, the tendency of goal programming to concentrate the changes in tree frequencies on too few diameter classes may have a rather small influence on the calculation results that are based on calibrated distributions.The problem of very unequal frequencies of neighbouring classes is much smaller with calibration Method 3, which allows flexibility in the values of stand variables computed from the calibrated distribution.
Calibrating the diameter distributions proved to be an efficient way of using all available information and generating more accurate diameter distributions in terms of sums of different powers of diameter.These results are in accordance with the study of Kangas and Maltamo (2000) who found that using G, N and D gM to calibrate the distribution produced more accurate results than using only G and N.In our study, the use of medians as additional calibration variables also improved the distribution, but combination G, N and D gM performed clearly worse than G and N with a mean diameter (D or D g ).The order of the means and medians, when one of them was used together with G and N, was (from best to worst): D g > D > D gM & D M .This suggests that it would be more useful to measure means rather than medians in the field survey.The advantage of medians is that they can be measured more easily than means in circular (D M ) or relascope (D gM ) plots.
In our study the RMSE% values for the sum of the third powers of diameter were higher than the RMSE% values in Kangas and Maltamo (2000) for volume when the same calibration equations were used.This might be due to the complex structures (irregular, bimodal, descending) of the stands used as the data of this study.However, at the same time the structural complexity justifies even more the use of calibration to improve the prediction of the diameter distributions of Catalonian forest stands.
Calibrating the diameter distribution is an efficient way of using additional information in the calculation of forest inventory results.One advantage of calibration estimation is that the same variables need not to be known from each stand (Kangas and Maltamo 2000).Calibration estimation enables the use of all combinations of stand variables that are collected in stand inventory.The minimum requirement is that those variables that are needed to predict the parameters of the diameter distribution are assessed (G and N for the parameter prediction models of in this study).
In Catalonia, forest inventory is the most expensive task of forest management planning.Usually, many circular, concentric or relascope plots are placed within the compartment, and the diameters of individual trees of the plots are measured and recorded.This study shows that cheaper forest inventory methods could also be used with only G and N measured in all stands, plus mean diameters in heterogeneous stands or in places where more accurate information is required.

Fig. 1 .
Fig. 1.Root of mean squared error (RSME) of the predicted frequencies of 5-cm diameter classes in the four provinces of Catalonia (A) and the relative RMSE (RMSE/mean prediction) in the whole study material (B).

Table 4 .
Regression models for the shape (c) parameter of the truncated Weibull distribution.G is stand basal area (mean diameter (cm), and Ele is elevation (m a.s.l.) of the plot.

Fig. 2 .
Fig. 2.Examples of measured, predicted (scaled with G), calibrated for N, G and D g , and calibrated for six variables (All) distributions for six plots of the study material when Method 1 is used in calibration (no prediction error in class frequencies and no estimation error in calibration variables).'Frequency' is the number of trees per hectare of the 5-cm diameter class.

Fig. 3 .
Fig. 3. Examples of measured, predicted (scaled with G), and calibrated with N, G and D g and D distributions for six plots of the study material when three different methods are used in calibration.Method 1 assumes no prediction errors in class frequencies and no estimation errors in calibration variables.Method 2 assumes prediction errors in class frequencies but no estimation errors in calibration variables.Method 3 assumes prediction errors in class frequencies and estimation errors in calibration variables.

Table 1 .
Mean, standard deviation (S.D) and range of some characteristics of the study plots.N is number of trees (ha -1 ), G is stand basal area (m 2 ha -1 ), Dq is quadratic mean diameter (cm), Ele is elevation (m a.s.l.), and b and c are Weibull parameters.

Table 2 .
Coefficient of determination (R 2 ), and absolute and relative RMSE for prediction models of parameter b and c of the truncated Weibull distribution.

Table 3 .
Regression models for the scale (b) parameter of the truncated Weibull distribution.G , D q is quadratic mean diameter (cm), and Ele is elevation (m a.s.l.) of the plot.

Table 7 .
Relative biases and RMSEs of different diameter sums that measure the difference between empirical and calibrated distributions when the prediction errors of class frequencies and the estimation errors of calibration variables are taken into account (Method 3).D g is the basal-area-weighted mean diameter (cm), D is mean diameter (cm), n j is the frequency (number of trees per hectare) and d j the mid-point diameter of class i, and R stands for rank.