Silva Fennica 34(4) research articles Percentile Based Basal Area Diameter Distribution Models for Scots Pine, Norway Spruce and Birch Species

Information about diameter distribution is used for predicting stand total volume, timber volume and stand growth for forest management planning. Often, the diameter distribution is obtained by predicting the parameters of some probability density function, using means and sums of tree characters as predictors. However, the results have not always been satisfactory: the predicted distributions practically always have a similar shape. Also, multimodal distributions cannot be obtained. However, diameter distribution can also be predicted using distribution-free methods. In the percentile method, the diameters at certain percentiles of the distribution are predicted with models. The empirical diameter distribution function is then obtained by interpolating between the predicted diameters. In this paper, models for diameters at 12 percentiles of stand basal area are presented for Scots pine, Norway spruce and birch species. Two sets of models are estimated: a set with and one without number of stems as a predictor. Including the number of stems as a predictor improved the volume and saw timber volume estimates for all species, but the improvements were especially high for number of stems estimates obtained from the predicted distribution. The use of number of stems as predictor in models is based on the possibility of including this characteristic to measured stand variables.


Introduction
Diameter distribution is one of the most descriptive and important stand characteristics.However, in forestry practice the empirical diameter distribution is seldom measured.For example, in Finnish compartmentwise inventory, the growing stock is described by means and sums of tree characteristics, such as mean height and basal area, for each tree species.In applications, the diameter distribution is predicted with models.The predicted distribution is used to compute stand volume characteristics with treewise height and volume models and as a basis for tree growth predictions (Bailey andDell 1973, Päivinen 1980).
Different theoretical distributions, for example beta, Weibull and Johnson's SB functions, have been used to describe the diameter distribution (e.g.Loetsch et al. 1973, Bailey and Dell 1973, Hafl ey and Schreuder 1977, Kilkki and Päivinen 1986, Kilkki et al. 1989, Maltamo 1997).The parameters of these functions have either been predicted as a function of stand characteristics, or, they are solved from a system of equations, equating measured stand attributes with their analytical counterparts (Hyink 1980, Hyink andMoser 1983).
In Finland, forest inventories are mainly carried out by sampling trees with Probability Proportional to Size (PPS sampling) (Tomppo 1996).The most commonly used application of PPS sampling in large scale forest inventories is relascope (angle-count) sampling, i.e. measurement of Bitterlich sample plots (Bitterlich 1984).When applying angle-count sampling the resulting diameter distribution is weighted by tree basal area to emphasise larger and economically more valuable trees (Päivinen 1980).This weighted distribution is called basal area diameter distribution.
When a diameter distribution is predicted from the mean characteristics of the growing stock using a probability density function, the variation observed in the original stands inevitably diminishes.The shape of the predicted diameter distribution is quite similar in most stands.Small trees are often described inaccurately, especially when using basal area diameter distributions.Siipilehto (1999) proposed the use of number of stems as an additional measurement to improve the accuracy of basal area diameter distribution.This approach improved especially the accuracy of the description of small trees.
Also so-called distribution-free methods that do not rely on any predefi ned functional form have been presented.For example, Borders et al. (1987) developed the percentile based diameter distribution prediction method.This method characterises an empirical distribution function with 12 percentiles.The percentiles were defi ned with respect to number of stems in a stand.The number of stems in desired diameter classes were calculated assuming a uniform distribution between adjacent percentiles.In other words, the distribution function was obtained by linear interpolation between the predicted diameters at the 12 percentiles.
This method has been further developed to project future stand tables (Borders and Patterson 1990).Maltamo et al. (2000) used the percentile based approach to predict irregular stem frequency diameter distributions of stands in a natural state.With the percentile method, multimodal distributions could also be reproduced.In addition of linear interpolation, they used Späth's rational spline to interpolate between the predicted diameters (Lether 1984, Späth 1974).
The purpose of this study is to estimate percentile based basal area diameter distribution models for the three most common tree species in Finland.First, models for diameters at 12 percentiles are estimated.The basal area diameter distribution function is then obtained using rational spline interpolation.

Material
The empirical data for modelling basal area diameter distributions consists of pure and mixed stands on both mineral soil and peatland located in central and eastern Finland.These stands are owned by a private forest enterprise and all of these stands have been managed according to normal thinning regimes (Table 1).Six to twelve relascope (angle-count) sample plots were systematically located in each stand.Diameter at breast height (dbh) was recorded in 1 cm classes from each tree included in the sample plots using the basal area factor two (m 2 /ha).The basal area diameter distributions were formed by summing the trees in the relascope sample plots in the stand.A stand was included in this study if at least 15 trees of same tree species were measured.This resulted in 416 stands being included for Scots pine (Pinus sylvestris L.), 251 stands for Norway spruce (Picea abies Karst.) and 121 for silver and pubescent birch (Betula pendula Roth, B. pubescens Ehrh.).

Modelling the Diameter Distribution
The diameter distribution was modelled using the percentile based diameter distribution method (Borders et al. 1987, Maltamo et al. 2000).In this method, the empirical basal area diameter distribution is described with the aid of the diameters at different percentiles of stand basal area (0,10,20,30,40,50,60,70,80,90,95 and 100 %), denoted by d 0 , d 10 ,...,d 100 .The 5th percentile was not used, since d 0 and d 10 were quite close in most stands.The logarithms of these 12 diameters were modelled using measured stand variables as predictors.In estimating models for these percentiles the seemingly unrelated regression (SUR) method was used (Zellner 1962).In SUR, the correlations among the error terms of different models are utilised in order to improve the estimation.The median of the distribution (50th percentile) is commonly assessed in compartmentwise inventory in Finland and was thus assumed to be known.
To be able to predict the diameter distribution using the predicted diameters at different percentiles, all the diameters must be positive.The diameters are also required to be in increasing (or at least nondecreasing) order (d 0 <d 10 <...<d 100 ), in order to produce a monotone distribution function and nonnegative frequencies for the diameter classes.Logarithmic models were used in order to meet the fi rst requirement.In most cases, logical predictions of diameters at different percentiles could be obtained by choosing the regressors in the models properly.However, in some cases the models produced, for example, a minimum diameter that was greater than the diameter at the 10th percentile, d 10 .
One possibility to ensure the nondecreasing order would have been to model the logarithm of the deviations between adjacent percentiles, for example log(d 10 -d 0 ).However, this approach does not guarantee positive diameters.This approach was also tested, but in some cases it produced a negative diameter for the 0th percentile.
Finally, to obtain both positive diameters and a logical order for the predicted diameters, a method combining both approaches was chosen.In addition to the logarithms of the diameters, one additional model was included in each model set.This additional model was used to model the difference between d 10 and d 0 (d 40 and d 30 , in the case of the birch model).These models did not include predictors, the only term in these additional models was the intercept.Since SUR estimation minimises the variance with respect to each model considered, the additional models act like constraints in the estimation process.
A similar approach has been previously used by Zhang et al. (1997) to create individual tree growth models that produce compatible estimates for stand growth.With this approach, logical predictions were obtained in each case.However, in the case of very peculiar stands, illogical estimates may occur.Thus, the monotonicity of the diameter distribution always needs to be checked.
The constraining models, however, need to be used carefully, since the diameters at different percentiles are not analytically related as are the tree and stand growth models in Zhang et al. (1997).Adding a constraining model without careful consideration could move the problem of the previous diameter being greater than the following one elsewhere or otherwise worsen the results.Adding too many such constraints also makes the models linearly dependent.

Application Stage
In the application stage, the estimate of the relative basal area in each diameter class [d 1 ,d 2 ] was calculated from the cumulative distribution of diameters F as F(d 2 )-F(d 1 ).The value of the empirical distribution F was obtained by interpolating the percentiles as a function of the predicted diameters with Späth's rational spline interpolation (Späth 1974, Lether 1984, see Maltamo et al. 2000).The relative basal areas were scaled to the measured basal area in the stand to obtain an absolute value of basal area b k in each diameter class k.Finally, the frequency f k in each diameter class k was calculated from the class basal area by dividing it with the basal area of the mean tree in this class: The rational spline combines the cubic splines and piecewise linear splines in order to obtain a smooth curve which does not wiggle.The rational function for interval k is of the form where A k , B k , C k , and D k are coeffi cients that are chosen so that f has a continuous second derivative, and p k and q k are user-defi ned nonnegative real numbers that control the tautness of spline fi t (Lether 1984).When q k and p k approach infi nity, the rational spline degenerates to a piecewise linear function, and making q k and p k zero produces a cubic spline.In this study 25 was used as the value of parameter q k and 30 for p k .
The rational spline function always gives mono tone distribution, provided that the values of p k and q k parameters are high enough.However, for a certain value of p k and q k parameters, the distribution may not be monotone, and thus the monotonicity needs to be checked.In the future, spline functions, which are monotony preserving (e.g.Lahtinen 1988) should be tested.
Volumes for each diameter class were calculated with Laasasenaho's models (1982), using diameter at breast height as a predictor.Saw timber volume was defi ned as the volume of trees larger than or equal to 16.5 cm.The basic performance of the models was examined by calculating the root mean square errors and biases of stand volume estimates (m 3 /ha) obtained with these methods.The absolute root mean square error (RMSE) was calculated as where n is the number of sample stands, V i is the true volume of stand i and Vi is the volume of stand i estimated from the predicted distribution.The relative RMSE of the volume estimate was calculated by dividing the absolute RMSE by the true mean volume -V of the stands.The bias of the predictions was calculated as bias In addition to stand volume, the RMSE and bias of saw timber volume and number of stems were considered.

Results
Models for the diameters at different percentiles for Scots pine are presented in Tables 2 and 3, for Norway spruce in Tables 4 and 5 and for birch species in Tables 6 and 7.For each species, there are two sets of models.The main differ-ence between these model sets is that the fi rst set for each species (Tables 2, 4 and 6) does not include number of stems per hectare as a predictor (Percentiles 1), but the other set of models (Tables 3, 5 and 7) does (Percentiles 2).
The predictors for the models were chosen from commonly measured stand characteristics.Basal area median diameter was used as a predictor in each model.For the model sets not including number of stems, stand age was another common predictor.In the model sets including number of stems as a predictor, it was used to predict the diameters below the 50th percentile and age was used to predict the diameters above it.The number of stems was not used as such, but relative to the basal area of the stand.Using number of stems as such would have lead to illogical results in stands with a very low number of stems.
Using number of stems related to basal area describes the size distribution better.A shape index (G/Ng M ) presented by Siipilehto (1999), where g M is the basal area of median tree, was also examined but was not found useful in the logarithmic models used.
The diameters near the 50th percentile could, quite obviously, be predicted accurately with both model sets.The minimum and maximum diameters were more diffi cult to predict, especially when number of stems was not used as a predictor.The largest standard errors were obtained for birch.The standard errors presented in the tables are from the OLS fi t used as a basis for SUR analysis.
The relative root mean square errors (RMSE) and the absolute biases of stand volume (m 3 /ha), stand timber volume (m 3 /ha) and number of stems (per hectare) in the modelling data set are presented in Table 8.The results for Scots pine were the most reliable, and the results for birch were the least reliable.With all species, the most reliable results were obtained with the Percentiles 2 method.For pine, including the number of stems as predictor reduced the RMSE of stand volume by 33%, that of saw timber volume by 16% and that of number of stems by 85%.For other species, the results were quite similar.However, the improvements were not quite as high as for pine (Table 8).The bias of stand volume and saw timber volumes were negligible.Instead, the bias of number of stems may be quite high, especially if number of stems is not used as a predictor in a model.It is notable that even if the number of stems is assumed to be known, the predicted diameter distribution does not produce correct number of stems estimate.To obtain correct values for known number of stems, calibration is required (see Kangas and Maltamo 2000a,b).

Discussion
In this paper, models predicting the diameters at different percentiles of stand basal area were estimated using seemingly unrelated regression method (SUR).Estimate of empirical diameter distribution was obtained by interpolating between the predicted diameters.From the empirical distribution function, the basal area at desired diameter classes could be calculated.However, the predicted diameters could have been used directly to describe the diameter distribution, without using interpolation.For example, if diameters at 5th, 15th, …, 95th percentiles were predicted with models, the distribution would be described with the aid of ten individual trees, each representing 10% of the total number of stems or basal area.The models were estimated for logarithmic diameters in order to avoid negative diameters in predictions.An additional model for the difference between the minimum diameter and 10th percentile diameter (30th and 40th percentile diameters for birch) was included in order to obtain a monotone cumulative distribution.With this model structure, logical distributions were obtained in all cases.
However, it seems probable that the illogical results could be avoided or at least reduced by collecting more data, or by using another method than angle-count sampling in collecting the data.This is because in angle-count sample plots the reliability of the measured diameters at the smallest percentiles is poorer than the reliability of the diameters at the largest percentiles.Observing a very small tree in an angle-count sample plot is quite rare.However, when one small tree is measured from one angle-count plot, the estimate of the number of small trees will be very high at this plot.Thus, the more angle-count sample plots are measured, the more stable is the occurring of such trees in a certain stand.
The models were also tested with respect to the accuracy of predicted stand volume, saw timber volume and number of stems in the modelling data set.Thorough tests in varying conditions are presented in Kangas and Maltamo (2000b).The models worked quite well for predicting the diameter distribution.With regard to the tree species, the Scots pine results were the most accurate, and the birch results the least accurate.The accuracy of the percentile based method using number of stems as a predictor (Percentiles 2) was superior when compared to the model set not including number of stems (Percentiles 1).The percentile based method presented by Borders et al. (1987) has been proved to be a good alternative for describing diameter distribution in Finnish conditions (see also Maltamo et al. 2000).Siipilehto (1999) proposed number of stems to be added to the standard stand characteristics to be measured.In Finland, since the beginning of the use of relascope tables (Nyyssönen 1954) basal area has been the main stand density characteristic which has been measured and the use of number of stems has been slight.If the stand volume results are of primary interest, measuring the number of stems may not be necessary.However, the description of stand structure improves considerably if the number of stems is measured, resulting in more accurate estimate in all stand characteristics, for example pulpwood volume  (Siipilehto 1999).This may also have a profound effect on the growth and yield predictions, a subject which remains to be studied.
If the models of this study had been estimated from sample plots of a fi xed radius, the results would probably have been better in terms of number of stems.On the other hand, with the same number of tallied trees, the results would probably have been worse in terms of stand volume characteristics.This is due to the fact that in angle-count sampling large trees, which have the greatest effect on volume characteristics, are emphasised (Päivinen 1980).A change in modelling data would also improve the description of diameter distribution of birch species because the data used here was not very representative.
In Finland, several diameter distribution models have been presented since the beginning of 1980's (e.g.Päivinen 1980, Kilkki and Päivinen 1986, Hökkä et al. 1991, Maltamo et al. 1995, Maltamo 1997, Maltamo and Kangas 1998, Siipilehto 1999, Maltamo et al. 2000).These studies include parameter models for beta, Weibull and Johnson's SB distribution and also applications which use non-parametric and distribution-free approaches.The accuracy of most of these models has been tested only in their own modelling data.Test results in independent data sets has been presented in the studies by Maltamo and Kangas (1998), Siipilehto (1999) and Kangas and Maltamo (2000b).
The next step in diameter distribution studies could be the optimal use of constructed models and measurements in different situations.Using calibration estimation applied in connection with diameter distribution by Cao and Baldwin (1999) and Kangas and Maltamo (2000a) it is possible to modify predicted distribution to produce correct values for all measured stand characteristics.However, it is presumable that the amount of measured stand variables should vary in different situations (stand age, main tree species, stand density etc.).

Table 1 .
Mean stand characteristics in modelling data.1)

Table 2 .
Regression models (SUR) for different percentile diameters of Scots pine (Percentiles 1).Median point (d gM or d 50 ) is expected to be known.Clarifi cations of variable codes: Soil = dummy variable for stands on mineral soil, sd = standard deviation of the model.For other variable codes, see Table 1.

Table 3 .
Regression models (SUR) for different percentile diameters of Scots pine (Percentiles 2).Median point (d gM ) is expected to be known.For variable codes, see Tables1 and 2.

Table 4 .
Regression models (SUR) for different percentile diameters of Norway spruce (Percentiles 1).Median point (d gM ) is expected to be known.Clarifi cations of variable codes: Soil = dummy variable for stands on mesic and poorer mineral soil.For other variable codes, see Tables1 and 2.

Table 5 .
Regression models (SUR) for different percentile diameters of Norway spruce (Percentiles 2).Median point (d gM ) is expected to be known.Clarifi cations of variable codes: Soil = dummy variable for stands on mesic and poorer mineral soil, G total the total stand basal area.For other variable codes, see Tables1 and 2.

Table 6 .
Regression models (SUR) for different percentile diameters of birch species (Percentiles 1).Median point (d gM ) is expected to be known.Clarifi cations of variable codes: Soil = dummy variable for stands on mesic and poorer mineral soil.For other variable codes, see Tables1 and 2.

Table 7 .
Regression models (SUR) for different percentile diameters of birch species (Percentiles 2).Median point (d gM ) is expected to be known.Clarifi cations of variable codes: Soil = dummy variable for stands on mesic and poorer mineral soil.For other variable codes, see Table1 and 2.

Table 8 .
The results of the prediction of basal area diameter distribution of Scots pine, Norway spruce and birch.