Bayesian Estimation of Diameter Distribution during Harvesting

This research aims to combine two different data sets with Bayesian statistics in order to predict the diameter distribution of trees at harvest. The parameters of prior distribution are derived from the forest management plans supplemented by additional ocular information. We derive the parameters for the sample data from the first trees harvested, and then create the posterior distribution within the Bayesian framework. We apply the standard normal distribution to construct diameter (dbh) distributions, although many other theoretical distributions have been proved better with dbh data available. The methodology developed is then tested on nine mature spruce (Picea abies) dominated stands, on which the normal distribution seems to work well in mature spruce stands. The tests indicate that prediction of diameter distribution for the whole stand based on the first trees harvested is not wise, since it tends to give inaccurate predictions. Combining the first trees harvested with prior information seems to increase the reliability of predictions.


Introduction
The normal Nordic harvesting technique converts tree stems into smaller logs at harvest.Modern harvesters are equipped with information systems able not only to continuously measure the length and diameter of the stem but also to predict the profile of the unknown section.In normal implementation, the harvester head first feeds the tree through the measuring and delimbing device for a given length, where after the system predicts the rest of the stem profile and calculates the optimal cross-cutting points for the whole stem (see e.g., Näsberg 1985, Uusitalo 2002).
The basic dilemma of tree bucking control is that optimization of the individual stem value tends to produce a log distribution that is not suitable for the sawmill, even where price lists are calibrated according to the demand for each saw-log category.System developers thus started to improve procedures in the early phase of development in such a way that the needs of the individual sawmill could be met more satisfactorily (Bergstrand 1990).Today, fine tuning of bucking optimization is carried out by calibrating the price list according to demand tables or by choosing close-to-optimal bucking alternatives that simultaneously satisfy both value and demand tables (e.g.Kivinen and Uusitalo 2002).
Efficient wood procurement and stand level allocation requires accurate prior information about the stand composition.Kivinen and Uusitalo (2002) have shown that the pre-harvest control of price matrices is beneficial in most cases provided that reliable stem-level information on the composition of stands is available.Several methods have been developed and tested to estimate the composition of stand structure (Uusitalo 1997).Since the pre-harvest measurement methods already developed have proved to be expensive and inaccurate, they have not been adopted in actual forest operations.
Forest management plans are known to provide relatively accurate volumetric information on the proportions of various species and the mean characteristics of the stand (e.g mean diameter), but since they do not usually provide estimates of the diameter distributions of the trees, this needs to be estimated by using various theoretical distributions.Researchers have mainly applied the Weibull distribution (Bailey and Dell 1973, Kilkki et al. 1989, Maltamo 1997) but some other similar distribution methods such as beta (Päivinen 1980) and Johnson's SB (Hafley andScreuder 1977, Siipilehto 1999) have also been tested with equal success.
Since the theoretical, parametric models have been observed to produce quite similar probability distributions in almost all conditions, researchers have more recently turned their attention to nonparametric density functions such as the Kernel function (Droessler andBurk 1989, Uusitalo 1997) and percentile based methods (Borders 1987, Maltamo et al. 2000, Kangas and Maltamo 2000b), which can take the various shapes of the distributions into account more precisely.The weak point of the non-parametric density function is that an additional inventory is often required in order to derive the distribution, although Kangas and Maltamo (2000a) have published general regression models predicting different percentile diameters.Provided additional stem data information is available, both parametric and non-parametric models can also be calibrated by the method presented by Deville and Särndal (1992) (Kangas andMaltamo 2000b, Kangas andMaltamo 2000c).Another way to predict diameter distribution is to use a k-nearest neighbor regression method that utilizes identical stands for predicting the characteristics of the target stands (Moeur and Stage 1995, Haara et al. 1997, Maltamo and Kangas 1998).It has been suggested and demonstrated that stem banks collected by the harvester measurement system can be used in harvesting to predict the composition of a new stand (Tommola et al. 1999, Malinen 2003).
We have hypothesized that tree bucking control could probably be improved provided price lists were calibrated during harvesting, not simply according to the difference between output and demand matrices, but also according to the improved predictions about the stand structure.Basically, accurate information about the stand composition when harvesting is needed for two reasons.First, logging managers and harvester operators have to decide what kind and what amounts of various wood products are worth logging in each stand.It is well known that transportation of small amounts of particular wood products from numerous stands is not very cost effective.Accurate prior knowledge gives managers a chance to log greater amounts of one wood assortment from one stand and another assortment from another stand so as to reduce total tree harvesting costs (Arce et al. 2002).The number of various, often very similar wood assortments has been growing rapidly recently, since sawmills and other mechanical factories are increasingly specifying product-oriented wood products.Second, the current trend is to control tree bucking within one wood product according to specified diameter-length distribution.The recent investigations clearly demonstrate that utilization of accurate prior information improves bucking outcomes, even where bucking is controlled online by bucking-to-demand procedures (Kivinen and Uusitalo 2002).
Forest management plans provide relatively accurate predictions of mean diameter and, in many cases, good estimates of diameter distributions can be derived from tree stems already harvested and measured.The maximum and minimum diameters that are quite easily estimated by brief ocular estimation can be utilized in predicting the shape or at least the variance of diameter distribution.Bayesian statistics provides a promising basis for combining two different kinds of data sets.The basic difference between the frequentist and Bayesian schools of inference lies in whether we use only likelihood or whether we also use prior probabilities (Bernardo and Smith 1994).
The aim of this paper is to test the effect of combining two different data sets with the Bayesian statistics in order to predict the diameter distribution of trees at harvest.We apply the standard normal distribution to construct diameter (dbh) distributions, although many other theoretical distributions have proved better with dbh data available.The main reason for this is that derivation of parameters for the Weibull or beta function would require far more complicated, computer intensive calculation methods, such as the Gibbs sampler (Green et al. 1994) during the actual harvesting.Puustelli et al. (2002) have previously conducted some theoretical experiments in applying the Bayesian approach to dbh distribution, but we discuss far more simplified models here and test them with real data sets.The theoretical properties of the normal distribution are well known and it is computationally feasible in a harvesting situation as well.The prior information provided by the forest management plans is also easily expressed in the form of the parameters of the normal distribution.
We first derive models of how data sets are combined and unknown parameters estimated.Next we test our models in nine different study stands, comparing the results with distributions derived both without the Bayesian statistics and with distributions derived from prior information.

Bayesian Prediction of dbh Data
When using the Bayesian approach to predict the dbh-distribution of trees at harvest, the data on the first trees harvested, say the first 30 to 120 trees of the entire stand is assumed to be normally distributed with an expected value of θ and a variance of σ 2 .In the Bayesian framework θ is considered to be a random variable.In our study θ is also assumed to be normally distributed with an expected value of δ and a variance of η 2 .This distribution will be derived from the prior information.The problem is to derive the predictive distribution of all trees to be harvested from the first trees harvested and the prior information.
Our assumptions are x | θ ∼ N(θ,σ 2 ) and θ ∼ N(δ,η 2 ).From this it is easily seen that the conditional distribution of the mean dbh x calculated from n first harvested trees is ) θ θ σ 2 .Since conditional random variables x and the future observation x n+1 are marginally normally distributed, their joint conditional distribution is bivariate normal, i.e. ( , . The covariance is zero since these two variables are independent.We use basic statistical results (see e.g.Rohatgi 2003, p. 283, 286) to derive the unconditional expected values, variances and covariance, i.e.
Since the joint conditional distribution of x and x n+1 is bivariate normally distributed the joint distribution of x and x n+1 is also bivariate normal distribution.This concludes the result Next we derive the posterior predictive distribution for the future observation conditional on the mean value of n first harvested trees.Basic statistical results of bivariate normal distribution (see e.g.Rohatgi 2003, p. 435) are used to find the expected value and variance for prediction, i.e.
Since the joint marginal distribution of these two variables is bivariate normal also its conditional distribution is normal as follows in Eq. 1 This is our posterior predictive distribution.

Frequentist Prediction of dbh Data
In the frequentist approach the dbh data is assumed to be normally distributed with a fixed expected value of θ and a variance of σ 2 and all the observations are assumed to be independent.The distribution of a sample mean x of n first harvested trees is also normal, with an expected value of θ and a variance of σ 2 /n.When the future observation x n+1 is predicted the unbiased estimator for the expected value is x.When calculating the variance of the prediction the estimating error of the expected value needs to be taken into account.It is of form since x n+1 and x are independent.Now the predictive distribution is where θ is estimated with x .

Estimation of the Unknown Parameters
We assume that the prior mean δ is known from a forest management plan, while the other parameters are unknown.If the sample is large, the population variance σ 2 can be estimated using the sample variance s 2 derived from the first trees harvested.The only parameter that still needs to be determined is the prior variance η 2 .The minimum and maximum dbh are known from the forest management plans.Since the normal distribution is used, it is known that approximately 99.7% of the data lie between x ± 3σ .From this we estimate that The prior variance can then be estimated as where n is the estimated number of trees in the entire area harvested.

Prediction of dbh Data with Prior Information
There is still one more prediction method that may be used along with Bayesian and frequentist prediction.The final prediction method does not use any information from the data, only the prior information from forest management plans.Prediction in this method is simply based on the mean dbh and Formula 3, the predictive distribution being x N n+ ( ) 3 Empirical Tests with Models

Material
The study material consisted of nine mature spruce (Picea abies (L.) Karst.)dominated stands from central Finland, close to the city of Mänttä (Table 1).The stands, scheduled for logging during Autumn 2000, were owned by Metsämannut Oy.The stands had valid forest management plans that were based on earlier inventories and updates.Each stand was visited prior to logging and inventory data was supplemented by quick ocular estimation of the following characteristics of the spruces: minimum and maximum diameter at breast height (dbh) and stand density (trees/ ha).Logging was carried out using the Ponsse harvester that employs the PonsseOpti-tree measurement and optimization system (Ponsse Oyj 2002).Tree taper data for all trees in the stands was registered in the stm-format, (StandForD 1997) in which tree diameter is registered to an accuracy of 1 mm every 10 cm from a given starting point (usually 1.2…1.8m from the butt) to the final cutting point.Since the harvester started to measure the taper 1.5 m from the butt, the dbh for each tree was derived by presuming that tapering from 1.5 m to 1.7 m is equivalent to tapering from 1.3 m to 1.5 m.Since the stems appear in the stm-file in the same order as they were originally logged, it is easy to derive a subset of the data from the first trees harvested.In our tests, the following subsets were derived: 30, 90, and 150 trees harvested first.To summarize our tests, parameters for the Eqs. 1, 2 and 5 presented in the previous section were derived from the following sources: -Prior mean (δ): forest management plan -Prior variance (η 2 ): Eq. 4; minimum and maximum dbh and stand density (ocular estimation) -Sample mean ( x ): Mean of the first trees harvested (stm-files) -Sample variance (s 2 ): Variance of the first trees harvested (stm-files) Since we are mainly interested in examining the significance of calibration of dbh distributions in the early phases of logging, we compared the following distributions: prior distribution derived from the inventory data; Bayesian distribution derived with prior information and the first 30, 90 and 150 trees harvested; and frequentist distribution derived from the first 30, 90 and 150 trees harvested

Results
Three examples of predicted spruce dbh distributions are illustrated in Figs.1-3.Stands 2 and 3 (Figs. 1 and 2) are examples of a dbh distribution of unimodal shape.It is apparent that in the cases where estimates of minimum and maximum values are available, the normal distribution fits the dbh data relatively well.In Stand 2 (Fig. 1) the prior mean is inaccurate, which means that the prior distribution gives relatively poor results.Since the first trees harvested fit the data quite well, the Bayesian method is able to improve the goodness of fit compared to the predictive distribution derived entirely from the prior information.
The frequentist approach gives a more accurate distribution than the Bayesian approach, since an inaccurate prior mean affects the Bayesian predictive distribution.Stand 3 (Fig. 2) represents the reverse situation.The prior information is accurate and the sample trees logged from one corner of the stand predict the distribution relatively poorly.The Bayesian method is markedly more accurate than the frequentist method.
Stand 9 (Fig. 3) illustrates the weakness of the parametric distribution.If the distribution has a Fig. 1.Histogram of the entire study stand 2 and predictive diameter distributions of spruce derived with different methods.Prior refers to prior distribution (Eq.5), bayes 30 and 90 to Bayesian distribution (Eq. 1) with 30 or 90 sample trees and freq 30 and 90 to frequentist approach (Eq.2) with 30 or 90 sample trees.

Fig. 2.
Histogram of the entire study stand 3 and predictive diameter distributions of spruce derived with different methods.Prior refers to prior distribution (Eq.5), bayes 30 and 90 to Bayesian distribution (Eq. 1) with 30 or 90 sample trees and freq 30 and 90 to frequentist approach (Eq.2) with 30 or 90 sample trees.

Fig. 3.
Histogram of the entire study stand 9 and predictive diameter distributions of spruce derived with different methods.Prior refers to prior distribution (Eq.5), bayes 30 and 90 to Bayesian distribution (Eq. 1) with 30 or 90 sample trees and freq 30 and 90 to frequentist approach (Eq.2) with 30 or 90 sample trees.2.
In all stands except one (Stand 2), and with almost all sample sizes, the Bayesian method was more accurate than the frequentist approach.The results also reveal that our prior information was very accurate, except in Stand 2. The prior distribution gave better results than frequentist approach with 150 sample trees in 6 cases out of 9, indicating that predictions based only on sample trees logged from one corner of the stand tend to be inaccurate.Combining sample trees with the prior information improved the accuracy fairly slowly, especially in those stands where the first sample trees gave poor results.The Bayesian method with 30 sample trees gave better predictions than the prior information in 5 cases out of 9, while the Bayesian method with 150 trees gave better results in 7 cases out of 9.

Discussion
Since the number of study stands is very small and represent only well-managed mature spruce forest in central Finland, one cannot draw general conclusions about the reliability or validity of the results.We received relatively accurate prior information about the stand structure, which is not always the case in real situations.Logging can be started from practically any point at the edge of the stand and the order of tree selection always varies from one harvester operator to another.There are still a couple of things that can be addressed, given these findings, however.
The normal distribution seems to work well in our mature, managed spruce stands.Provided we can obtain accurate estimates of maximum and minimum trees, that are used to give an estimate of variance and mean diameter, the normal distribution fits relatively well in most similar well-managed spruce stands.We should recall that our intention is to predict the distribution in order to improve bucking.From our point of view, error with small trees is almost as bad as with big trees, whereas traditionally researchers in Finland in this area have converted diameter distribution into basal area distribution in order to stress the importance of big trees.
The normal distribution and the combination presented here do not however fit well in all kinds of forest.It is well known that most forests Prior information refers to prior distribution (Eq.5), Bayes 30, 90 and 150 to Bayesian distribution (Eq. 1) with 30, 90 or 150 sample trees and Freq 30, 90 and 150 to frequentist approach (Eq.2) with 30, 90 or 150 sample trees.Mark "*" indicates that the p-value is between 0.001-0.01,mark "**" that the p-value is between 0.01-0.05and mark "***" that the p-value exceeds the probability of 0.05.It is therefore very important in the future to develop methods that are able to combine skewed distributions.Green et al. (1994) have suggested a way of combining the Weibull distributions within the framework of the Bayesian approach.This approach will certainly yield better results, but requires that both the prior information and the sample information be fitted with the Gibbs sampler technique, which is not feasible in our application.The models presented here are numerically exact and easy to calculate.The normal distribution does not work well in bimodal shape stands, but this is the case with all parametric distributions.Taking into account our intention to apply diameter distribution predictions, the non-parametric methods do not necessarily improve the results.Prior distribution can be accurate in cases with unimodal shape if the prior mean estimate is accurate.On the other hand, basing the prediction on the first trees harvested from one corner is not wise, since they tend to give inaccurate predictions.In comparison, the Bayesian model produced the best diameter distribution estimates in almost all cases.Calibration seems to work in both directions.The Bayesian method seems to increase the reliability of diameter distribution predictions, although there are certainly cases where the calibration of a sample with the prior information may even decrease accuracy.
The use of our findings in tree bucking control still needs more research.Logically, the accuracy of bucking outcomes (i.e.goodness of fit between the demand and output) should improve if we can incorporate better stand structure predictions into tree bucking control.The distribution of trees in a stand has considerable spatial variation, which makes on-line inference of the stand structure very complicated.However, nice recursive techniques (Kalman 1960, Kalman andBucy 1961) have been developed to estimate the parameters of the normal distribution when new data becomes available.At the same time we can smoothly diminish the effect of trees already harvested.In the next phase we are planning to test this Kalman filter and the methodology introduced in this study with more extensive data.We are also planning to investigate the use of skew-normal distribution (Azzalini 1985) within the Bayesian framework.The advantage of this distribution is that it has many of the appealing properties of the normal distribution.Skew-normal distribution cannot be used with a multimodal stand structure, but the advantage is that it fits skewed distributions well, which normal distribution does not.Skew-normal distribution might be a bit less flexible than the Weibull distribution, but it is much more easy to operate with.Prior information is also easier to incorporate into the formulations.We are planning to make some tests to fit the joint distribution of diameter and height as well.This is simply done by generalizing our formulas into two dimensions.Furthermore, we are going to formulate this situation with skew-normal distribution.
marked bimodal shape, parametric distributions cannot provide accurate predictions.Kolmogorov-Smirnov goodness of fit test (e.g., Rohatgi 2003) is used to test how well different prediction techniques work in all study stands.The test is based on the vertical deviation between the empirical cumulative density function and the cumulative density function under null hypothesis H 0 : F(x) = F 0 (x).The test statistic D is the largest absolute deviation between the two.In other words, small values of the test statistics D and large p-values indicate a better fit.Results are shown in Table

Table 1 .
Mean characteristics of study stands according to forest management plans (1 complemented with quick ocular estimation (2 .Other tree species than spruce were excluded.

Table 2 .
The goodness-of-fit of diameter distributions of spruce in terms of Kolmogorov-Smirnov test value D.