SILVA FENNICA Using the Process-based Stand Model ANAFORE Including Bayesian Optimisation to Predict Wood Quality and Quantity and Their Uncertainty in Slovenian Beech

The purpose of this study was to expand an existing semi-mechanistic forest model, ANAFORE (ANAlysing Forest Ecosystems), to allow for the prediction of log quality and the accompanying uncertainty as influenced by climate and management. The forest stand is described as consisting of trees of different cohorts, either of the same or of different species (deciduous or coniferous). In addition to photosynthesis, transpiration, total growth and yield, the model simulates the daily evolution in vessel biomass and radius, parenchyma and branch development. From these data early and latewood biomass, wood tissue composition, knot formation and density are calculated. The new version presented here, includes the description of log quality, including red heart formation of beeches. A Bayesian optimisation routine for the species parameters was added to the stand model. From a given range of input parameters (prior), the model calculates an optimised range for the parameters (posterior) based on given output data, as well as an uncertainty on the predicted values. A case study was performed for Slovenian beech forests to illustrate the main model functioning and more in particular the simulation of the wood quality. The results indicate that the ANAFORE model is a useful tool for analyzing wood quality development and forest ecosystem functioning in response to management, climate and stand characteristics. However, the Bayesian optimization showed that the remaining uncertainty on the input parameters for the chosen stand was very large, due to the large number of input parameters in comparison to the limited stand data.


Introduction
Forest models can be roughly categorized as either more mechanistic (based on current understanding of physiology), or empirical.Physiologybased forest models have the advantage of being able to deal with changes in environment, climate or management outside the original dataset for which the model was parameterized.However, most mechanistic models actually include some empirical relationships as well, and can therefore also be categorized as hybrid models.Simulation of wood quality is important, as the value of forest produce is very dependent on the quality (Knoke 2006).However, concerning wood quality predictions, mechanisms are not well understood for all processes.For ring width and tracheid dimensions a number of models have been developed (Fritts et al. 1999, Ikonen et al. 2003, Mäkelä and Makinen 2003, Deckmyn et al. 2008), for red heart (Knoke 2003) and knot and branch formation (Hein et al. 2007).In general, emphasis in wood quality modeling has been on more empirical models, yielding good predictions for specific sites, but needing parameterization for each site/condition (Leban et al. 1996, Ikonen et al. 2003, Soares and Tomé 2003, Lemieux et al. 2001, Constant et al. 2003).Other followed pathways are the development of single tree models (needing detailed tree input, and therefore not intentioned for practical use at stand level), or the simulation of a stand consisting of identical (average) trees (whereas for wood quality the number of 'top quality' trees is important.Some efforts have been made to simulate the final wood price (Carino and Biblis 2003).
To create a practical tool for the simulation of wood log production at the stand scale, we have tried to use a mechanistic model for the processes that are more or less understood, as far as possible towards wood quality development, and a limited number of empirical relations to calculate the final m distance from h tot /5 to the crown base J m25 µmol m -2 s -1 maximal electron transport rate per unit leaf area at 25 °C f starch dimensionless fraction starch that is used in spring (0-1) f emb,day dimensionless daily fraction of pipes embolising in summer(0-1) f emb,win dimensionless fraction of pipes embolising in winter (0-1) h tot m total stem height I 0 µmol m -2 s -1 light compensation point n brWhorl # number of branches in a whorl N l,LAres kgNg m -² residual leaf N (when V m equals zero) p rpip dimensionless species-specific empirical input parameter, for r pip,day calculation µmol m -2 s -1 photosynthetic Rubisco capacity per unit leaf area at 25 °C W horlMax m maximal distance between whorls of branches α kgC -1 species-specific pipe radius plasticity θ, θ', θ t Matrix of 128 input parameters and their prior distribution ε random vector and p(ε) = p(-ε) φ Gaussian probability density function quality of the wood logs where necessary.The ANAFORE model is a good starting tool because of the semi-mechanistic simulation of wood development (wood density within each year ring, early wood and late wood, vessel dimensions and knot formation), and the simulation of tree cohorts in a stand (Deckmyn et al. 2006 and2008).A limited number of empirical relations were added to calculate log quality following the European log standards, including a relation for red heart formation (Torelli 1984 and2008).
Another problem in forest modeling concerns parameterization and uncertainty.Most mechanistic models are over parameterized, and therefore will give a good fit to almost any output, but a very high uncertainty on the predictions (although in many studies this problem is ignored).In addition, parameters and validation data are often uncertain (measured on different sites or even species, or cannot be measured in situ).The use of Bayesian statistics can clarify these problems by explicitly using the uncertainty on the input parameters and the validation data, and by yielding an uncertainty on the predictions as well as a posterior uncertainty on the input parameters.The test case in this study is a Slovenian beech stand on which, besides stand inventory data, a number of wood quality parameters were measured, but basic physiology data were lacking.The central question for this stand was whether, given the limited amount of available data, anything significant could be concluded concerning the management of this stand.More in particular whether conventional thinning would result in a large difference in wood quality compared to the actual thinning in the stand (which is used as a seed stand).
Our goals in this study were: - A full description of the model, including all equations, has recently been published (Deckmyn et al. 2008).In the short description below we focus only on the relations that are important for the wood quality formation, and on the equations that have been added since the published version.The ANAFORE model has already been validated concerning the simulation of the water balance (Verbeeck et al. 2007a and b) and stomatal conductance (Op de Beeck et al. 2007).

Simulation of Wood Density
Wood density is simulated through the daily simulation of C allocated to the different stem tissues, and the daily calculation of the pipe radius as a function of growth rate.Daily C allocated to pipes is calculated through pipe theory, in such a way that the potential flow through the pipes equals potential transpiration rate, as described in Deckmyn et al. 2006.
Transition from early growth (leaf and crown development) to late growth (emphasis on storage) is induced by soil water stress, or at a fixed date.

Log Grading
As the grading of logs differs strongly between countries and for different end-uses, the ANA-FORE model has adopted grading rules as input.The user can define up to 46 characteristics, and for each can define the threshold values of up to 4 quality classes.In this study we use the European Log standards (19 quality characteristics, threshold values see Table 2).

Relations between Log Quality and Stem Characteristics
The quality characteristics relating to size, growth rate and knots are calculated through the growth model ANAFORE.Additional quality characteristics (ovality, red heart, white rot) are simulated (semi-) empirically.Different relations between these log quality characteristics and stem or tree characteristics can be used.However, most relationships are with diameter at breast height (DBH), crown size and growth rate.For the ANA-FORE model we opted to have the shape of each relation variable (linear, exponential or power function) for each of up to 46 wood quality characteristics.The parameters of each relationship are also input variables.
For example, in our study red heart formation (R RH , radius of red heart at h tot /5, h tot = total stem height) is simulated semi-empirically using the relation described by Torelli et al. 2008: with D H5 the distance from h tot /5 to the crown base, and R H5 the mean stem radius at h tot /5 (if calculated R RH is negative, R RH is set to 0).This equation was chosen because the ANAFORE model simulates changes in crown depth and stem to crown height ratio's in relation to the environment and the combined result of ANA-FORE with the Torelli equation therefore yields a semi-mechanistic prediction of red heart formation.
For some characteristics (such as spiral grain, fluting, shakes, forks) there might not be any (known) relation to tree or stem characteristics.In this case, the user can use a purely empirical fixed % of each quality class showing this characteristic.

Bayesian Optimization
Although in other environmental sciences Bayesian approach is often used, in forest research only a limited number of studies have been published (Van Oijen et al. 2005, Svensson et al. 2008).In a Bayesian approach, all input and outputs consist of probability distributions.Rules of probability    The central equation states that the likelihood of θ, given output data (D), is related to the likelihood of the data using θ and the likelihood of θ. (2) The procedure followed in our Bayesian approach is to make a random walk through the parameter space (= all possible priors), in such a way that the collection of visited points forms a representative sample of the distributions of the parameters.This is done through the use of the Metropolis-Hastings random walk (a Markov chain Monte Carlo) as described by Van Oijen et al. 2005.The new prior to test, or proposal, θ' is calculated from the previous one (θ t ) by: Where ε is a random vector and p(ε) = p(-ε).In our case, we used a gaussian distribution for ε, with variances equal to the square of 5% of the prior parameter range, and 0 covariances as suggested by Van Oijen et al. (2005).
For each new parameter set used, the probability of the measured data given these parameters is calculated (a low probability therefore shows a less good fit), P (D│θ), by assuming a Guassian distribution for the measured data, with 30% of the mean value as standard deviation.
Where φ denotes a Gaussian probability density function with given mean and variance.We also calculate the likelihood of the prior.For our study, since input data were very limited, we assumed and equal likelihoods over this range (not that starting from a normal distribution would not, in the end, change the result significantly, but might change the number of runs needed to get to a good result).From the likelihoods calculated in (Eq. 3) and the likelihood of the prior (in our case constant since a flat distribution was used) we get the likelihood of the proposal using (Eq. Then, the proposal is either accepted or rejected as a new prior (θ t+1 ) by comparing β to a random variable between 0 and 1 (therefore, if the likelihood is better (β > 1), the proposal is always accepted, if not, it depends on the random number generated).
In our case, we run the model 10 000 times to get posterior distributions.
Finally, the predictive uncertainty of the model is determined by running the model with the different parameters of the posterior distribution.
For the ANAFORE model we included all species-related input parameters (128 in total) in the prior, but uncertainty concerning the soil parameters was not included.

Case Study
For this case study, we chose to use an altimontane Slovenian beech forest, situated at mount Blegoš.The forest is an even-aged beech stand of approximately 130 years old.The site index of the stand (SI 22-26), and the bedrock type (dolomite) situate this stand a stand of average productivity.Precipitation, temperature and irradiance of the stand are depicted in Fig. 1.In 2004, a range of the forest was approved as seed stand.In 2004, 448 trees were felled (phenotypic negative trees as well as competitive trees with regard to selected seed trees).
In 2007, stem characteristics of the remaining trees were measured.In 10,5-ha stand a sampling grid 40 × 30 m was established, where 66 2-are plots were set out.Parameters of diameter, height and crown size were calculated according to equations of systematic sampling.Although stand is even-aged, trees in the upper part of the stand are of a bit larger dimensions.For this reason statistical analysis was performed separately for each part of the stand.Due to the fact that the seed stand is a finite population, a correction factor (1 -n / N) 1/2 for standard error was used.Parameters in Table 3 show to a greater variability with regard to tree diameter, while differences in height and crown size are smaller.Due to the greater variability of the tree diameter, i.e. due to too few number of trees per plot, the size of the latter would preferably be larger (3 are).At the significance level 5% it can be then calculated the lower and the upper bound of diameter for both parts of the stand (t 1 = 2,052 / t 2 = 2,026; df = n -1), respectively.
In each plot, DBH, tree trunk description, trunk defects, quality class from class A (sliced veneer,  3).In addition, diameter growth of 5 of the felled trees was analyzed for the entire growth period.As no data were available concerning the physiology of these trees, we started from the input values that were found by optimizing the ANAFORE model for growth of a German beech forest (Deckmyn et al. 2007).As the uncertainty of this prior parameterization is quite large we chose to use 30% uncertainty on most input parameters, and 50% to 100% on selected input parameters for which few data were available.
For the log quality calculations on the simulated trees, we graded only using the simulated dimensions, growth rate, red heart and stem defects (forking), as other characteristics were not measured on the site and therefore could not be validated.The Bayesian parameter estimation was done against the DBH, and average height (from allometric relations to the DBH) measured on five dominant trees, over 100 years.In a second run, we included the wood quality estimations as well.We use the simulated and measured crown depth over 10 DBH classes to validate the model, as well as the measured growth data of a similar unmanaged stand.Finally, we ran the model for 3 different management strategies: the current strategy (thinning low quality trees in 2003), equal thinning over all categories and without thinning, over the entire rotation period.

Posterior Versus Prior Distributions
Running the model 10 000 times ended in more than 80% of the cases not yielding viable trees, so many parameter settings are excluded by the Bayesian procedure.Reduced uncertainty of the parameters was not found for all parameters (although many runs are excluded, in some cases there was no clear distribution to the values that Table 6.Summary data for the posterior distribution of 17 input parameters of the ANAFORE model.The 10 parameters with the largest effect on growth, as well as 7 additional parameters with the greatest and least reduction in standard deviation from the prior to the posterior distribution are given.Of these 17, the average, minimum and maximum and standard deviation are given, as well as the value for the best fit are given.were excluded).In Table 5 the values and uncertainties of the 10 parameters most influencing total productivity (as was found in a previous paper on the ANAFORE model, Deckmyn et al. 2008), as well as the 7 parameters that were most and least improved by the procedure are given.Because the optimization was performed against stand and growth data, the uncertainty of parameters that are directly related to height growth and crown dimensions is shown in Table 6.For these parameters, uncertainty is reduced (Table 6).As an example, the posterior distribution of the maximal distance between whorls of branches, W horlMax , is depicted in Fig. 2., showing an important improvement from the prior flat distribution.

Wood Quality Predictions
Although the predicted and measured log grading show similar results over the tree cohorts (Fig. 5, r² = 0.93), due to the high uncertainty of the predictions (standard deviations over 50%) this good result is only found showing the results using the 'best fit' input parameters.In Table 4, for the shown best fit data, the predictions of crown depth and stem height are good, yielding an overall good prediction of read heart formation.

Management Effects
Reduced thinning limits the growth of the trees, so they never reach the required dimensions for A-logs.This was found in the measured data (from an unmanaged stand) and the simulated data (data not shown).Conventional thinning (equal over all classes instead of selective thinning of trees with defects or competing with the seed trees) results in a small increase in predicted wood quality of the remaining trees (+7% A-logs).As the uncertainty on the simulated logs is very large, the changes due to management were not significant.

Posterior Versus Prior Distributions
As we started from a very high uncertainty, it was expected that the output of the model would be very variable prior to the parameter estimation.Concerning basic physiology, since no measurements were made (for example of photosynthesis or transpiration rates), uncertainty was hardly diminished by the runs.For example the ratio between J m25 and V m25 , q Jm25-Vm25 (Fig. 3) has hardly evolved from the prior flat distribution.This can easily be explained as several of the photosynthesis related parameters are coupled through leaf nitrogen (N), but leaf N was not measured on the site.Therefore, high gross photosynthesis and dark respiration will yield as good a fit as low values for both parameters.In a study by Svensson et al. (2008) where data on C-fluxes were used for the parameterization of a forest growth model, the uncertainty of most of the parameters was reduced.Inclusion of a limited number of specific measurements (leaf N, specific leaf area S LA ) could improve our results significantly.Van Oijen et al. (2005) also showed that a bayesian parameterization can be greatly improved by increasing the variety of measured data.
A number of quality-related parameters in the ANAFORE model cannot easily be measured as such.For example, onset of late wood production (if it is not triggered by drought).In this case study, uncertainty is considerably reduced after the Bayesian procedure (Table 4).This is explained by the fact that it is during the late wood phase that the trees store most C.If this phase is too short, too little C is stored to survive the winter and regrow leaves in spring.Problematic parameters are those that have an important influence on the result, but a high (remaining) uncertainty such as f emb,day , the daily fraction of pipe embolism in summer.The amount of pipe embolism drives the need for pipe formation in the wood and therefore is of major influence to wood formation.However, this fraction cannot easily be measured and uncertainty remains high after the parameterization procedure.In Table 6, the minimum and maximum values of the parameters yielding live trees are given.It is clear, that for some parameters the original values were badly chosen, and large part of the prior distribution results in early tree death.For example if the maximal distance between branches is allowed to reach 4m the trees do not survive.However, the bad initial settings are not a problem, since the procedure will result in improved minimum and maximum values.
Overall, the results indicate the model is over parameterized compared to the limited dataset available for parameterization, but it is unclear whether this is still the case if the model is used in a study with more available data.

Wood Quality Predictions
Overall, the yield per log grade was well simulated, as a result of a realistic simulation of stem height and width.As measured height and width were used to parameterize the model, and the model tends to be over-parameterized, a good fit was to be expected.
The close fit of measured and simulated red heart indicates a close fit of simulated and measured crown and tree height, as red heart was calculated using the same equations for the stand and the simulated data.For the red heart prediction, in this case, the strength of the empirical relationship as constructed by Torelli (1984) determines the accurateness of the final result.Although the Torelli equation yielded good results in the original Slovenian study (which included trees from a wide variety of stands), it is not sure whether is would perform as well in other regions/climates.Red heart is an important determinant of the price of beech timber (Knoke et al. 2006), but prediction of red heart remains difficult.Recent studies emphasize the physiology of red hear formation (in relation to branch mortality), but do not necessarily yield better models for the prediction of red heart (Knoke 2003, Wernsdörfer et al. 2005).As the uncertainty of our tree predictions (crown and stem dimensions) was very high, uncertainty of the red heart formation is extremely high as well.Therefore, although the model is clearly able to simulate wood quality parameters, better and more data are necessary before the predictions become reliable.

Management Effects
Red heart formation is an interesting example for the wood quality predictions, as it empirically links crown characteristics (which are mechanistically simulated) to wood quality.Changing the management (reduced thinning) influences crown dimensions and therefore wood quality.Although a good fit was found between measured and simulated wood quality data, uncertainty was very high, and no significant effects of the management strategies on predicted wood quality formation could be shown.However, this does not mean there was no effect of management, only that due to our huge uncertainty, the effect was insignificant.

Conclusions
Bayesian statistics are a useful tool for the parameterization and analyses of mechanistic models such as ANAFORE, as has been found in a few previous studies (Svensson et al. 2008, Van Oijen et al. 2005).Although optimization of the input parameters yields high correlation between measured and simulated data, the Bayesian analyses indicates the high uncertainty.The Bayesian procedure is also helpful in indicating which measurements are most useful in reducing uncertainty.For example, biomass alone hardly influences the distributions, whereas height and crown depth have an important effect on several parameters.
Optimization of parameters related to crown development is quite efficient, so calculation of derived (empirical) wood quality characteristics such as red heart development is also relatively stable.However, some problematic parameters remain that have a high impact on the results, but are not easily measured or optimized, and future studies should focus on these (such as embolism).
In conclusion, although a good fit was obtained, the uncertainty is too high to make predictions concerning management effects on yield quality of Slovenian seed stand.This study clearly underlines the importance of including uncertainty when analyzing the performance of models (especially mechanistic model).The posterior distributions gained from this study can be used as prior distributions in future runs of the same species, which will reduce the uncertainty of the next model runs considerably.

Fig. 1 .
Fig. 1.Average monthly climate data (precipitation, temperature and irradiance at the altimontane beech site at mount Blegoš.

Fig. 4 .
Fig. 4. Simulated and measured log quality distribution over 10 tree cohorts of beech at mount Blegoš.

Table 1 .
List of symbols.

Table 2 .
European log standards (grading rules) for beech as used in the ANAFORE model.

Materials and Methods 2.1 The ANAFORE Model
2 O fluxes), daily (all C pools) and yearly (wood quality).Aboveground woody biomass pools, i.e. stem and branches, are further divided into four tissue pools to allow detailed simulation of wood development.These are 1) conductive pipes, i.e. vessels and/or tracheids; 2) fibers, which give mechanical support; 3) parenchyma, serving as storage compartment for C and H 2 O; and 4) bark tissue.

Table 3 .
Descriptives of the beech stand at Blegoš, devided into 2 plots.Number of trees n, mean, variance, standard deviation s and standard error s.e. are given.

Table 4 .
Measured (meas)and simulated (sim) data from the best fit (including standard deviation from all runs) concerning crown depth, height and red heart of 10 DBH cohorts of adult beech at mount Blegoš.

Table 5 .
Data from 10 000 Prior distributions of 17 input parameters of the ANAFORE model.The 10 parameters with the largest effect on growth, as well as the 7 additional parameters with the greatest and least reduction in standard deviation from the prior to the posterior distribution are given.Of these 17, the average, minimum and maximum and standard deviation are given.