Recruitment Models for Norway Spruce, Scots Pine, Birch and Other Broadleaves in Young Growth Forests in Norway

The objective of the present study was to develop recruitment models for Norway spruce, Scots pine, birch and other broadleaves in young growth forests in Norway. The models were developed from permanent sample plots established by the National Forest Inventory, and they will be included in a growth simulator that is part of a large-scale forestry scenario model. The modelling was therefore restricted to independent variables directly or indirectly available from inventories for practical forest management planning. A two-stage modelling approach that suited the stochastic nature of recruitment in boreal forests was used. Models predicting the probability of recruitment were estimated in a first stage, and conditional models for the number of recruits were developed in a second. The probability models as well as the conditional models were biologically realistic and logical. The goodness of fit tests revealed that the probability models fitted the data well, while the coefficients of determination for the conditional models were relatively low. No independent test data were available, but comparisons of predicted and observed number of recruits in different sub-groups of the data revealed few large deviations. The high level of large random errors was probably due to the great variability observed in number of recruits rather than inappropriate specifications of the models. Provided the generally high level of uncertainty connected to analysis performed with large-scale forestry scenario models and the stochastic nature of recruitment, the presented models seem to give satisfactory levels of accuracy.


Introduction
Recruitment is usually quantified by means of number of trees or seedlings that reach or exceed a specific threshold size over a certain period, as a result of different regeneration processes like establishment, growth and mortality of seedlings.A recruitment model predicts the number of trees or seedlings that pass a specific threshold size during a specified period of time.Usually, such models do not directly consider factors influencing seedling establishment (e.g.production, dispersion, predation and germination of seeds and competition from herbaceous vegetation), growth and mortality.Instead, recruitment models predict number of recruits directly from independent variables describing site and forest conditions.
Growth simulators and large-scale forestry scenario models are widely used to predict growth and yield of forest stands and to estimate the profitability of different harvesting regimes, and their influence on long-term timber production, stand structure and biodiversity.In Norway, several growth simulators and large-scale forestry scenario models covering even-aged forests have been developed from area-based growth models (e.g.GAYA (Hoen and Eid 1990) and AVVIRK (Eid and Hobbelstad 2000)).These models can, however, not predict, in a reliable way, stand development in uneven-aged forests, mainly because prediction of growth and yield is done independently of the tree size distribution.For this reason, a growth simulator able to predict stand development in any-aged forests has recently been developed in Norway (Gobakken et al. 2005).Another objective with the growth simulator is to consider stochastic regeneration, recruitment and mortality in projections.Hence, the mortality as well as the recruitment models needs to be stochastic.
The growth simulator consists of different submodels, Fig. 1, including a regeneration model predicting number of seedlings and species composition after regeneration cutting.Independent variables in the regeneration model is site index, vegetation type and regeneration method (Hoen et al. 1998).In addition to the regeneration model, the growth simulator consists of growth models predicting basal area growth of individual trees (Andreassen and Tomter 2003), stochastic mortality models for individual trees (Eid and Tuhus 2001) and stochastic recruitment models covering even-aged as well as uneven-aged forests (Lexerød 2005).Plots with young growth were, however, not included in the data material used by Lexerød (2005), and consequently the models do not cover such forests.Hence, stochastic recruitment models covering young growth forests needed to be developed in order to improve projections with the simulator.Several different approaches to recruitment modelling have been applied.Models predicting recruitment with one single equation is the most common approach (e.g.Moser (1972), Adams and Ek (1974), Buongiorno and Michie (1980), Hyink and Moser (1983) and Vanclay (1989)).Recruitment is the result of a stochastic process and, for several species and forest conditions in Norway, recruitment is either present during a particular growth period, or it is not.Hence, if both observations with and without recruitment are retained for estimation of one single equation, normal distribution of residuals will rarely appear.Thus, it will be difficult to evaluate the overall performance of the models.Alternatively, if only observations with recruitment are retained for model development, recruitment will be overestimated.
The stochastic nature of recruitment can be modelled by defining two separate states with the following characteristics; (i) observations can be placed with certainty in either of the two states and (ii) the dependent variable (i.e.number of recruits) is zero for all observations in one state.Such a system could be modelled by using a two-stage modelling approach that is previously applied to recruitment data by Vanclay (1992) and Lexerød (2005).Probability models predicting the probability that recruitment is present during the next simulation cycle are estimated in the first stage.In the second stage, conditional models are estimated in order to predict number of recruits given that recruitment is present.
Separate models for Norway spruce, Scots pine, birch and other broadleaves were developed based on data from permanent sample plots established by the National Forest Inventory (NFI) in Norway.The data covers all geographic regions, forest conditions and silvicultural prescriptions representative for the conditions under which the growth simulator will be used.The modelling was restricted to include independent variables directly or indirectly available from forest inventories for practical forest management planning.

Data Collection and Preparation
According to the Norwegian classification system young growth is defined as development class II.Maximum total stand ages for development class II are 20,20,25,30,35,45 and 55 years for site index 23,20,17,14,11,8 and 6 respectively if the forest is dominated by conifers or other broadleaves.The maximum ages for birch are 15, 15, 20, 25, 25, 25 and 30 years, respectively.Site index (SI) is estimated according to the H 40 system, which is based on the dominant height (i.e.mean height of the 100 largest trees ha -1 ) at breast height age 40 years (Strand 1967, Tveite 1976, 1977).
The NFI in Norway established permanent sample plots between 1986 and 1993.All plots were remeasured from 1994 to 1998, and approximately 60% of the plots, with a representative distribution, were remeasured for the second time in the period 1999-2001.The permanent sample plots cover the forest area of Norway (Finnmark county excluded) in a 3 km × 3 km grid (NIJOS 2000).Diameters were unfortunately not recorded in development class II at the time of establishment.Hence, recruitment in young forest could only be calculated for the second growth period consisting of approximately 60% of plots.The time period between the measurements was five years, and the size of the circular sample plots was 250 m 2 .The total number of available sample plots in development class II, after exclusion of non-productive areas (i.e. annual production < 1 m 3 ha -1 ), was 897 (Table 1).Dominant tree species (spruce, pine or broadleaves) were determined as the single tree species with the largest proportion of crown cover.
Tree species and diameter at breast height (dbh) were recorded for all trees with dbh ≥ 5 cm on each sample plot.In addition, numbers of trees (N) with height larger than 0.3 m was registered within an area of 1000 m 2 which included the 250 m 2 sample plot together with a buffer zone.Altitude (ALT), latitude (LAT), SI, stand age (A) and arithmetic mean height (H) were also based on the area of 1000 m 2 .SI was estimated according to the H 40 system described above for the dominant tree species on the plot, while number of trees and arithmetic mean height were measured separately for conifers and broadleaves.Vegetation type was registered according to the Norwegian classification of vegetation types (Fremstad 1997).Vegeta-tion types having similar influence on recruitment of a particular species were clustered using a method described by Lexerød (2005).The cluster procedure resulted in three different clusters describing good (VT1), intermediate (VT2) and poor (VT3) conditions for recruitment.
Recruitment was calculated for each plot as number of seedlings that reached or exceeded dbh = 5 cm between the two measurements.The proportion of plots with recruitment and number of recruits per plot varied considerably between species (Table 2).Observations of recruits were converted into annual recruitment rates (iN ha -1 yr -1 ) based on observations with recruitment only, and all observations, respectively.Norway spruce (Picea abies L.) and Scots pine (Pinus sylvestris L.) were treated separately while the broadleaved species were defined in  ) and Lodgepole pine (Pinus contorta).Some trees were recruited because of measurement errors (e.g. a tree considered to be outside the plot at the beginning of the growth period was considered to be inside at the end, or a tree inside the plot, but not recorded at the initial measurement was recorded at the remeasurement).In order to exclude such measurement errors it was assumed that a recruited tree could not be larger at the end of the growth period than what could be expected from predicted diameter growth.Single tree basal area growth models (Andreassen and Tomter 2003) were used to predict the expected diameter of recruits if the recruits had a diameter just below the threshold size at the beginning of the growth period.The predicted diameters were used as maximum size of recruits.Recruits that were wind thrown or not alive at the time of remeasurement were also excluded to ensure compatibility with the mortality models.

Model Development
Since only trees with dbh ≥ 5 cm were recorded by the NFI, a threshold size of dbh = 5 cm was used (i.e. the models will predict number of seedlings that reach or exceed a dbh = 5 cm).
Probability of recruitment as well as number of recruits was assumed to depend on location, site conditions and stand characteristics.Independent variables were restricted to those directly or indirectly available from inventories for practical forest management planning.It was important to develop models that were biologically realistic in order to ensure a logical behaviour when applied outside the range of the modelling data.Thus, an appropriate biological interpretation was the main criteria with respect to variable selection.Independent variables, with an appropriate biological interpretation, were selected among those describing location (ALT, LAT), site conditions (SI, VT1, VT2, VT3) and stand characteristics (A, N, PN sp , H,) through visual analysis of data plots, residual analysis and forward and stepwise regression (SAS Institute Inc. 1999).
Given the binary variable 0 = absence and 1 = presence of recruitment, the probability of presence of recruitment over a five-year period π(x) was modelled with the following logistic regression model (Agresti 1996): where α and β are parameters defining the S -shaped function of x.The logistic model has a linear form for the logit of this probability; Model (2) was estimated for each species group with PROC LOGISTIC using maximum likelihood estimation with the logit link function and the Newton-Raphson optimization algorithm (SAS Institute Inc. 1999).The significance of the parameter estimates was tested by means of Wald statistics (i.e.z 2 = β / ASE where β is the parameter estimate and ASE is the corresponding asymptotic standard error (Agresti 1996).The different models were evaluated with the Hosmer and Lemeshow goodness-of-fit test (Hosmer and Lemeshow 1989).
The relationship between the number of recruits (iN ha -1 5yr -1 ) and the independent variables was expressed with the following conditional model: where X' is the transpose of the vector of independent variables, β is a vector of regression coefficients and ε is a vector of random errors.It was assumed that the error term ε had E(ε) = 0 and V(ε) = σ 2 and that the errors were uncorrelated.
The assumption of constant variance is a basic requirement for regression analysis, and model (3) was therefore log-transformed in order to stabilize variance; Model (4) was estimated using PROC REG (SAS Institute Inc. 1999).In order to obtain unbiased estimates for the response variable when converted back to original scale, the intercept term was adjusted by adding half of the variance (Miller 1984).The relationship between the dependent variable and some of the independent variables were non-linear, and these variables were transformed in order to express a linear relationship.The final models were inspected with respect to multicollinearity by calculation of a condition index (Belsley et al. 1980).Since the intercept had no physical interpretation, data were centered before calculation of the condition index (Montgomery and Peck 1992).

Model Evaluation
Validation of a model should in general involve independent data preferably from controlled and replicated experiments measured over a long time (Vanclay and Skovsgaard 1997), or data representing another growth period, sample plot size or representation of forest conditions.Such data were, however, not available because dbh was not recorded in young forests at the time of plot establishment (i.e.only data from one growth period were available in young forests).In addition, no other suitable independent data were available at the present moment.The evaluation of the models was therefore based on the same data as the model development.The overall performance of the models was evaluated by examination of prediction graphs and residual plots, and by calculating the mean differences between predicted and observed number of recruits, together with the corresponding standard deviations, as recommended by Huang et al. (2003) and Kozak and Kozak (2003).In addition, the conditional models were evaluated by means of cross-validation and calculation of the predicted error sum of squares (PRESS).In all evaluations predicted number of recruits was calculated deterministically by multiplying the probability of recruitment with the expected number of recruits.

Results
Variables describing location (ALT, LAT), site (SI, VT1, VT2, VT3) and stand conditions (A, N, PN sp ) were tested as independent variables in the probability models.Neither ALT nor LAT had significant influence on the probability of recruitment for any species group (Table 3).The variables SI and VT indirectly describe site conditions as soil fertility, moisture and competition from herbs, all factors expected to influence recruitment.VT was, however, not significant for any species when N was included in the models, while the influence of SI was significant for all species except birch.The relationship between SI and the probability of recruitment was positive (i.e. the probability increased with increasing SI).Both A and N were significant for all species, and the probability of recruitment increased with increasing values of A and N. The probability of recruitment of a particular species significantly increased with increasing PN sp for all species groups, except other broadleaves.ALT and LAT were only included in the conditional models for other broadleaves (Table 4).The number of recruits decreased when ALT and LAT increased.SI had a significant effect on number of recruits for all species except birch, and number of recruits increased with increasing SI.The relationship with number of recruits where negative when SI was tested as independent variable in the birch model (i.e.number of recruits decreased with increasing SI).Number of recruits were significantly influenced by A and PN sp for all species groups, and number of recruits increased with increasing values of A and PN sp .Number of recruits was expected to increase with increasing N up to an optimal level, with respect to number of recruits, and thereafter slowly decrease with increasing N due to increased competition and mortality.Such a relationship was included in all models except for Scots pine.The parameter estimates for the variable N in the birch model and for the transformation of N in the model for other broadleaves were not significant, but the variables were still included since the sign and magnitude of the parameter estimates were in agreement with the ones estimated for the other species groups.
The Hosmer and Lemeshow test revealed no significant differences between observed and expected number of plots with recruitment (Table 3).The proportion of misclassified plots was 6.7%, 7.9%, 6.9% and 9.0% for Norway spruce, Scots pine, birch and other broadleaves, respectively.The coefficient of determination (R 2 ) Dependent variable is the natural logarithm of number of recruits during the next five years, ln (iN ha -1 5yr -1 ).Threshold size, dbh = 5 cm.a) ALT: altitude; LAT: latitude; SI: Site index, dominant height at breast height age 40 years; A: age; N: number of trees; PN sp : proportion of Norway spruce, Scots pine and broadleaves, respectively.b) ns = not significant (p > 0.10); * p < 0.10; ** p < 0.05; *** p < 0.01 for the conditional models was 0.40, 0.31, 0.16 and 0.19 for Norway spruce, Scots pine, birch and other broadleaves, respectively (Table 4).The coefficient of variation (C.V.) varied from 13.8 to 18.4%, and there were no sign of multicollinearity in any of the models.
The overall mean difference between observed and predicted number of recruits was 9.7, -1.3, -5.2 and -10.6 ha -1 5 yr -1 for Norway spruce, Scots pine, birch and other broadleaves, respectively (Tables 5-8).This corresponds to an overestimation of 7.2% for Norway spruce and an underestimation of 3.0%, 4,8% and 24.3% for Scots pine, birch and other broadleaves, respectively.There were in general few large deviations between predicted and observed number of recruits in different parts of the material (Tables 5-8).In general, the spruce model seemed to slightly overestimate number of recruits when the observed recruitment rate was low, while the models for birch and other broadleaves underestimated number of recruits when forest conditions favoured recruitment of the respective species, as in broadleaved dominated forests.

Data
The main advantage of the NFI data used for model development was that it covered all geographic regions, forest conditions and silvicultural practices in Norway.This was essential since the models will be included in a growth simulator supposed to cover all forest conditions and silvicultural treatments in Norway.Several recruitment models applied in growth simulators has been developed from NFI data to ensure a representative distribution of forest conditions and silvicultural practices (e.g.Sterba et al. (1997), Wikberg (2004) and Lexerød (2005)) The fact that all independent variables used in the presented models were registered on a 1000 m 2 large area including the sample plot (250 m 2 ) may have caused some discrepancies between the dependent variable (number of recruits) observed on the 250 m 2 plots and the independent variables.This is particularly relevant for N and PN sp since these variables may differ when observed on 250 m 2 compared to 1000 m 2 .The problem is probably less important for SI and A since it is reasonable to assume only minor differences in these variables between a 250 m 2 sample plot and a 1000 m 2 large area surrounding it.These discrepancies have, however, probably not biased the parameter estimates, since the difference can be treated as random measurement errors.In addition, the relatively large number of plots has probably reduced the effect of extreme combinations of variables caused by the different sample plot sizes.
There were some uncertainties related to the use of the growth models by Andreassen and Tomter (2003), since the models do not cover development class II, and the accuracy of the models when extrapolated is unknown.Unfortunately, no models predicting diameter growth of individual trees in development class II were available, and the models by Andreassen and Tomter (2003) had to be used.
No independent data were available for model evaluation, and both data splitting and resampling techniques (e.g.cross-validation) were considered.It has, however, been questioned if such data splitting and resampling provides any additional information.Kozak and Kozak (2003), for instance, states that the goodness of fit statistics and lack of fit statistics calculated directly from the data used for model development provides a better and more reliable description of the prediction errors than statistics obtained from splitting the original data set.In addition, models developed from a subset of the entire data set will not be as reliable as models developed from the entire data set because some observations are disregarded.The quality of the model fit can also be assumed to reflect the quality of predictions when, as in this study, the data are collected through a welldesigned and representative sampling-process, when the models will be used within the range of the data and when the models are biologically realistic (Huang et al. 2003).Thus, as long as the model fit was found satisfactory there are probably no reasons for a reduced confidence in the presented models despite the lack of independent test data.
Table 5. Mean differences (D) and standard deviations (SD) for the differences between predicted and observed number of recruits of Norway spruce (ha -1 5 yr -1 ).Table 7. Mean differences (D) and standard deviations (SD) for the differences between predicted and observed number of recruits of birch (ha -1 5 yr -1 ).

Model Development and Evaluation
The relatively low R 2 for the conditional models (Table 4) were probably due to the large variability in observed number of recruits, rather than inappropriate specifications of the models.
Residual plots indicated no serious violation of the assumption of constant variance, nor did they reveal any patterns that might indicate inappropriate specifications of the models.The relatively poor model fit for the species groups birch and other broadleaves were probably due to differences between species within the respective species groups.For instance, the species group other broadleaves consisted of 10 different species.The large deviations between predicted and observed number of recruits of birch and other broadleaves in forests dominated by these species were probably related to limitations in the description of dominant tree species.PN sp in the models for birch and other broadleaves includes both species groups, and the effect of forest type on number of recruits of birches and other broadleaves had probably been estimated more precisely with a better description of species distribution.
A general problem related to recruitment models is that current recruitment depends on regeneration conditions of previous years (Leak and Graber 1976).The long time interval between seedling establishment and recruitment means that conditions (e.g.temperature, moisture) favouring recruitment of a particular species in the recruitment year are, most probably, different from the conditions under which the seedlings were established.Thus, if variables describing temperature and moisture in the year of recruitment are included in the models both number of recruits and species distribution may be biased (Yaussy et al. 1996).This also illustrates the weakness of using a relatively short growth period for model development.
The presented recruitment models are logical with respect to feedback from silvilcultural treatments.Planting and soil scarification increases N and A at a specific point in time because of a larger amount of seedlings and a shorter regeneration period.The positive relationship between N and A on one side and number of recruits on the other ensures that number of recruits increases with an increase in N and/or A. In addition, planting influences the species composition, and hence regulates number of recruits of a particular species through the variable PN sp.If, for example, Norway spruce is planted at a site with naturally regenerated birch and pine seedlings, then the proportion of Norway spruce increases, with increased recruitment of Norway spruce as a consequence.Precommercial thinning will also influence recruitment since stand density, as well as species composition, are represented in the models.If, for example, Scots pine is favoured at the expense of Norway spruce, the presented recruitment models will reflect this since the number of pines and spruces recruited will increase and decrease, respectively.

Model Application
The objective of the present study was to develop recruitment models for application in a growth simulator that will be a part of a large-scale forestry scenario model.Although stands or sample plots are used as basic units for calculations in such models, the target levels with respect to accuracy of the predictions are usually at national, regional or forest-estate level.Detailed studies of the forest dynamics at the stand level are seldom an important part of analysis performed with a large-scale forestry scenario models.It is also important to address the many sources of uncertainty related to the use of large-scale forestry scenario models in general, such as uncertainty related to input variables (e.g.Kangas and Kangas (1999) and Eid (2000)), to model errors of the numerous sub-models used for predictions (e.g.Kangas (1996)), to the stochastic processes of stand development (e.g.Pasanen (1998)) and to future price and cost variations (e.g.Ståhl (1994) and Leskinen and Kangas (1998)).The uncertainties related to the presented models should be seen in this perspective.
The differences between expected and observed number of recruits were relatively large in some strata, especially for birch and other broadleaves.The consequences of such large errors are, however, minor with respect to stand volume.An underestimation of, for example, 50 recruits ha -1 during a five year period will lead to an underestimation in total stem number after five years of 2.5%, if the initial stem number is 1900 stems ha -1 and predicted recruitment is 50 recruits ha -1 instead of 100 ha -1 .In addition, most growth simulators include number of stems as an explanatory variable.The mortality sub-models will therefore compensate for an underestimation in number of recruits since total stand mortality will be underestimated.Diameter growth for each individual tree may also be overestimated if number of stems is underestimated.The overall underestimation considering volume at the stand level will therefore be lower than the corresponding underestimation of number of trees.
The developed models can be applied both deterministically and stochastically.The deterministic approach involves multiplying the probability of recruitment with the expected number of recruits.This approach will produce average values at the plot or stand level.If the models are used stochastically, the probability of recruitment within a plot is calculated.Whether or not recruitment is present is then determined by drawing a random number from a uniform distribution.If the stand is classified as recruited, the number of recruits is predicted directly with the conditional models.Separate models for each species group give a stochastic species distribution if the models are used stochastically.The stochastic approach should be restricted to plots with a plot size corresponding to the one used for model development (250 m 2 ).Mortality should be predicted before recruitment is added to the initial number of stems, if the models are used together with a mortality model, since the presented models predict net amount of recruits (i.e.all recruits are assumed to be alive at the end of the simulation period).
Although the data used for model development were based on 250 m 2 sample plots, the models can be applied deterministically to any plot size and also at the stand level.A small plot size means that the probability of recruitment is relatively small, while the conditional number of recruits per unit area will be relatively large.By calibrating conditional number of recruits per unit area with the probability of recruitment, an appropriate final level is determined.The opposite will be the case if the plot size is large or the models are applied at the stand level.A large plot size gives a large probability of recruitment, but a correspondingly low number of recruits per unit area, and accordingly an appropriate final level.
When the models are applied in the growth simulator described by Gobakken et al. (2005), the number of years that the models are used will vary with species groups and site index according to Table 9.The number of years that the models are used decreases with increasing site index for all species groups except birch, because of the positive effect of site index on the probability of recruitment as well as number of recruits.For birch there is no positive effect of site index, and hence the number of years to use the models does not vary with site index.Once the number of recruits is predicted with the probability and conditional models presented in this paper, each recruited tree needs to be given a diameter and height before its growth can be projected by the simulation system.Hence, the growth simulator includes a diameter distribution function and a diameter/height function (Gobakken et al. 2005) Usually only larger stems are sampled in inventories for practical forest management planning, and smaller stems remain unsampled.This means that data "censorship" may occur when the threshold size used in the recruitment model is smaller than the minimum size of the trees sampled.Data "censorships" will bias prediction of stand development.One solution to avoid this is, of course, to sample all stems above the threshold size.Other possibilities are to develop models that either predicts the number and size of small stems at the time of the inventory (Randall et al. 1988), or to make the recruitment models more flexible (see e.g.Shifley et al. (1993)).

Conclusion
Both the probability models and the conditional models were biologically realistic and logical.
The large random error levels were probably due to lack of independent variables describing the stochastic process of seedling establishment rather than inappropriate model specification.Provided the general uncertainty connected to analysis performed with large-scale forestry scenario models, and the stochastic nature of recruitment, the presented models seem to give a satisfactory level of accuracy.However, since the models were developed from data collected over a relatively short time period they should be evaluated and, if necessary, revised or re-estimated when more data becomes available.

Fig. 1 .
Fig. 1.Overview of the different sub-models in the growth simulator.The figure shows at which states the sub-models are used and how they interact with silvicultural treatments.

Table 2 .
Total number of observations, observations with recruitment and annual recruitment rate for observations with recruitment and for all observations by tree species.

Table 3 .
Parameter estimates for the probability models.

Table 4 .
Parameter estimates, coefficient of determination (R 2 ), square root of mean square error (RMSE), coefficient of variation (C.V.), condition index (C.I.) and predicted error sum of squares (PRESS) for the conditional models for number of recruits.

Table 6 .
Mean differences (D) and standard deviations (SD) for the differences between predicted and observed number of recruits of Scots pine (ha -1 5 yr -1 ).

Table 8 .
Mean differences (D) and standard deviations (SD) for the differences between predicted and observed number of recruits of other broadleaves (ha -1 5 yr -1 ).

Table 9 .
Gobakken et al. (2005)resented models are used for prediction of recruitment in young growth forests in the growth simulator described byGobakken et al. (2005).