Imputing stem frequency distributions using harvester and airborne laser scanner data: a comparison of inventory approaches

Stem frequency distributions provide useful information for pre-harvest planning. We compared four inventory approaches for imputing stem frequency distributions using harvester data as reference data and predictor variables computed from airborne laser scanner (ALS) data. We imputed distributions and stand mean values of stem diameter, tree height, volume, and sawn wood volume using the k-nearest neighbor technique. We compared the inventory approaches: (1) individual tree crown (ITC), semi-ITC, area-based (ABA) and enhanced ABA (EABA). We assessed the accuracies of imputed distributions using a variant of the Reynold’s error index, obtaining the best mean accuracies of 0.13, 0.13, 0.10 and 0.10 for distributions of stem diameter, tree height, volume and sawn wood volume, respectively. Accuracies obtained using the semi-ITC, ABA and EABA inventory approaches were significantly better than accuracies obtained using the ITC approach. The forest attribute, inventory approach, stand size and the laser pulse density had significant effects on the accuracies of imputed frequency distributions, however the ALS delay and percentage of deciduous trees did not. This study highlights the utility of harvester and ALS data for imputing stem frequency distributions in pre-harvest inventories.


Introduction
Accurate and high-resolution information on forest attributes is essential for optimizing forest management and wood procurement (Duvemo and Lämås 2006).In the Nordic countries, forest stands are the primary unit in forest planning.Decision support systems used in forest planning rely on information on forest resources, commonly in the form of stand averages and totals, such as basal area, tree height and numbers of stems (Eid and Hobbelstad 2000).Besides averages and totals, stem frequency distributions, such as diameter distributions, have been a key source of information both in long-term forest planning (Saad et al. 2015) and tactical, pre-harvest planning (Peuhkurinen et al. 2007).Distributions provide information on potential timber assortments, the forest structure, and the silvicultural state of a stand (Uuttera and Maltamo 1995; Kuuluvainen et al. 1996).In addition, information on individual trees can be derived from the distributions, which allow for simulating future stand development (Wikström et al. 2011).
Airborne laser scanning (ALS) has been a main data source for forest inventory over the past decades (Maltamo et al. 2014).ALS data provide spatially continuous and detailed information on forest structure as well as terrain height and allow for estimating a range of forest attributes.Two main inventory approaches are commonly used in estimating forest attributes from ALS data.The area-based approach (ABA; Naesset 2002) is used most frequently, by which ALS metrics are calculated for sample plots and linked to the plots' forest attributes in statistical models.The models are then used to predict the corresponding attributes over a grid tessellating the inventory area.Alternatively, the individual tree crown (ITC) approach has been proposed to provide treelevel information for tree crown segments delineated from ALS data (Brandtberg 1999;Hyyppä 1999).In the ITC approach, tree crowns of individual trees are segmented from ALS data and tree attributes are then predicted from the ALS data for each segment (Dalponte et al. 2018).
The ABA approach is practical in terms of field work, as co-location between field plot data and ALS data is conveniently done using coordinates of sample plot centers.However, the ABA can in some cases be susceptible to edge effects along borders of sample plots and grid cells, potentially leading to systematic errors (Naesset et al. 2013;Packalén et al. 2015).The ITC approach, on the other hand, relies on the capabilities of algorithms to derive the positions and heights of detected trees from the ALS data (Pascual 2019).Resultingly, only 30-70% of trees can be expected to be delineated (Kaartinen et al. 2012;Vauhkonen et al. 2012), whereby larger trees are more likely to be detected (Solberg et al. 2006).Further, tree crown segments are assumed to contain only a single tree, which can result in additional systematic errors in estimates of forest attributes (Persson et al. 2002;Peuhkurinen et al. 2011).
Variations on the ABA and ITC approaches have emerged to mitigate the mentioned sources of error.Packalén et al. (2015) proposed the enhanced ABA (EABA), in which edge effects due to tree crowns overlapping boundaries of sample plots and grid cells are accounted for.In the approach, boundaries of plots and cells are adjusted for tree crowns segmented from the ALS data, whereby crown segments of "in" trees are merged with the cell, and crown segments of "out" trees are discarded from the cell.This has been shown to resolve edge effects as well as co-registration between reference data and ALS data (Pascual 2019).Breidenbach et al. (2010) proposed the semi-ITC approach to account for systematic errors resulting from trees missing in the ITC data.In the semi-ITC approach, tree crown segments can contain single, multiple or no trees, which can reduce systematic errors substantially as shown by Kandare et al. (2017).
Besides being a useful data source for prediction of forest attributes at tree-and plot-level, ALS data have proven useful for estimating stem frequency distributions, mainly diameter distributions (Gobakken and Naesset 2004;Maltamo et al. 2006).However, few studies have assessed the use of ALS data for prediction of distributions of attributes other than stem diameter.Peuhkurinen et al. (2008) predicted height-diameter distributions at stand level to subsequently predict saw log recoveries from a stem bank.Mehtätalo et al. (2015) demonstrated a method for predicting tree height distributions from ALS data at plot level.Barth et al. (2015) used individual tree data obtained from an ALS inventory to predict top diameter distributions of sawlogs.Tompalski et al. (2015) predicted volume distributions by downscaling plot-level volume predictions obtained from ALS.However, there is a need to further assess the use of ALS data for predicting distributions of forest attributes other than stem diameter at stand level.Such distributions can provide tree-level data useful for preharvest planning, allowing for the assessment of potential assortments and timber values.
Methods for predicting stem frequency distributions from ALS data have mainly been based on parameter estimation of theoretical distributions (Gobakken and Naesset 2004;Thomas et al. 2008).Other methods have been proposed, such as the parameter recovery technique (Mehtätalo et al. 2007), or non-parametric methods by which tree data are imputed using nearest neighbor techniques (Packalén and Maltamo 2008;Räty et al. 2018).In the latter case, tree lists are imputed for target units based on a selected number of reference observations that are most similar, or nearest, in a space of ALS metrics.For this purpose, the k-nearest neighbor (kNN; Cover and Hart 1967) technique has been used widely, whereby values are imputed for target units based on the k-nearest neighbors in the reference data.However, kNN techniques require large reference data sets to ensure that target observations have sufficient neighbors that accurately describe the target observation, which can be expensive and time-consuming to obtain.
In recent years, data from cut-to-length harvesters have emerged as a valuable data source for forest inventory (Kemmerer and Labelle 2020).With advancements in harvester positioning systems, harvested trees can now be georeferenced with sub-meter accuracy (Hauglin et al. 2017;Noordermeer et al. 2021), providing a range of opportunities for forest inventory (Lindroos et al. 2015).Prior research has demonstrated that harvester data, georeferenced with sub-meter accuracy, can be used as ground reference data in ALS-based forest inventories (Hauglin et al. 2018;Maltamo et al. 2019).Furthermore, Maltamo et al. (2019) reported better accuracies in predicting stand-level stem diameter distributions using harvester data than previous studies that relied on field plot data.Despite these promising findings, there remains a need to evaluate the suitability of harvester and ALS data for imputing stem frequency distributions by forest attributes other than stem diameter.Therein, a comparative assessment of inventory approaches is necessary to inform operational inventory practices.
The main objective of this study was to compare ALS-based inventory approaches for imputing stand-level stem frequency distributions using harvester data as reference data.In principle, distributions of all variables recorded by cut-to-length harvesters can be imputed.In the current study, we imputed stand-level distributions of stem diameter, tree height, volume and sawn wood volume, and compared four ALS-based forest inventory approaches: (1) individual tree crown approach (ITC), semi-ITC, area-based approach (ABA) and enhanced ABA (EABA).

Study overview
Fig. 1 shows an overview of the study design.We used data collected by a cut-to-length harvester as reference data for imputing stand-level stem frequency distributions based on ALS data.The harvester was equipped with a real-time kinematic Global Navigation Satellite System (GNSS) and a system for positioning the crane tip (details are provided in following sections).The positioning system allowed locations of harvested trees to be determined with submeter accuracy.1. Study design to compare four inventory approaches for imputing stand-level distributions of stem diameter, tree height, volume and sawn wood volume using harvester and ALS data.We assessed the accuracies of the approaches by comparing the imputed distributions with reference distributions recorded by the harvester, as well as stand mean values.We further assessed which factors influenced the obtained errors.
We segmented trees from the ALS data and delineated harvested areas from the harvester data.We tessellated polygons of harvested areas using a regular grid of 250 m 2 and used grid cells that were located completely within those polygons as ABA harvester plots.We then used polygons of segmented tree crowns that intersected ABA grid cells to adjust cell borders to obtain EABA plots.We used the same tree crown segments as observations for the ITC and semi-ITC approaches.For the ITC approach, we spatially joined harvested trees of which the coordinates of the tree top (x,y,z) in the harvester data were nearest to the coordinates of the tree top detected in the ALS data.For the semi-ITC approach, we spatially joined all trees to their nearest tree crown segment.We computed ALS metrics for all ABA and EABA plots and tree crown segments.
We used the kNN technique to impute stem frequency distributions of stem diameter, tree height, volume and sawn wood volume from ALS data.In the imputation, individual tree data from nearest neighbors were used to impute stem frequencies within class values of the studied forest attributes.We compared the imputed distributions with observed distributions recorded by the harvester.Finally, we performed an error analysis in which we assessed how the forest attribute, inventory approach, imputation technique, stand size, ALS pulse density, delay between ALS data acquisition and harvesting, and percentage of deciduous trees in the stand influenced errors in imputed distributions.
The harvester data were collected in the period between March 2019 and June 2021 using a Komatsu 931XC harvester with a 230H crane and a C144 harvester head.The harvester was equipped with a Septentrio AsteRx-U real-time kinematic GNSS comprising two antennas of which positions and rotations were logged at a one second rate in the National Marine Electronics Association (NMEA) format.Locations of harvested trees were recorded with a planimetric accuracy of approximately 1 m, using positions and rotations obtained from the GNSS and the crane extension as measured by sensor hardware (Noordermeer et al. 2021).
Harvester production report (HPR) files were exported after each operation in the StanFord 2010 format (Arlinger et al. 2012).The HPR files included data on tree species, diameter measurements at 10 cm intervals along the stem, log volumes over bark and crane tip coordinates.For each harvested tree, we computed the stem diameter as the diameter (over bark) recorded by the harvester at a length of 110 cm along the stem, the volume as the sum of log volumes (over bark) recorded by the harvester and sawn log volume as the sum of sawn log volumes.Further, because stem profiles only contained data on the processed part of the stem, we predicted the total tree height from the stem profile using taper models (Hansen et al. 2023).
We generated polygons of harvested areas as unary unions of buffers around the positions of harvested stems (Fig. 1, panel 4).We tessellated polygons of harvested areas into grid cells of 250 m 2 using a regular grid.The grid size conformed to conventional practices in commercial ALS-based forest inventories in Norway, where sample plots and grid cells of 250 m 2 are used in an area-based inventory approach.We defined stands as clusters of grid cells with a minimum total size of 0.2 ha, and that were located completely within the spatial extent of polygons of harvested areas (Fig. 1, panel 7).The minimum size of 0.2 ha conformed to the typical minimum size of forest stands in commercial Norwegian forest inventories.The resulting dataset comprised 49 stands containing 102 289 trees of which 89% were spruce, 7% were pine and 4% were deciduous.Stand sizes ranged from 0.2 to 5.3 ha with a mean of 1.0 ha.A summary of the harvested trees is provided in Table 1.

Airborne laser scanner data
Four ALS surveys covered the harvested stands, acquired in 2013, 2016, 2017 and 2019 with different instruments and acquisi tion parameters (Table 2).The ALS data covered different areas which overlapped in some places.The elapsed time between ALS acquisition and harvesting varied from zero to eight years with a mean of five years.The contractors Blom Geomatics AS and Terratec AS processed the raw ALS data and classified laser echoes as ground or non-ground.For each harvesting operation, we used the most recently acquired ALS data.We used the lidR package (Roussel et al. 2020) in R (R Core Team 2022) to construct digital terrain models as triangulated irregular networks (TIN) from the laser echoes classified as ground.We then normalized the ALS data by computing the height relative to the terrain height of the TIN for all echoes.

Tree crown segmentation
We generated canopy height models for the harvested stands as rasterized values of maximum ALS height with a spatial resolution of 0.5 m.We located trees in the canopy height models using the locate_trees function of the lidR package, which implements a local maximum filter proposed by Popescu and Wynne (2004), using a window size of 5 m.We then segmented trees from the canopy height model using the dalponte2016 function of the lidR package, using default values.The function implements the segmentation algorithm proposed by Dalponte and Coomes (2016).We computed the height and coordinates (x,y,z) of the highest laser echo within each crown segment.The resulting tree crown polygons represented spatial extents of tree crowns and contained coordinates of treetops detected in the ALS data.

Polygons of grid cells and tree crown segments
Our analysis required polygons of grid cells, hereafter harvester plots, and tree crown segments as reference and target observations.For the ABA inventory approach, we used the regular grid cells of 250 m 2 (Fig. 1, panel 7) as harvester plots, i.e., the grid cells located completely within the spatial extent of polygons of harvested areas.The resulting dataset comprised 2040 harvester plots.
For the EABA inventory approach, we used polygons of segmented tree crowns that intersected the ABA plots to adjust plot borders following the approach proposed by Packalén et al. (2015).
We merged each tree crown segment with the intersecting ABA plot with which it overlapped most (Fig. 1, panel 8).For the ITC approach, we used the same tree crown segments that intersected the ABA plots, which we spatially joined with the nearest harvested trees.In case of multiple harvested trees being joined with a given tree crown segment, we used the harvested tree of which the coordinates of the tree top (x,y,z) were nearest to the coordinates of the tree top detected in the ALS data.For the semi-ITC approach, we used the same tree crown segments as those used for the ITC approach, and spatially joined them to the nearest harvested tree.However, multiple harvested trees were allowed for a given tree crown segment.Resultingly, some semi-ITC segments were empty, some contained a single tree, and some contained multiple trees.For each inventory approach, we generated tree lists for the harvester plots or tree crown segments.

Airborne laser scanner metrics
We extracted ALS echoes from within harvester plots and tree crown segments and computed ALS metrics from first (and single) laser echoes.We computed the maximum height, mean height, standard deviation, skewness and kurtsosis.We further computed the percentage of echoes above the mean height and 2 m, and the normalized Shanon diversity index proposed by van Ewijk et al. (2011).We computed echo heights at the 10th, 20th, …, 90th percentiles of the height distributions as well as the 95th percentile.Finally, we divided the height range between the lowest canopy echo > 2 m and the 95th percentile into 10 fractions of equal height and computed canopy density metrics (D0, D1, …, D9) as the proportion of echoes above each fraction divided by the total number of echoes.

Imputation of stem frequency distributions
We imputed stand-level distributions of stem diameter, tree height, volume and sawn wood volume from the ALS data using the four inventory approaches: ITC, semi-ITC, ABA, and EABA.We used the kNN technique to impute tree lists for each target observation based on the nearest neighbors, i.e., observations in the reference data of which a selection of ALS metrics were most similar to the corresponding ALS metrics of the target observation.We applied three-fold spatial cross validation, whereby we removed one third of the stands at a time as target data and used the remaining stands as reference data.We assigned spatially close stands to the same folds using k-means clustering.
For each fold, we imputed tree lists for target observations using the Euclidean distance metric.We then selected ALS metrics as predictor variables and determined the number of nearest neighbors.
In the following sections, the methods used for the imputation are explained in depth.

Nearest neighbor imputation
When nearest neighbor techniques are used for imputation, predictor variables should be selected to determine the similarity between pairs of observations (McRoberts et al. 2015).We used the leaps package (Lumley 2004) for the selection of ALS metrics as predictor variables for each of the four studied forest attributes separately.We used the regsubsets function to find optimal subsets of predictor variables for linear regression models with stem diameter, tree height, volume and sawn wood volume as response variables.We used exhaustive search and set a maximum of five predictor variables in each subset.We used the caret package (Kuhn 2008) to select the number of neighbors, k, for the four forest attributes separately.We used potential values of k = 5, 7, …, 15, and selected the value of k that minimized the root mean square error.We then used the yaImpute package (Crookston and Finley 2008) in R to identify the k-nearest neighbors in the reference data, based on the selected ALS metrics and k.

Removal of potentially erroneous reference data
A preliminary analysis revealed possible co-location errors between harvester and ALS data, i.e., harvester plots and tree crown segments of which ALS heights differed substantially from the heights of harvested trees (Fig. 3).Such large deviations in height may have been caused by errors in the harvester data or temporal decorrelation caused by the delay between ALS acquisition and harvesting.We identified and excluded such observations in the reference data prior to the imputation, similar to methods used by Breidenbach et al. (2010).We fitted a simple linear regression model with the maximum tree height obtained from the harvester data as the response variable and the 95th percentile of ALS height as the predictor variable.We then labeled those observations in the reference data that had a Cooks distance > 0.5 of the mean Cooks distance as erroneous.We removed erroneous observations only from the reference data, i.e., not from the target data, prior to the nearest neighbor imputation in each fold.

Imputation of tree lists
We imputed sets of trees for target observations following the methods described by Packalén and Maltamo (2008): where T uj is a set of trees imputed for the target observation j based on k reference observations denoted u, each containing 1, 2, …, n trees.For example, t u j 1 1 is the first tree of the first reference observation used for imputation for target observation j.We computed the similarities between each target observation and the selected k-nearest neighbors in the reference data, as the inverse of the distance matrices.We then used the similarities as weights in imputing the resultant tree lists for target observations: where T j is the imputed tree list for target observation j and W kj is the similarity between reference observation u and target observation j.Thus, each stem frequency in each class of a reference observation was weighted with the similarity between the target and reference observation.For each target observation, we obtained an imputed stem frequency for each class within the distributions of the studied forest attributes.The classes covered the range of the observed tree attributes in the harvester data, divided into 20 intervals of equal size for all the studied attributes.Finally, from the stand-wise imputed tree lists, we estimated the mean stem diameter, tree height, volume, sawn wood volume and number of stems for each stand, which in the latter three cases we scaled to a per ha unit.

Accuracy assessment
We compared the imputed distributions with observed distributions obtained from the harvester data at stand level.Because the harvested trees included in ABA and EABA plots differed slightly in some cases due to differences in cell borders (Fig. 1), the observed distributions were computed for the two approaches separately.The observed distributions obtained for the EABA approach also served as observed distributions for the ITC and semi-ITC approaches because the spatial extent of tree crown segments corresponded, and the trees included in the reference data were thus identical.We assessed the accuracies of imputed distributions for each stand using an error index (EI) proposed by Packalén and Maltamo (2008): where c is the number of classes in the frequency distribution, f i is the reference stem frequency within class i, ˆi f is the corresponding imputed stem frequency, N is the reference number of stems within all classes and N is the imputed number of stems within all classes.The EI ranges from 0 to 1 and is inversely proportional to accuracy, i.e., a lower value implies better accuracy.We further assessed the accuracies of stand mean estimates using the root mean square error relative to the mean: where n is the number of stands, X i is the observed stand-level forest attribute in stand i, ˆi X is the corresponding estimated forest attribute, and X is the mean value of the studied forest attribute for all stands.We further computed the mean difference between observed and estimated values, relative to the mean:

Error analysis
We assessed whether the errors obtained for the imputed distributions could be explained by the forest attribute, inventory approach, imputation technique, stand size, pulse density, number of years between ALS and harvester data acquisition (ALS delay), or the percentage of deciduous trees (%Deciduous).We fitted a linear model with the stand-wise EI obtained for all forest attributes, inventory approaches, and imputation techniques as the response, and the forest attribute, inventory approach, stand size, pulse density, ALS delay, and percentage of deciduous trees as predictor variables.The data used for the modelling were characterized by a hierarchical structure whereby error indices were repeated measurements for stands.We accounted for this dependency by fitting a mixed model with a random intercept for the stand.We fitted the model using the lme4 package (Bates et al. 2015) in R. We selected the following model form based on model diagnostics and analysis of variance comparisons: where S i is the random intercept for stand i, β 0,1, ... ,10 are parameters to be estimated, Attri bute Height , Attri bute Volume , and Attri bute SawnVolume are indicator variables representing the forest attributes tree height, volume and sawn wood volume, respectively, where stem diameter is the baseline, Approach ITC , Approach semi-ITC , and Approach EABA are indicator variables representing the inventory approaches ITC, semi-ITC and EABA, respectively, where ABA is the baseline, StandSize is the stand size in ha, PulseDensity is the mean number of laser pulses per m 2 in the stand, ALSdelay is the number of years between ALS and harvester data acquisition, %Deciduous is the percentage of deciduous trees in the stand and ε is the model error.We tested whether the predictor variables significantly influenced the error indices by constructing 95% confidence intervals of parameter estimates from the likelihood profile.Further, we assessed how the predictor variables affected the stand-level error indices based on estimates obtained for β 7 -β 10 .

Imputation of stem frequency distributions
Fig. 3 shows an example of observations used as reference data in the imputation and observations labeled as erroneous and therefore not used for nearest neighbor imputation.The percentage of trees labeled as erroneous ranged from 20% to 32% for the four inventory approaches and three folds.The number of nearest neighbors selected for the four inventory approaches and in the three folds ranged from 11 to 15 with a mean of 13.
Error indices obtained for the stand-wise imputed distributions for the four inventory approaches are shown in Fig. 4. We obtained smallest error indices for imputation of volume distributions using the semi-ITC, ABA, and EABA inventory approaches.Error indices were similar across forest attributes.We obtained best accuracies, i.e., smallest values of EI, of 0.13, 0.13, 0.10 and 0.10, for distributions of stem diameter, tree height, volume, and sawn wood volume.
Fig. 5 shows reference and imputed distributions of the four attributes for a representative example stand.Distributions imputed following the ITC approach showed systematic errors, with underpredicted stem frequencies in smaller classes.Imputed stem frequencies obtained for the remaining inventory approaches tended to be similar.

Estimation of stand-level forest attributes
The results of the stand-level estimation of mean values are shown in Table 3.Values of RMSE% obtained for the ITC approach were substantially better than those obtained for the remaining approaches.Apart from the ITC approach, values of MD%, i.e., systematic errors, were < 10% for stand mean estimates of all forest attributes.Overall, the semi-ITC approach gave the best mean accuracies.

Error analysis
The results of the fitted model used in the error analysis (Eq. 6) are shown in Fig. 6.The forest attribute, inventory approach, stand size and the laser pulse density significantly influenced the stand-level EI (p < 0.05).On average, values of EI increased by 20.4 when the ITC approach was used, in comparison to the baseline which was the ABA approach.Values of EI obtained for the semi-ITC and EABA approaches did not differ significantly from the ABA approach.Increased ALS delay and the percentage of deciduous trees in the stand did not cause larger errors in imputed frequency distributions.

Discussion
Stem frequency distributions can provide detailed information useful for pre-harvest forest planning.We compared four inventory approaches for imputation of stem frequency distributions by four forest attributes.We further assessed which factors influenced the accuracy of the imputed distributions.
Most previous studies on predicting distributions from ALS data have focused on stem diameter distributions and used conventional field plot measurements as reference data.For example, Packalén and Maltamo (2008) predicted species-specific stem diameter distributions and compared nearest neighbor imputation with parameter estimation of a theoretical (Weibull) distribution.The nearest neighbor technique outperformed the parametric technique, and they obtained  values of EI of 0.33 and 0.21 in predicting Norway spruce stem diameter distributions, for the two approaches, respectively.We obtained better accuracies for imputed stem diameter distributions in this study, i.e., a mean EI of 0.13 obtained for the semi-ITC, ABA and EABA approaches.This may have been due to the relatively large numbers of reference observations used in this study, i.e., 2040 harvester plots and 11 975 tree crown segments, in comparison to <500 field plots used in the mentioned study.In addition, the mentioned study used data comprising various stand development classes and a mixture of species, as opposed to mature spruce stands ready for harvest.Furthermore, the mentioned study used data collected from an average of seven plots per stand, while we used harvester data which comprised entire stands.In another study, Söderberg et al. (2021) used harvester data from clear-felled stands and ALS data to impute diameter distributions at stand level.Indeed, that study reported values of EI of 0.13-0.14,i.e., equivalent to the level of accuracy obtained here.Maltamo et al. (2019) imputed stand-level stem diameter distributions using harvester reference data and the kNN technique.The study reported better accuracies than previous studies using field plot data; values of EI ranged from 0.15 to 0.19, depending on the size of the ABA harvester plots used in the reference data, however 0.15 when a plot size of 200 m 2 was used and comparable to our plot size of 250 m 2 .In this study, we obtained a similar level of accuracy, i.e., mean EI of 0.13 obtained for the ABA, EABA and semi-ITC inventory approaches, confirming the accuracy reported in the mentioned study.
Accuracies of imputed distributions of tree height, volume and sawn wood volume were similar to, or better than, accuracies obtained for imputed distributions of stem diameter (Fig. 4).This result highlights the utility of harvester and ALS data for imputing distributions of forest attributes other than stem diameter.In a previous study on predicting stand-level top diameter distributions of sawlogs using harvester and ALS data, Barth et al. (2015) obtained values of EI of 0.34 and 0.15 for all diameter classes and diameter classes ≥15 cm, respectively.The mentioned study demonstrated the utility of harvester data as reference data in predicting product recovery in logging operations.In another study, however, Mehtätalo et al. (2015) used ALS data to predict plot-wise distributions of tree height using a parametric approach.The study reported unsatisfactory results based on comparisons of the predicted distributions with empirical data and therefore advised against practical use of the method.
We obtained significantly poorer accuracies for the ITC approach compared to other inventory approaches.This result confirms findings of previous studies in that the ITC approach can lead to systematic errors due to trees not being detected in the ALS data and thus missing in the reference data (Breidenbach et al. 2010;Maltamo et al. 2004).Smaller trees are less likely to be detected and delineated from the ALS data, and resultingly, smaller trees are typically missing in the imputed distributions (Bergseng et al. 2015).In previous studies assessing tree detection from ALS data, detection rates have varied considerably and typically ranged between 25% and 80% (Kaartinen et al. 2012).The detection rate has been found to be strongly affected by forest structure (Falkowski et al. 2008;Vauhkonen et al. 2012).In this study, a total of 43 371 trees were segmented from the ALS data within the harvested stands, while the harvester data comprised 102 289 trees within the corresponding areas, i.e., a detection rate of 42%.Thus, these trees not detected in the reference data are likely the main cause of the relatively poor results obtained for the ITC approach.
Error indices obtained for distributions imputed using the semi-ITC and ABA inventory approaches were similar for most forest attributes, as well as errors in stand-level estimates of those attributes.This result was in accordance with findings from Rahlf et al. (2015) who compared the two approaches in predicting forest attributes from image-based point cloud data, and obtained similar accuracies for the two approaches.By accounting for edge effects along harvester plot borders, we assessed whether the EABA would result in better accuracies than the ABA.The error analysis revealed that this was not the case.This finding contrasted with earlier studies that showed that by resolving edge effects and co-location errors between reference data and ALS data, better accuracies can be obtained using an EABA (Packalén et al. 2015;Pascual 2019).
In previous studies, the integration of ITC and ABA approaches has yielded promising results for ALS-based imputation of tree-level data (Breidenbach et al. 2012;Lindberg et al. 2010).In a study on predicting stem diameter distributions from ALS data, Xu et al. (2014) found that combining the ITC and ABA approaches can improve the accuracy of predicted distributions.The study compared two methods, namely (1) replacement, whereby large diameter classes in ABA-derived distributions were replaced by ITC-derived trees, and (2) by matching histograms obtained with ABA and ITC.The best accuracy was obtained by the second approach, whereby the accuracy of predicted distributions only increased slightly due to the ITC-derived information.
Our findings demonstrate that harvester data can be linked to ALS data to impute stem frequency distributions of forest attributes other than stem diameter with satisfactory accuracy.This suggests the potential applicability of the proposed methods in ALS-based inventories.However, harvester data primarily comprise mature forest stands ready for harvest.This can introduce sampling biases, and, consequently, lead to systematic errors in forest inventory based on harvester data, as demonstrated by Räty et al. (2022).Their study revealed that harvester data proved more useful in productive forests than unproductive forests, although they reported systematic errors in both cases.The study also showed that such systematic errors could be mitigated when a probability sample of field measurements obtained from sample plots is available within the inventory area.Thus, while the methods proposed here show promise in pre-harvest inventory of mature forest stands, operational application of the methods in other forest types may require supplementary sample plot data for unbiased results.Therefore, further research is needed to assess whether the methods are applicable to more variable forest types, potentially supported by samples of field plots typically used in conventional ALS-based inventories.

Conclusions
This study highlights the utility of harvester and ALS data for imputing stem frequency distributions in mature spruce-dominated stands.The choice of inventory approach had great impact on the accuracies of the imputed distributions.We obtained significantly better accuracies for the semi-ITC, ABA and EABA inventory approaches than the ITC approach.The forest attribute, inventory approach, stand size and the laser pulse density had significant effects on the accuracies of imputed frequency distributions, however the ALS delay and percentage of deciduous trees did not.

Declaration of openness of research materials, data, and code
The harvester data used in this study are owned by the contractor, Valdres Skog AS, and therefore not openly available.The ALS data are available at https://hoydedata.no.The statistical code is available upon request.

Silva
Fig.1.Study design to compare four inventory approaches for imputing stand-level distributions of stem diameter, tree height, volume and sawn wood volume using harvester and ALS data.We assessed the accuracies of the approaches by comparing the imputed distributions with reference distributions recorded by the harvester, as well as stand mean values.We further assessed which factors influenced the obtained errors.

Fig. 3 .
Fig. 3. Heights of harvested trees plotted against the 95 percentile of laser echo heights for observations in the reference data used for an example fold and the individual tree crown approach.Observations used as reference data in the imputation are shown as transparent blue circles and observations labeled as erroneous are shown as red dots.

Fig. 4 .
Fig. 4. Box plot of error indices obtained for stand-level imputed distributions of the studied attributes, obtained for the four inventory approaches: individual tree crown (ITC), semi-ITC, area-based approach (ABA) and enhanced ABA.

Fig. 5 .
Fig.5.Imputed stem frequency distributions obtained for a representative example stand, i.e., with an error index nearest to the mean error index obtained for all stands, recorded by the harvester and imputed using the four inventory approaches: individual tree crown (ITC), semi-ITC, area-based approach (ABA) and enhanced ABA (EABA).

Fig. 6 .
Fig. 6.Parameter estimates (bars) and 95% confidence intervals (whiskers) obtained for the fitted model used in the error analysis.Variables that significantly influenced the obtained error indices (p < 0.05) are labeled with an asterisk.

Table 1 .
Summary statistics (mean and range) of stem diameters, tree heights predicted with taper models, volumes (over bark), and sawnwood volumes (over bark) of harvested trees.

Table 3 .
Errors obtained for estimates of stand-level forest attributes, as root mean square errors and mean differences relative to the observed mean values (RMSE%, MD%), for the four inventory approaches: individual tree crown (ITC), semi-ITC, area-based approach (ABA) and enhanced ABA (EABA).