Improving Satellite Image Based Forest Inventory by Using A Priori Site Quality Information

The purpose of this study was to test the benefits of a forest site quality map, when applying satellite image based forest inventory. By combining field sample plot data from national forest inventories with satellite imagery and forest site quality data, it is possible to estimate forest stand characteristics with higher accuracy for smaller areas. The reliability of the estimates was evaluated using the data from a standwise survey for area sizes ranging from 0.06 ha to 300 ha. When the mean volume was estimated, a relative error of 14 per cent was obtained for areas of 50 ha; for areas of 30 ha the corresponding figure was below 20 per cent. The relative gain in interpretation accuracy, when including the forest site quality information, ranged between 1 and 6 per cent. The advantage increased according to the size of the target area. The forest site quality map had the effect of decreasing the relative error in Norway spruce volume estimations, but it did not contribute to Scots pine volume estimation procedure.


Introduction
inventory areas in Fennoscandia (Green et al. 1981, Jonasson 1987, Kilkki et al. 1988, Poso et In Finland national forest inventories (NFI), based al. 1990, Thomas 1990).To provide estimates on field sample plots, have been carried out since for smaller areas, such as individual municipalithe 1920's.Nowadays, additional data from sat-ties, Kilkki and Päivinen (1987) proposed a new ellite imagery have been used to obtain more method called "the reference sample plot methaccurate results (Tomppo 1992a).Several stud-od", in which satellite data are utilized in searchies have utilized satellite data to generalize the ing reference sample plots for pixels within an NFI field sample plot information over specific inventory area.Also several applications have been developed for obtaining better precision even to forest stand level.Predictions of stand characteristics from regression models for image segmentation based stands (Tomppo 1986, Häme et ai. 1988, Hagner 1989) have been used in growing stock estimations and forest monitoring (Häme 1986), as well as forest site quality estimation (Tomppo 1992b).The spectral stratification of satellite image pixels have also been succesfully used in the estimation of forest stand characteristics (Poso et al. 1987).Satellite image classification accuracy can be increased by using information from ancillary sources, such as topographical maps (Strahler 1980, Hutchinson 1982, Poso et al. 1984, Poso et al. 1987, Skidmore 1989, Tomppo 1990, 1992b, Franklin 1994) and image spatial characteristics (Peddle and Franklin 1991, Franklin and Wilson 1992, Tomppo 1992b).Ancillary data can be used either before, during or after classification, through stratification, classifier operations or post-classification sorting (Hutchinson 1982).The use of ancillary data prior to classification requires division of the target area into smaller strata based on some criteria.
The purpose of this study was to test the applicability of forest site quality maps as an ancillary data source to improve forest inventory based on satellite imagery and the NFI sample plot data covering large areas.The main advantage of using forest site quality maps could be expected when estimating volume increment, because determination of different quality classes is based on forest site fertility.The reference sample plot method (RSP) is used to combine the field sample plots and satellite data (Kilkki andPäivinen 1986, Tomppo 1990).This method, which may be seen as a combination of the traditional nearest neighbour classification and stratification approaches, has been further developed by Muinonen and Tokola (1990).The method is based on the assumptions that all field sample plots within a large area, which is typically an area of one satellite image or a part of it, can be used as a ground truth data for any subarea, and the satellite data can be used to estimate the areas represented by each sample plot.The area weights for the field sample plots are derived as a function of the distance between the reflectance responses of the pixels in the inventory area and the pixels corresponding to the field sample plots.As a result, the inventory data are in the same form as the original field measurement data.The method has been used to generalize the NFI and other field sample plot information over specific inventory areas in southern Finland (Muinonen and Tokola 1990).The estimates of the stand mean volume and the volumes by species were derived, and the accuracy of the estimates was studied for the areas of varying size.

Satellite Data
The study area is located in eastern Finland (Fig. 1).Landsat Thematic Mapper image recorded 19th July 1988 were used in the study.The image, covering the whole study area, was captured from the Landsat WRS path 186 between rows 16c and 17a.
The image was rectified to a 1:20 000 base map with a pixel size of 20 m by 20 m using fifteen control points.A second order polynomial model for coordinate transformation and the bilinear interpolation resampling method (Campbell 1987) were applied in the rectification.The root mean square error of the rectification was estimated to 0.26 pixels.Because the area was relatively flat, the errors caused by altitude parallaxes were ignored.

Sample Plots of the National Forest Inventory
The sample plot data from the 8th national forest inventory (NFI) were used (Fig. 1).All field data were collected in 1988.The primary sampling unit in the NFI consists of 21 relascope sample plots (basal area factor 2) (Kuusela and Salminen 1969), and the distance between plots is 200 m.The distance between two neighbouring primary sample units is 8 km in the north-south direction and 7 km in the east-west direction.
The reference area (RA) was defined to be the part of the Landsat TM scene, in which the field sample plots were situated.The distance from nearest forest boundary was recorded in the field.Thus, the sample plots falling near the forestnon-forest boundary were excluded from the field data.

Test area
The total number of the secondary sample units were 295, and this figure excluding scrubland and wasteland.Field sample plots were sometimes in forest stand borders, where forest characteristics have changed considerably.All those forest stand boundary sample plots were included in the field data.The accuracy of locating the sample plots in the satellite image was examined by computing the correlations between the forest stand characteristics and the 3-by-3 pixel windows of the satellite data centred on the sample plots.The correlations were highest in the middle of the window.

Stand Data Provided by the District Forestry Board
The District Forestry Board performs stand surveys based on interpretation of aerial photographs and subjective field measurements.Stand boundaries and stand characteristics were available for a test area of 2280 ha (Fig. 1, Table 1).The test area was dominated by young stands of Norway spruce and average stand size was small, about 1 ha.
The stand boundaries were digitized and transformed into raster format.The stand data were checked in 1990 using two-stage sampling.A total of 33 randomly sampled stands were measured using a systematic layout of relascope sample points in the stands (Laasenaho and Päivinen 1986).The variables of interest were: total mean volume (V tot , m 3 ha"') and specieswise mean stand volumes for Scots pine (Pinus sylvestris) (Vpj ne , m 3 ha~'), Norway spruce (Picea abies)  (Vgpruce» m 3 ha" 1 ) and the broadleaved species (Vdec» m 3 ha" 1 ), and the annual volume increment (I v , m 3 ha" 1 ).The broadleaved species include birch (Betula pendula, B. pubescens), alder (Alnus incana) and aspen (Populus tremula).The annual volume increment was computed using the stand increment models of Nyyssönen and Mielikäinen (1978).
When the control relascope data were compared to the District Forestry Board data, the standwise root mean square errors (RMSE) appeared to be considerable high for the species volumes, 42 per cent for V P j ne and 28 per cent for V S pruce-The relative RMSE of V to t was estimated to 16 per cent (Table 2).The volume of pine seemed to be overestimated, and the volumes of spruce and broadleaved species underestimated in the District Forestry Board data.However, no calibration was made.The stands with large volumes were underestimated, but the number of such stands was considered too small to justify any calibration.The stand data of the District Forestry Board were assumed to be correct and were used as a reference material in the study.

Numerical Maps from the National Board of Surveying
The numerical forest site quality map used in this study was initially prepared for forest taxation purposes.The taxation classes, which depict the production capacity of forest stands, are determined by using aerial photographs, field measurements and other geomorphological data.Four distinct classes are separated based on forest site types (Cajander 1909).The class I indicates the highest productivity level.Stoniness and wet-ness of the soil decrease the productivity and quality of stand.Table 3 shows the distribution of the site quality classes within the study area.
The two poorest classes were combined in this study, because there were not enough NFI sample plots in those classes.

Methods
The study was based on the assumption that all the NFI sample plots inside the area covered by the Landsat TM image can be used as ground truth for any subarea of the image.It was also assumed that the location of forest land is known.
In this study the reference sample plot method (RSP) was used to post-stratify the NFI field sample plots according to the spectral properties of the pixel covering a plot.The forest site quality map was used as a priori information in stratification of the target areas, and the NFI field sample plot data was used in the estimation, e.g. for a poor stand, only the reference sample plots from the same site quality class were used.
The reference sample plot method (RSP) is closely related to the grouping method (Poso 1972) for combining the field sample plots and satellite data (Kilkki and Päivinen 1986).Forest characteristics can be estimated as a function of the distance between the channel responses of the pixels in the inventory area and the pixels corresponding to the reference field sample plots.The weighted Euclidean distances between the channel values can be calculated as where nc = number of channels, i = pixel i, which is covering NFI sample plot j = forest characteristics are estimated to pixel j using NFI sample plots Cijh = difference in spectral values of pixels i and j in channel h, and Ph = empirical constant for channel h.The optimal parameters for the spectral distance function (Formula 1) and the weighting parameter t (Formula 2) for the RSP method were determined heuristically.Their selection was based on the RMSE and the bias of the plotwise volume estimates.The estimates (Formula 3) were computed using fifteen nearest sample plots, because it gives an adequate level of accuracy (Pitkänen 1991).
Reliability of the pixelwise estimates were evaluated using a cross-validation method with the NFI data; the forest stand characteristics are estimated for each sample plot in turn from the rest of the sample plots using the method and compared to the values obtained using the field measurements.The formula of accuracy used to compare the estimates was where y, = variable y in NFI sample plot i y, = satellite image based estimate for variable y in NFI sample plot i n = total number of NFI plots A simulation study was used to find out how the error of the satellite image based estimates decreases when the target area increases in size.The estimates of forest characteristics were derived using the RSP method with and without the forest site quality class based pre-stratification.The reliability estimates over larger areas were determined using repeated random samplings with replacement.Five hundred square shaped areas were randomly placed within the test area.The estimates for each areal unit were calculated using the differences between averages of estimated forest parameter values and average stand volumes of the pixels.The formula for accuracy measure, the relative standard error, was SE = (5) y where y, = mean estimate of survey by stands (District Forestry Board) for variable y m simulated area i y, = mean of satellite image based estimates for variable y in simulated area i y = mean of simulated areal estimates n = number of simulated areas The error in the standwise survey was not taken into account.This can be reduced if the errors are assumed to be independent and the amount of the error is known in terms of the size of the area.For example, for a stand of one hectare in size, the relative error of the total mean volume estimates would decrease from 36.2 to 32.5 per cent, if an error of 16 per cent occurs in the standwise survey.

Plotwise Reliability
The separation of the NFI sample plots according to site quality slightly improved the estimations of the sample plot specific cross-validation (Table 4).The bias was small compared to the deviation of the differences and it was not statistically significant.Especially, stratification decreased the bias in the estimation of pine and broadleaved species.Due to pre-stratification, the root mean square errors (RMSE) decreased by 3.35 m 3 ha ' for pine volume (V pine ), 2.68 m 3 ha"" 1 for spruce volume (V spruce ), 1.17 m 3 ha" 1 for the volume of broadleaved species (V dec ) and 4.30 m 3 ha" 1 in the estimation of the total mean volume (Vtot).The volume increment estimates were also slightly better with pre-stratification (Table 4).

Area-wise Reliability
When using pre-stratification according to the site quality map, V tot could be estimated with a relative standard error (SE) of 14 per cent for an area of 50 ha and 9 per cent for 100 ha areas (Fig. 2, Table 5).The standard error of the total mean volume increased rapidly when the area became less than 30 ha in size.The relative SE of the pre-stratified estimates were on average 3.4 per cent lower when compared to purely satellite image based estimates.For large areas (300 ha, 6.2 per cent) the advantage was greater than for areas small in size (10 ha, 2.1 per cent) (Fig. 2).No statistically significant bias was observed (Fig. 3, Fig. 4), although slight overestimates were obtained (Fig. 3).
The area-wise error estimates for the volumes of tree species were much higher (Table 5).For individual stands, one hectare in size, the area i • ' ' i ' ' ' i ' ' ' i ' ' ' i ' ' ' i ' ' ' i ' ' ' i ' ' ' r-< > i • ' ' i ' ' ' i ' ' ' i ' ' r ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' • I ' ' !related errors were 95.4 per cent for the volume of pine.It means that the method did not enable pine trees to be identified from among the spruce dominated stands and randomly estimated numbers are almost equal.For an area of 50 ha, the relative SE of the mean volume of pine was 49.5 per cent, and even for an area of 300 ha it was about 34 per cent (Fig. 5).Pre-stratification did not significantly increase the accuracy of pine volume estimation, whereas it did improve the accuracy of other estimates (Table 5).For the other tree species, the standard errors were smaller than for the volume of pine, but not as good as for the total mean volume.The pine volumes were overestimated (4-14 m 3 ha" 1 ), but the bias was not statistically significant (Fig. 6).The spruce volumes were estimated with the smallest specieswise standard error.Although pre-stratification was used, the relative SE for the stand level estimates was still high (61.6 per cent).It decreased rapidly and was 28.9 per cent for an inventory area of 30 ha, and the 15.7 per cent error level was reached when inventorying an area of 200 ha.The gain resulting from using the forest site quality map was 5.1 per cent on average for areas between 30-300 hectares in size.The variation of the bias was so high for spruce, as well as for the broadleaved species, that no statistical bias inference could be done.The vol-  ume of broadleaved tree species was estimated with 39.0 per cent relative SE for an area of 50 ha, and the error decreased to 20.2 per cent when inventorying an area of 200 ha.The gain provided by using the site quality map increased from 0 to 13.6 per cent according to the size of the area.The estimates of the volume increment obtained using the site quality map were only slightly better with the area of less than 30 ha (Fig. 7).With the inventory area ranging from 30 ha to 300 ha, the standard error was about 27 per cent and the gain provided by using the site quality map averaged at 3.3 per cent.

Discussion
The point accuracy of the satellite image based estimates is generally not very good, when detailed forest characteristics are concerned.At the moment there are many additional data sources, which can be utilized for improving interpretation procedure.However, in the area-wise estimation, when the number of pixels increases, the accuracy improves.Due to the spatial autocorrelation of the pixels, the degree of improvement can not be directly computed.However, using randomized simulated areas the accuracy of area averages can be determined.Average estimates for areas like 1-300 ha could be used for forest property taxation purposes and for planning of annual cutting or buying activities in timber companies.
The used reference sample plot method (Kilkki and Päivinen 1987) could be called the knearest neighbour estimation with the kernel function.It has been shown that kernel and nearest neighbour estimators are biased (Altman 1992), but the method has other advantages.As a result from the estimation of stand characteristics for unknown pixel, the area weight is divided according to the spectral distances to the ^-nearest field sample plots.Thus, the variation in the field data is preserved, and the covariance structure of the stand variables is retained.When low-precision satellite data is utilized in satellite image based forest inventory, it is always possible that separate interpretation of forest characteristics can lead to inconsistent results.If weighting is used to give smaller weight for nearest points farther from the average, bias can also be smaller (Altman 1992).Another problem in nonparametric estimation is overfitting, which has to be avoided.If few neighbours are used in estimation, estimate will be very close to original data.Due to overfitting, estimate will be almost unbiased, but it will have large variance if samplings are repeated (Altman 1992).If many neighbours are used and heavy weighting of distance is applied, it can lead to highly biased estimates.In this study, the weighting parameters were determined using a cross-validiation of the NFI sample plot based reference data.To further avoid bias, the method was applied in the test area with the site quality pre-stratification data, and the gain of that information was studied in interpretation procedure.
The estimates for the mean volume of the growing stock within the 2280 ha test area were compared with the results of the standwise survey.When the total mean volume was estimated, a relative error of 14 per cent was obtained when inventorying an area of 50 ha, while in an area of 30 ha the error was less than 20 per cent.The benefit of using the forest site quality map ranged between 0.6 and 6.2 per cent, and the benefit increasing according to the size of the target area.The specieswise results were not as good.Only the volume of the dominant tree species, Norway spruce, was estimated with a relative error of less than 20 per cent.Using the site quality map, the error decreased by 7 per cent in regard to spruce volume estimation in an area of 100 ha.However, the site quality map did not improve the accuracy of Scots pine volume estimation.The mean volume estimates for the different tree species were inaccurate, but the bias was not statistically significant.
The standwise survey data of the District Forestry Board had about 16 per cent relative error for the mean volume of each stand.If this error is noncorrelated, it can be subtracted from the error of the satellite estimates.With this assumption, a standwise error of 32.5 per cent was obtained for the total mean volume of the satellite estimates.This is higher than was obtained in the study by Hagner (1989).One reason for this discrepancy may be in the effect of the stand boundary pixels.Another reason may be in the more variable, pixelwise training data of this study.For larger areas the mean estimates were better.The mean timber volume of a regular farm forest holding, about 30 ha, can be estimated with a relative error of 17.4 per cent.For the individual tree species, the estimates were far worse; in a forest area of 30 ha, the relative error in the pine volume estimate was 55.1 per cent.This means that the satellite image based inven-tory results are suitable for farm level total volume assessments, but not for stand level timber management planning.
The test data from the District Forestry Board and the site quality map data from the National Board of Surveying have been produced using similar kind of subjective method.However, the test data have a main scope as a data source for timber management planning and it is an expensive alternative for tested methodology.Site quality indicates also similar properties as determined in the District Forestry Board data, although fertility of the soil have main attention in delineation of different forest types.The study could be improved significantly if more detailed test material would be available.
In the conditions prevailing in Finland, the soil type affects the spectral responses received from forested areas.Different objects, e.g. a dense stand of spruce and open swamp areas, can have similar reflectance properties (Saukkola 1982).Thus, a digitized peatland map from the inventory area can be used to separate peatlands and mineral soils.If digital ancillary data, e.g.forest or soil type maps concerning the target area, are available, the reliability of the subarea estimation can be improved and more representative field samples can be chosen on the basis of a priori information.

Fig. 1 .
Fig. 1.The location of the test area (2280 ha) inside the Landsat TM scene and the layout of the NFI sample plot tracks within the area covered by the image.The reference area includes all the sample plots within the scene area.All the areas belonging to the taxation class I were omitted from the shaded part of the test area.
The kernel function is used to give the weight for spectral differences.The weight of each reference sample plot is calculated for each pixel in the inventory area.The weight Wy is calculated as where t = empirical parameterThe estimates for the forest characteristics can be calculated as the moving averages of the NFI sample plot values Xj, which are the estimates of the pixel's forest characteristic values.The forest characteristic estimate y per unit area for pixel i is calculated from number of the reference sample plots used

Fig. 2 .
Fig. 2. The relative standard error of the total mean volume in the various area sizes (w/o site quality classes = crossed line, w site quality classes = dotted line).

Fig. 3 .Fig. 4 .
Fig.3.The bias and bias deviation of the total mean volume in the various area sizes with forest site quality pre-stratification.

Fig. 5 .
Fig. 5.The relative standard error of the mean pine volume in the various area sizes (w/o site quality classes = crossed line, w site quality classes = dotted line).

Fig. 6 .Fig. 7 .
Fig.6.The bias and bias deviation of the mean pine volume in the various area sizes with forest site quality pre-stratification.

Table 1 .
Means and standard devitations in regard to variables of interest for the reference area (from the NFI sample plots, n = 791) and the test area (from the District Forestry Board data, n = 1900).

Table 3 .
The distribution of taxation site quality classes within the reference-and test area data.

Table 4 .
The estimates of plotwise root mean square errors, the bias estimates and bias deviation (sdt>) for the growing stock.The estimates are calculated with (w) and without (w/o) site quality pre-stratification.

Table 5 .
The standard error estimates for the growing stock in the areas of different sizes.The estimates are calculated with (w) and without (w/o) site quality pre-stratification.