Silva Fennica 39(4) research articles Landsat TM Imagery and High Altitude Aerial Photographs in Estimation of Forest Characteristics

Satellite sensor data have traditionally been used in multi-source forest inventory for estimating forest characteristics. Their advantages generally are large geographic coverage and large spectral range. Another remote sensing data source for forest inventories offering a large geographic coverage is high altitude aerial photography. In high altitude aerial photographs the spectral range is very narrow but the spatial resolution is high. This allows the extraction of texture features for forest inventory purposes. In this study we utilized a Landsat 7 ETM satellite image, a photo mosaic composed of high altitude panchromatic aerial photographs, and a combination of the aforementioned in estimating forest attributes for an area covering approximately 281 000 ha in Forestry Centre Hame-Uusimaa in Southern Finland. Sample plots of 9th National Forest Inventory (NFI9) were used as field data. In the estimation, 6 Landsat 7 ETM image channels were used. For aerial photographs, 4 image channels were composed from the spectral averages and texture features. In combining both data sources, 6 Landsat channels and 3 aerial image texture channels were selected for the analysis. The accuracy of forest estimates based on the Landsat image was better than that of estimates based on high altitude aerial photographs. On the other hand, using the combination of Landsat ETM spectral features and textural features on high altitude aerial photographs improved the estimation accuracy of most forest attributes.


Introduction
Satellite imagery (e.g.Landsat TM or SPOT XS) has been widely used in remote sensing aided forest inventories to produce geospatial estimates of forest characteristics.A successful example of this is the Finnish multi-source National Forest Inventory (Tomppo, 1993).The method employs a systematic sample of field plots, satellite images, and digital map data to produce estimates of selected forest attributes in the form of thematic maps and forest statistics at a national, regional and municipal level.Poso and Kujala (1971) utilized visual interpretation of black and white aerial photographs in combination with two-phase plot sampling in the fifth national forest inventory (NFI 5) in Northern Finland.The first-phase sample plots were stratified into fairly small strata on the basis of an interpretation of aerial photo stereo pairs.One plot from each stratum was sampled for field measurement and the field data of the plot was transferred to all first-phase sample plots belonging to the same stratum.This method was also used in NFI 6 and NFI 7 with some modifications (Mattila 1985), until satellite images substituted for aerial photo interpretation.Nowadays the interest in visual aerial photo interpretation has mainly diminished.
Currently high altitude aerial photography provides up-to-date image material with a large geographic coverage.In Finland, high altitude black and white aerial photographs covering the entire country are acquired for mapping purposes with intervals of 3-4 years (Pätynen 1998).The scale of these images usually is 1:60 000, enabling digital image products with approximately onemeter spatial resolution.
When comparing satellite imagery, such as Landsat TM or SPOT multispectral, and high altitude aerial photography, the main advantage of satellite imagery is good spectral resolution (wide spectral range), whereas the high altitude aerial images have superior spatial resolution.Furthermore, high altitude aerial imagery is acquired at a very low altitude when compared to satellite imagery and, thus, spectral distortions caused by the sun-object-sensor geometry such as bidirectional reflectance are present in the imagery.Bidirectional effects are governed by the illumina-tion conditions, the viewing geometry, the atmospheric conditions and factors such as land cover or vegetation type (e.g.Jackson et al. 1990, Li and Strahler 1992, Lillesand and Kiefer 1994, Pellikka et al. 2000).The bidirectional effects are typically greater in data acquired from low altitudes and using wide-angle lenses (Lillesand andKiefer 1994, Pellikka et al. 2000) such as aerial photographs, since every pixel in the image is viewed with different zenith and azimuth angles.Because of bidirectional reflectance, similar objects may have different spectral characteristics in different parts of the image.Therefore it can be assumed that the spectral values of satellite image pixels provide better basis for estimating forest characteristics than the high altitude aerial photograph pixels do.
On the other hand, the higher spatial resolution of high altitude photographs makes it possible to utilize the textural properties of the imagery in estimating forest attributes.Image texture has been defined as the spatial organization of the gray-levels of the image pixels (Haralick et al. 1973).When utilizing the image texture, an element of major importance is the scale factor between the size of the objects measured and the spatial resolution of the image.When the spatial resolution of the image is significantly finer than the size of the objects in the image, the image pixels are highly correlated and the local variance is low.If the resolution is similar to the objects in the image, fewer neighboring pixels are likely to have similar values and the local variance rises.When the resolution is coarse in relation to the objects, single pixels contain many objects and the local variance again decreases (Woodcock and Strahler 1987).In satellite imagery such as Landsat TM the resolution is coarse in relation to the objects, i.e. tree canopies, and consequently, studies applying satellite imagery have shown that at spatial resolutions similar to Landsat TM, the spectral features of the images produce better results than textural features in estimating forest attributes (e.g.Shang andWaite 1991, Cohen andSpies 1992).At higher spatial resolution the studies by e.g.Hyppänen (1996), Wulder et al. (1998) and Levesque and King (2003) have shown that texture features have advantages in estimating forest characteristics.Generally, the texture features that have been applied are based on the co-occurrence matrices or semivariances (variograms) of the image gray-levels.
This study was a part of a research project where the applicability of the Finnish NFI data and methodology was examined.Here growing stock related forest attributes were of particular interest.The objective of this study was to examine the accuracy of multi-source forest inventory estimates utilizing satellite and high altitude aerial image features and their combination.

Study Area
The study area covered approximately 281 000 ha (of which 172 000 ha forestry land) in the surroundings of Lahti, in the area of the Forestry Centre of Häme-Uusimaa (Fig. 1).The area is divided into two geologically and topographically different parts by the First Salpausselkä ridge, which was formed at the edge of the continental ice sheet at the final stages of the most recent (Weichselian or Vistula) glacial period.The area north of Salpausselkä is characterized by forests and lakes, and the elevation variation is approximately between 80 and 180 m.The area south of Salpausselkä is characterized by forest and agricultural land on clay soils with relatively flat topography, elevation variation approximately 40-100 m.

Satellite Imagery
The Landsat 7 ETM scene, acquired on 27 August 2000, was used in this study.The image was floated along the path.The original scenes for the floating were WRS 188/17 and 188/18.The image was rectified to the Finnish national coordinate system and resampled to a pixel size of 25 m.Land cover classes other than forestry land were masked out using digital map data.The image was cut into a smaller subset that covered the study area.Only channels 1-5 and 7 were used in the analysis, channel 6 was excluded because of its poor spatial resolution.

High Altitude Aerial Photography
High altitude aerial panchromatic photographs at a scale of 1:60 000 were acquired in 1999.The imaging altitude was approximately 9100 m.The aerial photographs were digitized and saved as single band TIFF files.Finally, the images were orthorectified using 25-meter resolution raster DEM and ground control points, resampled to a pixel size of 1.0 m, and combined in a photo mosaic covering the study area.This photo mosaic was then used as the basis for further image analysis.
A four channel "synthetic" image was composed using the spectral and textural properties of the image.Channel 1 pixel values were defined as averages of the original pixel values calculated per polygons that were delineated by automatic image segmentation.A modified implementation of the "Image Segmentation with directed trees" algorithm     (Narendra and Goldberg 1980) was applied in the segmentation.Detailed description of the method has been published in Pekkarinen (2002).
Channel 2 pixel values were defined as standard deviations of pixels within image segments using the same segments as in channel 1. Image segments were applied in extracting the spectral average and standard deviation features in order to extract stand or sub-stand spectral averages and standard deviations of image pixel values (i.e. in order to retain the original spectral variation between stands).The assumption that image segments retain more of the original spectral variation than e.g.square raster windows in extracting image features was based on the results by Pekkarinen and Tuominen (2003).
Additionally two texture features based on the image gray-level co-occurrence matrix (Haralick et al. 1973, Haralick 1979) were extracted from aerial photograph: Channel 3 was defined as Haralick contrast (Eq. 1) calculated from 20 m × 20 m pixel window.
Channel 4 was defined as Haralick local homogeneity (Eq.2) calculated from 20 m × 20 m pixel window.
In equations 1 and 2: p(q, r) = M(q, r) / Nt Nt = total number of possible pairs in the image window M(q, r) = the co-occurrence matrix of the requantified pixel values q and r The requantified pixel value c was defined as (Tuominen and Pekkarinen 2005): where p = original pixel value; wi = (max -min)/ cnt; min, max = the minimum and maximum pixel values; cnt = number of requantification classes.
The selection of the aerial image texture features was based on the results by Tuominen and Pekkarinen (2005) where the correlations between different aerial image features and forest attributes were studied.The standard deviations of pixel values, Haralick contrast and local homogeneity were among the image features correlating moderately or well with forest attributes.The size of the raster window in extracting Haralick contrast and local homogeneity was based on results by Holopainen and Wang (1998), where 20 m × 20 m appeared to be near-optimal window size for extracting aerial photograph features for forest inventory purposes.The 4-channel image composed of the high altitude aerial photograph features was resampled to similar resolution as the Landsat 7 ETM image (25 m).The aerial photograph channels are presented in Fig. 2 (channel 1), Fig. 3 (channel 2), Fig. 4 (channel 3) and Fig. 5 (channel 4).The figures give an idea how different ground features are illustrated by different image features.The basic assumption behind using these features is that if certain forest attributes can not be discriminated when using one image feature it may be possible to discriminate them using another image feature, thus cumulating the information over different image features.

Field Material
The sample plot data of the 9th NFI of Finland were used in this study.Only those field plots of the study area which were located within a single stand and measured in 1998 or later were included.The sample plots of clear-cut areas were also located and excluded in the case of temporal discrepancy between the field measurements and the images.The total number of the sample plots in the study area was 691 (341 on forestry land) for the satellite image and 698 (346 on forestry land) for the mosaic of high altitude aerial photographs.The difference in the amounts of the plots is due the clear cuttings and the clouds in the satellite image.
A measure of canopy cover was acquired in the field as an additional variable to the standard NFI sample plot data.These measurements were carried out by taking photographs upwards from the ground using a digital camera.The photo-graphs were acquired from 7 points; the sample plot centre and 6 other points located at 8 meter distances from the sample plot centre in compass directions of 0°, 60°, 120°, 180°, 240°, 300°.The camera field of view used in acquiring the photographs was approx.54° horizontally and 42° vertically.The canopy cover photograph acquisition is illustrated in Fig. 6.The images were processed to binary format, where the canopy cover was represented by 1 and the background (sky) was represented by 0. The measure of canopy cover for the sample plot was calculated as the average proportion of canopy cover pixels of the 7 photographs.
The main forest characteristics of the study area are presented in Table 1.Some of the plots are treeless, so the minimum values of the characteristics in field material are 0.

Estimation Procedure
The applicability of the extracted image features was evaluated by using them in the estimation of actual forest attributes for the field sample plots.The k nearest neighbor (k-nn) estimator (e.g.Kilkki and Päivinen 1987, Muinonen and Tokola 1990, Tomppo 1990) was applied in the estimation.The k nearest neighbors were determined by the Euclidean distances between the observations in the n-dimensional feature space, where n stands for the number of image features.The stand variable estimates for the sample plots were calculated as weighted averages of the stand variables of the k nearest neighbors (Eq.3).Weighting by inverse squared Euclidean distance in the feature space was applied.The justification for weighting was that it was shown to reduce the bias of the estimates (Altman 1992).

ˆ( )
where ŷ = estimate for variable y y i = measured value of variable y in the ith nearest neighbor plot d i = Euclidean distance to the ith nearest neighbor plot k = number of nearest neighbors The alternative data combinations tested in the estimation were: 3) Landsat 7 ETM channels 1-5 and 7 and high altitude aerial photograph texture channels (2-4).
For the estimation procedure the aerial photograph features were scaled to a range of 0-255 (8-bit integer), which corresponds to the satellite image data type.The original range of the texture features was considerably smaller, and without scaling these features would have had smaller weights in the estimation procedure compared to satellite image features.All channels had equal weights in the estimation.The local spectral average of aerial photograph (channel 1) was left out of the combination of satellite and aerial images.This was based on the fact that it had relatively high correlation with the satellite image pixel values (Table 2.).Also the number of the aerial image texture features was limited to three since the estimation procedure applied in this study allows maximally 9 input variables.
The estimation tests were carried out using a cross-validation technique, in which each sample plot in turn was left out from the set of reference plots and estimated with the aid of the remaining plots.The estimates of the field plots were compared with the corresponding measured values (ground truth) of the plots, producing a measure of their accuracy as root mean square errors, RMSE (Eq.4).Furthermore, the estimation biases were examined (Eq.5).Based on the results of the cross-validation tests the value of k was set as 5.A maximum geographical horizontal (between the estimated pixel and the nearest neighbors) distance was applied in the k-nn estimation.Based on the results of the cross-validation the distance was set to 40 km.

Results and Discussion
Cross-validation tests of the k-nn estimates acquired using different data combinations show that the Landsat 7 ETM image based estimation generally resulted in higher accuracy than high altitude aerial imagery.The Landsat ETM based estimates had 7.0-11.8%lower RMSE in the forest attributes compared to high altitude photograph based estimates.On the other hand, using the high altitude photographs resulted in relatively accurate estimates considering the poor spectral resolution of the original data.The RMSE-values and biases of the forest attribute estimates derived using different image data are presented in Table 3. Combination of the satellite image spectral features and aerial image textural features improved the estimation accuracy of most forest attributes, although the improvement generally was relatively low.Exceptions here were the canopy cover, where Landsat ETM data alone resulted in similar accuracy of estimates as the combination of the images, and the volumes of pine and birch, where Landsat ETM data alone resulted in the most accurate estimates.The effect of bias seems to be marginal in contributing the estimation errors, thus RMSE consists mainly of random error component.
Based on the results shown in Table 3 it can be noted that when estimating forest attributes that are associated with stand structure of tree (canopy) size, such as DBH, height, basal area or volume, the use of image texture improves the estimation accuracy.In other forest attributes, such as tree species composition, the texture features do not improve the estimation accuracy.
When producing map from estimates with the help of satellite imagery, it is typical that the image coverage has gaps due to clouds.The high altitude photograph features could be used as a simple solution for dealing with these gaps.The unit price of the high altitude imagery is significantly higher than the price of satellite imagery, approx.100 €/100 km 2 in Finland, compared to approx.600 € per basic Landsat scene (180 km × 180 km).
The total growing stock volume estimates of this study had a relative RMSE of 74.5% for Landsat ETM imagery, 83.3% for high altitude aerial photograph and 72.9% for the combination of the images.For example, Poso et al. (1999) have reported volume RMSEs of 73-81% with Landsat 5 TM imagery, 71-74% with IRS-1C PAN imagery and 74% with digitized aerial photographs at sample plot level in similar conditions as this study.Franco-Lopez et al. (2001) have reported volume RMSE of approximately 65% in USA (Minnesota) using a larger field material than Poso et al. (1999).Thus, the general accuracy of the estimates of this study seems to be quite typical under the circumstances.
The use of the spectral values of panchromatic aerial photographs in estimation is not always advisable because of the distortions caused by the sun-object-sensor geometry.These distortions often require some kind of correction in the spectral values of the image.The alternative methods that have been suggested for the correction of aerial photographs can be divided into methods based on physical or empirical modeling.Physical models aim at theoretical modeling of the mechanism of the bidirectional reflectance distribution function (BRDF) (e.g.Nilson and Kuusk 1989, Li and Strahler 1992, Leblanc et al. 1999).Empirical radiometric calibration models have been more common approach for forest inventory applications (e.g.King 1991, Holopainen andWang 1998).The use of image correction methods in forest inventory applications has been rare because of the problems associated with the alternative image correction methods.Physical modeling of BRDF requires radiometrically calibrated sensors (Pellikka et al. 2000).They are also very complex in forest vegetation.The empirical modeling is generally less complex, and they can Their main problem is that they often require a priori knowledge of the vegetation of the inventory area (e.g.Li and Strahler 1992, Holopainen and Wang 1998, Leblanc et al. 1999, Pellikka et al. 2000).Tuominen and Pekkarinen (2004) have presented a local radiometric correction method that is based on the idea of adjusting the BRDF -affected intensities of aerial photographs to the local intensities of reference imagery.The reference imagery has to be independent from the BRDF, or the effect of BRDF has to be insignificant.Satellite images typically fulfil this requirement.The method was not applied here, because it was necessary to study the estimation results using the high altitude aerial photograph independently.On the other hand, when studying the combination of satellite imagery and aerial photograph, the spectral channels were extracted from the satellite imagery and only texture channels of aerial photographs were utilized, thus eliminating the need for spectral correction.
The presented way of combining satellite and aerial photo material within the k-nn procedure may not be the optimal one.Since the optimal weights of the input variables are usually not known in the estimation, it is not possible to control their contribution in determining the nearest neighbors.For example, even if the texture features indicated dissimilarity in the forest structure, the similarity of a higher number of spectral channels could overrule the effect of the texture features.Thus plots that do not have similarity in their forest attributes can be selected as nearest neighbors.This problem can be avoided using techniques that can generate optimal weights for the features or are able to select the best-performing subset of the features for the actual analysis.For example, sequential forward selection has been applied for analyzing, which features significantly contributed to the estimation accuracy and what was the appropriate number of features needed for accurate estimation results (e.g.Tuominen and Pekkarinen, 2005).Another option in utilizing both data in a different way would the combining pre-stratification with the k-nn estimation.For example, aerial photograph texture features can be used in pre-stratification of the sample plots into a number of strata that are assumed to have less structural variation than the entire plot material.In the k-nn estimation the nearest neighbors are then selected within the same stratum on the basis of satellite channel values.
As noted in Table 2, the local spectral averages of aerial photographs are highly correlated with the satellite image channels.On the other hand, the satellite image channels are highly correlated with each other, as also are aerial photograph texture features.Using a high number of image features that contain approximately the same information gives them a high weight in the estimation, while their performance in the estimation often is not known a priori.Thus, it may be advisable to use, for example, principal component analysis, in which the variation of the original input variables with high mutual correlation can be retained in smaller number of principal components which do not correlate with each other.
measured value of variable y on plot i ŷi = estimated value of variable y on plot i n = number of plots

Table 1 .
The main forest characteristics of the study area (based on the field plots).

Table 2 .
The correlation matrix of the image features.

Table 3 .
RMSE and bias values of forest estimates derived with different image materials.