Performance of the DL filter
A distortion corrected image dataset of 50 frames × 16 FOVs was denoised through the DL filter trained by the slow scan images. Also, the result of the DCFI image-trained DL filter is shown in this section for comparison. Figures 3a–c show a representative rapid scan image in the validation image dataset and its two corresponding references, i.e., the DCFI image and the slow scan image, respectively. The original image set (Fig. 3a) was denoised through the DL filter trained by using the DCFI image set (Fig. 3d DLF-DCFI) or the slow scan image set (Fig. 3e DLF-Slow). The results of identical denoising for other FOVs are shown in Supplementary information B. In Figs. 3d and e, the DLF-DCFI remained blurred while the statistical noise and the unidirectional blurring were successfully removed in the DLF-Slow, as clearly represented in the contour plots match-up (Fig. 3f). This is expectable because the DCFI images originally contained the blur. As noted in the previous section, the advantage of the use of the DCFI images as a reference data is that since there is no distortion between the rapid images and the DCFI images, the DL network can be trained without the distortion correction. Because the distortion correction developed in this study requires calculating local image cross-correlation between the deformed rapid scan image and the corresponding part of slow scan image, distortion correction of images with too poor electron signal or those containing a significant amount of noise would be challenging. The DLF-DCFI might be required for denoising such images at the expense of removing the rapid scanning-specific artifacts. The superior performance of DLF-Slow is demonstrated by the comparison of FFT patterns from these images, where a vertical streak (circled by broken lines) in the original image was removed in the DLF-Slow unlike in the DLF-DCFI. Since the direction of FFT spectra appears perpendicular to the lines in the original image, the unidirectional blurring, which extends in the horizontal direction (the scanning direction of the STEM), has appeared as an intense vertical feature in the FFT of Figs. 3a,b and d. Again, the DLF-Slow has the least vertical feature as well as the least high frequency component indicating the statistical noise. The removal of the statistical noise by DLF-Slow is not trivial since the slow scan images, which were used as the reference images in the training process, contained the high frequency component as marked by the broken line in Fig. 3c. Actually, the statistical noise seems to be more or less evenly distributed in slow scan images, and the noise might be reduced through the DLF-Slow, suggesting that output of the DLF-Slow could be even more noise-free than the reference images. Likely this is because the DLF was tuned so that the mean squared error between “all” the output images and the reference images was minimized, i.e., common features of the reference images selectively survive through the filtering process, resulting in the elimination of random components such as the statistical noise.
We evaluated the PSNR, which is a typical parameter representing an image quality, relative to the slow scan images for the rapid scan, the DCFI, the DLF-DCFI, and the DLF-Slow images. Here, the signal intensity for each image was scaled so that the minimum and the maximum are 0 and 65,535, respectively, prior to the calculation. Figure 4a shows the average of PSNR for all four types of images with an error bar indicating the standard deviation of the PSNR is given. The figure represents that the DLF-Slow images showed the highest PSNR among the four, increased by about 7% from the original images on average. Although the noise was successfully removed by the DLF-Slow operation as shown in Fig. 3e, the PSNR was not largely increased so much. This is because of the background intensity difference between the slow scan images and the DLF-Slow images as the line profiles of two images demonstrated in Fig. 4b. The signal intensity of the DLF-Slow image (orange) matched with that of the slow scan image (red) at nearly all dips (light-blue highlighted) that represent the dislocations. At the rest of locations, however, two profiles were not well aligned. By referring to the intensity profile of the DCFI image (green), this result would come from the fact that the background intensity of rapid scan image originally differed from that of the slow scan image. Note that in the DCFI image, the statistical noise has been reduced by averaging 50 frames of the rapid scan image, thus the profile of the DCFI image indicated the net signal obtained by the rapid scanning.
The background difference might be caused by the difference in the BF detector response between the rapid scan at 100 ns/pixel and the slow scan at 5 (upmu)s/pixel. The detector’s response could vary depending on the dose of an incident electron beam. If that is the case, the signal intensity profile of these two conditions would not be equal. To clarify the reasons why the detector response appears to be unlike is beyond the scope of this manuscript.
The DLF-Slow operation seems to increase the signal intensity over the entire area for compensating the intensity difference in the low signal intensity section as representatively seen in the line profile after 200 pixels in Fig. 4b. This means that the signal intensity of DLF-Slow processed images would match with that of slow scan images only within a limited signal intensity range. In fact, the PSNR calculated from the DLF-Slow images recorded within a limited signal intensity range of 20,000–40,000 was increased by 13% from the rapid scan images in average. The PSNR calculation outside the signal intensity range (0–20,000 and 40,000–65,535), on the other hand, showed only a 2% increment. The signal intensity matching within a limited range suggests that datasets with various signal intensity ranges should be used in the training process to avoid over-compensation. In this study, the training dataset was acquired in a way that both the minimum and maximum signal intensities fit into a certain range rather than using the entire 16bit dynamic range of the detector, for example, the minimum and the maximum intensities in one area were 7000–10,000 and 17,000–20,000, respectively, while those in another area were 15,000–20,000 and 30,000–40,000. Therefore, to improve the PSNR, it might be needed to acquire images with varying the signal intensity range or non-linearly adjust the contrast such as adaptive histogram equalization before training. Besides the small improvement of the PSNR value, the microstructure features of interest, dislocations in this case, were successfully and sufficiently recovered by the DLF-Slow operation, indicating that the DL-based filter trained by slow scan images showed satisfactory performance.
To further evaluate the image quality, this study discusses also the line profiles of FFT spectrums in a horizontal direction as shown in Fig. 4c, where the natural logarithm of power spectrum (squared amplitude of FFT processed intensity (F(I)) is evaluated. In this study, all the FFT spectra are obtained after applying the Hanning window to the original images. The line profiles of the DCFI image and the DLF-Slow image demonstrated discrepancy from the low frequency region. The more rapid fall of in the Fourier spectrum along the slow scan direction of the DCFI corrected image relative to the DLF-slow scan corrected and reference images demonstrates the ability of the DLF algorithm to recover spatial information lost in fast scanning. The FFT spectrum of DLF-Slow image, on the other hand, almost coincided with the slow scan image’s one within the wave number of about 120 (upmu)m−1, which is about 8.3 nm in the real space and about 4 pixels in the original image. Since most of the width of dislocation dips in the original images was more than 4 pixels, the DLF-Slow could successfully reproduce the signals with a high spatial resolution compared to the DCFI.
From the above discussion, it was shown that the DL filter supervised by slow scan images could remove not only the statistical noise, but also the unidirectional blurring, which would come from the delay of the detector response and was difficult to remove even by using the well-known noise filter, BM3D. The DLF-Slow obtained in this study could reproduce the signals from poor signal in the rapid scan images. The DL-based noise filter enabled us to acquire STEM images both fast and accurately.
Application to in-situ observation
It is well known that the performance of DL network depends on the training data. Therefore, sometimes its performance is limited in a specific case. In this section, the DLF-Slow developed in this study is applied to in-situ heating results for further validation of its generality.
We employed a 20%-cold-rolled FCC poly crystal sample of A1050 grade pure aluminum as the in-situ heating specimen. The sample preparation and equipment are described in “Method” section. The sample was placed on a MEMS heater chip, which can instantaneously raise and lower the temperature. Then, we performed continuous acquisition of 2000 frames of 512 × 512 pixels image with the rapid scanning of 100 ns/pixel for 6 series of in-situ heating observation; raising to 90 °C, 1 °C/s temperature rising in the range of 90–150 °C, 150–210 °C, 210–270 °C, 270–330 °C and 330–400 °C. In each experiment, the temperature was kept constant after it reached the designated temperature. The slow scan images of 5 (upmu)s/pixel were obtained after each experiment for comparison.
Figure 5 shows the results of continuous temperature rising of 90 °C to 150°C as a representative case. The other results are shown in Supplementary information C and those movies are available in Supplementary movie. In Fig. 5, it is clear that the noise contained in the rapid scan images was removed by the DLF-Slow and the dislocations are clearly visible in the DL filtered images. The final state was also similar to that obtained by the slow scan; a 14% increase in PSNR. Therefore, the superior performance of DLF-Slow was also demonstrated for in-situ observation in this temperature range.
Single FOV training
The DLF-Slow has evidenced high performance for eliminating the statistical noise and the unidirectional image blurring. In this study, the noise filter was made by training either 50 frames × 50 FOVs sets of the rapid scan images or the corresponding view of the slow scan images. Although this dataset was enough for denoising, a possibility that only single FOV might be enough arises if we just desire to remove the statistical noise and the blurring, because these artifacts would be included even in the single FOV. Since vast majority of nanoscale phenomena, e.g., dislocation emission from a crack tip, are site-specific, the observation area is sometimes limited in a few FOVs. The DL noise filter would be required even in in-situ observation that is difficult to obtain the training data. In the following, we examine whether the artifacts can be removed by training with multiple frames taken from only a single FOV or not.
A new set of data including 2500 frames was newly acquired. Figure 6a shows the new distortion-corrected training dataset acquired under the same condition as the previous dataset except for the magnification and the pixel size (here were 99,000 and 1.7 nm/pixel, respectively.). We newly trained the U-net by using up to 2500 frames of the rapid scan images, i.e., 100, 500, 1000 and 2500 frames. In the rest of this paper, each trained DL filter is referred to as DLF-100, DLF-500, DLF-1000, and DLF-2500, respectively. The training conditions were also the same as above section except the learning rate. Since the training data set shown in Fig. 6a contains a lot of dislocations and entangle of those, the U-net risks over-fitting, resulting in a loss of denoising capability overall. To avoid over fitting, we set the learning rate from 0.001 to 1 × 10−5 for DLF-100, DLF-500, DLF-1000, or 5 × 10−6 for DLF-2500. Figure 6b demonstrates the sample images after applying each filter, where each FFT spectrum is also shown. In the FFT diagrams, the statistical noise was removed even in DLF-100, although the unidirectional blurring was remained. The filtered image became more evident with increasing the number of training data. The vertical spectrum representing the blurring, however, could not be removed even in the DLF-2500.
The line profiles of FFT diagrams are also evaluated as shown in Fig. 6c. The DLF-2500 image is chosen as a representative case of the single FOV training. In Fig. 6c, the line profile in DLF-2500 did not match that of the slow scan after passing the low frequency region of about 48 (upmu)m−1. Since the matching was lower than the DLF-Slow, which almost coincided up to around 120 (upmu)m−1, the ability for reproducing slow scan-equivalent-image-quality became inferior if only the single FOV was used for the training. The reason why the single FOV training was less effective than the multi FOVs training might be because of the dislocation-shape dependence or dislocation-signal-intensity dependence of unidirectional blurring as has been shown in the contour plot of Fig. 1d. During the training process, the DL network is trained so that it reproduces the slow scan images from the corresponding rapid scan images. The transformation function to modify the difference in background intensity would be educated in this process. The background intensity of rapid scan images, however, contains the unidirectional blurring whose shape depends on the shape of dislocations or the intensity of dislocation signal. Therefore, it would be needed to learn multiple patterns of dislocation signal to eliminate the blurring. Although the single FOV training could partially eliminate both the statistical noise and the blurring if the number of frames was increased, the denoising performance would likely be limited because the DL network would not learn the dislocation-shape-dependence or the dislocation-signal-intensity-dependence of the blurring.
In conclusion, the single FOV trained DL filter could remove the image blurring, although the performance was inferior to 50 FOVs trained one. This study indicated that if more FOV were included for training, the performance of DL noise filter would become better. In the case where obtaining many different FOVs are practically challenging, the DL filter will still be worth training to eliminate the statistical noise. This study has also demonstrated that the poor electron signals caused by the rapid scanning in the STEM could be recovered by using DL filtering operations. Therefore, the DL-based technique could be applied mostly in in-situ observation by using the STEM, where temporal resolution is required.
Since the STEM is generally more tolerant of thicker samples, in-situ observation by using the STEM can capture dynamic evolution of phenomena with less effects of surface. The STEM also enables us to obtain the chemical components and bonding state, etc., at the same time when we observe a texture of a sample, providing a large amount of data even in one experiment. Such a big data-like data acquisition could be a standard in the current data-driven materials science. Because of these reasons, the STEM would be a better tool compared to the CTEM for operand observation utilizing data science. Therefore, the denoising technique developed in this study could be applied to determine which area we should further examine by advanced analysis methods, e.g., electron energy loss spectroscopy, as well as to improve the quality and accuracy of STEM images taken with a high temporal resolution. This paper has demonstrated an essential approach for electron microscope observation, which is difficult to establish in the conventional TEM-based way such as observing a dynamic evolution of dislocation structures in a thick sample under external stimuli.