J. Phys. Soc. Jpn. 86, 123601 (2017) [4 Pages]

Transfer Learning to Accelerate Interface Structure Searches

+ Affiliations
1Institute of Industrial Science, The University of Tokyo, Meguro, Tokyo 153-8505, Japan2Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan3Center for Materials Research by Information Integration, National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan4RIKEN Center for Advanced Intelligence Project, Chuo, Tokyo 103-0027, Japan

Interfaces have atomic structures that are significantly different from those in the bulk, and play crucial roles in material properties. The central structures at the interfaces that provide properties have been extensively investigated. However, determination of even one interface structure requires searching for the stable configuration among many thousands of candidates. Here, a powerful combination of machine learning techniques based on kriging and transfer learning (TL) is proposed as a method for unveiling the interface structures. Using the kriging+TL method, thirty-three grain boundaries were systematically determined from 1,650,660 candidates in only 462 calculations, representing an increase in efficiency over conventional all-candidate calculation methods, by a factor of approximately 3,600.

©2017 The Author(s)
This article is published by the Physical Society of Japan under the terms of the Creative Commons Attribution 4.0 License. Any further distribution of this work must maintain attribution to the author(s) and the title of the article, journal citation, and DOI.

Interfaces play crucial roles in materials properties. The abundant interfaces in polycrystalline materials — namely, the grain boundaries (GBs) — determine electronic and ionic conductivities and mechanical strengths.14) Furthermore, interfaces in thin films often endow new functions such as the emergence of two-dimensional electron gases or superconductivity.59) In addition to such positive effects, negative effects such as embrittlement are also known to originate from interfaces.10,11)

Large interface effects are caused by atomic structures that differ significantly from those in the bulk. Determining the central structures at the interface from which material properties originate and understanding the relationships between structures and these properties are among the most important tasks in materials research. However, the determination of interface structures remains challenging because of the high amount of geometrical freedom at the interface.12) For one type of coincidence site lattice GB in simple metals — a very simplified Σ GB — the number of candidate configurations approaches \(10^{3}\)\(10^{5}\). In cases such as these, it is necessary to find the most stable structure from the data space of all candidate structures through a process of all-candidate calculation. Some methods applying the aid of machine learning techniques such as kriging to accelerate interface structure searching have recently been reported.1316) Kriging is an effective interpolation method based on a Gaussian process and Bayesian optimization to determine the optimal point in a multi-dimensional data space.17) In the kriging process, which is shown schematically in Fig. 1(a), a prediction model, namely “predictor”, is constructed using the Gaussian process regression supplemented with the Bayesian optimization. The predictor provides a next search point and is sequentially updated during the structure search. This Gaussian process regression supplemented with the Bayesian optimization, namely kriging, allows one to find the most stable structure [indicated by the star symbol in Fig. 1(a)] efficiently with a fewer calculation than in conventional all-candidate calculations. Kriging is a very powerful method for searching for the optimum point in a multi-dimensional dataspace and has been successfully applied in interface structure determinations of metals13) and oxides14) as well as in surface structure determination.18)

Figure 1. (Color online) Schematics of (a) kriging and (b) combined method with transfer learning (kriging+TL). (a) In kriging, each GB has an individual three-dimensional search space. (b) By contrast, a common 74-dimensional search space is used in the kriging+TL method.

However, the current kriging method requires separate constructions of predictors for different interfaces, as shown in Fig. 1(a), in which the predictors indicated in red, blue, and yellow must be constructed for GB1, GB2, and GB3, respectively. Thus, kriging by itself is still inefficient in performing systematic determinations of many interface structures. If the speed of kriging could be improved and a feasible predictor constructed, the process of structure determination of an interface could dramatically accelerate. Such enhancement would facilitate systematic investigation of interfaces, and would pave the way for a deeper understanding of the mechanisms from which interface properties arise.

In this study, kriging was combined with another machine learning technique, transfer learning (TL), to accelerate the searching process. TL improves learning efficiency by solving a certain task using data and learning results from other related tasks.19) The concept of kriging combined with TL, kriging+TL, for the interface structure searching is schematically illustrated in Fig. 1(b). In the kriging+TL process, a common predictor (green) is used for all tasks, and the data space obtained by a given kriging is transferred to the next kriging. Thus, the data space is shared with all GBs and the predictor gradually become more “intelligent” during the GB calculation process. Thus, the speed, robustness, and feasibility of interface determination can be improved using the kriging+TL method. We demonstrated here that the kriging+TL process is a very powerful method for systematic determination of interface structures.

In this study, 33 Fe [110] symmetric tilt GBs whose structures have been already reported20,21) were systematically investigated. Static lattice calculations using the general utility lattice program (GULP) code22) were performed to optimize the structures and calculate the lattice energy of a supercell GB. Finnis–Sinclair-type interatomic potentials were employed.23)

To find the most stable structure, three-dimensional translations with 0.1– Å steps were applied to one side of the grain. The schematics of the GB model, for instance \(\Sigma3(11\bar{1})\), and the definitions of the x-, y-, and z-directions in the existing model are shown in Fig. 2. In particular, the y-direction translations along the intergranular distance ranged over 0.9–1.4 Å.

Figure 2. (Color online) Schematic of the GB model. Here, two crystals for the [110] symmetric tilt Fe \(\Sigma3(1\bar{1}1)\) GB are shown.

The GB energy, \(E_{\text{GB}}\), was estimated using \begin{equation} E_{\text{GB}} = \frac{\mathit{TE}_{\text{GB}} - \mathit{TE}_{\text{bulk}}}{2A}, \end{equation} (1) where \(\mathit{TE}_{\text{GB}}\) and \(\mathit{TE}_{\text{bulk}}\) are the total lattice energies of the supercells of the GB and bulk, respectively, and A is the GB area. As the supercell contains two GBs, it was divided into two.

Kriging is a Gaussian process of non-parametric regression analysis based on Bayesian statistical methods. In the kriging process carried out here, five initial configurations were randomly selected for the first single GB [GB1 in Fig. 1(b)] and then structure optimizations and energy calculations for the selected configurations were performed. Then the prediction model, namely predictor, is constructed based on the Gaussian process supplemented with the Bayesian optimization. The predictor is described by the Gaussian kernel, and the mean and standard deviations at each unobserved point in the search space were obtained from a probability distribution based on the Gaussian process, and the z-score, \(Z_{i}\), was estimated to find the next searching point: \begin{equation} Z_{i} = \frac{E_{\text{min}} - E_{i}}{\sqrt{\sigma_{i}}}, \end{equation} (2) where \(E_{\text{min}}\) is the minimum GB energy at the given (\(\underline{i}\)-th) moment and \(E_{i}\) and \(\sigma_{i}\) are, respectively, the mean and standard deviation at the i-th point in the search space. The point with the maximum z-score, which was likely to have a global minimum GB energy at the moment, was chosen as the next sampling configuration, and the structure and energy calculations were then performed. These operations were repeated until the convergence criteria were satisfied. In this case, the structure searching continued until five structures with the lowest identical (within \(\pm 0.005\) J/m2) GB energies were found for a single GB. Until the convergence criteria were reached, the kriging algorithm continued searching for the lowest energy configuration.

As shown schematically in Fig. 1(b), the data space obtained by a given kriging is transferred to the next kriging in the kriging+TL method. Thus, the data space was shared with all GBs. Because the 74-dimensional data space was common for other GBs, the data space became “smarter” by repeating the kriging operation. Thus, if the transfer learning was successful, the number of calculations for the respective GBs, namely l-, m-, and n-times for GB1, GB2, and GB3, respectively, became smaller (\(l' = l\), \(m' < m\), and \(n' < n\)). The random selection was only performed at the initial GB and was not performed for the latter GBs.

The misorientation angles, Σ values, GB planes, lattice constants, number of atoms in the supercell, and number of candidate configurations for the employed GBs are listed in the Supplemental materials.24) The total number of candidates for the 33 GBs was 1,650,660, meaning that it was necessary to perform 1,650,660 computations to determine the stable structures of all of the GBs.

Details on kriging for searching GB structures have been described previously13) from the results of studies in which kriging was performed in three-dimensional space with x, y, and z translations. Although such “three-dimensional kriging” is adequate when searching for a single GB, each GB's three-dimensional space cannot be shared with other GBs [Fig. 1(a)], and therefore in this study the data space was expanded to a 75-dimensional data space constructed using 74 dimensions of search space and one dimension for the GB energy; a schematic of this expansion is shown in Fig. 1(b) and descriptors for the 74 dimensions are listed in Table I,. Original values of the descriptors listed in Table I, were obtained from the interface models, and square, inverse, exponential, or exponential inverse of them (except for inverse of zero) were also generated and used as descriptors. Then, 74 descriptors in total were used. We used such relatively large number of descriptors because of following reasons; 1) a larger number of suitable descriptors would be helpful to describe the complex data space, and 2) the computational time for regression is not largely changed by the number of the descriptors. In three-dimensional kriging, each data space is independent only for respective GBs [Fig. 1(a)]; by using 74-dimensional kriging, all GBs can be considered to be in the same data space, making TL possible, as shown in Fig. 1(b).

Data table
Table I. List of descriptors for regression analysis. Square, inverse, exponential, and exponential inverse of the original descriptors were considered. A total of 74 descriptors were generated.

The feasibility of 74-dimensional (74D)-kriging was confirmed by calculating the \(\Sigma3(1\bar{1}1)\) GB [Fig. 3(a)]. Based on the candidate calculations, this GB was found to have 17,466 initial configurations and an accurate GB energy of 1.51 J/m2. The 74D-kriging process was repeated ten times for the same GB, with an average of 11.2 calculations, including five initial random samplings, required to reach convergence. The convergent GB energies were all found to be 1.51 J/m2. Furthermore, a previously reported structure [orange circles in Fig. 3(a)] agrees well with these results,20,21) confirming that the 74D-kriging worked correctly with an efficiency 1,500 times greater than conventional all-candidate calculations.

Figure 3. (Color online) Atomic structures of (a) \(\Sigma3[110](11\bar{1})\) calculated using the 74-dimensional kriging, and (b) \(\Sigma3[110](11\bar{2})\) calculated using kriging+TL. Orange circles represent the reported previously atomic structure.20,21)

The above 74D-kriging process was then combined with a TL process in which knowledge obtained in previous calculations is transferred to succeeding calculations [as shown schematically in Fig. 1(b)]. In the previous 3D-kriging case described above, the kriging process used individual 3D data spaces with essentially independent predictors for each GB [Fig. 1(a)]. In the kriging+TL process, the predictor obtained after the GB1 calculation was preserved and transferred to the next GB2 calculation, by updating the predictor. In this manner, the predictor continuously transferred to succeeding GBs analyses. When the transfer succeeded, the calculation efficiency should have been improved; in other words, the number of calculations needed to reach convergence, l, m, and n and \(l'\), \(m'\), and \(n'\) for kriging and kriging+TL, respectively, became \(l = l'\), \(m > m'\), and \(n > n'\) [Fig. 1(b)].

In the kriging+TL process, the kriging was performed using the order of the Σ value, namely, from \(\Sigma3\) to \(\Sigma129\). Figure 4(a) shows the results of this process. The convergence rate was calculated from the number of calculations with and without TL (i.e., as the number of calculations required with TL divided by the number required without TL), with smaller values indicating faster convergence. The predictor constructed for the initial \(\Sigma3(1\bar{1}1)\) GB calculation was transferred to the calculation for the next \(\Sigma3(1\bar{1}2)\) GB and the stable GB structure was determined in an average (over ten trials) of 12.9 calculations. Without TL, the \(\Sigma3(1\bar{1}2)\) GB structure was determined in 17.4 calculations, indicating a 35% acceleration through the use of TL. The calculated \(\Sigma3(1\bar{1}2)\) GB structure is shown in Fig. 3(b) in comparison with a previously reported structure (orange circles).20,21) It is clearly seen that the converged structure obtained using kriging+TL closely resembles the previous structure. Furthermore, the history of the selected point in the kriging+TL for \(\Sigma3(1\bar{1}2)\) GB structure search indicates that the most stable structure was found at the first selection using kriging+TL; further selections were only necessary to meet the strict convergence criteria. These results indicate that TL effectively decreased the kriging trial time and that kriging+TL successfully determined the GB structure with improved efficiency.

Figure 4. (Color online) (a) Convergence speed ratio attained by transfer learning (TL). The convergence speed ratio corresponds to the number of calculations using TL divided by the number without TL; thus, smaller values indicate faster searching. (b) GB energies as a function of tilt angle calculated by kriging+TL compared with previous study results.20,21) Individual points are the GB energies of the 33 GBs listed in Supplementary material.24)

The predictor was then continuously updated and applied to the other 31 GBs. As shown in Fig. 4(a), kriging+TL always had a faster (up to 70% acceleration) convergence rate than that of kriging without TL, suggesting that the predictor gradually became more “intelligent” during the GB calculation process and transferred its knowledge to successive krigings.

Figure 4(b) plots the calculated GB energies of the 33 GBs as a function of the misorientation angle. To confirm the validity of the above calculations, these are compared to GB energies reported in previous molecular dynamics simulations.20,21) Because of the differences in the empirical potential used, the absolute GB energies differed from the those in previous reports.19) However, the trend seen in the GB energy (convex shapes with some cusps) reproduces those seen in previous work. This clearly indicates that using TL provided acceleration while maintaining robustness. Even though there was a total of 1,650,660 candidate structures for the 33 GBs, kriging+TL required only 462 computations to determine the actual 33 GB structures. In other words, the proposed kriging+TL method was approximately 3,600 times more efficient than conventional all-candidate calculation method, which in turn is three times more efficient than kriging without TL.

In summary, kriging+TL, a powerful combination of machine learning techniques, was shown to accelerate interface structure searches. It was confirmed that in the kriging+TL process the prediction model, or predictor, became increasingly more intelligent by sequentially transferring knowledge between process iterations. The proposed kriging+TL method was approximately 3,600 times more efficient than the conventional all-candidate calculation methods. It was demonstrated through this study that transfer learning dramatically improved the kriging convergence speed.

In crystalline materials, there is a very wide variety of interface types with differing atomic structures that can govern many types of material properties. The systematic investigation of a variety of interfaces is therefore indispensable in gaining an understanding of interface properties. To perform such systematic investigations, the kriging+TL presented here could be quite powerful, and we believe that the proposed method will pave the way for the investigation and design of material interfaces.


This study was supported by the Japan Science and Technology Agency–Precursory Research for Embryonic Science and Technology (JST-PRESTO, JPMJPR16NB 16814592), Japan, Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT; Nos. 25106003 and 17H06094), and the special fund of Institute of Industrial Science, The University of Tokyo (Tenkai5504850104).


  • 1 K.-S. Chang, Y.-F. Lin, and K.-L. Tung, J. Power Sources 196, 9322 (2011). 10.1016/j.jpowsour.2011.07.085 CrossrefGoogle Scholar
  • 2 D. Chen, X.-F. Zhang, and R. O. Ritchie, J. Am. Ceram. Soc. 83, 2079 (2000). 10.1111/j.1151-2916.2000.tb01515.x CrossrefGoogle Scholar
  • 3 J. P. Buban, K. Matsunaga, J. Chen, N. Shibata, W. Y. Ching, T. Yamamoto, and Y. Ikuhara, Science 311, 212 (2006). 10.1126/science.1119839 CrossrefGoogle Scholar
  • 4 K. Matsunaga, H. Nishimura, H. Muto, T. Yamamoto, and Y. Ikuhara, Appl. Phys. Lett. 82, 1179 (2003). 10.1063/1.1555690 CrossrefGoogle Scholar
  • 5 S. Thiel, G. Hammerl, A. Schmehl, C. W. Schneider, and J. Mannhart, Science 313, 1942 (2006). 10.1126/science.1131091 CrossrefGoogle Scholar
  • 6 H. Ohta, S. Kim, Y. Mune, T. Mizoguchi, K. Nomura, S. Ohta, T. Nomura, Y. Nakanishi, Y. Ikuhara, M. Hirano, H. Hosono, and K. Koumoto, Nat. Mater. 6, 129 (2007). 10.1038/nmat1821 CrossrefGoogle Scholar
  • 7 T. Mizoguchi, H. Ohta, H.-S. Lee, N. Takahashi, and Y. Ikuhara, Adv. Funct. Mater. 21, 2258 (2011). 10.1002/adfm.201100230 CrossrefGoogle Scholar
  • 8 A. Ohtomo and H. Y. Hwang, Nature 427, 423 (2004). 10.1038/nature02308 CrossrefGoogle Scholar
  • 9 J. Matsuno, N. Ogawa, K. Yasuda, F. Kagawa, W. Koshibae, N. Nagaosa, Y. Tokura, and M. Kawasaki, Sci. Adv. 2, e1600304 (2016). 10.1126/sciadv.1600304 CrossrefGoogle Scholar
  • 10 G. H. Li and L. D. Zhang, Scr. Metall. 32, 1335 (1995). 10.1016/0956-716X(95)00167-T CrossrefGoogle Scholar
  • 11 R. Schweinfest, A. T. Paxton, and M. W. Finnis, Nature 432, 1008 (2004). 10.1038/nature03198 CrossrefGoogle Scholar
  • 12 A. Sutton and R. Balluffi, Interfaces in Crystalline Materials Clarendon (Oxford University Press, Oxford, U.K., 1995). Google Scholar
  • 13 S. Kiyohara, H. Oda, K. Tsuda, and T. Mizoguchi, Jpn. J. Appl. Phys. 55, 045502 (2016). 10.7567/JJAP.55.045502 CrossrefGoogle Scholar
  • 14 S. Kikuchi, H. Oda, S. Kiyohara, and T. Mizoguchi, Phys. B: Condens. Matter, (2017) in press. Google Scholar
  • 15 T. Ueno, T. D. Rhone, Z. Hou, T. Mizoguchi, and K. Tsuda, Mater. Discovery 4, 18 (2016). 10.1016/j.md.2016.04.001 CrossrefGoogle Scholar
  • 16 S. Kiyohara, H. Oda, T. Miyata, and T. Mizoguchi, Sci. Adv. 2, e1600746 (2016). 10.1126/sciadv.1600746 CrossrefGoogle Scholar
  • 17 C. E. Rasmussen, Gaussian Processes for Machine Learning (MIT Press, Cambridge, MA, 2006). Google Scholar
  • 18 D. M. Packwood and T. Hitosugi, Appl. Phys. Express 10, 065502 (2017). 10.7567/APEX.10.065502 CrossrefGoogle Scholar
  • 19 S. J. Pan and Q. Yang, IEEE Trans. Knowl. Data Eng. 22, 1345 (2010). 10.1109/TKDE.2009.191 CrossrefGoogle Scholar
  • 20 R. J. Kurtz and H. L. Heinisch, J. Nucl. Mater. 329–333, 1199 (2004). 10.1016/j.jnucmat.2004.04.262 CrossrefGoogle Scholar
  • 21 E. Nakajima and M. Takeuchi, Tetsu-to-Hagane 86, 357 (2000). 10.2355/tetsutohagane1955.86.5_357 CrossrefGoogle Scholar
  • 22 J. D. Gale, J. Chem. Soc., Faraday Trans. 93, 629 (1997). 10.1039/a606455h CrossrefGoogle Scholar
  • 23 M. W. Finnis and J. E. Sinclair, Philos. Mag. A 50, 45 (1984). 10.1080/01418618408244210 CrossrefGoogle Scholar
  • 24 (Supplemental Material) List of calculated GBs in this study. In all, 33 [110] symmetric tilt GBs of iron were calculated. Misorientation angle, Σ value, interface plane, number of atoms in the supercell, size of the supercell, and number of candidate configuration are listed. Google Scholar