1Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan2Japan Synchrotron Radiation Research Institute (JASRI), Sayo, Hyogo 679-5198, Japan3Institute of Pulsed Power Science, Kumamoto University, Kumamoto 860-8555, Japan4Kyushu Synchrotron Light Research Center, Tosu, Saga 841-0005, Japan5Research and Services Division of Materials Data and Integrated Systems, National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan
Received December 4, 2018; Accepted December 25, 2018; Published February 19, 2019
Core-level X-ray photoelectron spectroscopy (XPS) is a useful measurement technique for investigating the electronic states of a strongly correlated electron system. Usually, to extract physical information of a target object from a core-level XPS spectrum, we need to set an effective Hamiltonian by physical consideration so as to express complicated electron-to-electron interactions in the transition of core-level XPS, and manually tune the physical parameters of the effective Hamiltonian so as to represent the XPS spectrum. Then, we can extract physical information from the tuned parameters. In this paper, we propose an automated method for analyzing core-level XPS spectra based on the Bayesian model selection framework, which selects the effective Hamiltonian and estimates its parameters automatically. The Bayesian model selection, which often has a large computational cost, was carried out by the exchange Monte Carlo sampling method. By applying our proposed method to the 3d core-level XPS spectra of Ce and La compounds, we confirmed that our proposed method selected an effective Hamiltonian and estimated its parameters appropriately; these results were consistent with conventional knowledge obtained from physical studies. Moreover, using our proposed method, we can also evaluate the uncertainty of its estimation values and clarify why the effective Hamiltonian was selected. Such information is difficult to obtain by the conventional analysis method.
Core-level X-ray photoelectron spectroscopy (XPS) is a useful means of investigating the electronic state of a strongly correlated electron system.1) The XPS spectral distribution is reproduced from its state transition probability. The state transition probability is calculated on the basis of an effective Hamiltonian, which explains complex electron-to-electron interactions in the transition of XPS. The parameters of the effective Hamiltonian are the physical parameters of the measurement target. Therefore, by adjusting the parameters of the effective Hamiltonian so as to reproduce the measured XPS spectrum, the physical parameters of the object material are estimated as the tuned parameters.2–5) Normally, this parameter adjustment is performed manually, and the effective Hamiltonian, which is the premise of the analysis, is set on the basis of the physical consideration of the XPS transition process by researchers.2–5)
There is an information science methodology called spectral deconvolution that regresses a spectrum as a linear sum of unimodal basis functions, such as the Gaussian function. Nagata et al.6) introduced Bayesian inference to spectral deconvolution by the exchange Monte Carlo method,7) which is an efficient sampling method. By Bayesian spectral deconvolution, the quantitative selection of the spectral model, such as the selection of the number of peaks, has been realized. Bayesian spectral deconvolution has been further developed, and the efficient estimation of noise intensity,8) the analysis of time series spectra,9) and a fast Bayesian spectral deconvolution algorithm10) applicable to a high-dimensional spectrum have been realized. Many studies have used Bayesian spectral deconvolution to analyze spectra measured in various scientific fields.11–13) Normally, in spectral deconvolution, the parameters of the unimodal basis, such as the position, intensity, and variance of basis functions, are estimated. Therefore, if the parameters of the physical model do not directly correspond to the parameters of the unimodal basis function, we need an indirect estimation of the parameters. In the analysis of the core-level XPS spectrum, the parameters of the effective Hamiltonian also do not directly correspond to the peak positions or peak intensities. To realize the direct estimation of physical parameters, Bayesian spectral deconvolution methods that build an internal model on the spectrum model have been developed.9,11) Here, the internal model is defined as a model whose parameters are physical parameters themselves, unlike the parameters of the unimodal basis function such as the mean, intensity, and variance. For instance, Kasai et al. applied the relationship between the protein species and the peak intensity ratio as an internal model, and estimated the protein species directly.11) Also, Murata et al. applied an autoregressive model representing the time evolution of peak positions of the spectrum, and estimated the parameters of latent dynamics directly.9)
In this study, we aim to achieve the automatic selection of an effective Hamiltonian and the estimation of its parameters based on the support of statistical science. By incorporating the effective Hamiltonian of core-level XPS into the Bayesian spectral deconvolution method as an internal model, we propose a Bayesian spectral deconvolution method that enables the automatic analysis of core-level XPS spectra. Our proposed method is not merely an automated spectral analysis method for estimating the parameters and selecting a model. In Bayesian inference, a physical parameter is treated as a statistical variable. Therefore, the physical parameter is estimated as the probability distribution of the values that the parameter can possibly take, which is called the posterior distribution. As a result, we can estimate the parameter and, at the same time, determine its estimation accuracy. This makes it possible to obtain the information necessary for developing measurement plans, such as the number of measurements and the measurement method, to satisfy the required estimation accuracy. In addition, by analyzing the shape of the distribution, we can also discuss the nature of the effective model. For example, we can discuss the uncertainty of the model parameters when fitting the model to the data. This provides information such as the relationship between the expressive ability of the model and the data complexity. It is difficult to obtain this information from simple parameter fitting. Thus, our proposed method can extract useful information for physical discussion from XPS spectra.
In the current methods of analyzing core-level XPS spectra, researchers use effective Hamiltonians such as the cluster model14) and the impurity Anderson model.15,16) The cluster model contains the whole process of core-level XPS spectrum generation, such as the spin–orbit interaction, multiplet effect, and crystal field effect. In the impurity Anderson model, the effect of the conduction band structure on the hybridization interaction between \(4f\) electrons and conduction electrons is considered. In contrast, as the internal model, our proposed method adopts two of the simplest effective Hamiltonians that can represent the XPS spectra of La2O3 and CeO2. The analysis of the core-level XPS spectra started from these effective Hamiltonians.2,3) Since these effective Hamiltonians do not contain many factors, such as the effects of the spin–orbit interaction, multiplet, and crystal field, they cannot be applied to the general core-level XPS spectra of a rare-earth insulator compound. However, the framework of the proposed method does not limit the inner model to these effective Hamiltonians. Even if the effective Hamiltonian is replaced with an arbitrary one in the framework of the proposed method, this simply increases the number of parameters to be estimated and does not change the principle of the framework. This suggests that our proposed method is applicable to general core-level XPS spectra.
Finally, we applied the proposed method to emulated measurement data of La2O3 and CeO2 and their intermediate electron states. As a result, it was confirmed that the two effective Hamiltonians, which are also regarded as effective Hamiltonians in conventional physical studies,2,3) were selected for the spectrum data of La2O3 and CeO2. Furthermore, from the analysis of the posterior distribution, it was shown that the effective Hamiltonian of CeO2 has a too large degree of freedom to express the spectrum data of La2O3. In addition, we also applied our proposed method to the spectra, which are the intermediate states of a seamless transition from the CeO2 spectrum (three peaks) to the La2O3 spectrum (two peaks). As a result, it was confirmed that the proposed method selects an effective Hamiltonian on the basis of not only the information about the peak number but also other information contained in the effective Hamiltonian. This cannot be realized by existing spectral deconvolution methods applicable to XPS spectra, which select the model on the basis of the peak number.6,8)
2. Generative Model of Spectrum
In this study, we analyzed spectra that were simulated on the basis of an effective model that is applicable to emulating all types of simplified \(4f\)-electron-derived \(3d\) core-level XPS spectra of rare-earth insulating compounds. We refer to this effective model as a generative model. The generative model was proposed by Kotani et al.3) It employs one of the simplest cluster models as the effective Hamiltonian. In the effective Hamiltonian, the effect of the spin–orbit interaction, the multiplet effect, and the crystal field effect are ignored. Moreover, the effect of hybridization between \(4f\) electrons and conduction electrons is simplified because it assumes that there is only one ligand state. Note here that the generative model contains two effective models, which we applied later as the inner model of the spectral deconvolution method. We refer to these effective models as recognition models, which are explained later.
There are three eigenstates of \(4f\) electrons: \(|f^{0}\rangle\), \(|f^{1}\rangle\), and \(|f^{2}\rangle\). In the \(|f^{0}\rangle\) state, there is no electron in the \(4f\) orbital, whereas there is one electron in the \(|f^{1}\rangle\) state and two in the \(|f^{2}\rangle\) state. In such a state space of \(4f\) electrons, the effective Hamiltonian of the generative model is given as \begin{align} \mathcal{H} &= \epsilon_{L} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{L\nu}a_{L\nu} + \epsilon^{0}_{f} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{f\nu}a_{f\nu} + \epsilon_{c} a^{\dagger}_{c} a_{c}\\ &\quad + \frac{V}{\sqrt{N_{f}}} \sum_{\nu=1}^{N_{f}} (a^{\dagger}_{f\nu}a_{f\nu} + a_{f\nu}a^{\dagger}_{f\nu}) + U_{ff} \sum_{\nu>\nu'} a^{\dagger}_{f\nu}a_{f\nu}a^{\dagger}_{f\nu'}a_{f\nu'}\\ &\quad - U_{fc} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{f\nu}a_{f\nu}(1- a^{\dagger}_{c} a_{c}), \end{align} (1) where \(\epsilon_{L}\), \(\epsilon^{0}_{f}\), and \(\epsilon_{c}\) are the energies of the conducting electron of \(4f\) rare-earth metals (\(5d\), \(6s\) electrons), the \(4f\) electron, and the core electron, respectively. The index ν (\(\nu=1-N_{f}\), \(N_{f}=14\)) represents the quantum number of the spin and f-orbital. V, \(U_{ff}\), and \(-U_{fc}\) are the energies of the hybridization interaction between \(4f\) electrons and conduction electrons, the Coulomb interaction between \(4f\) electrons, and the core-hole Coulomb potential for \(4f\) electrons, respectively. In both the initial and final states, we set the energy level of \(|f^{0}\rangle\) to 0 as the standard energy level. Thus, the number of parameters of the effective Hamiltonian \(\mathcal{H}\) is reduced to four: Δ (\(=\epsilon^{0}_{f} -\epsilon_{L}\)), V, \(U_{ff}\), and \(U_{fc}\). Here, we refer to the parameter set of the effective Hamiltonian \(\mathcal{H}\) as \({\boldsymbol{{\vartheta}}}_{\mathcal{H}} = \{\Delta,V,U_{ff},U_{fc}\}\).
The conceptual diagram of the \(4f\) electron state from the initial state to the final state is shown in Fig. 1. The initial state is defined as the state before X-rays are irradiated, and the final state is defined as the state after X-rays are irradiated and a core hole is generated. Furthermore, we define the initial eigenstate of the minimum energy \(E_{G}({\boldsymbol{{\vartheta}}}_{\mathcal{H}})\) as \(|G\rangle\) and the final eigenstates of the three energy levels \(E_{j}({\boldsymbol{{\vartheta}}}_{\mathcal{H}})\) (\(j = 0,1,2\)) as \(|F_{j}\rangle\) (\(j = 0,1,2\)) (Fig. 1). Using Fermi's golden rule, we define the transition probability from the initial ground state to the final state as above, \begin{equation} F(\omega;{\boldsymbol{{\vartheta}}}_{\mathcal{H}}) = \sum_{j=0}^{2} |\langle F_{j}|a_{c}|G\rangle|^{2}\delta (\omega - E_{j}({\boldsymbol{{\vartheta}}}_{\mathcal{H}}) + E_{G}({\boldsymbol{{\vartheta}}}_{\mathcal{H}})). \end{equation} (2) By convoluting \(F(\omega;{\boldsymbol{{\vartheta}}}_{\mathcal{H}})\) with the Lorentz function and by adding the Gaussian noise ϵ, we can obtain the intensity distribution of the spectrum as \begin{align} &\mathcal{I}(\omega;{\boldsymbol{{\vartheta}}}) \notag\\ &\quad= \sum_{j=0}^{2} |\langle F_{j}|a_{c}|G\rangle|^{2} \frac{\Gamma/\pi}{(\omega-(E_{j}({\boldsymbol{{\vartheta}}}_{\mathcal{H}})-E_{G}({\boldsymbol{{\vartheta}}}_{\mathcal{H}})))^{2}+\Gamma^{2}} + \epsilon, \end{align} (3) where Γ is half the width of the Lorentz function and \({\boldsymbol{{\vartheta}}}\) is the parameter set of the intensity distribution \(\mathcal{I}(\omega;{\boldsymbol{{\vartheta}}})\) defined as \({\boldsymbol{{\vartheta}}} = \{{\boldsymbol{{\vartheta}}}_{\mathcal{H}},\Gamma\} = \{\Delta,V,U_{ff},U_{fc},\Gamma\}\). We define the data generated following \(\mathcal{I}(\omega;{\boldsymbol{{\vartheta}}})\) as “emulated measurement data”. Because the generative model is uniquely defined by the effective Hamiltonian \(\mathcal{H}\), we label this model as \(\mathcal{H}\). Following de Groot and Kotani,5) the emulated parameters of the XPS spectrum of La2O3 and CeO2 are set as illustrated in Table I. The generated data based on Table I are shown in Fig. 2. The two peaks in Fig. 2(a) correspond to La2O3 and the three peaks in Fig. 2(b) correspond to CeO2.
Figure 1. Conceptual figure of the initial and final states of the XPS process. Each state represents the state of the \(4f\) electron \(|f^{j}\rangle\) (\(j=0,1,2\)).
Figure 2. (a) XPS spectrum of La2O3. The gray dots represent the observed data, the gray curve represents the true spectral density, and the vertical line represents the peak position and peak intensity. (b) XPS spectrum of CeO2. Both spectra include Gaussian noise with a standard deviation of 0.001.
Table I. Reproduction parameters of XPS spectra of La2O3 and CeO2.
In the parameters of the two-peak spectrum, the peak intensity of the highest energy \(E_{2}\) becomes almost zero. This is caused by the decreased interaction between the \(|f^{2}\rangle\) state and the \(|f^{0}\rangle\) state, originating from the increased energy of the \(|f^{2}\rangle\) state. This suggests that Δ is essential for changing the property of the peak number, because Δ controls the energy of the initial and final \(|f^{2}\rangle\) states. In fact, the most significant change between the parameters of La2O3 and CeO2 is Δ (Table I). Thus, when we discuss the properties of the La2O3 and CeO2 spectra, we can ignore the difference in parameters between La2O3 and CeO2, except for Δ.
3. Recognition Model of Emulated Spectrum
In this section, we describe the two effective models for recognizing the emulated measurement data generated following the generative model \(\mathcal{H}\). Hereinafter, we refer to these effective models as the recognition models. In this section, we introduce two recognition models referred to as the two-state Hamiltonian model and the three-state Hamiltonian model.
Two-state Hamiltonian model
Kotani and Toyozawa2) proposed a two-state Hamiltonian model as an effective model of La2O3 XPS spectra. The two-state Hamiltonian model is defined using an effective Hamiltonian on two eigenstates of \(4f\) electrons \(|f^{0}\rangle\) and \(|f^{1}\rangle\). The effective Hamiltonian is given as \begin{align} H_{2} &= \epsilon_{L} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{L\nu}a_{L\nu} + \epsilon^{0}_{f} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{f\nu}a_{f\nu} + \epsilon_{c} a^{\dagger}_{c} a_{c}\\ &\quad + \frac{V}{\sqrt{N_{f}}} \sum_{\nu=1}^{N_{f}} (a^{\dagger}_{L\nu}a_{f\nu} + a_{L\nu}a^{\dagger}_{f\nu}) \notag\\ &\quad-U_{fc} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{f\nu}a_{f\nu}(1- a^{\dagger}_{c} a_{c}), \end{align} (4) where, as in the generative model Hamiltonian \(\mathcal{H}\), \(\epsilon_{L}\), \(\epsilon^{0}_{f}\), and \(\epsilon_{c}\) are the energies of the conducting electron of \(4f\) rare-earth metals (\(5d\), \(6s\) electrons), the \(4f\) electron, and the core electron, respectively. The index ν (\(\nu=1-N_{f}\), \(N_{f}=14\)) represents the quantum number of the spin and f-orbital. V and \(-U_{fc}\) are the energies of the hybridization interaction and the core-hole Coulomb potential for the \(4f\) electrons, respectively. As in the generative model \(\mathcal{H}\), we set the energy level of the initial and final states \(|f^{0}\rangle\) to 0 as the standard energy level. Thus, the number of parameters of the two-state Hamiltonian model is reduced to two: \(\Delta^{\prime}\) (\(=\epsilon^{0}_{f} -\epsilon_{L} - U_{fc}\)) and V. Here, we refer to the parameter set of the two-state Hamiltonian \(H_{2}\) as \({\boldsymbol{{\theta}}}_{H_{2}} = \{\Delta^{\prime}, V \}\).
We define the initial eigenstate of the minimum energy \(E_{G}({\boldsymbol{{\theta}}}_{H_{2}})\) as \(|G\rangle\) and the final eigenstates of the two energy levels \(E_{j}({\boldsymbol{{\theta}}}_{H_{2}})\) (\(j = 0,1\)) as \(|F_{j}\rangle\) (\(j = 0,1\)). In the two-state Hamiltonian model, the initial eigenstate \(|G\rangle\) is set to be equal to the \(4f\) electron eigenstate \(|f^{0}\rangle\). By using Fermi's golden rule, convoluting the transition probability with the Lorentz function, and adding the Gaussian noise ϵ, we obtain the spectral distribution \begin{align} I_{2}(\omega;{\boldsymbol{{\theta}}}_{2}) &= \sum_{j=0}^{1} |\langle F_{j}|a_{c}|G\rangle|^{2} \notag\\ &\quad \times\frac{\Gamma_{j}/\pi}{(\omega-(E_{j}({\boldsymbol{{\theta}}}_{H_{2}})-E_{g}({\boldsymbol{{\theta}}}_{H_{2}}) - b))^{2}+\Gamma^{2}_{j}} + \epsilon, \end{align} (5) where each Lorentz function corresponding to each eigenenergy \(E_{j}({\boldsymbol{{\theta}}}_{H_{2}})\) can take a different half width \(\Gamma_{j}\) and \({\boldsymbol{{\theta}}}_{2}\) is the parameter set of the spectral distribution \(I_{2}(\omega;{\boldsymbol{{\theta}}}_{2})\), defined as \({\boldsymbol{{\theta}}}_{2} = \{{\boldsymbol{{\theta}}}_{H_{2}},\Gamma_{1},\Gamma_{2},b\} = \{\Delta^{\prime},V,\Gamma_{1},\Gamma_{2},b\}\). The energy shift parameter b is added to the model to compensate for the difference in the standard energy level between the models. The energy shift parameter is also added to the model in the conventional analysis of the XPS spectra. Because the two-state Hamiltonian model is uniquely defined by the effective Hamiltonian \(H_{2}\), we label this model as \(H_{2}\).
It was reported that the XPS spectra of La2O3 can be reproduced by the two-state Hamiltonian model \(H_{2}\).2) As we mentioned, the emulated measurement data of the La2O3 XPS spectrum is generated on the basis of the generative model \(\mathcal{H}\), which is more complex than the two-state Hamiltonian model \(H_{2}\). One purpose of this study is to confirm that this physical knowledge is reproduced by the proposed method.
Three-state Hamiltonian model
Kotani et al.3) proposed a three-state Hamiltonian model as an effective model of CeO2 XPS spectra. The effective Hamiltonian of the three-state Hamiltonian model is defined as \begin{align} H_{3} &= \epsilon_{L} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{L\nu}a_{L\nu} + \epsilon^{0}_{f} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{f\nu}a_{f\nu} + \epsilon_{c} a^{\dagger}_{c} a_{c}\\ &\quad + \frac{V}{\sqrt{N_{f}}} \sum_{\nu=1}^{N_{f}} (a^{\dagger}_{f\nu}a_{f\nu} + a_{f\nu}a^{\dagger}_{f\nu}) + U_{ff} \sum_{\nu>\nu'} a^{\dagger}_{f\nu}a_{f\nu}a^{\dagger}_{f\nu'}a_{f\nu'}\\ &\quad - U_{fc} \sum_{\nu=1}^{N_{f}} a^{\dagger}_{f\nu}a_{f\nu}(1- a^{\dagger}_{c} a_{c}), \end{align} (6) where the index ν (\(\nu=1-N_{f}\), \(N_{f}=14\)) represents the quantum number of the spin and f-orbital. In both the initial and final states, we set the energy level of \(|f^{0}\rangle\) to 0 as the standard energy level. Thus, the number of parameters of the effective Hamiltonian \(H_{3}\) is reduced to four: Δ (\(=\epsilon^{0}_{f} -\epsilon_{L}\)), V, \(U_{ff}\), and \(U_{fc}\). Here, we refer to the parameter set of the effective Hamiltonian \(H_{3}\) as \({\boldsymbol{{\theta}}}_{H_{3}} = \{\Delta,V,U_{ff},U_{fc}\}\). Then, as in the previously described model, the XPS spectral model is derived as \begin{align} I_{3}(\omega;{\boldsymbol{{\theta}}}_{3}) &= \sum_{j=0}^{2} |\langle F_{j}|a_{c}|G\rangle|^{2} \notag\\ &\quad \times\frac{\Gamma_{j}/\pi}{(\omega-(E_{j}({\boldsymbol{{\theta}}}_{H_{3}})-E_{G}({\boldsymbol{{\theta}}}_{H_{3}}) - b))^{2}+\Gamma^{2}_{j}} + \epsilon, \end{align} (7) where \(|G\rangle\) is the initial eigenstate of the minimum energy \(E_{G}({\boldsymbol{{\theta}}}_{H_{3}})\), and \(E_{j}({\boldsymbol{{\theta}}}_{H_{3}})\) (\(j = 0,1,2\)) as \(|F_{j}\rangle\) (\(j = 0,1,2\)) are the final eigenstates of the three energy levels. Similarly to the \(H_{2}\) model, each Lorentz function corresponding to each eigenenergy \(E_{j}({\boldsymbol{{\theta}}}_{H_{3}})\) can take a different half width \(\Gamma_{j}\) and \({\boldsymbol{{\theta}}}_{3}\) is the parameter set of the spectral distribution \(I_{3}(\omega;{\boldsymbol{{\theta}}}_{3})\), defined as \({\boldsymbol{{\theta}}}_{3} = \{{\boldsymbol{{\theta}}}_{H_{3}},\Gamma_{1},\Gamma_{2},\Gamma_{3}, b\} = \{\Delta,V,U_{ff},U_{fc},\Gamma_{1},\Gamma_{2},\Gamma_{3},b\}\). The three-state Hamiltonian model and the generative model are similar except for b and the degree of freedom of Γ.
As we mentioned earlier, it was reported that the XPS spectra of CeO2 can be reproduced by the three-state Hamiltonian model \(H_{3}\).3) Because the three-state Hamiltonian model is uniquely defined by the effective Hamiltonian \(H_{3}\), we label this model as \(H_{3}\).
4. Method
Bayesian model selection
We evaluate the recognition models \(H_{2}\) and \(H_{3}\) in terms of their capability to represent the spectrum data \(\boldsymbol{{D}}=\{\boldsymbol{{w}},\boldsymbol{\mathcal{I}}\}=\{(w_{1},w_{2},\cdots w_{N}),(\mathcal{I}(w_{1};{\boldsymbol{{\vartheta}}}),\mathcal{I}(w_{2};{\boldsymbol{{\vartheta}}}),\cdots\mathcal{I}(w_{N};{\boldsymbol{{\vartheta}}}))\}\) generated by the generative model \(\mathcal{H}\). The likelihood of the recognition model \(H_{k}\) (\(k=2,3\)) for the dataset \(\boldsymbol{{D}}\) is defined as \begin{equation} \mathrm{P}(H_{k}|\boldsymbol{{D}}) = \frac{\mathrm{P}(\boldsymbol{{D}}|H_{k})\mathrm{P}(H_{k})}{\mathrm{P}(\boldsymbol{{D}})} \propto \mathrm{P}(\boldsymbol{{D}}|H_{k})\mathrm{P}(H_{k}), \end{equation} (8) where \(\mathrm{P}(\boldsymbol{{D}})\) is a normalization constant. In this study, we assume that there is no prior knowledge about the likelihood of the model. Thus, we set the prior probability \(\mathrm{P}(H_{k})\) as a uniform distribution; in this study, it is equal to \(\frac{1}{2}\). We also assume that \(\boldsymbol{{w}}\) in the dataset \(\boldsymbol{{D}}=\{\boldsymbol{{w}},\boldsymbol{\mathcal{I}}\}\) is given deterministically, that is, non-probabilistically. Then, the likelihood of the model is transformed as \begin{align} \mathrm{P}(H_{k}|\boldsymbol{{D}}) &\propto \mathrm{P}(\boldsymbol{{D}}|H_{k}) = \mathrm{P}(\boldsymbol{\mathcal{I}}|H_{k})\\ &= \int_{-\infty}^{\infty} \mathrm{P}(\boldsymbol{\mathcal{I}}|{\boldsymbol{{\theta}}}_{k},H_{k}) \mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\,d{\boldsymbol{{\theta}}}_{k}, \end{align} (9) where \({\boldsymbol{{\theta}}}_{k}\) is the parameter set of the recognition model \(H_{k}\) described in Sect. 3.
The conditional probability \(\mathrm{P}(\boldsymbol{\mathcal{I}}|{\boldsymbol{{\theta}}}_{k},H_{k})\) of Eq. (9) is a stochastic generative model of the recognition model \(H_{k}\). When the additive noise ϵ of the XPS spectra is given as an independent and identically distributed Gaussian with average 0 and standard deviation \(\sigma_{\text{noise}}\), \begin{align} \mathrm{P}(\boldsymbol{\mathcal{I}}|{\boldsymbol{{\theta}}}_{k},H_{k}) &= \prod_{i=1}^{N} \mathrm{P}(\mathcal{I}(w_{i})|{\boldsymbol{{\theta}}}_{k},H_{k})\\ &= \left(\frac{1}{2\pi\sigma_{\text{noise}}^{2}}\right)^{N/2} \prod_{i=1}^{N} \exp\left[-\frac{1}{2\sigma_{\text{noise}}^{2}}(\mathcal{I}(w_{i};{\boldsymbol{{\vartheta}}}) - I_{k}(w_{i};{\boldsymbol{{\theta}}}_{k}))^{2} \right]\\ &= \left(\frac{1}{2\pi\sigma_{\text{noise}}^{2}}\right)^{N/2} \exp\left\{-\sum_{i=1}^{N} \left[\frac{1}{2\sigma_{\text{noise}}^{2}}(\mathcal{I}(w_{i};{\boldsymbol{{\vartheta}}}) - I_{k}(w_{i};{\boldsymbol{{\theta}}}_{k}))^{2}\right] \right\}. \end{align} (10) The probability \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\) in Eq. (9) simulates the prior knowledge about the model parameters \({\boldsymbol{{\theta}}}_{k}\) as a probability distribution. By substituting Eq. (10) into Eq. (9), we obtain the following: \begin{align} \mathrm{P}(\boldsymbol{\mathcal{I}}|H_{k}) &= \left(\frac{1}{2\pi\sigma_{\text{noise}}^{2}}\right)^{N/2} \int_{-\infty}^{\infty} \exp\left\{-\frac{1}{2\sigma_{\text{noise}}^{2}}\sum_{i=1}^{N} [\mathcal{I}(w_{i};{\boldsymbol{{\vartheta}}}) - I_{k}(w_{i};{\boldsymbol{{\theta}}}_{k})]^{2}\right\}\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\,d{\boldsymbol{{\theta}}}_{k}\\ &= \left(\frac{1}{2\pi\sigma_{\text{noise}}^{2}}\right)^{N/2} \int_{-\infty}^{\infty} \exp[-\mathit{NE}({\boldsymbol{{\theta}}}_{k})]\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\,d{\boldsymbol{{\theta}}}_{k}, \end{align} (11) where \begin{equation} E({\boldsymbol{{\theta}}}_{k}) = \frac{1}{2N\sigma_{\text{noise}}^{2}}\sum_{i=1}^{N} [\mathcal{I}(w_{i};{\boldsymbol{{\vartheta}}}) - I_{k}(w_{i};{\boldsymbol{{\theta}}}_{k})]^{2}. \end{equation} (12) The probability \(\mathrm{P}(\boldsymbol{\mathcal{I}}|H_{k})\) is often referred to as the marginal likelihood and is proportional to the likelihood of the recognition model \(H_{k}\). The negative log-likelihood, \begin{equation} F(H_{k}) = -{\log \mathrm{P}}(\boldsymbol{\mathcal{I}}|H_{k}), \end{equation} (13) is often referred to as the Bayesian free energy (\(\mathit{FE}\)). In this way, \(\mathit{FE}\) is proportional to the negative log-likelihood of the recognition model \(H_{k}\). Therefore, the effective model \(H_{k}\) with the smallest \(\mathit{FE}\) value represents the best model.
Exchange Monte Carlo method
To obtain the value of \(\mathit{FE}\), we need to execute the integration in Eq. (11). However, it is difficult to analytically execute the integration owing to the complicated relationship between \(\boldsymbol{\mathcal{I}}\) and \({\boldsymbol{{\theta}}}_{k}\). We overcame this difficulty by numerical integration using the exchange Monte Carlo method.7)
Markov chain Monte Carlo (MCMC) methods17) are efficient for sampling from a probability distribution in a high-dimensional space, such as \({\boldsymbol{{\theta}}}_{k}\). To apply an MCMC method to execute an integration, we need to transform the integration to a mean value calculation. When applying an MCMC method to the calculation of \(\mathit{FE}\), we introduce an auxiliary variable β and transform Eq. (13) into \begin{align} F(H_{k}) &= -{\log}\int_{-\infty}^{\infty} \exp[-\mathit{NE}({\boldsymbol{{\theta}}}_{k})]\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\,d{\boldsymbol{{\theta}}}_{k} + \frac{N}{2}\log(2\pi\sigma_{\text{noise}}^{2})\\ &= \int^{1}_{0} \frac{\partial}{\partial \beta}\left\{-{\log}\left[\int_{-\infty}^{\infty} \exp(-\beta \mathit{NE}({\boldsymbol{{\theta}}}_{k}))\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\,d{\boldsymbol{{\theta}}}_{k}\right]\right\}\,d\beta + \frac{N}{2}\log(2\pi\sigma_{\text{noise}}^{2})\\ &= \int^{1}_{0} \int_{-\infty}^{\infty} \mathit{NE}({\boldsymbol{{\theta}}}_{k})\mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}},\beta)\,d{\boldsymbol{{\theta}}}_{k}\,d\beta + \frac{N}{2}\log(2\pi\sigma_{\text{noise}}^{2})\\ &= \int^{1}_{0} \langle \mathit{NE}({\boldsymbol{{\theta}}}_{k})\rangle_{\mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}},\beta)}\,d\beta + \frac{N}{2}\log(2\pi\sigma_{\text{noise}}^{2}), \end{align} (14) where \(\langle\cdot\rangle\) represents an average and \begin{equation} \mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}},\beta) = \frac{\exp[-\beta \mathit{NE}({\boldsymbol{{\theta}}}_{k})]\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})}{\displaystyle\int_{-\infty}^{\infty} \exp[-\beta \mathit{NE}({\boldsymbol{{\theta}}}_{k})]\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\,d{\boldsymbol{{\theta}}}_{k}}. \end{equation} (15) When \(\mathit{NE}({\boldsymbol{{\theta}}}_{k})\) is regarded as energy, this equation suggests that \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}},\beta)\) corresponds to the Boltzmann distribution in statistical physics. In the same way, the auxiliary variable β corresponds to the inverse temperature in statistical physics. Equation (14) is approximated to a quadrature by parts, \begin{equation} F(H_{k}) \simeq \sum_{l=0}^{L} \langle \mathit{NE}({\boldsymbol{{\theta}}}_{k})\rangle_{\mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}},\beta_{l})} \Delta \beta_{l} + \frac{N}{2}\log(2\pi\sigma_{\text{noise}}^{2}), \end{equation} (16) where \(\beta_{l}\) is given as a sequence of inverse temperatures \(0=\beta_{1}<\beta_{1}<\cdots<\beta_{L}=1\) obtained by dividing \(\beta=0\) to 1 into L pieces, and each \(\langle\mathit{NE}({\boldsymbol{{\theta}}}_{k})\rangle_{\mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}},\beta_{l})}\) is obtained by performing MCMC sampling independently at each inverse temperature \(\beta_{l}\). However, the MCMC sampling is often trapped at local minima.
The exchange Monte Carlo method (EMC) is an algorithm of an MCMC method used to avoid local trapping at minima. This method simulates multiple samplings from multiple densities with different inverse temperatures \(\beta_{l}\). The EMC takes samples from the joint density \begin{equation} \mathrm{P}({\boldsymbol{{\theta}}}_{k}^{1},{\boldsymbol{{\theta}}}_{k}^{2}\cdots {\boldsymbol{{\theta}}}_{k}^{L}|\boldsymbol{\mathcal{I}}) = \prod_{l=1}^{L}\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l}|\boldsymbol{\mathcal{I}},\beta_{l}), \end{equation} (17) where the probability density \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l}|\boldsymbol{\mathcal{I}},\beta_{l})\) is defined in Eq. (15). The EMC algorithm realizes sampling from the joint density \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{1},{\boldsymbol{{\theta}}}_{k}^{2}\cdots {\boldsymbol{{\theta}}}_{k}^{L}|\boldsymbol{\mathcal{I}})\) on the basis of the following updates.
1 Sampling from each density \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l}|\boldsymbol{\mathcal{I}},\beta_{l})\)
Sampling from \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l}|\boldsymbol{\mathcal{I}},\beta_{l})\) by a conventional MCMC method, such as the Metropolis–Hastings algorithm.18)
2 Exchange process between two densities corresponding to adjacent inverse temperatures
The exchanges between the configurations \({\boldsymbol{{\theta}}}_{k}^{l}\) and \({\boldsymbol{{\theta}}}_{k}^{l+1}\) correspond to adjacent inverse temperatures following the probability \(R=\min(1,r)\), where \begin{align} r &= \frac{\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{1},\cdots, {\boldsymbol{{\theta}}}_{k}^{l+1}, {\boldsymbol{{\theta}}}_{k}^{l},\cdots,{\boldsymbol{{\theta}}}_{k}^{L}|\boldsymbol{\mathcal{I}})}{\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{1},\cdots, {\boldsymbol{{\theta}}}_{k}^{l}, {\boldsymbol{{\theta}}}_{k}^{l+1},\cdots,{\boldsymbol{{\theta}}}_{k}^{L}|\boldsymbol{\mathcal{I}})}\\ &= \frac{\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l+1}|\boldsymbol{\mathcal{I}},\beta_{l})\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l}|\boldsymbol{\mathcal{I}},\beta_{l+1})}{\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l}|\boldsymbol{\mathcal{I}},\beta_{l})\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{l+1}|\boldsymbol{\mathcal{I}},\beta_{l+1})}\\ &= \exp\{N[\beta_{l+1} - \beta_{l}][E({\boldsymbol{{\theta}}}_{k}^{l+1}) - E({\boldsymbol{{\theta}}}_{k}^{l})]\}. \end{align} (18) Sampling from a distribution with a smaller β corresponds to sampling from a distribution with a larger intensity of noise; thus, the distribution tends not to have a local minimum. On the other hand, sampling from a distribution with a larger β corresponds to sampling from a distribution with local minima. Hence, sampling from the joint density \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}^{1},{\boldsymbol{{\theta}}}_{k}^{2}\cdots {\boldsymbol{{\theta}}}_{k}^{L}|\boldsymbol{\mathcal{I}})\) overcomes the local minimum and enables the fast convergence of sampling.
Using the sampling result of the \(\beta=1\) state, we obtained a posterior density of the parameter \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}})\) [Eq. (15)]. From the posterior density of \({\boldsymbol{{\theta}}}_{k}\), we can estimate the model parameters \({\boldsymbol{{\theta}}}_{k}\) of \(H_{k}\) and the related information, such as estimation accuracy.
5. Results
We applied our proposed method to the emulated measurement data and estimated the likelihood of the effective models \(H_{2}\) and \(H_{3}\). As described in Sect. 3, from the physical knowledge, two recognition models \(H_{2}\) and \(H_{3}\) are expected to be selected for the emulated measurement data of La2O3 and CeO2 XPS spectra, respectively. The spectrum of CeO2 has a three-peak structure, and that of La2O3 has a two-peak structure. If the selection of an effective model is based only on the peak number, the effective model can also be selected indirectly by the existing spectral deconvolution method, which has no internal model.6,8) On the other hand, because our proposed method builds an effective model into spectral deconvolution, various information about the peak structure, such as the peak position or the order of peak intensity, should be used to select the effective model. To confirm this, we applied our proposed method not only to spectra of CeO2 and La2O3, but also to spectra that are the intermediate states of a seamless transition from the CeO2 spectrum to the La2O3 spectrum. As we mentioned in Sect. 2, Δ is an important parameter for controlling the properties of XPS spectra from La2O3 to CeO2. The generated parameters V and \(U_{fc}\) of the La2O3 spectrum are also slightly different from the generated parameters of the CeO2 spectrum (Table I). However, the emulated measurement data of the La2O3 spectrum generated by replacing the V and \(U_{fc}\) values with the parameter values of the CeO2 spectrum have almost the same peak position and peak intensity as those in the La2O3 spectrum [Figs. 3(a)-(1), 3(a)-(2), and 2(a)]. Therefore, we refer to this parameter-replaced emulated spectrum as the La2O3 spectrum. That is, by shifting Δ from 1.6 to 12.5 and fixing the other parameters as the parameters of CeO2, we can generate spectra that have the intermediate structure of the La2O3 and CeO2 spectra. The increase in the parameter Δ from CeO2 to La2O3 induces the decrease in the transition probability \(\langle F_{\text{max}}|a_{c}|G\rangle\) from the initial ground state \(|G\rangle\) to the final maximum eigenenergy \(E_{\text{max}}\) state \(|F_{\text{max}}\rangle\). Because the square of the transition probability \(|\langle F_{\text{max}}|a_{c}|G\rangle|^{2}\) is the peak intensity, the decrease in the transition probability \(\langle F_{\text{max}}|a_{c}|G\rangle\) transforms the three-peak CeO2 spectrum to a two-peak spectrum. To be precise, the emulated measurement data of La2O3 has three peaks, but the peak intensity corresponding to the largest eigenvalue \(E_{2}\) is very small. Hence, in this study, we define the peak number of the emulated measurement data as the number of peaks whose intensity is larger than the noise intensity \(\sigma_{\text{noise}}\). In this way, we generated emulated measurement data by shifting the parameter Δ and applying our proposed method to it. More specifically, all parameters except Δ are the same, [\(V=0.76\), \(U_{ff}=10.5\), \(U_{fc}=12.5\), \(\Gamma = 0.5\)], for all applied emulated measurement data. As mentioned above, the peak number is defined using the noise intensity \(\sigma_{\text{noise}}\) and the peak intensity. Therefore, we evaluated the effect of not only Δ, but also the noise intensity \(\sigma_{\text{noise}}\). Hence, we also generated the emulated measurement data by setting the noise standard deviation to \(\sigma_{\text{noise}}\in \{0.001, 0.0028, 0.0046, 0.0064, 0.0082, 0.01\}\). The peak number, peak position, and peak intensity of the emulated measurement data are given in Table II, and some examples of emulated measurement data are shown in Fig. 3(a). In the emulated measurement data of \(\Delta = 10.08\), if the noise intensity \(\sigma_{\text{noise}}\) is smaller than 0.0052, the peak number is three, whereas if the noise intensity \(\sigma_{\text{noise}}\) is larger than 0.0052, the peak number is two (Table II). In this study, each emulated measurement data consists of \(N = 400\) samples.
Figure 3. Graphs of spectra generated by the generative model \(\mathcal{H}\) with parameters \(V=0.76\), \(U_{ff}=10.5\), \(U_{fc}=12.5\), and \(\Gamma = 0.7\). The gray dots represent the observed data, the gray curve represents the true spectral density, and the vertical line represents the peak position and peak intensity. For Δ and \(\sigma_{\text{noise}}\), the following values were set for each spectrum: (a)-(1) \([\Delta,\sigma_{\text{noise}}] = [12.5, 0.001]\), (a)-(2) \([12.5, 0.01]\), (a)-(3) \([10.08, 0.001]\), (a)-(4) \([10.08, 0.01]\), (a)-(5) \([1.6, 0.001]\), (a)-(6) \([1.6, 0.01]\). (b) Enlarged plots of spectra in (a).
Table II. Properties of spectrum structure. \(E_{j} - E_{g}\) corresponds to the peak position, and \(|\langle F_{j}|a_{c}|G\rangle|^{2}\frac{\Gamma/\pi}{E_{j} - E_{g} - (E_{j} - E_{g}) +\Gamma^{2}}=\frac{|\langle F_{j}|a_{c}|G\rangle|^{2}}{\Gamma\pi}\) corresponds to the peak intensity. The number of peaks is defined as the number of peaks whose intensity is larger than the noise intensity \(\sigma_{\text{noise}}\).
To apply Bayesian estimation, we set the prior density \(\mathrm{P}({\boldsymbol{{\theta}}}_{k}|H_{k})\) to a uniform distribution in the range given in Table III. In the execution of EMC sampling, we adopted the Metropolis–Hastings algorithm18) to sample each state of inverse temperature. The states of the inverse temperature were determined using the following function:19) \begin{equation} \beta_{l} = \begin{cases} 0.0 &\text{($l=1$)}\\ \gamma^{l-L} &\text{($l\neq 1$)}\end{cases}, \end{equation} (19) where \(L = 40\) and \(\gamma=1.4\). We abandoned the first 10,000 steps and sampled the next 1,000,000 steps.
Table III. Range of uniform prior densities of parameters.
On the basis of the obtained Bayesian free energies \(F(H_{2})\) and \(F(H_{3})\), which correspond to the two-state Hamiltonian model \(H_{2}\) and the three-state Hamiltonian model \(H_{3}\), respectively, the log likelihood of \(H_{3}\), \(\mathrm{P}(H_{3}|\boldsymbol{{D}})\), was calculated as \begin{equation} \mathrm{P}(H_{3}|\boldsymbol{{D}}) = \frac{\exp[-F(H_{3})]}{\exp[-F(H_{2})]+\exp[-F(H_{3})]}. \end{equation} (20)
If \(\mathrm{P}(H_{3}|\boldsymbol{{D}})>0.5\), then \(H_{3}\) is a more plausible model than \(H_{2}\). Otherwise, if \(\mathrm{P}(H_{3}|\boldsymbol{{D}})<0.5\), then \(H_{2}\) is a more plausible model than \(H_{3}\). From the phase diagram of \(\mathrm{P}(H_{3}|\boldsymbol{{D}})\) (Fig. 4), the three-state Hamiltonian model \(H_{3}\) was selected for the emulated measurement data of \(\Delta = 1.6\), corresponding to the CeO2 spectra, and the two-state Hamiltonian model \(H_{2}\) was selected for the emulated measurement data of \(\Delta = 12.5\), corresponding to the La2O3 spectra. Furthermore, the proposed method selected the three-state Hamiltonian model \(H_{3}\) for all intermediate emulated measurement data from \(\Delta = 1.6\) to 12.5. It included the emulated measurement data of \(\Delta = 10.08\) and \(\sigma_{\text{noise}}\geq 0.0046\), whose peak intensity of the largest energy was smaller than the noise intensity. For further analysis, we evaluated the difference in the \(\mathit{FE}\)s, \(F(H_{3}) - F(H_{2})\), which corresponds to the ratio of likelihoods, \(\mathrm{P}(H_{2}|\boldsymbol{{D}})/\mathrm{P}(H_{3}|\boldsymbol{{D}})\), in logarithm. \(F(H_{3}) - F(H_{2}) < 0\) means that \(H_{3}\) is a better model than \(H_{2}\), and \(F(H_{3}) - F(H_{2}) > 0\) means that \(H_{2}\) is a better model than \(H_{3}\). The value of \(F(H_{3}) - F(H_{2})\) tends to gradually increase as the data generated by the parameters Δ and \(\sigma_{\text{noise}}\) increase [Fig. 4(b)].
Figure 4. (Color online) (a) Likelihood of the \(H_{3}\) Hamiltonian model. The colored cells represent the value of \(\mathrm{P}(H_{3}|\boldsymbol{{D}})\). (b) Difference of the Bayesian free energy of the \(H_{3}\) Hamiltonian model and that of \(H_{2}\). The colored cells represent the value of \(F(H_{3}) - F(H_{2})\).
To estimate the model parameters and their estimation uncertainty on the basis of the effective model \(H_{3}\), we evaluated the posterior density of parameters \(\mathrm{P}({\boldsymbol{{\theta}}}_{3}|\boldsymbol{\mathcal{I}})\). The posterior density must consist of independent samples. However, the sampling time series of the EMC has a time correlation. Therefore, we extracted samples from sufficiently separated intervals with a correlation coefficient of 0.5 or lower, and calculated the posterior distribution. Furthermore, to visualize a posterior distribution with more than three dimensions, we calculated the following marginal posterior density, which marginalizes the posterior distribution of the parameters \({\boldsymbol{{\theta}}}_{k}^{\lnot m}\) except for the parameter of interest \(\theta_{k}^{m}\): \begin{equation} \mathrm{P}(\theta_{k}^{m}|\boldsymbol{\mathcal{I}}) = \int_{-\infty}^{\infty} \mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}})\mathrm{P}({\boldsymbol{{\theta}}}_{k})\,d{\boldsymbol{{\theta}}}_{k}^{\lnot m}. \end{equation} (21) Assuming that T sampling data of one parameter m, \(\{\theta_{k}^{m}(t)\}_{t=1}^{T}\), are extracted from the sampling result of the EMC, the marginalized posterior distribution \(\mathrm{P}(\theta_{k}^{m}|\boldsymbol{\mathcal{I}})\) can be estimated from the sampling result by the kernel density estimation method using Gaussian kernels. We determined the bandwidth h of Gaussian kernels using the following common rule-of-thumb:20) \begin{equation} h = \frac{C\sigma}{T^{1/5}}, \end{equation} (22) where C is constant and σ is the standard deviation of the sampling result of \(\theta_{k}^{m}\). In this study, we set C to 1.0.
From the evaluation of the marginal posterior density of the emulated measurement data of Δ equal to 7.66 or less, we found a sharp peak in posterior distributions around the true parameter, which we used to generate the emulated measurement data (Figs. 5 –10). From the evaluation of the marginal posterior density with the emulated measurement data of \(\Delta=10.08\), whose Δ value is the model switching boundary from \(H_{3}\) to \(H_{2}\), we found that the width of the posterior distribution of Δ, \(U_{fc}\), and \(U_{ff}\) increased as the noise intensity \(\sigma_{\text{noise}}\) (Figs. 5–7) increased. On the other hand, the marginal posterior densities of Δ, \(U_{ff}\), and \(U_{fc}\) for the emulated measurement data of \(\Delta=12.5\) have almost uniform distributions.
Figure 5. (Color online) Marginalized posterior densities of Δ arranged in the same order as in the heat map of Fig. 4. The horizontal axis of each graph is Δ and the vertical axis is \(P(\Delta|\boldsymbol{{I}})\). The range of the horizontal axis of all graphs is from 0.0 to 20.0, the range of the vertical axis of the graphs with \(\Delta < 10.08\) is from 0.0 to 3.0, and that of the graphs with \(\Delta\geq 10.08\) is from 0.0 to 1.0. The red circle in each graph indicates the true parameter, which we used to generate the emulated measurement data.
Figure 6. (Color online) Marginalized posterior densities of \(U_{fc}\) arranged in the same order as in the heat map of Fig. 4. The horizontal axis of each graph is \(U_{fc}\) and the vertical axis is \(P(U_{fc}|\boldsymbol{{I}})\). The range of the horizontal axis of all graphs is from 0.0 to 20.0, the range of the vertical axis of the graphs with \(\Delta < 10.08\) is from 0.0 to 4.0, and that of the graphs with \(\Delta\geq 10.08\) is from 0.0 to 1.0. The red circle in each graph indicates the true parameter, which we used to generate the emulated measurement data.
Figure 7. (Color online) Marginalized posterior densities of \(U_{ff}\) arranged in the same order as in the heat map of Fig. 4. The horizontal axis of each graph is \(U_{ff}\) and the vertical axis is \(P(U_{ff}|\boldsymbol{{I}})\). The range of the horizontal axis of all graphs is from 0.0 to 20.0, the range of the vertical axis of the graphs with \(\Delta < 10.08\) is from 0.0 to 4.0, and that of the graphs with \(\Delta\geq 10.08\) is from 0.0 to 1.0. The red circle in each graph indicates the true parameter, which we used to generate the emulated measurement data.
Figure 8. (Color online) Marginalized posterior densities of V arranged in the same order as in the heat map of Fig. 4. The horizontal axis of each graph is V and the vertical axis is \(P(V|\boldsymbol{{I}})\). The range of the horizontal axis of all graphs is from 0.0 to 4.0, the range of the vertical axis of the graphs with \(\Delta < 10.08\) is from 0.0 to 50.0, and that of the graphs with \(\Delta\geq 10.08\) is from 0.0 to 20.0. The red circle in each graph indicates the true parameter, which we used to generate the emulated measurement data.
Figure 9. (Color online) Marginalized posterior densities of \(\Gamma_{2}\) arranged in the same order as in the heat map of Fig. 4. The horizontal axis of each graph is \(\Gamma_{2}\) and the vertical axis is \(P(\Gamma_{2}|\boldsymbol{{I}})\). The range of the horizontal axis of all graphs is from 0.01 to 1.0, and the range of the vertical axis of all graphs is from 0.0 to 50.0. The red circle in each graph indicates the true parameter, which we used to generate the emulated measurement data.
Figure 10. (Color online) Marginalized posterior densities of \(\Gamma_{3}\) arranged in the same order as in the heat map of Fig. 4. The horizontal axis of each graph is \(\Gamma_{3}\) and the vertical axis is \(P(\Gamma_{3}|\boldsymbol{{I}})\). The range of the horizontal axis of all graphs is from 0.01 to 1.0, the range of the vertical axis of the graphs with \(\Delta < 10.08\) is from 0.0 to 50.0, and that of the graphs with \(\Delta\geq 10.08\) is from 0.0 to 20.0. The red circle in each graph indicates the true parameter, which we used to generate the emulated measurement data.
In this study, model parameters were estimated from such posterior distributions using the following two methods. The first method was the maximum a posteriori (MAP) method. The MAP method estimates parameters on the basis of the following equation: \begin{equation} {\boldsymbol{{\theta}}}_{k}^{\textit{MAP}} = \mathop{\mathrm{arg\,max}}_{{\boldsymbol{{\theta}}}_{k}} \mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}}). \end{equation} (23) The second method was the maximizer of the posterior marginal (MPM) method. The MPM method estimates parameters on the basis of the following equation: \begin{align} \theta_{k}^{\textit{m MPM}} &= \mathop{\mathrm{arg\,max}}_{\theta_{k}^{m}} \mathrm{P}(\theta_{k}^{m}|\boldsymbol{\mathcal{I}}) \notag\\ &= \mathop{\mathrm{\arg\,max}}_{\theta_{k}^{m}} \int_{-\infty}^{\infty} \mathrm{P}({\boldsymbol{{\theta}}}_{k}|\boldsymbol{\mathcal{I}})\mathrm{P}({\boldsymbol{{\theta}}}_{k})\,d{\boldsymbol{{\theta}}}_{k}^{\lnot m}, \end{align} (24) where, as with the marginal posterior density, m is the index of a certain parameter \(\theta_{k}^{m}\) included in the parameter set \({\boldsymbol{{\theta}}}_{k}\). The MPM corresponds to using the maximum marginal posterior density as an estimation value. To evaluate the estimation uncertainty of a parameter, we defined the variation \(\chi^{m}\) of the sampling data \(\{\theta_{k}^{m}(t)\}_{t=1}^{T}\) from the MPM as \begin{equation} \chi^{m} = \sqrt{\frac{1}{T}\sum_{t=1}^{T} (\theta_{k}^{m}(t) - \theta_{k}^{\textit{m MPM}})^{2}}. \end{equation} (25) Here, we focused on the parameters of the effective model \(H_{3}\), which are Δ, \(U_{fc}\), \(U_{ff}\), V, \(\Gamma_{2}\), and \(\Gamma_{3}\) (Tables IV –IX, respectively).
Table IV. Estimated values of Δ by MAP and MPM methods. The estimation accuracy can be evaluated by comparison between the estimated parameter and true parameter. The uncertainty of estimation can be evaluated using the variation χ.
Table V. Estimated values of \(U_{fc}\) by MAP and MPM methods. The estimation accuracy can be evaluated by comparison between the estimated parameter and true parameter. The uncertainty of estimation can be evaluated using the variation χ.
Table VI. Estimated values of \(U_{ff}\) by MAP and MPM methods. The estimation accuracy can be evaluated by comparison between the estimated parameter and true parameter. The uncertainty of estimation can be evaluated using the variation χ.
Table VII. Estimated values of V by MAP and MPM methods. The estimation accuracy can be evaluated by comparison between the estimated parameter and true parameter. The uncertainty of estimation can be evaluated using the variation χ.
Table VIII. Estimated values of \(\Gamma_{2}\) by MAP and MPM methods. The estimation accuracy can be evaluated by comparison between the estimated parameter and true parameter. The uncertainty of estimation can be evaluated using the variation χ.
Table IX. Estimated values of \(\Gamma_{3}\) by MAP and MPM methods. The estimation accuracy can be evaluated by comparison between the estimated parameter and true parameter. The uncertainty of estimation can be evaluated using the variation χ.
In the emulated measurement data of \(\Delta<10.08\), all parameters were estimated correctly by both the MAP and MPM methods. In more detail, the gaps between the estimated parameters and the true parameters increased as the noise variance increased, and the variation χ also increased as the noise intensity \(\sigma_{\text{noise}}\) increased. In the emulated measurement data of \(\Delta\geq 10.08\), the gaps between the estimated parameters and the true parameters were much larger than the others, except for the emulated measurement data of \(\Delta=10.08\) and \(\sigma_{\text{noise}}=0.001\) and 0.0028. We evaluated the uncertainty of parameter estimation from the variation χ of the marginal posterior density. As a result, at \(\Delta=10.08\) and \(\sigma_{\text{noise}}=0.0046\), a large transition of the variation χ of Δ, \(U_{fc}\), and \(U_{ff}\) greater than one order was observed (Tables IV–VI). Also, at \(\Delta = 12.5\), a large χ was observed regardless of the noise intensity (Tables IV–VI).
6. Discussion
By applying our proposed method to emulated measurement data, we determined that the two-state Hamiltonian model \(H_{2}\) should be applied to emulated measurement data corresponding to the La2O3 spectra. On the other hand, the three-state Hamiltonian model \(H_{3}\) should be applied to emulated measurement data corresponding to the CeO2 spectra. These results are consistent with those of previous studies.2,3) For the emulated measurement data of \(\Delta=10.08\) and \(\sigma_{\text{noise}}\geq 0.0046\), our proposed method selected the three-state Hamiltonian model \(H_{3}\). These spectral distributions, the same as the La2O3 spectra, are two-peak spectra as shown in Fig. 3[(a)-(2)] and Table II. Here, we consider applying the existing method to the emulated measurement data of \(\Delta=10.08\). The existing Bayesian spectral deconvolution methods that are applicable to the analysis of the core-level \(3d\) XPS spectrum have no internal model.6,8) Such existing methods simply select the model whose number of peaks is the same as the number of peaks appearing in the spectra.21) Therefore, if the existing methods are applied to the emulated measurement data of \(\Delta=10.08\), a model that has a two-peak structure should be selected. Such differences in model selection results between our proposed method and existing Bayesian spectral deconvolution methods will depend on whether the internal model, the effective Hamiltonian, was built in the spectral deconvolution model. This suggests that our proposed method should be applied to the analysis of core-shell XPS spectra when its candidates of the effective Hamiltonian are given.
From the posterior distribution, we can estimate the uncertainties of estimated parameters. Actually, from the analysis of the posterior distribution of the \(H_{3}\) model, it is confirmed that the estimation uncertainty, which corresponds to the variation χ, is increased as the noise intensity \(\sigma_{\text{noise}}\) increased. The marginalized posterior distributions, except for Γ and V, of \(\Delta=10.08\) and \(\sigma_{\text{noise}}=0.0046\) were broad and had no sharp peak structures, whereas the marginalized posterior distributions of \(\Delta=10.08\) and \(\sigma_{\text{noise}}=0.0028\) had sharp peak structures. These properties suggest that the estimation uncertainty significantly decreases around \(\Delta=10.08\) and \(\sigma_{\text{noise}}=0.0028\), where the peak number of the spectrum is changed. This suggests that, to obtain high estimation accuracy, the noise intensity must be reduced to less than \(\sigma_{\text{noise}}=0.0028\).21) Through such an analysis of the posterior distribution, it is possible to make a measurement plan, such as the number of measurements and the measurement method, to satisfy the required estimation accuracy.21) Such an expansion of the variation χ of the marginalized posterior distribution is presumed to occur via the indefinite estimation parameter as a result of the effective model \(H_{3}\) having an excessive expression power. In particular, the fact that the marginalized posterior density began to spread at \(\Delta=10.08\) and \(\sigma_{\text{noise}}=0.0046\), where the peak number was changed, suggests that the effective model \(H_{3}\) has excessive expression capability for the two-peak spectrum. Information about the estimation accuracy of parameters or the expression capability of the effective model is difficult to obtain by the conventional methods of analysis such as the core-level XPS analysis method using manual tuning and spectral deconvolution using a simple fitting method.
In this study, we adopted the simplified cluster model as the effective model of core-level \(3d\) XPS spectra. This effective model does not take into account the spin–orbit interaction, the multiplet effect, or the crystal field effect. It is generally too simplistic to explain the actual measurement spectra of core-level XPS. However, if the spectrum intensity model is generated by Fermi's golden rule, our proposed method can easily replace the internal model with another model. For example, except for the difficulty related to increasing the number of parameters, the effective Hamiltonian used in this study can be easily replaced with a model that takes into account the spin–orbit interaction, the multiplet effect, and the crystal field effect. This capability means that we can apply our proposed method to a wider range of actual observed XPS spectra by replacing the effective model with a cluster model that focuses more on interactions or with the impurity Anderson model. In the impurity Anderson model, the effect of the band structure of conducting electrons is concerned with the band structure of conducting electrons. Thus, it is suggested that the framework of the proposed method has wide applicability to actual measurement data of core-level XPS. Also, the analysis of emulated measurement data in the intermediate state can also be realized by the analysis of the actual spectra that have the same kind of spectral structure for each other.
7. Summary
By incorporating the effective Hamiltonian into the stochastic model of spectral deconvolution, we developed a Bayesian spectral deconvolution method for core-level XPS to realize the automatic analysis of core-level XPS spectra. By applying our proposed method to the emulated \(3d\) core-level XPS spectra of La2O3 and CeO2, it was confirmed that our proposed method selects effective Hamiltonians that are consistent with knowledge obtained from the conventional study of physics. We also applied our proposed method to spectra that are the intermediate states of a seamless transition from the CeO2 spectrum (three peaks) to the La2O3 spectrum (two peaks). As a result, it was confirmed that the proposed method selects an effective Hamiltonian on the basis of not only the information about the peak number but also other information contained in the effective Hamiltonian. This cannot be realized by existing Bayesian spectral deconvolution methods applicable to XPS spectra, which select the model on the basis of the peak number.6,8) Our proposed method also enables the parameter estimation of the effective model using the posterior distribution of its parameter. Using the MAP or MPM methods, we were able to estimate the true parameters of the generative model \(\mathcal{H}\) from the posterior distributions. Furthermore, using the posterior distribution, we were able to evaluate the parameter estimation accuracy or obtain information about the properties of the effective model for the spectrum. This capability of our proposed method can yield information for scientific discussion, e.g., the detection limit,21) or the improvement of the effective Hamiltonian using the observed data. In conventional analysis methods, such as those using manual tuning or a simple fitting technique, such information cannot be obtained. It is also suggested through our discussion that the framework of the proposed method has wide applicability to actual measurement data of core-level XPS. We expect that our proposed method will pave the way for the highly quantitative analysis of core-level XPS spectra.
Acknowledgments
This work was supported by the Cross-ministerial Strategic Innovation Promotion Program (SIP) “Structural Materials for Innovation” (Funding agency: JST) and JST CREST (JPMJCR1761, JPMJCR1861).
References
1 J. Kanamori and A. Kotani, Proc. 10th Taniguchi Int. Symp., 1987, p. 81. 10.1007/978-3-642-83437-0 Crossref, Google Scholar
2 A. Kotani and Y. Toyozawa, J. Phys. Soc. Jpn. 37, 912 (1974). 10.1143/JPSJ.37.912 Link, Google Scholar
3 A. Kotani, H. Mizuta, T. Jo, and J. C. Parlebas, Solid State Commun. 53, 805 (1985). 10.1016/0038-1098(85)90223-6 Crossref, Google Scholar
4 A. Kotani, M. Okada, T. Jo, A. Bianconi, A. Marcelli, and J. C. Parlebas, J. Phys. Soc. Jpn. 56, 798 (1987). 10.1143/JPSJ.56.798 Link, Google Scholar
5 F. M. F. de Groot and A. Kotani, Core Level Spectroscopy of Solids (CRC Press, London, 2008). Crossref, Google Scholar
6 K. Nagata, S. Sugita, and M. Okada, Neural Networks 28, 82 (2012). 10.1016/j.neunet.2011.12.001 Crossref, Google Scholar
7 K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. 65, 1604 (1996). 10.1143/JPSJ.65.1604 Link, Google Scholar
8 S. Tokuda, K. Nagata, and M. Okada, J. Phys. Soc. Jpn. 86, 024001 (2017). 10.7566/JPSJ.86.024001 Link, Google Scholar
9 S. Murata, K. Nagata, M. Uemura, and M. Okada, J. Phys. Soc. Jpn. 85, 104003 (2016). 10.7566/JPSJ.85.104003 Link, Google Scholar
10 Y. Mototake, Y. Igarashi, H. Takenaka, K. Nagata, and M. Okada, J. Phys. Soc. Jpn. 87, 114004 (2018). 10.7566/JPSJ.87.114004 Link, Google Scholar
11 T. Kasai, K. Nagata, M. Okada, and T. Kigawa, J. Phys.: Conf. Ser. 699, 012003 (2016). 10.1088/1742-6596/699/1/012003 Crossref, Google Scholar
12 K. Iwamitsu, S. Aihara, M. Okada, and I. Akai, J. Phys. Soc. Jpn. 85, 094716 (2016). 10.7566/JPSJ.85.094716 Link, Google Scholar
13 P. K. Hong, H. Miyamoto, T. Niihara, S. Sugita, K. Nagata, J. M. Dohm, and M. Okada, J. Geol. Geophys. 5, 243 (2016). 10.4172/2381-8719.1000243 Crossref, Google Scholar
14 F. M. F. de Groot, J. C. Fuggle, B. T. Thole, and G. A. Sawatzky, Phys. Rev. B 42, 5459 (1990). 10.1103/PhysRevB.42.5459 Crossref, Google Scholar
15 O. Gunnarsson and K. Schönhammer, Phys. Rev. B 28, 4315 (1983). 10.1103/PhysRevB.28.4315 Crossref, Google Scholar
16 T. Jo and A. Kotani, J. Phys. Soc. Jpn. 57, 2288 (1988). 10.1143/JPSJ.57.2288 Link, Google Scholar
17 N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem. Phys. 21, 1087 (1953). 10.1063/1.1699114 Crossref, Google Scholar
18 W. K. Hastings, Biometrika 57, 97 (1970). 10.1093/biomet/57.1.97 Crossref, Google Scholar
19 K. Nagata and S. Watanabe, Neural Networks 21, 980 (2008). 10.1016/j.neunet.2007.11.002 Crossref, Google Scholar
20 D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization (Wiley, New York, 1992). Crossref, Google Scholar