1 Coupled Space Learning of Image Style ation Dahua Lin Department of Information Engineering The Chinese University of Hong Kong Xiaoou Tang Microsof...

Author:
Alexis O’Brien’

0 downloads 16 Views 867KB Size

Abstract In this paper, we present a new learning framework for image style transforms. Considering that the images in different style representations constitute different vector spaces, we propose a novel framework called Coupled Space Learning to learn the relations between different spaces and use them to infer the images from one style to another style. Observing that for each style, only the components correlated to the space of the target style are useful for inference, we ﬁrst develop the Correlative Component Analysis to pursue the embedded hidden subspaces that best preserve the inter-space correlation information. Then we develop the Coupled Bidirectional Transform algorithm to estimate the transforms between the two embedded spaces, where the coupling between the forward transform and the backward transform is explicitly taken into account. To enhance the capability of modelling complex data, we further develop the Coupled Gaussian Mixture Model to generalize our framework to a mixture-model architecture. The effectiveness of the framework is demonstrated in the applications including face super-resolution and bidirectional portrait style transforms.

1. Introduction In recent years, transformation between image style representations becomes an active research topic in computer vision. Representative works on style-transforms include image hallucination[3][1][6] and non-photorealistic rendering[8][4]. Different from conventional approaches where different types of transforms are treated separately. In this paper, we study different transform tasks in a uniﬁed perspective and develop a new learning framework to improve the quality of the resultant images. In statistical learning, each image can be represented by a vector and thus the images in a certain style form a vector space. Under this formulation, the relation between two

Xiaoou Tang Microsoft Research Asia Beijing, China [email protected]

image styles can be seen as the relation between two vector spaces associated with the two image styles. In the literature, a series of statistical learning approaches have been proposed to model the image space. Among these methods, the most well known one is PCA[10], which ﬁnds a principal subspace where the variational energy is maximized. However, PCA is aimed at modelling a single sample space with the goal of best reconstruction, thus cannot be directly applied to model the inter-space dependencies. Some works have been done to extend the conventional PCA models to learn the dependency between two sample spaces. There are mainly two families of methods: one is to establish a subspace model in the joint space of two vector spaces[9]; the other is to learn the relation between two principal spaces: a representative method in this family is the eigentransformation[8] method, which learns the relationship between photo-space and sketch-space by transferring the synthesis coefﬁcients obtained by PCA. One important drawback in these two families of methods is that some important correlative information, which is not necessarily signiﬁcant in reconstruction, may be lost in the stage of projecting the sample to the principal subspace learned individually, this is because the learning of these individual spaces does not take the correlation between the two spaces into account. To address the issue, Fernando et al. developed the Asymmetric Coupled Component Analysis(ACCA)[2] where hidden parameter space is made explicit to serve as a bridge coupling the two spaces. In ACCA, though the coupling is explicitly accounted for, however, as shown later, its simple formulation does not fully reﬂect the essence of coupling and lacks the capability of modelling complex dependencies. In this paper, we propose a novel framework called Coupled Space Learning to learn the dependency between two vector spaces, with each space corresponding to one image style. The core of our framework is to couple the learning process of the forward and the backward transforms. Observing that only the components that are correlated to the other vector space contribute to the inference, we derive the Maximum Correlation Criteria and develop a new

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

algorithm called Correlative Component Analysis to pursue the hidden spaces associated with the two representative spaces so that the correlative information is best preserved. Then the Coupled Bidirectional Transforms algorithm is developed to learn the bidirectional transforms between two hidden spaces in a coupled manner where the relation between the transforms in the two opposite directions are explicitly taken into account. The coupling between the forward transform and backward transform are gradually established through repeated information exchange between the two transforms. To further enhance the framework’s capability of modelling complex data, we generalize our framework to a mixture-model architecture, called Coupled Gaussian Mixture Model, where GMM for both spaces are jointly trained. The system consists of multiple models, in the training phase each model is adapt to a part of the samples and in the testing phase for a new sample, the results produced by these models are fused together by a weighting scheme using model-posteriori as weights. Our framework is of broad interest in the realm of computer vision. To illustrate the effectiveness of the framework, we conduct comparative experiments in styletransform applications including face super-resolution and bidirectional transforms between portrait styles. In the rest of the paper, we ﬁrst present the theoretical principle and algorithms for Coupled Space Learning in section 2. In section 3, we generalize our model to a mixture-model system with Coupled GMM. Experiments and their results are introduced in section 4. Finally, we conclude the paper and propose future work in section 5.

2. Coupled Space Learning 2.1. Framework of Coupled Modelling Suppose we have a set of visual objects, denoted as a1 , a2 , . . . , an , here n is the number of objects. For each object, it can be expressed by images in different styles, such as photos and nonphotorealistic paintings. In each style, the objects are represented in vectors. The vectors in the ﬁrst style constitute a sample space X , denoted as x1 , x2 , . . . , xn ; likewise, the vector space and vectors in the second style are denoted as Y and y1 , y2 , . . . , yn . It should be noted that the space dimension for different representation is not necessarily equal, we denote the dimensions of the two representation spaces as dx and dy respectively. Considering that both sample spaces are associated to the same object space, it is reasonable to assume that there is an intrinsic hidden space H reﬂecting the variations which the visual objects inherently bear and the observed spaces are some transformed versions of the hidden space, which is the fundamental principle in coupled learning.

Projection T X

P

Transform Hidden Subspace

X

A

PY

Y

V

U

PX Reconstruction

Reconstruction Hidden Subspace

T Y

B

P

Transform

Projection

Figure 1. Illustration of Harmony Coupled Learning Framework

Denote the vectors in hidden space as h, the transforms from hidden spaces to observed spaces as TX and TY x = TX h + mx

y = TY h + my .

(1)

Here, mx and my are mean vectors of x and y. Assume the dimension of hidden space is d, then TX is a dx × d matrix while TY is a dy × d matrix. To investigate the composition of the transform, we perform compact SVD T and TY = UY DY VYT . on them as TX = UX DX VX Here UX is a dx × d matrix, and UY is a dy × d matrix, while DX , DY , VX , VY are all d × d matrices. Considering Eq.(1), we have T h UTY (y − my ) = DY VYT h. UTX (x − mx ) = DX VX (2) This equation can be interpreted as follows: orthonormal UX projects the dx -dimensional vector x − mx to a subspace of equal dimension to H, which is actually H’s rotated and scaled version, denoted as U. Similar interpretation can be applied to y, where the embedded subspace is denoted as V. Base on this interpretation, we see that there are two d-dimensional embedded spaces associated with X and Y, which are related to the H with rotation and scaling. It further follows that the two embedded subspaces are related to each other with rotations and scaling. To clearly emphasize the projection role of UX and UY , we denote them as PX and PY . Based on this rationale, we design a three-level framework for Coupled Space Learning (CSL) as illustrated in Figure.1. where the connection between U and V is established through d × d transform matrices A and B. In mathematics, the whole transform procedure can be represented as follows:

y − my = PY APTX (x − mx ), x − mx = PX BPTY (y − my ).

(3) (4)

It is worthwhile to emphasize the following points concerning the design of the framework: 1. Under this formulation, given the relation between x and y as linear, why don’t we directly use the form y = Ax and x = By? As mentioned above, the fundamental concept in coupled learning is the hidden space, and the PX

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Correlated Variations

X

Independent Variations

Independent Variations

Y

Correlated Variations

Figure 2. Illustration of Components Decomposition and PY in Eq.(3) and Eq.(4) indeed embody this concept and enforce it as a structural constraint in transforms. 2. Why don’t we directly focus on TX , TY and H, but decompose the process into PX , PY , A, and B? This is mainly for the sake of computational efﬁciency. Because directly solving transforms between two spaces of different dimensions is difﬁcult and unstable, especially when dx and dy is large, it is desirable to ﬁrst learn a projection to project the vectors to subspaces with equal dimensions and than solve the transform within the subspaces.

2.2. Correlative Component Analysis As we know, the essence of coupling comes from the statistical dependencies between two sample spaces, which is also the foundation for bidirectional inference. In a linear model, under Gaussian assumption, statistical dependency is equivalent to correlation, where uncorrelated components provide no information for prediction of each other. Concretely, as illustrated in Figure.2, for each space, it can be decomposed into two subspaces, one preserves correlative information for intra-space communication, while the other only captures independent variations special to the space itself. Only the former contributes to the inter-space inference. Suppose x ∼ N (mx , Cx ) and y ∼ N (my , Cy ), for a component in X characterized by projection direction px and a component in Y projected by py , their correlation can be measured in terms of covariance as E (pTx (x − mx ))(pTy (y − my ))T = pTx Cxy py . (5) Here, Cxy = E (x − mx )(y − my )T is the covariance matrix between x and y. Considering that Cxy is not a semideﬁnite matrix, thus the value of the covariance matrices can be negative, however, only the magnitude but not the sign of the value represents the intensity of the correlation. For mathematical tractability, we use the square of covariance value as Correlation Intensity: CI(px , py ) = (pTx Cxy .py )2 .

(6)

For a set of components obtained by projection matrices Px and Py , their covariance is a generalization of Eq.5 as E (PTX (x − mx ))(PTY (y − my ))T = PTX Cxy PY . (7)

By taking all components as a whole, the total correlation intensity can be derived as CI(PX , PY ) = tr (PTX Cxy PY )(PTX Cxy PY )T (8) = tr PTX Cxy PY PTY Cyx PX T T = tr PY Cyx PX PX Cxy PY . (9) Here, Cyx = E (y − my )(x − mx )T = CTxy is the covariance matrix between y and x. Given training sets as [x1 , x2 , . . . , xn ] and ¯ [y1 , y2 , . . . , yn ], and denote their sample mean as x ¯ , then we can arrange the mean-offset samples into and y = [x1 − x ¯ , x2 − x ¯ , . . . , xn − x ¯ ] and two matrices as X = [y1 − y ¯ , y2 − y ¯ , . . . , yn − y ¯ ], thus the maximum Y likelihood estimation of covariance matrices[5] can be Y T and Cyx = 1 Y T written as Cxy = n1 X n X To facilitate further analysis of the relation between the two spaces, it is desirable to pursue the subspaces which best preserve the correlative information, hence we derive the Maximum Correlation Criteria for learning two correlative subspaces as follows: (PX , PY ) = argmax CI(PX , PY ). PX ,PY

(10)

Here,

Y T PY PTY Y X T PX , (11) CI(PX , PY ) = tr PTX X X T PX PT X T = tr PTY Y X Y PY . (12)

To optimize the Maximum Correlation Criteria, we develop an algorithm called Correlative Component Analysis (CCA), which pursues optimal PX and PY alternately. The procedure is described in Table.1: (0) 1. Initialize P(0) X and PY to be identity matrices. 2. Repeat the following steps, at the t-th step: e e T (t−1) P(t−1)T Y eX eT (a) Compute S(t) X = XY PY Y (t) T (t) (b) Update PX by PX = argmaxPX PX SX PX e e T (t) (t)T e e T (c) Compute S(t) Y = Y X PX PX XY (t) (d) Update PY by PY = argmaxPY PTY S(t) Y PY (e) Compute the objective function C (t) by Eq.10. 3. Stop and exit when C (t) − C (t−1) < ε.

Table 1. Training process of CCA Note that for a positive semideﬁnite matrix S, argmaxP tr(PT SP) can be obtained by performing eigenvalue-eigenvector analysis on S, and takes the d eigenvectors associated with largest eigenvalues as the column vectors of P. Discussion 1. As in Eq.11 and Eq.12, their equivalence elegantly reﬂects the duality of the two spaces.

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

BAu rd

B

u

1st term B

2nd term

A

Bv

coupling relationship, which can be measured in terms of ﬁdelity of reconstruction. Based on this rationale, the objective function to be minimized can be written as

Au

A

3 term

v 4th term ABv

Figure 3. Illustration of Objectives of Bidirectional

J(A, B) =

i=1

+||ui − BAui ||2 + ||vi − ABvi ||2 .

Transform BAu

v

Au

ABv

u (BAu)

Without Considering Coupling

(15)

Denote U = [u1 , u2 , . . . , un ] and V = [v1 , v2 , . . . , vn ], then Eq.15 can be rewritten as

Bv

B

Bv

A

u

n ||vi − Aui ||2 + ||ui − Bvi ||2

v (ABv)

J(A, B) = ||V − AU||2F + ||U − BV||2F

Au

+||U − BAU||2F + ||V − ABV||2F .

Considered Coupling

Figure 4. Illustration of Signiﬁcance of Coupling 2. When PX or PY is ﬁxed, the objective function is convex w.r.t PY or PX , thus the updating achieves global optimal PY or PX with the other matrix ﬁxed. Moreover, we can see that due to the equivalence of two forms of the objective function, step.2.2 and step.2.4 is actually optimizing the same objective, and the procedure is thus guaranteed to converge. 3. Intuitively, the Correlative Component Analysis algorithm embodies a Negotiation Mechanism: in each iteration, both spaces convey the information of themselves through the projection matrices, and adjust their subspace projection to cater for the other part’s need. In this procedure the commonality between two subspaces is gradually ampliﬁed via continuous conversation between the two parts.

2.3. Coupled Bidirectional Transform When the two hidden subspaces are established by Correlative Component Analysis, we can learn the bidirectional transform between the two spaces. Here we denote the vectors in the two hidden spaces as [u1 , u2 , . . . , un ] and [v1 , v2 , . . . , vn ] respectively, which can be computed as follows ¯ ), ui = PTX (xi − x ¯ ). vi = PTY (yi − y

(13) (14)

Before we derive the objective function, it is worthwhile to analyze the goal of learning. Different from conventional unidirectional model where transform accuracy is the chief aim, in a bidirectional model the relation between forward transform and backward transform should be taken into account. In ideal case, they should be inverse process of each other. Therefore, there are two goals in learning the pair of transforms: the ﬁrst goal is the accuracy of transform as in unidirectional models, while the second goal is their

(16)

As illustrated in Figure 3, the 1st term and the 2nd term of Eq.16 correspond to the goal of transform accuracy, while the other 2 terms correspond to the ﬁdelity of coupling. The objective function is nonlinear with respect to A and B, and it has no analytic solution. One approach is to employ gradient-based numerical optimization, the derivatives of the objective w.r.t A and B are deduced as follows ∂J = −2VUT + 2AUUT − 2BT UUT − 2VVT BT ∂A +2BT BAUUT + 2ABVVT BT , (17) ∂J = −2UVT + 2BVVT − 2AT VVT − 2UUT AT ∂B +2AT ABVVT + 2BAUUT AT . (18) A drawback of traditional optimization method is that it is computationally expensive and the convergence is slow. To enhance the efﬁciency of optimization, we develop a novel algorithm to learn the Coupled Bidirectional Transform (CBT) as described in Table.2

1. Initialize A and B by linear regression: A(0) = argmin ||V − AU||2F = (VUT )(UUT )−1 , (19) A

B

(0)

= argmin ||U − BV||2F = (UVT )(VVT )−1 . (20) B

2. Iterate the following steps until the change of objective is below some speciﬁed threshold: (a) Backward transform variables [v1 , v2 , . . . , vn ] by B(t−1) . Then we can form the augmented sample matrices Uaug = [U, B(t−1) V], and Vaug = [V, V], then update A as A(t) = argminA ||Vaug − AUaug ||2F (b) Forward transform variables [u1 , u2 , . . . , un ] by A(t) . Then we can form the augmented sample matrices Uaug = [U, U], and Vaug = [V, A(t) U], then update B as B(t) = argminB ||Uaug − BVaug ||2F

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Table 2. Training process of CBT

Discussion 1. The goal of the transform accuracy and that of the reconstruction ﬁdelity are not equivalent. This is clearly illustrated in Figure.4, it can be easily seen that although the transform accuracy of the left one equals the right one, however the right one achieves much higher reconstruction ﬁdelity and thus is more preferable, which indicates the signiﬁcance of explicit considering reconstruction ﬁdelity in learning bidirectional transforms. 2. As in correlative component analysis, the process of learning the coupled bidirectional transforms also reﬂects the “negotiation mechanism”, where the forward transform A conveys the information about it by augmenting the training set of B with the forward transformed variables AU, so that the optimization of B will take the information from A into account. The principle is similar for backward transform. Therefore, the “augmented set” plays an crucial role for exchanging information between the two transforms, and through repeated communication, the two transforms are ﬁnally coupled together and a well equilibrium is achieved between transform accuracy and reconstruction ﬁdelity.

2.4. Procedure of Coupled Space Learning Here, we summarize the whole procedure for coupled space learning. In training stage, the process can be brieﬂy described as follows:

1. Compute the mean vectors mx and my , covariance matrices Cx , Cy , Cxy and Cyx for both spaces. 2. Learn the two hidden subspaces PX and PY by Correlative Component Analysis. and Y onto 3. Project the mean-offset samples X hidden spaces to obtain U and V. 4. Learn the bidirectional transforms between two U and V using Coupled Bidirectional Transforms algorithm. Table 3. The whole training procedure of CSL In testing stage, for arbitrary new sample u or v, we can infer corresponding v or u following Eq.3 or Eq.4.

3. Generalization to Mixture Model 3.1. Coupled Gaussian Mixture Model Due to the complexity of the real data, one single linear model is often not enough to capture all the aspects of variations and dependencies. Motivated by the successful application of Gaussian Mixture Model(GMM)[13] in many practical problems, we develop Coupled Gaussian Mixture Model (CGMM) which effectively integrates GMM and Coupled Space Learning. The fundamental difference between CGMM and GMM is that the pair of samples ui and vi should be dealt with as a whole instead of being handled individually. In CGMM, suppose we have

K models, denoted as M1 , M2 , . . . , MK ; then the probability of a sample-pair (ui , vi ) conditioning on the k-th model is p(ui , vi |Mk ) = p(ui |muk , Σuk )p(vi |mvk , Σvk ).

(21)

Here muk , Σuk and mvk , Σvk are the mean vectors and covariance matrices of the samples belonging to Mk in hidden space U and V respectively.

3.2. Optimization by EM Algorithm With the deﬁnition of coupled conditional probability, the CGMM can be learned by Expectation-Maximization algorithm similar to that in GMM[13]. The procedure is described as follows:

1. Initialize CGMM by Random Clustering: (a) Randomly select K pairs of samples as (0) (0) cluster centers, denoted as mu1 , . . . , muK and (0) (0) mv1 , . . . , mvK , which are also the initial estimation of mean vectors for CGMM. (b) For each pair of samples(ui , vi ), categorize it to the cluster where the cluster center is closest. The (0) distance is simply deﬁned as dik = ||ui − muk ||2 + (0) 2 ||vi − mvk || (c) Compute the covariance matrices in clusters (0) (0) (0) (0) Σu1 , . . . , ΣuK and Σv1 , . . . , ΣvK as initial estimation of covariance matrices. (d) Initialize the prior probability for all models to 1 be the same, i.e. P (0) (M1 ) = · · · = P (0) (MK ) = K 2. Update the CGMM by iterating the following steps: (a) Compute the probability of every training sample-pair belonging to the k-th model as P (t−1) (Mk )p(ui , vi |Mk ) . wik = PK (t−1) (M )p(u , v |M ) j i i j j=1 P

(22)

The calculation of conditional probability follows Eq.21 using the mean vectors and covariance matrices computed in the (t − 1)-th step. (b) Update the priori of models as P (23) P (Mk ) = n1 n i=1 wik . (c) Update the mean vectors and covariance matrices as follows: Pn (t) 1 (24) muk = nP (M i=1 wik ui , k) Pn (t) 1 (25) mvk = nP (M ) i=1 wik vi , k

(t)

Σuk =

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

(t)

Σvk =

1 nP (Mk )

Pn

1 nP (Mk )

i=1

Pn

wij (ui − muk )(ui − muk )T , (26)

i=1

wij (vi − mvk )(vi − mvk )T . (27)

Training process of CGMM

P(M1|u) Cluster 1

M1

Cluster 2

M2

P(M2|u) x

Project

u

x

v2 P(M3|u)

Cluster 3

x

v1

M3

+ +

v +

x

v3

U space

Figure 5. Illustration of the CGMM Inference Process

M1

M1 M2

M2 M3

M3 M4

M4 M5

M5 M6

M6 M7

M7 M8

M8 M9

M9

Figure 6. Illustration of Patch-based Models

Figure

7. Overlapped Patch Partition and Weighted Pixel Synthesis

The CGMM is trained on U and V after the hidden subspaces are obtained. After the CGMM is trained, the bidirectional transforms are learned for every model, denoted as A1 , . . . , AK and B1 , . . . , BK . For a new sample x it is ﬁrst projected to U to obtain u, then v can be computed as v=

K X

P (Mk |u)(Ak u).

(28)

k=1

Here the posteriori can be calculated as p(u|Mk )p(Mk ) . P (Mk |u) = PK j=1 p(u|Mj )p(Mj )

learn the dependencies within these patches. As illustrated in ﬁgure 6, in our patch-based approach, we train different models for patches in different positions. The patch-based strategy brings us two-fold merits: 1)The vector space dimensions for each model is much lower, thus both robustness and efﬁciency will be enhanced; 2)Since each model focuses on a small region, more subtle details can be captured in the model. Since patches in different positions are independently modelled, the continuity in the patch-boundary cannot be guaranteed. To enhance the smoothness of the whole image and reduce the artifacts incurred by inter-patch discontinuities, we design a scheme (illustrated in ﬁgure 7), where adjacent patches are overlapped and the value of each pixel is weighed sum of values in synthesized patches covering that pixel. The weights of a patch on each pixel is attenuated softly as the distance of the pixel to the center of the patch increases. Specially, we employ an exponential function to describe the attenuation of the weights as w(r) = exp(−r 2/σ 2 ), where r is the distance of a pixel to the patch-center, σ controls the speed of attenuation. The whole procedure of image-transform can be described as follows: Step 1. For an input image, divide it into overlapped patches. The partition scheme should follow that in the training stage. Step 2. For each patch, infer its corresponding part using the Coupled Space Learning Model and Coupled GMM Model as introduced in previous sections. Step 3. Use the weighed-sum scheme to combine all patches synthesized to form the entire image.

5. Experiments In this section, we test the framework in two applications: face super-resolution and portrait style transforms.

5.1. Face Super-resolution (29)

This inference process is illustrated in ﬁgure 5.

4. Integrated Framework for Image Style Transform In this section we introduce the framework integrating the coupled space learning algorithm with image analysis and synthesis. Due to the complexity and high dimensionality of image, directly modelling the whole image sample space is very difﬁcult and inefﬁcient. However, the images associated with one visual object such as face maintain a stable global structure over the image, which will not change notably when the style changes. For example, whatever style you employ to express a face, the eyes, nose and mouth remain in all the styles and correspondence can be established between the same facial components in different image styles. Moreover, inter-pixel dependency is believed to consist within a neighboring region but not the whole image. Based on these rationales, we can partition the images into patches, and

Super-resolution technique is to infer the high-resolution image based on a given low-resolution image in order to restore the details of the image. There are mainly two families of superresolution methods: reconstruction-based and learning-based. Recently, learning-based approaches become more popular due to its capability of utilizing the prior knowledge in the inference, which is shown to play a crucial role, especially for face super-resolution. Baker. et al[1] propose the Gradient Prior Prediction algorithm with an MAP framework incorporated. Liu et al.[6] develop a framework integrating a global parametric linear model and a local patch-based Markov Random Field. Wang and Tang propose to use eigentransformation for inferring high-resolution faces from low-resolution ones [12][11]. Though face super-resolution seems to be a unidirectional process, however due to the fact that the super-resolution process is related to its inverse process: down sampling, thus we believe that it can be beneﬁt from coupled learning. We conduct our experiments on the FERET database[7], where 1276 images are selected into the training set while another 1272 images are selected into the testing set. Each image is preprocessed by afﬁne transform to ﬁx the positions of eyes and mouth

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

center and cropped to size of 96 × 120 as high-resolution image. Then each one is low-pass-ﬁltered and down-sampled to size of 24 × 30 as low-resolution image. Models are then trained on the 1276 pairs of images. Here, we employ the patch-based strategy, each image is divided into 11 × 11 overlapped patches, the size of patches in the high-resolution images is 16 × 20 while the size of patches in the low-resolution images is 4 × 5. In testing, a low-resolution image is input and its high-resolution counter-part is inferred by the algorithms in testing. We compare the experimental results by our framework and those produced by other state-of-the-art algorithms in ﬁgure 8. It can be seen from the results that the quality of high-resolution images obtained by our Coupled Space Learning(CSL) framework is better than other algorithms. Moreover, the Coupled GMM further reﬁnes the details of the image and reduces the artifacts. An objective evaluation of all these algorithms is shown in the following table in terms of mean square error compared to original high resolution image, which shows that the resultant images of Coupled Space Learning approximates the desired high resolution image better. Algorithm MSE Algorithm MSE

B-Spline 0.00813 C.Liu 0.00152

Baker 0.00290 CSL 0.00094

Eigen Trans. 0.00243 CSL + CGMM 0.00047

Table 4. Comparison of reconstruction errors of different methods

5.2. Portrait Style Transforms The transforms between different art styles are mainly investigated in Computer Graphics. The most representative work is the image analogies[4] which use graphics techniques to ﬁlter the image so that it takes on desirable artistic effect. Tang and Wang[8] consider the problem as a learning problem and derive an eigentransformation method to transform photos to sketches for recognition. In our experiments, we apply the CSL framework to learn the transforms between two image styles for portrait. The face images in the FERET database[7] are divided into a training set with 1276 samples and a testing set with 1272 samples. The images are normalized to size of 96 × 120 and partitioned into 11 × 11 patches of size 16 × 20. The relationship between real portraits and PosterEdge-Style images and that between real portraits and HalfTone images are respectively learned in training stage. Figure 9 illustrates the results of forward and backward transforms between real photos and PosterEdge-Style renderings, while ﬁgure 10 illustrates the results for real photos and images with halftone effects. The results show the good performance of our framework in the application of style transforms.

Component Analysis and Bidirectional Transforms are integrated to learn the relation in a coupled manner. We further develop a Coupled GMM model to enhance the framework’s capability of modelling data under complex distribution by adapting each model to a part of samples and fuse them together. Experiments in face super-resolution and portrait style transforms clearly demonstrate the effectiveness of the framework.

Acknowledgement The work described in this paper was fully supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region (4190/01E and N CUHK409/03). The work was conducted at the Chinese University of Hong Kong.

References [1] S. Baker and T. Kanade. Limits on super-resolution and how to break them. IEEE Trans. on PAMI, pages 1167–1183, 2002. [2] F. de la Torre and M. J. Black. Dynamic coupled component analysis. Proc. of CVPR’01, pages 643–650, 2001. [3] W. T. Freeman and E. C. Pastor. Learning low-level vision. Proc. of ICCV’99, pages 1182–1189, 1999. [4] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. Proc. of SIGGRAPH’01, pages 687–694, 2001. [5] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Pearson Education, Inc., 5th edition, 2003. [6] C. Liu, H. Shum, and C. S. Zhang. A two-step approach to hallucinating faces: Global parametric model and local nonparametric model. Proc. of CVPR’01, pages 192–198, 2001. [7] P. J. Philips, H. Moon, S. A. Ryzvi, and P. J. Rauss. The feret evaluation methodology for face-recognition algorithms. IEEE Trans. on PAMI, 12(10):1090–1104, 2000. [8] X. Tang and X. Wang. Face sketch synthesis and recognition. Proc. of ICCV’03, pages 687–694, 2003. [9] T.Cootes, G.Edwards, and C.Taylor. Active appearance mode. Proc. of ECCV’98, pages 484–498, 1998. [10] M. Turk and A. Pentland. Face recognition using eigenfaces. Proc. of CVPR’91, pages 586–591, 1991. [11] X. Wang and X. Tang. Face hallucination and recognition. Proc. of AVBPA’03, pages 486–494, 2003. [12] X. Wang and X. Tang. Face hallucination by eigentransformation. IEEE Trans. on Systems, Man and Cybernetics-Part C, Special issues on Biometrics Systems, 35(3), Aug 2004. [13] A. Webb. Statistical Pattern Recognition. John Wiley & Sons Ltd., 2nd edition, 2002.

6. Conclusion In this paper, we have proposed a new framework to learn the dependency between two associated vector space. Correlative

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Input Cubic LowB-Spline Resolution

Baker’s Method

Eigentransform

Ce Liu’s Method

CSL

CSL +CGMM

Original HighResolution

Figure 8. Results of face hallucination

Input

EigenTransform

CSL

CSL +CGMM

Groundtruth

Input

EigenTransform

CSL

CSL +CGMM

Groundtruth

Figure 9. Results of bidirectional transforms between real

Figure 10. Results of bidirectional transforms between

photos and PosterEdge-Renderings

real photos and Halftone-images

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & Close