Improving Color Image Segmentation by Spatial-Color Pixel Clustering Henryk Palus and Mariusz Frackiewicz Silesian University of Technology, ul. Akademicka 16, Gliwice, Poland ABSTRACT Image segmentation is one of the most difficult steps in the computer vision process. Pixel clustering is only one among many techniques used in image segmentation. In this paper is proposed a new segmentation technique, making clustering in the five-dimensional feature space built from three color components and two spatial coordinates. The advantages of taking into account the information about the image structure in pixel clustering are shown. The proposed 5D k-means technique requires, similarly to other segmentation techniques, an additional postprocessing to eliminate oversegmentation. Our approach is evaluated on different simple and complex images. Keywords: image segmentation, spatial-color pixel clustering
1. INTRODUCTION Color image segmentation techniques play a fundamental role in different machine vision systems. Image segmentation is based on partitioning of the image into homogeneous regions corresponding to objects located in the scene. The result of image segmentation is a much easier image (set of labeled regions), which however facilitates a further analysis. The regions separated during segmentation process meet certain homogeneity criteria, which may be based on color, grey level, texture etc. The growing computational capabilities of the computer equipment make possible the use more sophisticated segmentation techniques with additional pre-processing, e.g. denoising filtering, and postprocessing, e.g. region merging. However, there is no universal technique for color image segmentation. In recent years have been published the review works on color image segmentation techniques [1–3], as well as the first books devoted to this subject [4, 5]. Among the many image segmentation techniques we can find different pixel clustering techniques such as k-means [6], k-harmonic means [7], mean-shift [8] and others. These techniques belong to pixel-based techniques and do not use information about the structure of the processed image and thus are called spatial blind. The images dominant colors naturally form clusters in color space and clustering techniques can be considered as tools for unsupervised classification of hundreds of thousands and sometimes millions of image pixels. Further considerations will be limited to the classic clustering technique named k-means (KM ). KM is applied here for clustering color pixels in three-dimensional color space such RGB, CIELAB etc. and hence this technique will be marked as 3DKM. After adding two spatial pixel coordinates to the three color components of pixel we can cluster pixels in the five-dimensional space, hence the designation of this version is 5DKM. The details of such spatial-color pixel clustering are described below.
2. SPATIAL-COLOR PIXEL CLUSTERING The KM technique is one of the oldest [6], most popular and also fastest clustering techniques. This requires determining the number of clusters k and choosing their starting centers, which is an important limitation. The segmentation results using 3DKM technique significantly depend on the position of starting centers of clusters. This input data may be randomly selected from the colors occuring in the image. The pixels assigned to one Further authors information: Henryk Palus: E-mail:
[email protected], Telephone: +48 322372744 Mariusz Frackiewicz: E-mail:
[email protected], Telephone: +48 322371066
cluster generally belong to different regions of segmented image. Clustering may be performed in the RGB color space with using the Euclidean distance dRGB between two pixels: q 2 2 2 (1) dRGB = (R1 − R2 ) + (G1 − G2 ) + (B1 − B2 ) where: R1 , G1 , B1 - color components of a pixel with coordinates (X1 , Y1 ), R2 , G2 , B2 - color components of a pixel with coordinates (X2 , Y2 ) or color components in another color space, which is the result of converting the RGB space, for example, perceptually uniform CIELAB color space: q 2 2 2 (2) dLAB = (L1 − L2 ) + (a1 − a2 ) + (b1 − b2 ) where L is the luminance, a and b are the chrominance components of the image pixel. In order to take into account the pixel coordinates has been introduced a normalizing spatial weight factor [9]:
√
M ·N (3) m where M · N is the spatial resolution of the image and m is a parameter. KM needs an assessment of the similarity of image pixels to cluster centers. Simultaneous use of color components and pixel coordinates in the similarity formula is a problem due to the different ranges of variability. Application of SW factor allows to solve this kind of problem: s 2 2 (X1 − X2 ) (Y1 − Y2 ) 2 2 2 dRGBXY = (R1 − R2 ) + (G1 − G2 ) + (B1 − B2 ) + + (4) SW SW SW =
After simple transformations we get: s dRGBXY =
2
2
2
(R1 − R2 ) + (G1 − G2 ) + (B1 − B2 ) +
m(X1 − X2 ) √ M ·N
2
+
m(Y1 − Y2 ) √ M ·N
2 (5)
Formula (5) applies to both segmentation techniques; when m = 0 then we obtain the formula (1), useful in 3DKM. The open question remains, what should be the m parameter’s value that properly take into account both aspects: color and spatial.
3. POSTPROCESSING The use of clustering techniques in image segmentation requires in its final phase a region labeling. Often performs also postprocessing in order to remove oversegmentation, which may be the result of segmentation technique, poor selection of its parameters or a noise in segmented image. One of the most effective postprocessing methods is a removal of small regions from the image by merging them to neighboring regions. This task is simplified by the earlier labeling step, which usually generates a list of formed regions with information about pixel membership. A single pixel can contribute only one region. An area of region is expressed by the number of pixels making up the region. Thus, finding regions with an area smaller than some threshold A is not a difficult task.
4. QUALITY EVALUATION CRITERIA Evaluation of image segmentation results lacks both commonly accepted evaluation criteria and evaluation procedures. Objective methods for the evaluation of segmentation results, described in the classical work of Zhang [10], have been divided into analytical and experimental. However, since there is no general theory of image segmentation, the analytical methods are poorly developed. Experimental methods are dominated by two approaches. The first approach, named empirical goodness, does not require a reference segmented image and the evaluation is carried out in respect of original image. Examples of goodness measures can be homogeneity of regions and contrast between regions. In the second approach,
a discrepancy measure expressed as a difference between the segmented and reference image (ground truth) is computed. The reference image is an image manually segmented by the expert. Generation of such reference images is often problematic, because different people create different segmentations for the same image. The discrepancy measure may be based on a number of mis-segmented pixels, a position of mis-segmented pixels etc. Another form of evaluation of segmentation results is a subjective assessment carried out by an expert or group of experts. Additionally, in some cases, a final quality index of vision system can indicate a quality of segmentation, e.g. a recognition rate in the case of object recognition system. Borsotti et al. in the paper [11] proposed for segmentation evaluation an empirical quality function Q(I) and applied it to clustering-based segmentation techniques: " 2 # R √ X e2i R (Ai ) 1 R + (6) Q(I) = 10000 (M · N ) 1 + log Ai Ai i=1 where I - segmented image, M ·N - spatial resolution of the image, R - the number of regions in segmented image, Ai - the area of the region with index i, R(Ai ) - the number of regions with area equal to Ai and ei - the color error of region with index i. The error in RGB color space is calculated as the sum of the Euclidean distances between color components of region pixels and the components of average color, which is a color attribute of this region in segmented image. First term in (6) is a normalization factor, the second term penalizes the oversegmentation (results with too many regions), and the third term penalizes segmented image with non-homogeneous regions. Because the color error is greater for large regions, the last term is scaled by the surface area. The main idea of using this kind of function can be formulated as follows: the smaller the value of the evaluation function Q(I), the better will be the segmentation result.
5. EXPERIMENTAL TESTS To study the proposed segmentation methods we selected two groups of images. In the first group are relatively simple images (Fig.1) that show uniformly colored objects on a uniform background. The images were acquired in our laboratory conditions and their spatial resolution is 320x200 pixels. The second group (Fig.2) contains complex images depicting natural scenes and derived from the University of Berkeley image database [12] that is frequently used in studies on the image segmentation. The spatial resolution of these images is 481x321 pixels. All tests were performed on three simple and three complex test images.
(a)
(b)
(c)
Figure 1. Simple test images: a) Scene1, b) Scene2, c) Scene3
Both tested 3DKM and 5DKM techniques were initialized randomly by choosing random coordinates of pixels, which color components determine the initial cluster centers. In the case of 3DKM were used the color components only and in the case of 5DKM additionally the pixel coordinates. Random initialization required the series of drawings; in each image segmentation 10 random drawings for each cluster are adopted. The number of clusters for simple images (k = 8) was smaller than number of clusters for complex images (k = 32). The use of 5DKM technique required also a definition of values of m parameters. Fig.3 presents a relationship between the values of evaluation function Q(I) averaged over 10 random drawings and the value of m parameter in the
(a)
(b)
(c)
Figure 2. Complex test images [12]: a) #12003, b) #124084, c) #249061.
(a)
(b)
Figure 3. Segmentation quality vs m parameter value: a) simple images, b) complex images. Table 1. Values of criterion for simple images (k = 8) Simple images
Scene1
Scene2
Scene3
A R e Q(I) A R e Q(I Q(I) A R e Q(I)
3DKM
5DKM
3DKM+PP
5DKM+PP
0 4521 8 1438 0 4470 11 2419 0 439 7 2703
0 1422 9 425 0 980 11 667 0 388 8 1658
2200 7 10 76 1200 12 12 98 1600 49 9 10
2200 9 11 73 1200 8 12 83 1600 43 9 9
range from 0 to 150. On the basis of these test results the authors proposed for simple images m = 20 and for complex images m = 140. In both KM techniques was applied the same number of iterations equal to 20. The postprocessing described in the Section 3 and dependent on the threshold value A (small region area) is used in order to improve the segmentation results. The threshold value A should be selected according to the scene and cannot be greater than an area of region corresponding to the smallest object that should be segmented. Adopted values of A are included in the tables showing the results (Table 1 and Table 2).
6. CONCLUSION Results of the studies on 5DKM technique have shown that oversegmentation in segmented images is less than in the case of 3DKM technique. Similarly, in the case of 5DKM the value of evaluation funktion Q(I) is generally
Table 2. Values of criterion for complex images (k = 32) Complex images
#12003
#124084
#249061
A R e Q(I Q(I) A R e Q(I Q(I) A R e Q(I)
3DKM
5DKM
3DKM+PP
5DKM+PP
0 27337 12 347432 0 15041 9 40428 0 19583 8 101584
0 8356 19 6642 0 4241 17 1571 0 6324 12 2759
1500 47 34 833 1500 42 28 625 1500 27 20 590
1500 32 39 1345 1500 32 30 782 1500 31 21 528
also smaller. This suggests that the simultaneous inclusion of color and spatial information in the process of clustering improves obtained segmentation results. A similar approach using locally five-dimensional feature space is currently being developed in the form of superpixel SLIC technique [9]. Presented in the article, the spatial-color segmentation technique based on KM clustering in five-dimensional space, can be also developed for other clustering methods, particularly those which are generalization of KM, as e.g. KHM.
ACKNOWLEDGMENTS This work was supported by Polish Ministry for Science and Higher Education under internal grant BK265/RAu1/2014 for Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland.
REFERENCES [1] Vantaram, S. R. and Saber, E., “Survey of contemporary trends in color image segmentation,” Journal of Electronic Imaging 21(4), 040901–1–040901–28 (2012). [2] Palus, H., “Color image segmentation: selected techniques,” in [Color Image Processing: Methods and Applications], Lukac, R. and Plataniotis, K., eds., 103–108, CRC Press, Boca Raton, FA, USA (2006). [3] Cheng, H., Jiang, X., Sun, Y., and Wang, J., “Color image segmentation: advances and prospects,” Pattern Recognition 34(12), 2259–2281 (2001). [4] Zhang, Y.-L., [Advances in Image and Video Segmentation ], IRM Press, Hershey, PA, USA (2006). [5] Ho /ed./, P.-G. P., [Image Segmentation], InTech, Rijeka, Croatia (2011). [6] MacQueen, J., “Some methods for classification and analysis of multivariate observations,” in [Proceedings of the 5th Berkeley Symposium on Mathematics, Statistics, and Probabilities, Berkeley CA, USA], 281–297 (1967). [7] Zhang, B., Hsu, M., and Dayal, U., “K-harmonic means - data clustering algorithm,” Tech. Rep. TR HPL1999-124, Hewlett Packard Labs, Palo Alto, CA, USA (1999). [8] Comaniciu, D. and Meer, P., “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619 (May 2002). [9] Hsu, C.-Y. and Ding, J.-J., “Efficient image segmentation algorithm using SLIC superpixels and boundaryfocused region merging,” in [Proceedings of 9th International Conference on Information, Communications and Signal Processing (ICICS)], 1–5 (Tainan, Taiwan, 2013). [10] Zhang, Y. J., “A survey on evaluation methods for image segmentation,” Pattern Recognition 29(8), 1335– 1346 (1996). [11] Borsotti, M., Campadelli, P., and Schettini, R., “Quantitative evaluation of color image segmentation results,” Pattern Recognition Letters 19(8), 741–747 (1998). [12] Martin, D., Fowlkes, C., Tal, D., and Malik, J., “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in [Proceedings of the 8th International Conference on Computer Vision ], 416–423 (Vancouver, BC, Canada, 2001).