The bacon image is a spectroscopic image of the "HK American Bacon" slices. The image dimensions are 1344x953 pixels and 1024 channels. The wavelengths of the channels extend from 339 nm to 1010 nm. The exact wawelengths are tabulated in the file pekoni_lambda.txt.
One layer, corresponding the wavelenght of 501 nm, of the spectroscopic image is shown in Fig 1. Specgtra of some fat and some meat pixels is shown in Fig 2.![]() |
![]() |
| Fig 1: Bacon, 501 nm | Fig 2: The spectra |
|---|
Then I created a mask for four separate areas of the image, to show the classifier some pixels which are meat, some pixels which are fat and the pixels whose class I want to find. The rest of the pixels are not included in the classification. Therefore the meat, the fat and the unknown pixels forms the region on interest (ROI). The masks are shown in figure 3.
![]() |
![]() |
![]() |
| Fig 3: The 4 masks | Fig 4: 4 Segmentations | Fig 5: The classifier |
|---|
The sizes of the ROIs in the masks are about 600x350 pixels. I extracted the spectra of all pixels of all ROIs for classification. The pixels which are already classified as fat or meat by the masks, form the training sets. The training set contains thousands of samples of meat an fat spectra and more than 150 000 samples from unknown classes and needs to be assigned to meat or fat classes. I have also constructed a small random subset of the whole training set that can be used for testing. The details of the datasets are shown in following table:
| Type | Mask 1 | Mask 2 | Mask 3 | Mask 4 |
|---|---|---|---|---|
| Meat | 16802 | 12498 | 6673 | 15054 |
| Fat | 22848 | 7663 | 6050 | 11945 |
| Unknown | 206878 | 197652 | 173277 | 157405 |
| Total | 246528 | 217813 | 186000 | 184404 |
I used the training set to teach a Naive Bayesian Classifier to separate meat from bacon for all four cases. One of the resulting classifiers and the samples in the small subset is shown in Fig 5. All four classifiers used only a small subset of all available wavelengths. These wavelengts are listed in following table:
| NBC 1 | NBC 2 | NBC 3 | NBC 4 |
|---|---|---|---|
| 373.0 | |||
| 381.1 | |||
| 382.4 | |||
| 383.0 | |||
| 391.2 | |||
| 392.4 | |||
| 398.7 | |||
| 438.1 | |||
| 448.1 | |||
| 471.9 | 471.9 | ||
| 578.9 | |||
| 598.9 | |||
| 611.4 | |||
| 612.0 | |||
| 943.7 | |||
| 955.6 | |||
| 973.1 | |||
| 1007.5 |
When the pixels were assigned into classes, I put them back to the original image to visually examine the quality of the classification. The result is shown in Fig 4.
The next step was to test the stability of the segmentation. Can the classifier trained for upper right corner properly classify pixels from lower left corner and vice versa? I tested this by choosing samples from the original training set of some area, and tested if the classifier trained to some other area is able to make the right decision. I found out that some pixels were assigned into a wrong class already by the mask. Usually the consequence was that the classifier couldn't achieve 100% classification rate even for it's own training set. Therefore I only accepted those samples from the training set, which were also classified correcty by the classifier trained for that set. Still the training set was inconveniently large, so I choose only a random subset of each training sets for a test set. The results of this classifier cross validation are listed in the following table:
| Classifier | Dataset | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| 1 | 100% | 98% | 98% | 99% |
| 2 | 99% | 100% | 100% | 99% |
| 3 | 99% | 100% | 100% | 99% |
| 4 | 98% | 99% | 99% | 100% |
These results show that even though all classifiers use different set of wavelengts, they are about equally effective for segmentation across all areas. The number of variables used by all classifiers are at most 0.7% of all available variables. This implies that for simple queries, only a few variables are needed and that there are several good sets of variables which can be used. The question of which are the best variables in general may be inappropriate, since the value of a certain variable depends on both the query and what other variables have been selected.