A take-home assignment project for SilverAI application. The topic is similar photo clustering.
- Import large batch of photos (Up to
10.000
photos). - Multiple clustering approaches (with a distance threshold of
10.0
). - Display the first photo of each clusters to the view.
- Execution time results.
Import photos | Mode selection | Execution results | Groups |
---|---|---|---|
Known issue:
This app employed PHPickerViewController of PhotoKit as it's image picker; nevertheless, a known issue has been identified: You can't import an amount of photos that is larger than 503
items, or else it'll be throttled (and yes, exactly >503
images). I believe it's an iOS bug because the Freeform app also got the same issue, or it just could be due to my device.
A possible solution is to switch to PHImageManager and use
requestImage(for:)
. However, I will not implement it right now.
To optimize the loading of the image batch, we shouldn't load the highest resolution version of them of course. To load only the image's mini-sized thumbnails, we specified the maximum number of pixels that should be in the thumbnail in PHPickerViewControllerDelegate using kCGImageSourceThumbnailMaxPixelSize
(here it is 300
pixels max). Pass the entire downsampleOptions
dictionary to the constructor CGImageSourceCreateThumbnailAtIndex
and retrieve it's thumbnail; proceed with the thumbnails only. Use a DispatchGroup
to optimize those procedures. The whole process is as follows:
let dispatchGroup = DispatchGroup()
let dispatchQueue = DispatchQueue(label: "com.verny.PhotoClustering")
let itemProviders = results.map { $0.itemProvider }
.filter { $0.hasItemConformingToTypeIdentifier(UTType.image.identifier) }
itemProviders.forEach { itemProvider in
dispatchGroup.enter()
itemProvider.loadFileRepresentation(forTypeIdentifier: UTType.image.identifier) { url, error in
if let error = error { Logger().error("itemProvider.loadFileRepresentation \(error.localizedDescription)") }
guard let url = url else { dispatchGroup.leave(); return }
let sourceOptions = [
kCGImageSourceShouldCache: false,
kCGImageSourceCreateThumbnailFromImageIfAbsent: true
] as CFDictionary
guard let source = CGImageSourceCreateWithURL(url as CFURL, sourceOptions) else { dispatchGroup.leave(); return }
let downsampleOptions = [
kCGImageSourceCreateThumbnailFromImageAlways: true,
kCGImageSourceCreateThumbnailWithTransform: true,
kCGImageSourceThumbnailMaxPixelSize: 300,
] as CFDictionary
guard let cgImage = CGImageSourceCreateThumbnailAtIndex(source, 0, downsampleOptions) else { dispatchGroup.leave(); return }
dispatchQueue.sync {
let image = UIImage(cgImage: cgImage)
self.viewModel.images.append(image)
}
dispatchGroup.leave()
}
}
dispatchGroup.notify(queue: .main) {
DispatchQueue.main.async { self.processing() }
}
The similarity distance between photos is calculated using the API
: computeDistance(_:to:)
of VNFeaturePrintObservation. The underling backbone of it is basically similar to that of the combination between MobileNet and PCA, which simply computes the very basic Euclidean distance between two feature print vectors, given by:
While comparing the distance between two arbitrary vectors in iOS 16 is questionable, measuring the distance between two normalized vectors in iOS 17 is far more significant. In data analysis, a typical measure of vector similarity is the cosine distance, which is closely connected to the vector distance by the following formula, where θ is the angle between the two vectors u and v:
- In iOS 16, a feature print is a non-normalized float vector having
2048
values. Typical distance values range between0.0 ~ 40.0
. - While in iOS 17, a feature print is a normalized float vector of length
768
. Distance value range between0.0 ~ 2.0
(in my testing set, the highest distance was1.4
).
Because the app must support users on iOS 16 - which don’t have access to the latest Vision framework, so I did specify an earlier revision:
visionRequest.revision = VNGenerateImageFeaturePrintRequestRevision1
Specified threshold:
Because we stick with the revision 1 of VNFeaturePrintObservation which ranges from 0.0
to 40.0
. We took an empirical method to determine an adequate similarity criterion for our use case (removing duplicates or quasi-duplicates from a collection of photos). We combined images of very similar dog breeds with images of various cats, ducks, towns, human faces, and other dog breeds. Then we asked Vision to calculate the distance between each pair of photographs, and we repeated the experiment with other datasets (a mix of similar and distinct pictures).
Every time, we noticed that a cutoff value spreading between 9.0
and 11.0
could separate the cluster of similar pictures from the different ones. The ideal threshold will fall to around one-quarter of the maximum value (40.0
), which in this case is 10.0±1.0
.
let threshold: Float = 10.0
Node Clustering, also known as Community Detection is the process of starting with each node as its own cluster and merging clusters until a balancing point is reached.
DISCUSSION: The function's goal is to generate an undirected graph with vertices connected by weighted edges (which in this case is the distance of similarity between the photos). The lower the edge value, the more similar the vertices connected by that edge. This function creates clusters for the specified graph, with each cluster including vertices that are similar to one another.
The construction of the clusters is determined by the threshold value supplied as input. If the weight of an edge between two clusters divided by the minimum weight of clusters is less than or equal to the tolerance threshold, the two clusters are combined into a single cluster. By default, each vertex is a cluster with a weight of 1
.
Linear Marching through all of the photographs, grouping similar neighbor photos together and never turning it's head back.
IMPORTANT: This approach results in fragmented but similar clusters, which is a disadvantage but offset by it's optimized time complexity.
Approach | Time complexity | Benefits | Downsides |
---|---|---|---|
Node Clustering | Precision, will works with all kind of input variations. Extendable, is a potential solution. | Timely execution. | |
Linear Marching | Fast, rapid. | Resulted in fragmented but similar clusters, unprecise. |
These tests and records are performed on my device - an iPhone XS Max.
Amount of images | Node Clustering | Linear Marching |
---|---|---|
10 | 1.0418 seconds |
0.1875 seconds |
100 | 101.8176 seconds |
2.2491 seconds |
500 | Too long... | 10.2574 seconds |
I used the Stanford Dogs dataset, which contains about ≈20.000
photos classified into 120
breeds of dogs. There are relatively few photographs for each breed, and some are duplicated, which is a good reason to use it. The breeds I chose for the performance test include Pekingese, Shih-Tzu, Blenheim Spaniel and Papillon, which are all quite similar visually.
- iOS
16.0+
- Swift
5.1+
- Xcode
14.0+
Installation: This project doesn't contain any package dependencies; just simply register the bundle with your developer account and build it.
PhotoClustering is available under the MIT license. See the LICENSE
file for more info.