I went to multiple spots on campus and clicked a series of pictures at each spot with my phone. I made sure to do my best at fixing the Center of Projection by rotating around the phone camera. Unfortunately, I did not have access to a tripod.
Li Ka Shing |
Original left image |
Original right image |
Bowles Hall Lounge |
Original left image |
Original right image |
Cory Courtyard View |
Original left image |
Original right image |
When trying to recover the homography transformation from one image to the other, I manually selected corresponding points in both images. The idea is that if H is a 3x3 matrix that represents the homography transformation, and p and p' represent the pairs of corresponding points in homogenous coordinates, Hp ≈ p'. That is, H should approximately transform points in one image to the other image. Ideally, H would exactly transform the points, but due to noise, it is often infeasible to find a suitable H that does this. Because of noise when selecting corresponding points, we prefer to select more than the minimum 4 corresponding points needed to solve for the homography. We end up with an overdetermined system that may have no exact solution, but we can find a good solution for H using least squares.
We set up a least squares equation to solve for the entries of H. Note that there are 8 unknowns because we can fix the 9th entry of H (lower-right) to be 1. Once we solve for these entries, we reconstruct H.
We warp the left image to the perspective of the right image by applying a homography transformation to the points in the left image. We specifically do an inverse warp and interpolate pixel values in the resulting image.
Using our tool to recover the homography transformation, we can now rectify images! An object that is rectangular in nature may appear as a trapezoid in an image if the image is taken from an angle. We can compute the homography transformation to transform the perspective of the object from an angle (appearing trapezoidal) to straight on (appearing rectangular). This rectifies the image.
In order to compute the homography, we must use corresponding points. We select the corners of the image in the original image and then set the corresponding points in the target image to make the shape of a rectangle. For the Ticket to Ride one below, we use [(500, 500), (2500, 500), (2500, 2500), (500, 2500)]. For Calvin and Hobbes, we use [(100, 100), (1100, 100), (1100, 1500), (100, 1500)]
Original image, taken from an angle |
Original image, with corner points marked |
Rectified image by computing homography |
Original image, taken from an angle |
Original image, with corner points marked |
Rectified image by computing homography |
We generated mosaics by first warping one image using a homography transformation. This way, we change its perspective to that of the second image. We then overlay the two images, making sure to apply the same shift computed from warping the first image to the second image.
We then use alpha-blending to blend the seams together. The technique we choose to use includes first extract the region of overlap betwen the two images. We then create an alpha parameter that linearly decreases from 1 to 0 as we go horizontally across this region of overlap. This serves to linearly reduce the influence of the left image and linearly increase the influence of the right image as we go across.
Original left image with |
Original right image with |
Left image warped to the right homography |
Mosaic formed by blending |
Original left image with |
Original right image with |
Left image warped to the right homography |
Mosaic formed by blending |
Original left image with |
Original right image with |
Left image warped to the right homography |
Mosaic formed by blending |
Detected Harris corners |
Detected Harris corners |
Detected Harris corners |
Detected Harris corners |
Detected Harris corners |
Detected Harris corners |
We apply adaptive non-maximal suppression (ANMS) in order to reduce the number of interest points so that we can speed up downstream calculations like feature matching. Adaptive non-maximal suppression helps us do this and also maintains good coverage of the image, rather than concentrating interest points in just one part. This is because ANMS factors in the distance from one peak to other peaks.
Our algorithm works as follows. For each interest point, we consider all points whose corner strength times c_robust (set to 0.9) exceeds the corner strength of the interest point in question. Then, we compute the closest distance of these much stronger interest points to our interest point in question -- that is our suppression radius. We then pick the top k interest points with the highest suppression radii. This essentially allows us to pick interest points that are far enough from bigger interest points.
For all of the following, we look to pick 500 interest points.
ANMS applied to detected Harris corners |
ANMS applied to detected Harris corners |
ANMS applied to detected Harris corners |
ANMS applied to detected Harris corners |
ANMS applied to detected Harris corners |
ANMS applied to detected Harris corners |
Around each of the interest points, we extract a 40 by 40 window. We then subsample every 5th pixel in this 40 by 40 window to end up with an 8 by 8 subsampled feature descriptor. We then bias/gain-normalize the feature descriptors.
Feature descriptor extracted |
Feature descriptor extracted |
For each pair of features in the two images, we compute the error (using squared L2 distance) if we matched those features. Then, for each feature, we can find the 1st and 2nd nearest neighbors (i.e., features in the other image that give the lowest error). We apply Lowe's trick, rejecting those features whose ratios are above a certain threshold (we chose a threshold of 0.6) -- what we ideally want is features with a much lower distance to first neighbor than second neighbor since this indicates a good match was found. Out of the remaining features, we match them with their 1st nearest neighbor, and we only keep mutual matches.
Feature matches for Li Ka Shing images |
|
Feature matches for Bowles Hall images |
|
Feature matches for Cory Courtyard images |
There may still be outliers even after matching features, applying Lowe's trick, and rejecting asymmetries. We can compute a homography estimate with the Random Sample Consensus (RANSAC) method. We randomly sample four matches and compute a homography estimate from these. Then, for this homography estimate, we count the number of inliers among the entire data set of matches. Specifically, we look for matches that are modeled by the homography estimate, within an error threshold of 2px. We repeat for 10,000 iterations and keep track of the maximum set of inliers found. We then re-compute a homography estimate from the maximum set of inliers.
We use this homography estimate to warp the left images to the right, and then we stitch the images together. When stitching images together, we use alpha-blending where in the region of overlap, we linearly decrease the influence of the left image and linearly increase the influence of the right image as we move from left to right.
Left Li Ka Shing image warped |
Right Li Ka Shing image |
Warped left and regular right Li Ka Shing image |
Manually-stitched Li Ka Shing mosaic from Part A |
Left Bowles Hall image warped |
Right Bowles Hall image |
Warped left and regular right Bowles Hall image |
Manually-stitched Bowles mosaic from Part A |
Left Cory image warped |
Right Cory image |
Warped left and regular right Cory image |
Manually-stitched Cory mosaic from Part A |
The coolest thing I learned is RANSAC. It's so incredible that a simple randomized algorithm is able to completely get rid of outliers (not just reduce the influence of outliers) that can drastically affect our result.