Project 4A: Image Warping and Mosaicing

Part 1: Shoot the Pictures

I went to multiple spots on campus and clicked a series of pictures at each spot with my phone. I made sure to do my best at fixing the Center of Projection by rotating around the phone camera. Unfortunately, I did not have access to a tripod.

Li Ka Shing

Original left image

Original right image

Bowles Hall Lounge

Original left image

Original right image

Cory Courtyard View

Original left image

Original right image

Part 2: Recover Homographies

When trying to recover the homography transformation from one image to the other, I manually selected corresponding points in both images. The idea is that if H is a 3x3 matrix that represents the homography transformation, and p and p' represent the pairs of corresponding points in homogenous coordinates, Hp ≈ p'. That is, H should approximately transform points in one image to the other image. Ideally, H would exactly transform the points, but due to noise, it is often infeasible to find a suitable H that does this. Because of noise when selecting corresponding points, we prefer to select more than the minimum 4 corresponding points needed to solve for the homography. We end up with an overdetermined system that may have no exact solution, but we can find a good solution for H using least squares.

We set up a least squares equation to solve for the entries of H. Note that there are 8 unknowns because we can fix the 9th entry of H (lower-right) to be 1. Once we solve for these entries, we reconstruct H.

Part 3: Warp the Images

We warp the left image to the perspective of the right image by applying a homography transformation to the points in the left image. We specifically do an inverse warp and interpolate pixel values in the resulting image.

Part 4: Rectification

Using our tool to recover the homography transformation, we can now rectify images! An object that is rectangular in nature may appear as a trapezoid in an image if the image is taken from an angle. We can compute the homography transformation to transform the perspective of the object from an angle (appearing trapezoidal) to straight on (appearing rectangular). This rectifies the image.

In order to compute the homography, we must use corresponding points. We select the corners of the image in the original image and then set the corresponding points in the target image to make the shape of a rectangle. For the Ticket to Ride one below, we use [(500, 500), (2500, 500), (2500, 2500), (500, 2500)]. For Calvin and Hobbes, we use [(100, 100), (1100, 100), (1100, 1500), (100, 1500)]

Original image, taken from an angle

Original image, with corner points marked

Rectified image by computing homography
and warping image

Original image, taken from an angle

Original image, with corner points marked

Rectified image by computing homography
and warping image

Part 5: Blending Mosaics

We generated mosaics by first warping one image using a homography transformation. This way, we change its perspective to that of the second image. We then overlay the two images, making sure to apply the same shift computed from warping the first image to the second image.

We then use alpha-blending to blend the seams together. The technique we choose to use includes first extract the region of overlap betwen the two images. We then create an alpha parameter that linearly decreases from 1 to 0 as we go horizontally across this region of overlap. This serves to linearly reduce the influence of the left image and linearly increase the influence of the right image as we go across.

Li Ka Shing

Stitched together two images showing different views of Li Ka Shing and surrounding buildings.

Original left image with
correspondences selected

Original right image with
correspondences selected

Left image warped to the right homography

Mosaic formed by blending

Bowles Hall Lounge

Stitched together two images showing different views of the lounge at Bowles Hall (my dorm!)

Original left image with
correspondences selected

Original right image with
correspondences selected

Left image warped to the right homography

Mosaic formed by blending

Cory Courtyard

Stitched together two images from Cory Courtyard

Original left image with
correspondences selected

Original right image with
correspondences selected

Left image warped to the right homography

Mosaic formed by blending

Project 4B: Feature Matching for Autostitching

Part 1: Harris Interest Point Detector

In order to get a set of candidate interest points, we looked at detecting Harris interest points (i.e., corners) in the images. We compute the corner strength function and find points that are a maximum within a small radius of 1 px. This essentially means we are looking for points that are bigger than their neighbors.

Detected Harris corners
in the left Li Ka Shing image

Detected Harris corners
in the right Li Ka Shing image

Detected Harris corners
in the left Bowles Hall image

Detected Harris corners
in the right Bowles Hall image

Detected Harris corners
in the left Cory Courtyard image

Detected Harris corners
in the right Cory Courtyard image

Part 2: Adaptive Non-Maximal Suppression

We apply adaptive non-maximal suppression (ANMS) in order to reduce the number of interest points so that we can speed up downstream calculations like feature matching. Adaptive non-maximal suppression helps us do this and also maintains good coverage of the image, rather than concentrating interest points in just one part. This is because ANMS factors in the distance from one peak to other peaks.

Our algorithm works as follows. For each interest point, we consider all points whose corner strength times c_robust (set to 0.9) exceeds the corner strength of the interest point in question. Then, we compute the closest distance of these much stronger interest points to our interest point in question -- that is our suppression radius. We then pick the top k interest points with the highest suppression radii. This essentially allows us to pick interest points that are far enough from bigger interest points.

For all of the following, we look to pick 500 interest points.

ANMS applied to detected Harris corners
in the left Li Ka Shing image

ANMS applied to detected Harris corners
in the right Li Ka Shing image

ANMS applied to detected Harris corners
in the left Bowles Hall image

ANMS applied to detected Harris corners
in the right Bowles Hall image

ANMS applied to detected Harris corners
in the left Cory Courtyard image

ANMS applied to detected Harris corners
in the right Cory Courtyard image

Part 3: Feature Descriptor Extraction

Around each of the interest points, we extract a 40 by 40 window. We then subsample every 5th pixel in this 40 by 40 window to end up with an 8 by 8 subsampled feature descriptor. We then bias/gain-normalize the feature descriptors.

Feature descriptor extracted
from Li Ka Shing left image

Feature descriptor extracted
from Bowles right image

Part 4: Feature Matching

For each pair of features in the two images, we compute the error (using squared L2 distance) if we matched those features. Then, for each feature, we can find the 1st and 2nd nearest neighbors (i.e., features in the other image that give the lowest error). We apply Lowe's trick, rejecting those features whose ratios are above a certain threshold (we chose a threshold of 0.6) -- what we ideally want is features with a much lower distance to first neighbor than second neighbor since this indicates a good match was found. Out of the remaining features, we match them with their 1st nearest neighbor, and we only keep mutual matches.

Feature matches for Li Ka Shing images

Feature matches for Bowles Hall images

Feature matches for Cory Courtyard images

Part 5: RANSAC

There may still be outliers even after matching features, applying Lowe's trick, and rejecting asymmetries. We can compute a homography estimate with the Random Sample Consensus (RANSAC) method. We randomly sample four matches and compute a homography estimate from these. Then, for this homography estimate, we count the number of inliers among the entire data set of matches. Specifically, we look for matches that are modeled by the homography estimate, within an error threshold of 2px. We repeat for 10,000 iterations and keep track of the maximum set of inliers found. We then re-compute a homography estimate from the maximum set of inliers.

We use this homography estimate to warp the left images to the right, and then we stitch the images together. When stitching images together, we use alpha-blending where in the region of overlap, we linearly decrease the influence of the left image and linearly increase the influence of the right image as we move from left to right.

Left Li Ka Shing image warped
using homography computed by RANSAC

Right Li Ka Shing image

Warped left and regular right Li Ka Shing image
stitched together and blended with alpha-blending

Manually-stitched Li Ka Shing mosaic from Part A

Left Bowles Hall image warped
using homography computed by RANSAC

Right Bowles Hall image

Warped left and regular right Bowles Hall image
stitched together and blended with alpha-blending

Manually-stitched Bowles mosaic from Part A

Left Cory image warped
using homography computed by RANSAC

Right Cory image

Warped left and regular right Cory image
stitched together and blended with alpha-blending

Manually-stitched Cory mosaic from Part A

Coolest Thing Learned

The coolest thing I learned is RANSAC. It's so incredible that a simple randomized algorithm is able to completely get rid of outliers (not just reduce the influence of outliers) that can drastically affect our result.