What The Hare Said To Hector

Hector: Alright, explain your game to me, Mr. Hare.

Hare: First, I will stand somewhere along the race track, but I won’t tell you where, and this wall hides me from view. Now, do you see this large contraption here?

Hector: Indeed. There is a l-meter long net suspended from a high cable that extends along the track between two poles. Attached to the pole at this end (behind the wall) is a box with a crank, a lever, an anemometer, and a graphing calculator. I surmise that the crank moves the net along the cable, and that the lever releases it.

Hare: Very astute!

Hector: And I suppose that my goal is to trap you under the net?

Hare: Without any information to guide you? No such nonsense. In fact, the game is much more subtle than that. You will indeed crank the net to a position of your choosing — let’s name the end closest to you n — and then release it. I will then reveal my position — call it h — to you, and you must wager with me on whether I am under the net.

Hector: Too easy! If n \leq h \leq n + l, then I wager that you are under the net; otherwise, that you aren’t. I can’t lose!

Hare: Ah, not so fast. You see, there are random winds along the direction of the race track in these parts, and so the net may be blown considerably off from a straight downward course.

Hector: Which, of course, I won’t be able to see, due to the wall.

Hare: Precisely.

Hector: So, we could say that the net covers not [n, n + l], but [u, u + l], where u is a n-mean Gaussian process with standard deviation \sigma.

Hare: If you like.

Hector: Hence the anemometer, from which I can estimate \sigma. I see. Thus, my task is reduced to the following: given n, l, \sigma, and h, compute the probability that u \leq h \leq u + l. From there, I can decide whether the wager has a positive expected value for me.

Hare: Well played, Hector. But there is still one outstanding consideration.

Hector: And that is?

Hare: Actually computing the probability you speak of quickly enough to make your wager! I have somewhere to be promptly after this game, and as you know, we leporids hate to be late…

And with that, the Hare races off around the wall and down the track.

Comments Off
Jan 20th, 2012

Miss Register, I Presume?

In my AIM 2010 paper, I describe how to obtain a projective transformation \mathbf{H} from the image plane to the laser plane in a line laser 3D range imaging system. With the laser oriented vertically (i.e. perpendicular to the transport direction of the object being scanned), this allows for mapping of image coordinates directly to 3D coordinates.

Specifically, the (u, v) coordinates of a point in the image map through \mathbf{H} to yield the (x, z) coordinates in the laser plane. Then, if y_\Delta is the transport direction offset between profiles, the 3D coordinates of such a point in profile i are (x, i y_\Delta, z).

There is an additional step when the angle between the laser and the vertical axis is nonzero, as in the common configuration shown below, where the camera (rather than the laser) is orthogonal to the transport surface.

Ordinary Configuration

In this case, knowing the angle \beta is the key. Assuming the transport direction is positive away from the laser side and the object runs toward the laser, a point (u, v) in the image maps to 3D coordinates (x, i y_\Delta - z\sin\beta, z\cos\beta), where x and z are found as before.

I have a feeling that it is possible to derive \beta from \mathbf{H}. More formally, given the 3 \times 3 matrix representation \mathbf{H} of a projective transformation between two planes P_1 and P_2, and supposing L is the line of intersection of P_1 and P_2, what is the angle \alpha between vectors \vec{n}_1 and \vec{n}_2 lying, respectively, in P_1 and P_2 and perpendicular to L?

Comments Off
Jun 4th, 2011

Rise of the Quaternions

Adolphus finally quit messing around and started using a quaternion representation for rotations internally! The quaternion class itself is simple, and conversion to and from rotation matrix and axis-angle representations is fairly straightforward. The magic happens in converting from Euler angles — all twelve valid conventions!

By solving the conversion to quaternion for all twelve possibilities, I managed to squeeze out a pattern that allows me to solve them with a near-minimum of redundant code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
    @staticmethod
    def from_euler(convention, angles):
        def qterm(index):
            bin = lambda x: [(x >> 2) % 2, (x >> 1) % 2, x % 2]
            return copysign(1, index) * reduce(lambda a, b: a * b,
                [(bit and sin or cos)(angles[i] / 2.0) \
                for i, bit in enumerate(bin(abs(index)))])
        # TODO: can these be generated from the convention?
        eulerquat = {'xyx': [0, -5, 1, 4, 2, 7, 3, -6],
                     'xyz': [0, -7, 1, 6, 2, -5, 3, 4],
                     'xzx': [0, -5, 1, 4, -3, 6, 2, 7],
                     'xzy': [0, 7, 1, -6, -3, 4, 2, 5],
                     'yxy': [0, -5, 2, 7, 1, 4, -3, 6],
                     'yxz': [0, 7, 2, 5, 1, -6, -3, 4],
                     'yzx': [0, -7, 3, 4, 1, 6, 2, -5],
                     'yzy': [0, -5, 3, -6, 1, 4, 2, 7],
                     'zxy': [0, -7, 2, -5, 3, 4, 1, 6],
                     'zxz': [0, -5, 2, 7, 3, -6, 1, 4],
                     'zyx': [0, 7, -3, 4, 2, 5, 1, -6],
                     'zyz': [0, -5, -3, 6, 2, 7, 1, 4]}
        a, b, c, d = [sum([qterm(eulerquat[convention][i + j * 2]) \
                      for i in range(2)]) for j in range(4)]
        if a > 0:
            return Quaternion(a, -b, -c, -d)
        else:
            return Quaternion(-a, b, c, d)

Now, those hard-coded sequences of numbers? I am sure there is some relationship between the patterns and the conventions that would allow a more concise translation from one to the other. Note that, for a sequence [a, b, c, d, e, f, g, h], any or all of the pairs (a, b), (c, d), (e, f), and (g, h) can be swapped without changing anything (for example, the sequence for zyx could just as easily be [0, 7, 4, -3, 2, 5, -6, 1]). This has the unfortunate effect of making finding a pattern more difficult.

There are a number of tantalizing clues. The first number is always 0. The second is always -5 when two of the rotations are about the same axis, -7 when they are all different and in (rotated) xyz order, and 7 when they are all different and in zyx order. The 1 is always positive and appears in the second pair when the first axis is x, the pair when y, and the fourth when z. And so on.

I wonder if anyone else has figured out something similar? Adolphus wants a 9-line conversion function for its birthday.

Comments Off
Oct 18th, 2010

Rotating Lines

Problem:

Given a line of slope m in the Euclidean plane, what is the slope m’ of the line rotated (counterclockwise) by angle θ?

Solution:

Suppose we have an equation for the line of the form y = mx + b. We can ignore b as it is unrelated to the slope (in effect, we are working in an affine space).

So, y = mx for our purposes. Every point satisfying this equation is a multiple of

\left[\begin{array}{c}1 \\ m\end{array}\right]

and, similarly, every point satisfying the equation y = m’x of the rotated line is a multiple of

\left[\begin{array}{c}1 \\ m'\end{array}\right]

Since the latter point is the image of the former after rotation by θ, the points are related by a rotation matrix, like so:

\left[\begin{array}{c}1 \\ m'\end{array}\right] = \left[\begin{array}{cc}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{array}\right] \left[\begin{array}{c}1 \\ m\end{array}\right]

Solving for m’ then yields

m' = \frac{\sin\theta + m\cos\theta}{\cos\theta - m\sin\theta}

which, of course, is our solution in terms of m and θ.

Comments Off
Jun 28th, 2010
Tags:

So Close, Yet So Far Away

I humbly entreat any analytical intellects of greater constitution than my own (of which, to be sure, there is no dearth) to enlighten me in matters mathematical.

First of all, if I have a 5-dimensional space which consists of a 3-dimensional Euclidean space plus direction (defined by inclination and azimuth) — that is, the set of vectors (x, y, z, \rho, \eta) where x, y, z \in \mathbb{R}, \rho \in [0, \pi], and \eta \in [0, 2\pi) — what is that called? Of course, this generalizes to a (2p – 1)-dimensional space with a p-dimensional Euclidean spatial component. I have been calling fuzzy subsets of this space spatial-directional fuzzy sets, but spatial-directional space sounds patently ridiculous.

Second, is it possible to define a useful distance metric in such a space? Chaudhuri and Rosenfeld generalize the Hausdorff distance to arbitrary fuzzy subsets of a metric space, but this is of little use to me if my universal space is not metric. The natural way to define distance between two directions is to use the angle between the corresponding vectors, or similarly a norm on the surface of a torus. The obvious problem is that the numbers used for space and angle bear no relation to one another, so it seems nonsensical to combine them in a single metric (scaling and other such hackery need not apply). Yet, configuration spaces with similar discord among the units of their dimensions abound in engineering. Surely someone has tried to do something like this before?

Comments Off
Jun 15th, 2010
Tags: ,

The Road To Here

In my M.A.Sc. work on stereo camera network calibration, I made use of a series of graphs to describe the relationships between nodes. Existing work had already produced the vision graph and what I will call the transition graph (after Farrell and Davis‘ transition model). In both graphs, each vertex represents a node (camera or camera pair, usually) in the multi-camera network.

The Vision Graph

In a typical sensor network, the sensing modality is simple and omnidirectional, and thus adjacency is a function of proximity. A consequence of this is that the sensing graph is usually a subset of, equal to, or a superset of the communication graph. Therefore, it is usually possible to cheat by initializing the sensing graph model using either physical topology (from, say, GPS) or communication topology.

Camera networks break from this in that their sensing modality is complex and directional, so sensing adjacency has relatively little to do with communication adjacency. Devarajan and Radke realized this early on, and proposed the vision graph. An edge in the vision graph represents information about shared field of view between the two nodes represented by its vertices; there is an edge wherever there is significant overlap of the observed scene. But what does significant mean? Assuming we’re referring to a classic, unweighted graph for the moment, in order for it to be useful, significant must mean that there is sufficient overlap that we can – or at least, that it is reasonable to expect that we could – achieve whatever task is intended for the shared data.

Since every published use of the vision graph I am aware of is a calibration application, let’s talk about that. Originally, Devarajan and Radke had to specify the vision graph manually a priori in order to use it for calibration; a follow-up with Cheng proposed a method to build it by propagating digests of SIFT features from the various images to other nodes. I myself only used it in concept, as a theoretical upper limit for the final calibration graph.

Now, for a digression…

The Transition Graph

Meanwhile, folks working on surveillance and tracking applications were also using a graph formalism – well, in most cases, a reflexive binary relation on the set of nodes that translates trivially into a graph formalism. An edge in the transition graph represents a different, although related, kind of adjacency between the two nodes represented by its vertices: the possibility of a target moving from the field of view of one node to that of the other. This, of course, includes overlapping fields of view, but importantly, it also extends to the non-overlapping case.

It quickly became clear to someone that it was worthwhile to represent not just the possibility, but the probability, of a transition. This may have been an obvious consequence of some of the methods used to actually determine this graph or relation – for example, Marinakis et al. used a Monte Carlo expectation-maximization method, and Farrell and Davis used sequential Bayesian estimation, both probability-oriented. Though intended to properly capture the information presented, it turns out that weighting the graph with these probabilities allows for some good optimization when it comes time to do predictions or camera hand-offs.

The Fuzzy Vision Graph

Seeing this, we thought, what if we similarly retain the uncertainty in the vision graph (rather than thresholding it out on an task-specific basis, as in all prior work) and put it to work optimizing calibration and other applications? Probability is, in general, ill-suited to the job, because we aren’t talking about transitions here. So, we considered another representation of uncertainty: fuzzy sets. A graph is an ordered pair (V, E) of a set of vertices V and a set of edged E; a fuzzy graph (in most definitions) is simply a generalization where V and E are fuzzy sets.

One nice thing immediately apparent is that, with the inherent, well, fuzziness of fuzzy graphs, there’s no longer any need to set task-specific thresholds in advance when constructing the model; instead of summarily judging whether overlap is significant, we simply capture the degree of significance, which if we really want can be thresholded later via task-specific alpha cuts. Beyond this, a number of opportunities for automatically optimizing various tasks in more advanced ways presented themselves as I poked at the idea. (This is the time when FuzzPy was born.)

The major problem, so far, is that there is no real understanding of what the ideal fuzzy vision graph actually represents. We can build them in various ways: using feature digests like Cheng, for example. But in order to decouple the practical optimization research from the practical modeling research, we need some definition of what a perfect vision graph for a camera network actually is.

Modeling Camera Network Coverage

To approach the problem of modeling the topology of camera coverage overlap, what is really needed is a proper model of camera network coverage. Much of the existing work on this has been developed specifically for optimal camera placement algorithms; it provides some inspiration, but as these models are rather simplistic (discrete, two-dimensional, triangular in shape, etc.) they don’t provide much in the way of an ideal model. Delving deeper, some decades-old work on task-oriented camera placement by Cowan and Kovesi provides an idea for a more realistic model of camera coverage, if you invert it, but is only for single cameras and uses task-specific thresholds.

Coverage, like overlap in the vision graph, is an imprecise concept when it is not tied to a specific task. Hence, fuzzy sets once again presented themselves as a possible framework for the representation: one can assign a membership degree in the set of covered points to every point in three-dimensional Euclidean space. Like Cowan and Kovesi, I identified visibility, focus, and resolution as the major factors in single-camera coverage (in the absence of a priori scene knowledge), developed “trapezoidal” fuzzy sets for each, then combined them via algebraic product fuzzy intersection to get a complete model that, given the intrinsic parameters of a camera and a couple of intuitive application (not necessarily single-task) parameters, would tell me how well any point in its local 3-space is covered. This is called the fuzzy coverage model.

It becomes evident that a fourth factor, direction, is needed when looking at the multi-camera case, where cameras are situated in a common coordinate system based on their extrinsic parameters, and we are interested in part in their field-of-view overlap. If two cameras face the same general volume of space, but from opposite sides, is the volume in common covered by both cameras? If you’re tempted to think yes, consider a task such as feature matching. Cameras can only see opaque surfaces with normals in the interior half-space defined by a plane tangent to the point in question and perpendicular to the principal axis. Furthermore, a number of studies have shown that, in practice, feature matching performance degrades over rotation of the viewpoint. Thus, if we extend our three-dimensional concept of space to include direction as well, making the fuzzy coverage model 5-dimensional, we consider points at the same location in 3-space but with different direction not the same point at all. Some geometry and a fuzzifying parameter gives us another component fuzzy set for the single-camera model, which we can incorporate by algebraic product intersection.

Another major consideration for the multi-camera case, which various work on camera network topology and placement optimization has attempted to tackle, is occlusion from the static scene. This may be known a priori (e.g. CAD layout of a building) or estimated by the camera network itself. It is possible to do this in continuous space, like Tarabanis et al., and so ideally you’d have a complete 3D model of the unoccluded field of view for each camera, and all other points have \mu = 0. However, the complexity seems prohibitive in practice for large camera networks. If we instead move to a discrete representation of the fuzzy coverage model, we can do what I did: make the scene finite, mark a subset of its voxels opaque, then use the voxel traversal of the line from a given point to the principal point of the camera to determine whether it is occluded (i.e. if the traversal includes any opaque voxels).

The easy part is the total network coverage. For simple multi-camera networks, where one camera covering a point (remember, we account for direction implicitly) is enough, simply combine all the appropriately transformed, discretized, and occlusion-ified single-camera coverage models together via standard fuzzy union. For 3D multi-camera networks, where we want at least two cameras covering a point, first generate a coverage model for each pair of cameras via standard fuzzy intersection on the pair, then combine all these via union.

Okay, now what?

This is about where I’m at. There’s plenty more that could be done with the fuzzy coverage model itself: solving the optimal camera placement problem in new and exciting ways is one really obvious possibility that I talk about in an upcoming paper.

As far as getting back to the graph stuff, I defined an overlap metric involving the scalar cardinality, which could be used to derive the fuzzy vision graph from the discrete fuzzy coverage model; this is significant in that it indicates a direct link from the \mu values of the edges all the way back to the basic parameters of the camera network and the scene (and we know where all the simplifications are). Even better, though, might be something more analytic and general, like using something along the lines of the Hausdorff distance between two cameras’ coverage models, which might not only allow us to find the fuzzy vision graph, but also start unifying it with the non-overlapping concept of the transition graph.

Back to the future…

Comments Off
May 23rd, 2010
Tags:

Axes of Evil

A quick and dirty set of 3D axes for Visual:

from visual import arrow, cylinder
 
def visual_axes( scale ):
    """\
    Display a set of 3D axes.
 
    @param scale: The scale of the axis set.
    @type scale: C{float}
    """
    for axis in [ tuple( [ i == j and scale * 5 or 0 for i in range( 3 ) ] ) \
                  for j in range( 3 ) ]:
        arrow( pos = ( 0, 0, 0 ), axis = axis, shaftwidth = scale / 10.0 )
 
    cylinder( pos = ( ( scale * 6.0 ), -( scale / 4.0 ), 0 ),
              axis = ( -( scale / 2.0 ), ( scale / 2.0 ), 0 ),
              radius = scale / 20.0 )
    cylinder( pos = ( scale * 5.5, -( scale / 4.0 ), 0 ),
              axis = ( ( scale / 2.0 ), ( scale / 2.0 ), 0 ),
              radius = scale / 20.0 )
 
    cylinder( pos = ( 0, ( scale * 5.5 ), 0 ),
              axis = ( 0, ( scale / 4.0 ), 0 ),
              radius = scale / 20.0 )
    cylinder( pos = ( 0, ( scale * 5.75 ), 0 ),
              axis = ( -( scale * 0.17 ), ( scale / 4.0 ), ( scale * 0.17 ) ),
              radius = scale / 20.0 )
    cylinder( pos = ( 0, ( scale * 5.75 ), 0 ),
              axis = ( ( scale * 0.17 ), ( scale / 4.0 ), -( scale * 0.17 ) ),
              radius = scale / 20.0 )
 
    cylinder( pos = ( 0, -( scale / 4.0 ), ( scale * 6.0 ) ),
              axis = ( 0.0, ( scale / 2.0 ), -( scale / 2.0 ) ),
              radius = scale / 20.0 )
    cylinder( pos = ( 0, -( scale / 4.0 ), ( scale * 6.0 ) ),
              axis = ( 0.0, 0.0, -( scale / 2.0 ) ),
              radius = scale / 20.0 )
    cylinder( pos = ( 0, ( scale / 4.0 ), ( scale * 6.0 ) ),
              axis = ( 0.0, 0.0, -( scale / 2.0 ) ),
              radius = scale / 20.0 )

Yep, this is how I’m validating my geometry module by eye.

Comments Off
Mar 22nd, 2010
Tags: ,

Topology, To What End?

There have been a number of approaches to build topological models of multi-camera networks. The idea is to represent, in some useful and relatively simple mathematical way, the relationship — usually meaning the overlap — between the fields of view of the cameras. To date, it seems as though all such methods fall into two basic types.

On one hand, we have the motion-based methods, which are typically targeted at tracking applications. Ellis et al. temporally correlate objects transiting between adjacent fields of view, after establishing each camera’s entry and exit zones. Similarly, Mandel et al. correlate simultaneous motion between views to establish overlap. Detmold et al. propose the exclusion algorithm, which is basically the opposite and potentially more robust: if there is motion in one view, and not in another, then those views cannot overlap (initially all views are assumed to overlap). Farrell and Davis present a slightly different model, dubbed the transition model, which focuses even more on tracking as it expresses the probability of transitions between views; this is also determined from observations of motion.

On the other hand, we have the feature-based methods, typically used as a precursor to or substitute for multi-camera calibration. Devarajan and Radke first proposed the vision graph without specifying an automated means of obtaining it; they later approached this topic in Cheng et al. using distributed matching of SIFT features. Kurillo et al. also use common features to build their vision graph for calibration. In my ICDSC 2008 paper, the vision graph is a theoretical upper limit for the grouping and calibration graphs, which are built from 3D feature matches via registration. Kulkarni et al. use a calibration target to explicitly match spatial points, and build an actual tessellation of 3D space to represent the range and degree of overlap of the cameras’ fields of view. Finally, Lobaton et al. use scene features (in their case, bisecting lines indicating wall delineations) to construct their algebraic topological model, which is proposed as a substitute for full calibration in certain (unspecified) secondary applications.

So is generic topological information about multi-camera networks only useful for tracking and calibration, the two problem areas that appear to have spawned every topological model in the literature? Are there any other applications that simply don’t have any convenient information of their own for building such models?

Comments Off
Mar 19th, 2010
Tags:

Snap To Grid

I like Gabor Herman‘s definition of discrete Euclidean space. He defines, for any positive real number \delta and any positive integer N:

\delta \mathbb{Z}^N = \{ ( \delta x_1, \ldots, \delta x_N ) | x_n \in \mathbb{Z} \text{ for } 1 \leq n \leq N \}

This chops N-space up into a square (cubic, etc.) Bravais lattice, with primitive vectors of magnitude \delta. Each point in the discrete space is then associated with a primitive cell (or Voronoi neighbourhood, in Herman’s parlance) consisting of all points in the associated continuous space which are closer to that discrete point than any other discrete point.

This relationship to continuous space makes discretization of otherwise continuous sets — or regions, as say the GIS people — a snap (pun intended). In the simplest case, it is equivalent to basic sampling.

Voxels

In 3-space, these are essentially voxels, which are intuitive for visual thinkers like me, and allow for neat tricks from the computer graphics literature like voxel traversals.

Comments Off
Mar 17th, 2010
Tags: ,

Pi Day Madness

You have a right-handed Cartesian coordinate basis of a three-dimensional Euclidean space, with axes x, y, and z. You’re given some spatial-directional vectors of the form (x,y,z,\rho,\eta), where \rho and \eta are, respectively, the inclination angle (from the positive z-axis zenith) and the azimuth angle (measured right-handed from the positive x-axis) of an associated direction, which is unrelated to the direction of vector (x,y,z). How do you tell if the direction (\rho,\eta) of such a vector falls inside the exterior half-space defined by the plane normal to (x,y,z)?

If it helps, visualize it this way. Travel to a point in space (x,y,z). Then, find a point on the unit sphere surrounding (x,y,z) defined by (\rho,\eta) (like latitude and longitude). Now, cut all of space in half with a plane passing through (x,y,z) and perpendicular to the line between the origin and (x,y,z). Is the point on the unit sphere in the same half of space as the origin?

If x = y = 0, it’s dead simple: \rho \geq \pi/2. Doesn’t even matter what \eta is.

If y = 0 and x > 0, there’s no effect on the azimuth, so it’s still pretty straightforward: \rho \geq \pi/2 + \sin\eta\arctan(x/z). The \arctan(x/z) accounts for the tilt of the plane, and the \sin\eta accounts for the azimuth.

In the general case where x, y \not= 0, and thus the azimuth angle relative to the plane’s direction of tilt differs from the absolute azimuth angle, it gets a bit trickier. First, we need the magnitude of (x,y) for the \arctan bit:

r = \sqrt{x^2 + y^2}

Now, for z > 0, what we want is \rho \geq \pi/2 + \cos(\eta - \theta)\arctan(r/z), where \theta is the right-handed angle of (x,y) from the positive x-axis. Conceptually, subtracting \theta is like transforming the azimuth angle into a new coordinate system where the plane tilt is back in the x-z plane (as if y = 0 and x = r). We could leave it like this, and have the reader calculate \theta as:

A Giant Ugly Mess

But, with a well-known trigonometric identity we can make it a bit prettier:

\rho \geq \frac{\pi}{2} + \left(\frac{y}{r}\sin\eta + \frac{x}{r}\cos\eta\right) \arctan\left(\frac{r}{z}\right)

More than a few scrap half-pages were harmed during the making of this inequality.

Comments Off
Mar 14th, 2010
Tags: ,