Monday, May 10, 2010

Easy way to create artificial anaglyphic images.

(please excuse all the atrocious font inconsistencies. Blogspot is *way* too buggy to bother trying to fix it, at least for right now. also, my images are missing important elements to tell what's going on until I figure out how to change my blog width to something other than "retardedly small.")

Flat Perspective Projection - Summary

When considering how to project something 3d onto 2 dimensions, my algorithm is simple: I treat the actual computer monitor as a window through which the viewer looks at the object behind it. As the very fact of having to project onto 2D implies that actual depth is not observed, the only thing we need to consider is where the object would appear on the window if it were actually on it, meaning at what point the line of sight between the viewer's eye and the object intersects the monitor.

Here is some code in no particular language. For simplicity's sake we'll say that (x, y, z) values are in inches, and (0, 0, 0) is the center of the screen. vd will be viewing distance. w and h will be the graphic window's width and height. dpi is discussed later in the essay.

r = vd/(object.z+vd)
x = w/2 + object.x*r*dpi
y = h/2 + object.y*r*dpi

(Don't forget to scale your object's apparent size too. Since you're dealing with 3d chances are this isn't a sprite so I don't know what format it's in and can't include code to do it.)

Here's an example where (x, y, z) is a voxel position and (0, 0, 0) is the top left corner of the screen:

r = vd/(object.z+vd*dpi)
x = w/2+(object.x-w/2)*r
y = h/2+(object.y-h/2)*r

But I suppose nobody but me would actually have (0, 0, 0) not be directly forward now that I think about it. (I did because my world was a 1000x1000x1000 box.)

Flat Perspective Projection - Theory


The orange dot is our object we're looking at. The green line is the line of sight to that object. D is the line of sight if the viewer were looking directly forward. A is 50% of the distance between the viewer and the depth of the object (meaning strictly the z value, or in other words distance along D), and its length is what the horizontal displacement would be between the object and D if the screen were at that depth. It should be obvious that the length of A is half the length of C (imagine if the green line were a graph and D were the x axis. Lines are linear, so so rise is directly proportional to run here).

The length of C is obviously 100% of the horizontal displacement between the object and D, and our distance along D at that point is 100% of the depth of the object.

Similarly if B is 78% of the distance from the viewer to the object, then the length of B will be 78% of C. Since B is where our screen is, we will obviously draw our object {length of B} away from the center of the screen, horizontally.

Vertically it's the same principle. Just imagine I had been illustrating the whole thing from the side instead of from the top. y would be, in this case, 78% of the object's vertical disposition.

Of course, when we talk about the distance between the viewer and the screen, we're talking a physical measure. In order to use this value in the pixel realm, we have to know the DPI of the screen. Sort of. This isn't really necessary if we don't care how large the picture appears to us and we don't care if the perspective is completely accurate. Overall, viewer distance scaled by the wrong factor will simply result in a world that's smaller or larger than it should be.

We also, of course, have to make an assumption about the viewer's distance from the screen. Average distance is supposedly about one and a third of arm's length. My measured distance is 31 inches (which is exactly my "arm's length" too), so I use that value.

On Windows, DPI can be acquired from the following registry key: HKEY_CURRENT_USER\Control Panel\Desktop\WindowMetrics\AppliedDPI

You could also calculate DPI by dividing the screen's width by the X resolution of the current screen mode (or the X resolution of the monitor if that's all you have; I'm assuming most people use max. resolution these days.) If all you have is the diagonal measurement (for example, a '21" monitor' means the screen is 21" diagonally) and an aspect ratio (some are 4:3, some are 16:9, and many are neither), then you could use X_resolution/(diagonal_length*sin(atan(aspect_ratio)))). If the aspect ratio is 16:9, aspect_ratio would be 16/9. atan is arctangent -- not the same as tan. But since you probably don't have all that information, a safe bet is that the monitor has 96 DPI.

If you're not using the imperial system, you can multiply DPI by about 39 to get dots per meter. (I'm not bothering with more precision than that because we don't actually know our viewer's distance anyway.) If you used one of the above formulas to get DPI, plugging in meters instead of inches will naturally give you dots per meter instead of dots per inch.

So let's say we have our object's (x, y, z) position, where z is depth and z=0 is where the screen is.

1. Take the total distance of the object from the viewer's eyes. If your x, y, z are in physical units, that's simply z + viewer distance. If they're in pixel-sized units then you'll want to scale your viewing distance by DPI. Let's call the total distance dzt.
2. Take the ratio of viewing distance over dzt. Call it r.
3. Take your distance from x to the center of the screen along the x axis. This is simply x if x=0 in your world is directly ahead of the viewer. Otherwise, just add the distance you need to x so that x=0 is straight ahead. Call this dx.
4. Take x resolution / 2 (mid-point) and add to it dx*r. This is the object's x position on the screen.
5. Do steps 3-4 again, substituting x with y for the y position.
6. Scale the object to original size * r.

Obviously wherever we add or multiply physical sizes by pixels or some other scale we have to convert to the same measure, and then if we're left with a physical size in our result (or some other scale) we have to convert back to pixels before we display it.

You may have noticed that, in the diagram, the line of sight is not actually pointing to one of his eyes; it stops in between. I was just starting off with a non-anaglyphic image for the basics.

First, in case you don't know how anaglyphs work, I'll give a brief explanation. If you already know how they work, skip to "Anaglyphic Perspective Projection" ahead.

But before that, a brief explanation of depth perception. If you already understand it then you can skip to "Anaglyphs - Theory."

Depth Perception

Humans don't really see in 3d. We see in 2d. If we saw in 3d, we would see the complete insides of everything, similarly to how you can see everything inside and outside of a shape when you look at a 2d image, without any parts occluded. Instead, we only see the surfaces of things when looking at the 3d world. If we didn't see in 2d, then looking at 2-dimensional images of 3d objects on a screen wouldn't make much sense to us: it would be analogous to trying to infer an object from a line of pixels. But instead, the experience of looking at a 2d photograph is largely realistic.

Although we see in 2d, depth perception works for two main reasons: 1) apparent size is inversely proportional to distance, and 2) we have two eyes. How our brain processes the binocular vision is similar to how surveyors and astronomers use parallax. Say you're looking at an object exactly one foot away:




Your eyes cross just enough that they're both pointing to the exact same spot on the object, where your center of vision is. Notice there's a red dot half way between the viewer and the object. Note also that, relative to the left eye's line of sight, the red dot is slightly to the right. Relative to the right eye's line of sight, the object is slightly to the left. So from the observer's point of view, the left eye will see this:



And the right eye will see this:



The brain recognizes that the left and right eyes are seeing the same object at two different apparent positions. Based on the distance apart it determines the object's distance from you. This of course works not only with objects, but (moreso) with the individual features of an object. I say "moreso" because when the apparent images are too far displaced it just appears as double-vision, and distinct objects are more likely to be too different in depth for true stereopsis. I've personally experienced complete stereopsis over the entire range of depth (coinciding with a fixation span covering the full field of view), but I'm not sure why and that's another issue entirely. Perhaps, once upon a time, this was a normal thing for humans.

Anaglyphs - Theory

Thus in order to have the 3d sensation, your two eyes need to be able to see two different things. (Yes, even when looking at a monitor the eyes will see slightly different things at points on the monitor besides the one you're staring at because the eyes are looking at the monitor's 3d shape from two different angles. But the point of 3d vision is to give the eyes two distinct sensations that are actually based on the depths of the original objects.)

This is where 3D glasses come in. Even if you had a display that could somehow direct one image to one iris and another image to the other iris, it would work only when it knows exactly where your eyes are. That would be some pretty expensive technology in any case. Holograms (the baseball card kind, not the Star Trek kind) are of course an exception to this, but nobody knows how to animate a hologram from two video signals, or if somebody does then it's still not easy.

With photos or cinematography of real-life things, it's pretty easy to generate the two separate views: just record or photograph the scene exactly like the eyes would, that is, with two separate cameras about 2.5 inches apart. Show the image captured by the left camera to the left eye, and show the image captured by the right camera to the right eye.

Sometimes, you might want to take pictures with cameras further away from each other. This makes the object/expanse seem smaller than it really is, but renders greater depth information that can be used for analysis purposes or to allow us to see stereoptically over distances we normally can't (like a landscape) because our eyes are just too close together in comparison to the distance.

Anaglyphs - Types

There are a few different kinds of 3d glasses, but they all come down to the same principle: the display, in some way or another, shows both images, and it's the role of the glasses to filter the results so that the left eye sees only the left-eye component and the right eye sees only the right-eye component. Anaglyphs are a really crude method of doing this, but probably also the cheapest and the only way that works on normal media such as paper or a computer monitor (stereograms, stereoscopy or ChromoDepth notwithstanding).

Red/cyan anaglyph glasses filter out the image so that the left eye sees only the red component of the image and the right eye sees only the blue and green components. Thus, to produce a red/cyan anaglyph, you take only the red channel of the left-eye image and add to it only the blue and green channels of the right-eye image. This, of course, immediately results in an insane psychalgia of binocular color rivalry, but it does work.

Red/cyan (aka IYF) isn't the only option for anaglyphs. Other flavors of anaglyph are:
  • Red/green: this completely excludes blue, so you can have red, green, black, yellow, reddish yellows and greenish yellows, but nothing else. Not very good for skin tones, unless, perhaps, you're a Simpsons character. Works better with printing (as in children's books and comics) than computer monitors because the green of computer monitors leaks through the red filter.

  • Green/magenta (aka TrioScopic): should be about as effective as red/cyan; I don't know which is better, but since our eyes are most sensitive to red, it should be a better channel to single out in opposition to a combination of the other two channels with regard to lightness rivalry. On the other hand, red is the worst color when it comes to definition -- especially when using paper glasses -- so it may not be the best color to restrict a whole ocular to.

  • Red/blue: completely excludes green. Not as good as red/cyan.

  • Blue/yellow: yellow is red+green, so this covers the whole gamut, like magenta/green and red/cyan do. It's somewhat newer and might be better than red/cyan, but is apparently less popular than amber/blue. (There may not even be a blue/yellow, except as another name for amber/blue. I'm not sure yet.)

  • Amber/dark blue (aka ColorCode 3D): Full-color and shows red better than red/cyan, but can only be viewed well in a dark room, and apparently some skill on the part of the anaglyph maker is required to make blue and yellow show up. One person claimed that green/magenta is better than amber/blue because not much detail is carried in the blue channel on most video systems. One of the benefits of amber/blue is that we're least sensitive to blue light, so a amber/blue anaglyph appears more like a normal picture when viewed without 3d glasses. The blue filter passes 450-nm wavelengths, and the amber filter passes wavelengths ≥ 500 nm, which is red, green and a little bit of blue, so calculating left/right colors for ColorCode isn't as simple as just separating sRGB channels.

  • Red/clear: Not really popularized. See an example at http://abdownload.free.fr/Anaglyphs/13/13987_Body.html

  • Dark red/cyan+a little red (aka Anachrome): Like red/cyan, but the red is darker and the cyan lets some of the red through. Shows red better than red/cyan, but the image is darker and there's more lightness rivalry. This is a little-known format.
Red/cyan seems to be by far the most popular nowadays. One caveat regarding red/cyan is that for some reason the eye doesn't focus on the red perfectly, thus giving the left eye less detail. Only the non-paper (plastic) variety of red/cyan glasses, which you have to pay for, lenses the red side to correct for this problem. Also, red hues show up really badly with red/cyan.

In case I didn't make it clear, the method used to create the anaglyphic image must correspond with the colors used in the glasses, so the crux here is that the user won't be able to appreciate your production if they don't have the kind of glasses your application requires. (Some cases might be "good enough," such as viewing a red/blue anaglyph using red/cyan glasses and vice versa, but some mismatches won't work at all.)

As of this writing, 3D HDTVs (using shutter glasses) are just coming out. I hear that some graphics cards have HDMI output, so it might not be too difficult to use a 3D HDTV as a computer monitor. However, HDTV-3D viewing requires twice as many frames per second (120), and I'm not sure if current HDMI-capable graphics cards support that. I'm not even sure HDMI supports that. But even if not, it may become a possibility in the near future, and this would be a *much* better alternative than using anaglyphs -- except that you *may* need either total control of the device (such as in full-screen mode), or for the OS to be 3D-aware.

Anaglyphic Perspective Projection

In a nutshell, it's the algorithm I explained in "Flat Perspective Projection," except that you create two images. In one, D is 1.25" left of center of screen, and in the other D is 1.25" right of the center. This is because human eyes are, on average, spaced 2.5" apart (though yes, it does vary). The fact that the user's eyes won't necessarily actually be positioned along the lines of Dleft and Dright doesn't really matter...much. For a red/cyan anaglyph, remove the blue and green channels from the left image and remove the red channel from the right image, and combine the results. (Generally red/cyan glasses have red on the left. That's not because the original patent had red on the left, but because it didn't.. :P)



Just apply the logic given for a flat projection to the above model, once for each blue line. That's all there is to it. The formula automatically accounts for things with a negative depth, too -- i.e., things that pop out in front of the screen.

For any of the other type of anaglyph glasses, it's pretty much the same idea: just restrict each component image to whatever channels the filter for that eye passes. So for green/magenta, for example, have its left image restricted to the green channel and its right image restricted to the red and blue channels, since red+blue = magenta. An exception is amber/blue, for which the formula is this: restrict the left channel to red, and for the right use no red or green but make the blue channel red*15%+green*15%+blue*7%. Why? I dunno. Also, an alternative way to color for red/cyan glasses is that, for the red, instead of just taking the red channel you use the overall lightness of all three channels (as if saturated to greyscale). I don't know what the difference it makes is.

There are a number of advanced techniques some have developed for making anaglyphs which reduce lightness rivalry, reduce "ghost images" (due to cross-leakage of color), improve color rendition, etc. I don't know much about them, but here are the links I have.





Motion-tracking??

Obviously as a person moves their perspective changes, so ideally so should the image being displayed. Can you imagine the 3D image looking fixed in space no matter how you move? You could even look at it from a different direction to "look around" an object.

3D glasses happen to be almost ideal for tracking using just a webcam and a somewhat straightforward algorithm. Just scan the image for a pure red blob (intensity-variant), then scan proximally to that for a pure cyan blob.
  • Distance between the red and cyan blobs will tell you distance from the camera.
  • Their location in the field of view (combined with distance) will tell you where the glasses are actually located, which will give you Dleft, Dright, and viewer_distance.
  • Angle between them will tell you if there's any rotation, i.e. if their head is cocked. If there is then the two D's could equally be rotated around each other so that the 3D still works, but I think it would require the use of sine and cosine in the projection formula, making it slightly more complicated.
  • The person can also change the way he's facing, which would be indicated by skewing of the shapes. You don't really need to know the way he's facing for its own sake, but this has to be watched out for because skewing could easily be misinterpreted as further distance from the camera using a more naive algorithm.

  • It would be a bit difficult to pull off, but possible and would make for a really impressive display. Some caveats, though:

    • You'd have to know where the camera is positioned or have some calibration method to be able to determine where the glasses are. A calibration method might involve telling the person put their face in a certain spot and then click. Otherwise, an educated guess that they start off at normal viewing distance and in line with the center of the screen might at least have cool, if mildly inaccurate, results.

    • You have to know the camera's angle of view. That, together with its current resolution, will allow you to make meaningful size and distance inferences. I don't know if there's a way to find that other than looking up the specific model. A typical guess for this might work somewhat, not sure.

      Making an assumption based on the calibration frame might also work, at least if the camera is in a predictable spot like centered on top of the monitor. (Calculate distance from camera using assumed viewer distance, monitor dimensions and camera's location relative to the monitor, then based on assumed distance between color filters, distance from camera and pixels between the filters, calculate the camera's angle of view per pixel.)

    • You may want to use a camera resolution that supports at least 30fps. Otherwise the display update will be too jerky and that's just lame. However, the lower the resolution you use the less finely you can deduce position. That's probably less of an issue -- a little bit of position-inaccuracy will be hardly noticeable.
    This method could apply to normal visualizations as well as 3D. Somebody's actually done something similar on the Wii (not in 3D vision), but by switching the roles of the controller and the sensor bar and wearing the sensor bar on his head, rather than using a webcam. It's pretty neat and he made a little test program for it: http://www.youtube.com/watch?v=Jd3-eiid-Uw. HOWEVER I DID NOT GET ANY OF MY IDEAS FROM HIM. :P`


    No comments:

    Post a Comment