Saturday, September 18, 2010

Poro 0.1.0 Released

Today I released the Ruby Gem Poro v0.1.0. It is a first working version of an extensible persistence engine. Currently it only has support for MongoDB, but I plan on also supporting SQL and MemCache before going to version 1.0.

You can get it by either calling gem install poro or by downloading or forking the source on GitHub.

Poro takes a slightly different philosophy from existing gems: Recently I was struck with a thought, in the Ruby world the majority of persistence engines take philosophy of having a base persistence object, and then subclassing that object for everything that needs persistence. This, it seems to me, puts the thought of persistence before the thought of what the object does. In other words, it seems like the chicken before the egg.

In my mind, one should first be worried about what an object does and implementing that. After that, they should be able to transparently add persistence to the object, without actively affecting it.

Poro (which stands for Plain Ol' Ruby Object) tries to take a hands-off approach. It does this by generating and configuring persistence management objects--called contexts--that manage the persistence for a given class. Thus, each class that one wishes to persist has a context sitting off to the side, that can be used (in a functional language like way) to persist an object.

Of course, I realize the convenience of models, so there is a basic model mixin that can be used to your object to add methods like find and save. (In the future, I plan on breaking apart this mixin into several modules that can be included all at once in the same way--as they are now--or in pieces, so that you can add only the pieces you want to use.

At this point, the major thing missing from Poro is testing and tuning. It will probably contain bugs and be slow, but I have plans to use the MongoDB context for a project soon, so it'll should be brought up-to-snuff relatively quickly.

Thursday, January 7, 2010

Rant: Names Are NOT Identifiers

Names Everywhere

I see this happen often: Someone creates a list of items with a "name" column. The backend team uses the name to fetch and compare these items. The UI team uses the name as a human readable description. Hilarity ensues.

Let me back-up and give an example: Let's say we have a list of states that an order can be in, things like "New", "Being Fullfilled", "Shipped", etc. The backend team, naturally, writes code like "Transition all the orders that are in the state 'New' and over 2 days into the state 'Overdue'". The front-end team creates a list of Orders that clients can view online and places the state name on the screen.

So what happens when it is decided that "New" needs to read "Pending"? If the UI team changes the name in the table to "Pending", all the code written around lookup and comparison to "New" breaks!

Names are Two Concepts in One: Identifiers and Labels

What happened here is that the developers confused the concept of an identifier with a label. The former should never be shown in a UI, but should always be used by developers for look-up, comparison, and whatever else they need. The latter should only ever be used for user display, and never for anything else.

In other words, mixing these into a single column is a layering violation, because you are mixing business logic and user interface display layers together! Making sure you don't breech these layers ensures that you can change the label all willy-nilly at anytime and not break anything.

What About My "id" Column?

As a last note, I often get asked (which pertains mostly to users of ORMs that rely on a numeric id column as a convention) is: why not use the numeric id as an identifier? In a small project this is probably fine. But in large or long running projects, it can get more complex: Sometimes the table's contents aren't predicable enough. FOr example, sometimes you have multiple clients running the same application, but with different (developer defined) entries in their own databases. The numeric IDs just aren't reliable because they are created via a sequence counter (auto-increment, serial, whatever). You are just asking for trouble if your project becomes anything more than a pet one.

Friday, May 15, 2009

Tip: Compiling the Postgres Gem for Ruby 1.9.1

[EDIT (2010-JAN-16): The original postgres gem has been replaced by ruby-pg. If you are having the problem below, try installing the pg gem via sudo gem pg install.]

Crash!

I compiled a fresh copy of Ruby 1.9.1 onto my MacBook Pro and proceeded to install Ramaze, Sequel, Thin. Things went well and I was happy... that is, until I tried to install the postgres gem. I'm talking the C-extension one, which runs so much faster than the native one that it isn't even funny.

$ sudo gem install postgres

That's when it happened: bam! exception!

/usr/local/bin/ruby extconf.rb install postgres
extconf.rb:4:in `<main>': uninitialized constant PLATFORM (NameError)

No good. How am I supposed to develop enterprise Ruby software when I cannot connect to the database?!

What Happened

After poking around on the internet, I discovered two things about Ruby 1.9.1:

  1. The PLATFORM environment variable is now RUBY_PLATFORM, and
  2. The C macros for working with a Ruby Array changed.

The Solution

To solve this problem, one could learn what to do and hand-change all the code. This is a waste of time. I set-up a couple sed filters instead. Thus, to get your postgres adapter working, just do the following:

$ cd /usr/local/lib/ruby/gems/1.9.1/gems/postgres-0.7.9.2008.01.28/ext
$ sudo sed "s/PLATFORM/RUBY_PLATFORM/" extconf.rb > ./extconf.rb
$ sudo ruby extconf.rb
$ sudo sed "s/RARRAY(\([_a-zA-Z0-9]*\))->ptr/RARRAY_PTR(\1)/; s/RARRAY(\([_a-zA-Z0-9]*\))->len/RARRAY_LEN(\1)/; s/row->len/RARRAY_LEN(row)/; s/row->ptr/RARRAY_PTR(row)/" postgres.c > ./postgres.c
$ sudo make
$ sudo make install

Disclaimer

While I have been successfully using this patch, I have not tested it in a production environment. Therefore, you should put this patch through its paces before using it on anything critical.

Lastly, but definitely most importantly, I am NOT liable for anything bad that may happen as a result of using this patch. It is up to you to thoroughly test it for any problems, which may include (but are not limited to) loss of data on your system, loss of data on your database, corruption of your Ruby installation, self destruction of your hard drive, spontaneous combustion of your printer, Swine flu infection, SARS transmission, broken lawn mowers, and rancid ice cream.

Tip: Ruby 1.9.1 RDocs in HTML

That's A Big Process!

I like my HTML RDocs. They are easy to browse and search. Unfortunately, unlike ruby 1.8.x, which made compiling the HTML RDocs trivial, ruby 1.9.1 seems to only make ri documentation trivial... and I soon found out why.

It turns out that on my old MacBook Pro (maxed out with only 2 GB of RAM), I didn't have enough RAM to build the process. Somewhere around 3/4 of the way done, the thing would jump into swap and crawl to a halt. I aborted and decided I'd have to live with online and ri documentation.

Recently, however, I upgraded to a new MacBook Pro with 4 GB of RAM on board and decided to try compiling the RDocs again. After 25 minutes and around 2 GB of RAM for the single ruby process, it worked! (Why it takes so much RAM is beyond me... they need to rethink how RDoc is written).

Where Do I Get It?

Considering that I've gone through all the work, there is no reason why I can't share, so I've published the directory to my GitHub account in the ruby-rdoc project.

While other GitHub users will probably just clone the repository to keep up on the latest updates, most people will want to download a tarball or zip archive. To do this, simply go to the downloads page and pick the documentation that goes with the ruby version you want.

Tuesday, July 15, 2008

Essay: Scale Railroading (Part 2: Time)

Model railroaders frequently like to talk about scale speeds, and often quote the scale speeds at which their models run. However, it is very clear there are no physicists amongst them, because they way they scale speed is incorrect by most measures.

How It's Done

Let me back-up. I should first explain how modelers currently do it, and the reason why they do it the way they do.

To do this, I need to do a little math. Bare with me, because I don't need a lot of math, but for this article to be meaningful to anyone, a little bit of simple algebra needs to be invoked.

Speed is calculated by the equation
    v = d/t,
where d is the distance travelled and t is the time it took.

Additionally, we will introuce an equation to take a real distance d and change it into a scale distance d' using the scale factor m
    d' = m*d.

Now if I want to calculate the scale speed v'--according to the vast majority of modelers--I can do it by cleverly combining these too formulas. I must first observe that I can change out the scalable quantities in the velocity equation to produce
    v' = d'/t,
and then combine that with the scaled distance equation to produce the relationship
    v' = m*v.

For those of you that love examples, an HO scale railroad would give m a value of 1/87.1, or approximately 0.0115. Thus a track speed of 60 mph would be equivlent to very nearly 0.7 mph, which is 1 ft/sec.

What About Time?

Of course, the question we should ask ourselves, is why do we scale both the velocity and the distance in the above equation, but not time? Is there a fundamental reason why time should not be scaled?

To most, the reason is intuitive. It just seems absurd. And if you were to naively scale time in the same way we scale distance, you'd be right.

Going back to the ugly math, let us write the velocity equation with all the coordinates scaled, and then scale both distance and time by the factor m. Doing so produces the equation for scaled velocity as     v' = d'/t', and after relating it back to the original velocity through the scaling equations, d'=m*d and t'=m*t, we get
    v' = v,
which is, of course, an absurd result!

Imagine having to run your model trains at an actual 60mph around the track! Your very expensive train, if it could even produce that much speed, would go flying off the track and probably injure someone in the process!

But this is not the end of the story for time scaling; Indeed, this is just the beginning.

One Man's Scale Clock

One of the justifications one hears for measuring scaled time the way it does, is because our concept of time is so thoroughly rooted in our own wrist watches. Indeed, when timing out the distance a model train goes, we run off to our wrist watches, happily counting off non-scaled seconds, and never even give it a second thought.

Human cognitive function is so rooted in this absolute concept of time, that the theory or relativity (which dictates that time runs at seemingly contradictorily different rates depending on where you are) is still impossible for most scientists to intuitively work with.

That being said, let's devise an interesting thought experiment that messes with our concept of a uniform clock.

In the real world, I set up a simple pendulum clock next to the train tracks, and seeing how far the next locomotive to pas the clock gets in one cycle of the pendulum.

Now, if we made a scale model of the simple pendulum clock, and put it next to our model train tracks, how fast (in the real world) would we need to run that model train to cover the scaled distance the real train traveled in one cycle of the pendulum?

In other words, given the scaled distance and the scaled concept of time from the ticking pendulum, how does the speed scale?

To solve this problem, one must (once again) invoke some math. More specifically, the first step is to find the equation that relates the length of one tick of the pendulum, T, with the length of the pendulum L. A little looking around the web will show you that the period of a pendulum's swing is given by
    T = 2*pi*sqrt(L/g),
where g is the acceleration of gravity, which is a constant at any position on the Earth. Combining that with the speed equation, we find that the speed of a the train is related to the pendulum's length via
    v = (d/(2*pi)) * sqrt(g/L).

If one scales the distance L by the factor m, and uses the techniques presented above, it is very easy to show the surprising result that the scaled velocity v' is related to the real velocity v by the relationship:     v' = v * sqrt(m).

Surprise!

So what does that mean?

Think about it for a moment. Let's say, in the real world, we have a clock with a pendulum that swings a complete cycle once every second. This allows a train passing at 60mph to travel 88ft during that one second. Now if we made an HO scale model of this clock, and placed it next to our model train, we'd have to run the train at about 9.43 ft/sec in order to cover a scaled 88 ft distance (12 1/8 inches) in one tick of the pendulum! That is nearly 10 times faster than the speed of 1 ft/sec we calculated using the classic relationship.

Of course, the reason for this is that the clock itself runs faster. Indeed, a pendulum that is shorter by a factor of 100 will run at a rate that is 10 times faster than that of the longer pendulum.

Thus, in this example at least, it is clear that we have scaled not just distance, but also time.

Superelevation

Of course, I hear you say, this is just one example. One example that could have been carefully constructed to give the desired result. And I'm happy to hear that skepticism, because that is what real science is all about.

So let's try another situation entirely. Train curves are superelevated, just like roads, in order to make corners smoother. Ideally, this is done by finding the elevation angle such that the rails tilt the coach in a way that exactly balances the forces so that they pull straight down through the bottom of the coach, causing you to feel no lateral forces at all (and keeping the tea in your cup from spilling).

So the question is, if we built a scale model of a real world curve that is ideally superelevated, and we perfectly model that curve in our model train set, what speed would we have to run the model train at so that the forces are balanced for the model passengers with their model tea cups in the model coach?

Of course, to answer that, we need to know that the equation for relating the superelevation angle theta of the coach (and hence the rails it is on), to the curve's radius R and the design speed of the train around the curve v.

Fortunately, this is a problem that is solved in the first few months of class by every student of physics in the world, and the correct answer is always
   v^2 = g*R*tan(theta).

Applying the same techniques to this equations as the examples before it, we find something surprising: yet again, the relationship between the scale velocity v' and the real world velocity v, is given by
    v' = v * sqrt(m).

In other words, if a real world curve has its superelevation perfectly designed to a 60mph train, then an HO scale model train would have go around that curve at 9.43 ft/sec to keep the tea in the model cup from spilling while going around the corner. (This, of course, being a good deal faster than the 1 ft/sec scale speed that modelers would calculate!)

Welcome to the Movies

It's hard to argue with physics, and so far I've given two entirely unrelated physical situations in which I have no choice but to produce the same, very surprising result.

But, being the stubborn kind of people we are, it is hard to accept radically different ideas (indeed cognitive dissonance is a very fascinating field of psychology), and so I fear I must produce one more example before you actually believe what it is I have to say.

Imagine you are a special effects artist working for Robert Zemeckis and Bob Gale, and are in charge of the crash sequence at the end of Back to the Future III, where the beautiful steam engine drives off the cliff at 88mph, and plummets into the Earth below.

Obviously, Sierra Railway will not let you drive their prized engine number 3 off a cliff, so you must resort to models to do it!

Being the sort of person you are, you want the shot to look realistic. This requires you to satisfy two criteria:

  1. You must find the correct speed to drive your model train off the cliff so that it follows the exact same parabolic path through the air as the real thing would, except scaled down according to the scaling equations we now know and love.
  2. You must run the film through the camera at a faster rate so that, when slowed back down to the standard 24 fps of a theater projector, it looks like it took the same amount of time to hit the ground as the real thing, even though it actually hit much sooner due to the much shorter distance.

Now the math for this one is much too ugly to show here, but rest assured that over the last year I have calculated it several different times, several different ways, and always come to the same conclusions:

  • Velociy scales by the same sqrt(m) factor as we found in all of the previous examples.
  • Time scales by the sqrt(m) as well!

Time Does Scale

So we've determined that velocity scaled according to the square root of the scale factor, and that time also scales by the square root of the scale factor. These are not independent results! Indeed, following the lead of theories such as special relativity, the fundamental equations are best taken as atransformation of the basic coordinates; that is to say, the basic equations are:
    d' = m*d
    t' = t*sqrt(m)

One can then easily derive the velocity transormation equations by applying these transforms to the scaled velocity equations v'=d'/t' to get the velocity transform equation v'=v*sqrt(m).

But Why?

But why does time scale in this seemingly absurd way? It is very counter intuitive, even to me, that this would be the case.

As it turns out, the common thread is gravity. The pendulum uses gravity to drive it. Superelevation is determined as balancing gravity with centrifugal aceleration. And, of course, a projectile motion is the quintessential problem of gravity.

More specifically, gravity is scale invariant. No matter how you scale your model railraod, the gravitational acceleration roots you down to a very specific scale of motion. In other words, you have no choice but to accelerate a real 9.8 meters per second per second, no matter how you scale!

This scale invariance, then, gives a way for both the brain, and for measuring devices, to determine the scale factors involved, unless you change the rate of time!

So why does the rate of time change the way it does? The easiest way is to use a bit of differential calculus. But that is beyond the scope of this article. Instead, I will note that the constraint on how to scale time is imposed by the quadratic relationship between distance and time under the effects of gravity. More technically, since d is proportional to t^2, and we know that d scales according to d'=m*d, it necessarily constrains t to scale by t'=t*sqrt(m).

So Now Then

While modelers as a whole have determined what feels to a reasonable scaling methodology (space scales, but time does not). But to anyone serious about modeling the physics of trains correctly, it is important to realize that it isn't a matter of what feels right, but instead is determined by what is right.

And as much as everyone feels that time does not scale, the physics does not agree. Indeed, time must scale in order to replicate the balance of the physical forces involved.

Monday, July 14, 2008

Essay: Scale Railroading (Part 1: Space)

If you spend much time around railroad modelers, they tend to talk a lot about scale, and particularly in reference to speed.

Being a physicist, I've put some thought into scaling things, and don't agree with the reasoning for how to derive scale speed. But that is for Part 2: Time.

In this part, I want to cover some basics about pure spacial scaling, and how standard layouts and the real world already agree very poorly.

Length is very important. But how big is a model layout in the realworld? How long would a real North American train be in a scaled layout?

Prototypicality

Train Length

Let's answer the latter question first. In North America, trains are often 100+ cars long, resulting in lengths of around a mile long. Given the most favored modeling scale, HO, which has a 1:87.1 scale factor, 1 mile is a hair under 61 feet long!

This means that a scale model railroad of North American trains needs 60 feet just for a single train!

To put this in perspective, an 8'x4' layout, which is fairly common to find in spare bedrooms, only gives about 20' on an oval track around the edge, which is only about 1/3rd of a mile!

So how does this scale relate to the real world?

The Shopping Mall

Doing even a small real life layout would take a space about the size of a shopping mall. Indeed, let's look at a mall of about average mid-sized town length. Spokane Valley, WA, and Santa Rosa, CA, each serve communities of around 1/4 million, and each have a mall that supports three department stores, and are of a length-wise construction. In each case, if you ran an HO scale layout from the far end of one department store, through the main corridor of shops, and to the far end of the opposite department store, you'd have about 1400ft in which to model. Translate this to miles, and you get very nearly only 24 scale miles of layout! This is smaller than many steam excursion branch railway lines!

Trees Are How Tall?

While this puts a lot into perspective, I do find the most interesting case--and the one that will actually surprise a lot of modellers--to not be a geographic distance, but instead the scale modeling of an evergreen tree.

The Northern California coast is filled with dense redwood forests which were, and still are, logged of their trees. These trees, the tallest in the world, can grow over 300 feet tall and become over 26 feet in diameter, and in the time in which it was logged, it was common to find trees 200 feet tall and 18 feet in diameter.

Thus, if you were modeling a logging railroad, you'd need redwood trees that are nearly 3.5 feet tall whose trunks are made from dowels about 2.5" in diameter! I dare you to build a layout with trees that large and not have all your friends call it absurd.

So how big are those 6" tall model evergreens you probably have on your layout? A measly 44 feet, which is about 1/5 the maximum height of a Pacific Coast Ponderosa Pine tree!

Curves

Lengths are all fine and dandy, but something that modelers tend to pay less attention to is how tight those curves really are. This is compounded by the fact that track manufacturers tend to sell pre-molded curve pieces that are sizes like 15", 18" and 24".

But Just How tight are those curves? To answer that, we must first learn how curves are measured.

Surveying Curves

Railroads survey their curves by using what is called the "degree" of the curve. According to an excellent article by Robers S. McGonigal in TRAINS Magazine, this is done by measuring the change in heading of a piece of curved track that is 100ft in length. Expressed mathematically, one can find that the radius of a curve in feet can be found by dividing 5729 by the degree of a curve.

Furthermore, the article goes on to explain that for a typical mainline, curves are limited to only 1-2 degrees, but that mountainous territory dictates curves of around 5-10 degrees. Furthermore, the limit for a four-axle diesel with rolling stock is about 20 degrees and the locomotive by itself can handle curves up to about 40 degrees.

Scaling Curves Down

So how do these real world curves relate to an HO scale model?

As modelers tend to like the beauty of mountainous scenery, let's start with those 5-10 degree curves. In the real world, these correspond to a curve radius of about 570-1150 feet. Scaled down, these correspond to a relatively giant 79"-158" curve radii!

But if those curves are so tight, how tight are those model curves that the track manufacturers make?

As it turns out, the standard 18" curve is already nearly a 44 degree, 130 scale foot curve, which is already in excess of the maximum for a four-axle engine with out any rolling stock!

So what are the limits? For a 20 degree curve (287ft radius), you do not want to have smaller than a 39.5" curve radius, and for a 40 degree curve (144ft radius), you do not want to have smaller than a 20" curve radius

Indeed, as a rule of thumb, an HO scale modeler can divide 800" by the degree of curve you wish to obtain, and you will obtain a pretty good estimate of the necessary curve radius you need for your model. Equivalently, you can divide 800" by the radius of a curve on your layout and get a fairly good estimate of the curve degree.

Of course, in real life, if you plan on having speed, you'll want to model those curve radiuses of only 1-2 degrees, which correspond to around 1/2-1 mile. But most people don't have the room to construct a prototpyical curve in their layout, as a 30'-60' curve radius is bigger than pretty much any layout.

Saturday, July 12, 2008

Rant: Proper Planning is Pivotal

My brother and I are, on the side, developing some software for our own enjoyment. However we take our development effort seriously, and have spent a good deal of time writing small technology tests and, most importantly, documenting our specification so that we can agree on the path that we are taking.

Our spec is mainly composed of two things:

  1. The feature specification, and
  2. the architectural plan.

Both of these are important. Even if everyone agrees on the features that should be implemented (and we'll assume that there weren't the normal communication issues involved in doing this without a written spec), everyone's programming/architectural styles can be incredibly different.

Unfortunately, this can lead to a mess later.

The Draw

It is very tempting to ignore this obvious fact, especially when you have small teams of developers working under tight timelines and budgets. Your team will find itself having one-hour meetings to verbally discuss a feature and how it should work, and then the developers will all run off to their caves and start writing code.

Initially this actually works out really well. Everyone is building their separate pieces, plugging along and writing code in the way they write best, which results in quick turnaround.

It is a manager's best dream. Everything is getting done ahead of time and under budget!

The Problem

But what happens when the pieces have to start fitting together?

At some point the pieces that everyone writes have to start fitting together. If everyone is writing to their own architectural drum in their own style, how easy is this integration going to be?

Even worse, how many of your developers are good architects? Did those quick up-front times result from the developer doing whatever was quickest to get their pieces done? What happens when, as anything in real life happens, the spec changes a little? How about when it changes a lot?

And none of this includes what happens when developers have to work on eachother's code. Where is a document to help them figure out what was done, especially since the code is so incredibly different from their own?

It Happens

These aren't all hypothetical questions. I've encountered this problem in the real world. Specs aren't thought out and aren't documented to save time. It's alluring. I've even bought into it. Turn around times are fast and everyone is happy...

...at first.

But after a year on a project, this can start to turn around. You start discovering all the webs of spaghetti code that were weaved to meet deadlines, and the hacks involved when a developer didn't want to spend the time to ask someone else how to integrate with their component, or the bugs that appear because of the bad assumptions that lazy programmers will make.

I've seen all these problems over and over again, and sometimes these problems can bog down development to the point that whole sections of code must be stripped out and re-written to get around it.

The Solution

In programming, the money is where there are hard problems that need to be solved, and hindsight is 20/20. So no matter how much planning you do, you'll never be happy with your solution. There is always something that you should have done.

Which is why architecture is so important. If architected properly, a system is set-up so that when these problems are found, they are easier to solve. One must strive for expandability and refactorability, but without spending time creating interfaces and functionality that are never used.

As I always say, a good software architect dares to dream and plans ahead for the future, but knows where to stop to keep development times and costs down.

For this to succeed, there must be somethind that is in control. Something that is guiding everyone else towards a common goal.

That is the architectural spec.

The Spec

Of course, someone has to create the spec. That should be the job of multiple people. Ideally, there should be an open dialog between the developers that are writing new code, the developers that wrote any code that is being interfaced with, and an architect that can coordinate the input from these developers into the common design goals and styles of the application as a whole.

And this is easy to do with the small teams that are usually lured into ignoring spec writing altogether!

Words of Advice

My advice to you: whether you are two programmers or hundreds of programmers, specification documents, and clear architectural leaders, are important to the overall long-term success of your software.

You may be tempted by The Devil to ignore proper specification in order to achieve the impossible up-front, but like any deal with The Devil, you better be prepared to tackle the ugly consequences that follow.