Saturday, September 18, 2010

Poro 0.1.0 Released

Today I released the Ruby Gem Poro v0.1.0. It is a first working version of an extensible persistence engine. Currently it only has support for MongoDB, but I plan on also supporting SQL and MemCache before going to version 1.0.

You can get it by either calling gem install poro or by downloading or forking the source on GitHub.

Poro takes a slightly different philosophy from existing gems: Recently I was struck with a thought, in the Ruby world the majority of persistence engines take philosophy of having a base persistence object, and then subclassing that object for everything that needs persistence. This, it seems to me, puts the thought of persistence before the thought of what the object does. In other words, it seems like the chicken before the egg.

In my mind, one should first be worried about what an object does and implementing that. After that, they should be able to transparently add persistence to the object, without actively affecting it.

Poro (which stands for Plain Ol' Ruby Object) tries to take a hands-off approach. It does this by generating and configuring persistence management objects--called contexts--that manage the persistence for a given class. Thus, each class that one wishes to persist has a context sitting off to the side, that can be used (in a functional language like way) to persist an object.

Of course, I realize the convenience of models, so there is a basic model mixin that can be used to your object to add methods like find and save. (In the future, I plan on breaking apart this mixin into several modules that can be included all at once in the same way--as they are now--or in pieces, so that you can add only the pieces you want to use.

At this point, the major thing missing from Poro is testing and tuning. It will probably contain bugs and be slow, but I have plans to use the MongoDB context for a project soon, so it'll should be brought up-to-snuff relatively quickly.

Thursday, January 7, 2010

Rant: Names Are NOT Identifiers

Names Everywhere

I see this happen often: Someone creates a list of items with a "name" column. The backend team uses the name to fetch and compare these items. The UI team uses the name as a human readable description. Hilarity ensues.

Let me back-up and give an example: Let's say we have a list of states that an order can be in, things like "New", "Being Fullfilled", "Shipped", etc. The backend team, naturally, writes code like "Transition all the orders that are in the state 'New' and over 2 days into the state 'Overdue'". The front-end team creates a list of Orders that clients can view online and places the state name on the screen.

So what happens when it is decided that "New" needs to read "Pending"? If the UI team changes the name in the table to "Pending", all the code written around lookup and comparison to "New" breaks!

Names are Two Concepts in One: Identifiers and Labels

What happened here is that the developers confused the concept of an identifier with a label. The former should never be shown in a UI, but should always be used by developers for look-up, comparison, and whatever else they need. The latter should only ever be used for user display, and never for anything else.

In other words, mixing these into a single column is a layering violation, because you are mixing business logic and user interface display layers together! Making sure you don't breech these layers ensures that you can change the label all willy-nilly at anytime and not break anything.

What About My "id" Column?

As a last note, I often get asked (which pertains mostly to users of ORMs that rely on a numeric id column as a convention) is: why not use the numeric id as an identifier? In a small project this is probably fine. But in large or long running projects, it can get more complex: Sometimes the table's contents aren't predicable enough. FOr example, sometimes you have multiple clients running the same application, but with different (developer defined) entries in their own databases. The numeric IDs just aren't reliable because they are created via a sequence counter (auto-increment, serial, whatever). You are just asking for trouble if your project becomes anything more than a pet one.

Friday, May 15, 2009

Tip: Compiling the Postgres Gem for Ruby 1.9.1

[EDIT (2010-JAN-16): The original postgres gem has been replaced by ruby-pg. If you are having the problem below, try installing the pg gem via sudo gem pg install.]

Crash!

I compiled a fresh copy of Ruby 1.9.1 onto my MacBook Pro and proceeded to install Ramaze, Sequel, Thin. Things went well and I was happy... that is, until I tried to install the postgres gem. I'm talking the C-extension one, which runs so much faster than the native one that it isn't even funny.

$ sudo gem install postgres

That's when it happened: bam! exception!

/usr/local/bin/ruby extconf.rb install postgres
extconf.rb:4:in `<main>': uninitialized constant PLATFORM (NameError)

No good. How am I supposed to develop enterprise Ruby software when I cannot connect to the database?!

What Happened

After poking around on the internet, I discovered two things about Ruby 1.9.1:

  1. The PLATFORM environment variable is now RUBY_PLATFORM, and
  2. The C macros for working with a Ruby Array changed.

The Solution

To solve this problem, one could learn what to do and hand-change all the code. This is a waste of time. I set-up a couple sed filters instead. Thus, to get your postgres adapter working, just do the following:

$ cd /usr/local/lib/ruby/gems/1.9.1/gems/postgres-0.7.9.2008.01.28/ext
$ sudo sed "s/PLATFORM/RUBY_PLATFORM/" extconf.rb > ./extconf.rb
$ sudo ruby extconf.rb
$ sudo sed "s/RARRAY(\([_a-zA-Z0-9]*\))->ptr/RARRAY_PTR(\1)/; s/RARRAY(\([_a-zA-Z0-9]*\))->len/RARRAY_LEN(\1)/; s/row->len/RARRAY_LEN(row)/; s/row->ptr/RARRAY_PTR(row)/" postgres.c > ./postgres.c
$ sudo make
$ sudo make install

Disclaimer

While I have been successfully using this patch, I have not tested it in a production environment. Therefore, you should put this patch through its paces before using it on anything critical.

Lastly, but definitely most importantly, I am NOT liable for anything bad that may happen as a result of using this patch. It is up to you to thoroughly test it for any problems, which may include (but are not limited to) loss of data on your system, loss of data on your database, corruption of your Ruby installation, self destruction of your hard drive, spontaneous combustion of your printer, Swine flu infection, SARS transmission, broken lawn mowers, and rancid ice cream.

Tip: Ruby 1.9.1 RDocs in HTML

That's A Big Process!

I like my HTML RDocs. They are easy to browse and search. Unfortunately, unlike ruby 1.8.x, which made compiling the HTML RDocs trivial, ruby 1.9.1 seems to only make ri documentation trivial... and I soon found out why.

It turns out that on my old MacBook Pro (maxed out with only 2 GB of RAM), I didn't have enough RAM to build the process. Somewhere around 3/4 of the way done, the thing would jump into swap and crawl to a halt. I aborted and decided I'd have to live with online and ri documentation.

Recently, however, I upgraded to a new MacBook Pro with 4 GB of RAM on board and decided to try compiling the RDocs again. After 25 minutes and around 2 GB of RAM for the single ruby process, it worked! (Why it takes so much RAM is beyond me... they need to rethink how RDoc is written).

Where Do I Get It?

Considering that I've gone through all the work, there is no reason why I can't share, so I've published the directory to my GitHub account in the ruby-rdoc project.

While other GitHub users will probably just clone the repository to keep up on the latest updates, most people will want to download a tarball or zip archive. To do this, simply go to the downloads page and pick the documentation that goes with the ruby version you want.

Tuesday, July 15, 2008

Essay: Scale Railroading (Part 2: Time)

Model railroaders frequently like to talk about scale speeds, and often quote the scale speeds at which their models run. However, it is very clear there are no physicists amongst them, because they way they scale speed is incorrect by most measures.

How It's Done

Let me back-up. I should first explain how modelers currently do it, and the reason why they do it the way they do.

To do this, I need to do a little math. Bare with me, because I don't need a lot of math, but for this article to be meaningful to anyone, a little bit of simple algebra needs to be invoked.

Speed is calculated by the equation
    v = d/t,
where d is the distance travelled and t is the time it took.

Additionally, we will introuce an equation to take a real distance d and change it into a scale distance d' using the scale factor m
    d' = m*d.

Now if I want to calculate the scale speed v'--according to the vast majority of modelers--I can do it by cleverly combining these too formulas. I must first observe that I can change out the scalable quantities in the velocity equation to produce
    v' = d'/t,
and then combine that with the scaled distance equation to produce the relationship
    v' = m*v.

For those of you that love examples, an HO scale railroad would give m a value of 1/87.1, or approximately 0.0115. Thus a track speed of 60 mph would be equivlent to very nearly 0.7 mph, which is 1 ft/sec.

What About Time?

Of course, the question we should ask ourselves, is why do we scale both the velocity and the distance in the above equation, but not time? Is there a fundamental reason why time should not be scaled?

To most, the reason is intuitive. It just seems absurd. And if you were to naively scale time in the same way we scale distance, you'd be right.

Going back to the ugly math, let us write the velocity equation with all the coordinates scaled, and then scale both distance and time by the factor m. Doing so produces the equation for scaled velocity as     v' = d'/t', and after relating it back to the original velocity through the scaling equations, d'=m*d and t'=m*t, we get
    v' = v,
which is, of course, an absurd result!

Imagine having to run your model trains at an actual 60mph around the track! Your very expensive train, if it could even produce that much speed, would go flying off the track and probably injure someone in the process!

But this is not the end of the story for time scaling; Indeed, this is just the beginning.

One Man's Scale Clock

One of the justifications one hears for measuring scaled time the way it does, is because our concept of time is so thoroughly rooted in our own wrist watches. Indeed, when timing out the distance a model train goes, we run off to our wrist watches, happily counting off non-scaled seconds, and never even give it a second thought.

Human cognitive function is so rooted in this absolute concept of time, that the theory or relativity (which dictates that time runs at seemingly contradictorily different rates depending on where you are) is still impossible for most scientists to intuitively work with.

That being said, let's devise an interesting thought experiment that messes with our concept of a uniform clock.

In the real world, I set up a simple pendulum clock next to the train tracks, and seeing how far the next locomotive to pas the clock gets in one cycle of the pendulum.

Now, if we made a scale model of the simple pendulum clock, and put it next to our model train tracks, how fast (in the real world) would we need to run that model train to cover the scaled distance the real train traveled in one cycle of the pendulum?

In other words, given the scaled distance and the scaled concept of time from the ticking pendulum, how does the speed scale?

To solve this problem, one must (once again) invoke some math. More specifically, the first step is to find the equation that relates the length of one tick of the pendulum, T, with the length of the pendulum L. A little looking around the web will show you that the period of a pendulum's swing is given by
    T = 2*pi*sqrt(L/g),
where g is the acceleration of gravity, which is a constant at any position on the Earth. Combining that with the speed equation, we find that the speed of a the train is related to the pendulum's length via
    v = (d/(2*pi)) * sqrt(g/L).

If one scales the distance L by the factor m, and uses the techniques presented above, it is very easy to show the surprising result that the scaled velocity v' is related to the real velocity v by the relationship:     v' = v * sqrt(m).

Surprise!

So what does that mean?

Think about it for a moment. Let's say, in the real world, we have a clock with a pendulum that swings a complete cycle once every second. This allows a train passing at 60mph to travel 88ft during that one second. Now if we made an HO scale model of this clock, and placed it next to our model train, we'd have to run the train at about 9.43 ft/sec in order to cover a scaled 88 ft distance (12 1/8 inches) in one tick of the pendulum! That is nearly 10 times faster than the speed of 1 ft/sec we calculated using the classic relationship.

Of course, the reason for this is that the clock itself runs faster. Indeed, a pendulum that is shorter by a factor of 100 will run at a rate that is 10 times faster than that of the longer pendulum.

Thus, in this example at least, it is clear that we have scaled not just distance, but also time.

Superelevation

Of course, I hear you say, this is just one example. One example that could have been carefully constructed to give the desired result. And I'm happy to hear that skepticism, because that is what real science is all about.

So let's try another situation entirely. Train curves are superelevated, just like roads, in order to make corners smoother. Ideally, this is done by finding the elevation angle such that the rails tilt the coach in a way that exactly balances the forces so that they pull straight down through the bottom of the coach, causing you to feel no lateral forces at all (and keeping the tea in your cup from spilling).

So the question is, if we built a scale model of a real world curve that is ideally superelevated, and we perfectly model that curve in our model train set, what speed would we have to run the model train at so that the forces are balanced for the model passengers with their model tea cups in the model coach?

Of course, to answer that, we need to know that the equation for relating the superelevation angle theta of the coach (and hence the rails it is on), to the curve's radius R and the design speed of the train around the curve v.

Fortunately, this is a problem that is solved in the first few months of class by every student of physics in the world, and the correct answer is always
   v^2 = g*R*tan(theta).

Applying the same techniques to this equations as the examples before it, we find something surprising: yet again, the relationship between the scale velocity v' and the real world velocity v, is given by
    v' = v * sqrt(m).

In other words, if a real world curve has its superelevation perfectly designed to a 60mph train, then an HO scale model train would have go around that curve at 9.43 ft/sec to keep the tea in the model cup from spilling while going around the corner. (This, of course, being a good deal faster than the 1 ft/sec scale speed that modelers would calculate!)

Welcome to the Movies

It's hard to argue with physics, and so far I've given two entirely unrelated physical situations in which I have no choice but to produce the same, very surprising result.

But, being the stubborn kind of people we are, it is hard to accept radically different ideas (indeed cognitive dissonance is a very fascinating field of psychology), and so I fear I must produce one more example before you actually believe what it is I have to say.

Imagine you are a special effects artist working for Robert Zemeckis and Bob Gale, and are in charge of the crash sequence at the end of Back to the Future III, where the beautiful steam engine drives off the cliff at 88mph, and plummets into the Earth below.

Obviously, Sierra Railway will not let you drive their prized engine number 3 off a cliff, so you must resort to models to do it!

Being the sort of person you are, you want the shot to look realistic. This requires you to satisfy two criteria:

  1. You must find the correct speed to drive your model train off the cliff so that it follows the exact same parabolic path through the air as the real thing would, except scaled down according to the scaling equations we now know and love.
  2. You must run the film through the camera at a faster rate so that, when slowed back down to the standard 24 fps of a theater projector, it looks like it took the same amount of time to hit the ground as the real thing, even though it actually hit much sooner due to the much shorter distance.

Now the math for this one is much too ugly to show here, but rest assured that over the last year I have calculated it several different times, several different ways, and always come to the same conclusions:

  • Velociy scales by the same sqrt(m) factor as we found in all of the previous examples.
  • Time scales by the sqrt(m) as well!

Time Does Scale

So we've determined that velocity scaled according to the square root of the scale factor, and that time also scales by the square root of the scale factor. These are not independent results! Indeed, following the lead of theories such as special relativity, the fundamental equations are best taken as atransformation of the basic coordinates; that is to say, the basic equations are:
    d' = m*d
    t' = t*sqrt(m)

One can then easily derive the velocity transormation equations by applying these transforms to the scaled velocity equations v'=d'/t' to get the velocity transform equation v'=v*sqrt(m).

But Why?

But why does time scale in this seemingly absurd way? It is very counter intuitive, even to me, that this would be the case.

As it turns out, the common thread is gravity. The pendulum uses gravity to drive it. Superelevation is determined as balancing gravity with centrifugal aceleration. And, of course, a projectile motion is the quintessential problem of gravity.

More specifically, gravity is scale invariant. No matter how you scale your model railraod, the gravitational acceleration roots you down to a very specific scale of motion. In other words, you have no choice but to accelerate a real 9.8 meters per second per second, no matter how you scale!

This scale invariance, then, gives a way for both the brain, and for measuring devices, to determine the scale factors involved, unless you change the rate of time!

So why does the rate of time change the way it does? The easiest way is to use a bit of differential calculus. But that is beyond the scope of this article. Instead, I will note that the constraint on how to scale time is imposed by the quadratic relationship between distance and time under the effects of gravity. More technically, since d is proportional to t^2, and we know that d scales according to d'=m*d, it necessarily constrains t to scale by t'=t*sqrt(m).

So Now Then

While modelers as a whole have determined what feels to a reasonable scaling methodology (space scales, but time does not). But to anyone serious about modeling the physics of trains correctly, it is important to realize that it isn't a matter of what feels right, but instead is determined by what is right.

And as much as everyone feels that time does not scale, the physics does not agree. Indeed, time must scale in order to replicate the balance of the physical forces involved.

Monday, July 14, 2008

Essay: Scale Railroading (Part 1: Space)

If you spend much time around railroad modelers, they tend to talk a lot about scale, and particularly in reference to speed.

Being a physicist, I've put some thought into scaling things, and don't agree with the reasoning for how to derive scale speed. But that is for Part 2: Time.

In this part, I want to cover some basics about pure spacial scaling, and how standard layouts and the real world already agree very poorly.

Length is very important. But how big is a model layout in the realworld? How long would a real North American train be in a scaled layout?

Prototypicality

Train Length

Let's answer the latter question first. In North America, trains are often 100+ cars long, resulting in lengths of around a mile long. Given the most favored modeling scale, HO, which has a 1:87.1 scale factor, 1 mile is a hair under 61 feet long!

This means that a scale model railroad of North American trains needs 60 feet just for a single train!

To put this in perspective, an 8'x4' layout, which is fairly common to find in spare bedrooms, only gives about 20' on an oval track around the edge, which is only about 1/3rd of a mile!

So how does this scale relate to the real world?

The Shopping Mall

Doing even a small real life layout would take a space about the size of a shopping mall. Indeed, let's look at a mall of about average mid-sized town length. Spokane Valley, WA, and Santa Rosa, CA, each serve communities of around 1/4 million, and each have a mall that supports three department stores, and are of a length-wise construction. In each case, if you ran an HO scale layout from the far end of one department store, through the main corridor of shops, and to the far end of the opposite department store, you'd have about 1400ft in which to model. Translate this to miles, and you get very nearly only 24 scale miles of layout! This is smaller than many steam excursion branch railway lines!

Trees Are How Tall?

While this puts a lot into perspective, I do find the most interesting case--and the one that will actually surprise a lot of modellers--to not be a geographic distance, but instead the scale modeling of an evergreen tree.

The Northern California coast is filled with dense redwood forests which were, and still are, logged of their trees. These trees, the tallest in the world, can grow over 300 feet tall and become over 26 feet in diameter, and in the time in which it was logged, it was common to find trees 200 feet tall and 18 feet in diameter.

Thus, if you were modeling a logging railroad, you'd need redwood trees that are nearly 3.5 feet tall whose trunks are made from dowels about 2.5" in diameter! I dare you to build a layout with trees that large and not have all your friends call it absurd.

So how big are those 6" tall model evergreens you probably have on your layout? A measly 44 feet, which is about 1/5 the maximum height of a Pacific Coast Ponderosa Pine tree!

Curves

Lengths are all fine and dandy, but something that modelers tend to pay less attention to is how tight those curves really are. This is compounded by the fact that track manufacturers tend to sell pre-molded curve pieces that are sizes like 15", 18" and 24".

But Just How tight are those curves? To answer that, we must first learn how curves are measured.

Surveying Curves

Railroads survey their curves by using what is called the "degree" of the curve. According to an excellent article by Robers S. McGonigal in TRAINS Magazine, this is done by measuring the change in heading of a piece of curved track that is 100ft in length. Expressed mathematically, one can find that the radius of a curve in feet can be found by dividing 5729 by the degree of a curve.

Furthermore, the article goes on to explain that for a typical mainline, curves are limited to only 1-2 degrees, but that mountainous territory dictates curves of around 5-10 degrees. Furthermore, the limit for a four-axle diesel with rolling stock is about 20 degrees and the locomotive by itself can handle curves up to about 40 degrees.

Scaling Curves Down

So how do these real world curves relate to an HO scale model?

As modelers tend to like the beauty of mountainous scenery, let's start with those 5-10 degree curves. In the real world, these correspond to a curve radius of about 570-1150 feet. Scaled down, these correspond to a relatively giant 79"-158" curve radii!

But if those curves are so tight, how tight are those model curves that the track manufacturers make?

As it turns out, the standard 18" curve is already nearly a 44 degree, 130 scale foot curve, which is already in excess of the maximum for a four-axle engine with out any rolling stock!

So what are the limits? For a 20 degree curve (287ft radius), you do not want to have smaller than a 39.5" curve radius, and for a 40 degree curve (144ft radius), you do not want to have smaller than a 20" curve radius

Indeed, as a rule of thumb, an HO scale modeler can divide 800" by the degree of curve you wish to obtain, and you will obtain a pretty good estimate of the necessary curve radius you need for your model. Equivalently, you can divide 800" by the radius of a curve on your layout and get a fairly good estimate of the curve degree.

Of course, in real life, if you plan on having speed, you'll want to model those curve radiuses of only 1-2 degrees, which correspond to around 1/2-1 mile. But most people don't have the room to construct a prototpyical curve in their layout, as a 30'-60' curve radius is bigger than pretty much any layout.

Saturday, July 12, 2008

Rant: Proper Planning is Pivotal

My brother and I are, on the side, developing some software for our own enjoyment. However we take our development effort seriously, and have spent a good deal of time writing small technology tests and, most importantly, documenting our specification so that we can agree on the path that we are taking.

Our spec is mainly composed of two things:

  1. The feature specification, and
  2. the architectural plan.

Both of these are important. Even if everyone agrees on the features that should be implemented (and we'll assume that there weren't the normal communication issues involved in doing this without a written spec), everyone's programming/architectural styles can be incredibly different.

Unfortunately, this can lead to a mess later.

The Draw

It is very tempting to ignore this obvious fact, especially when you have small teams of developers working under tight timelines and budgets. Your team will find itself having one-hour meetings to verbally discuss a feature and how it should work, and then the developers will all run off to their caves and start writing code.

Initially this actually works out really well. Everyone is building their separate pieces, plugging along and writing code in the way they write best, which results in quick turnaround.

It is a manager's best dream. Everything is getting done ahead of time and under budget!

The Problem

But what happens when the pieces have to start fitting together?

At some point the pieces that everyone writes have to start fitting together. If everyone is writing to their own architectural drum in their own style, how easy is this integration going to be?

Even worse, how many of your developers are good architects? Did those quick up-front times result from the developer doing whatever was quickest to get their pieces done? What happens when, as anything in real life happens, the spec changes a little? How about when it changes a lot?

And none of this includes what happens when developers have to work on eachother's code. Where is a document to help them figure out what was done, especially since the code is so incredibly different from their own?

It Happens

These aren't all hypothetical questions. I've encountered this problem in the real world. Specs aren't thought out and aren't documented to save time. It's alluring. I've even bought into it. Turn around times are fast and everyone is happy...

...at first.

But after a year on a project, this can start to turn around. You start discovering all the webs of spaghetti code that were weaved to meet deadlines, and the hacks involved when a developer didn't want to spend the time to ask someone else how to integrate with their component, or the bugs that appear because of the bad assumptions that lazy programmers will make.

I've seen all these problems over and over again, and sometimes these problems can bog down development to the point that whole sections of code must be stripped out and re-written to get around it.

The Solution

In programming, the money is where there are hard problems that need to be solved, and hindsight is 20/20. So no matter how much planning you do, you'll never be happy with your solution. There is always something that you should have done.

Which is why architecture is so important. If architected properly, a system is set-up so that when these problems are found, they are easier to solve. One must strive for expandability and refactorability, but without spending time creating interfaces and functionality that are never used.

As I always say, a good software architect dares to dream and plans ahead for the future, but knows where to stop to keep development times and costs down.

For this to succeed, there must be somethind that is in control. Something that is guiding everyone else towards a common goal.

That is the architectural spec.

The Spec

Of course, someone has to create the spec. That should be the job of multiple people. Ideally, there should be an open dialog between the developers that are writing new code, the developers that wrote any code that is being interfaced with, and an architect that can coordinate the input from these developers into the common design goals and styles of the application as a whole.

And this is easy to do with the small teams that are usually lured into ignoring spec writing altogether!

Words of Advice

My advice to you: whether you are two programmers or hundreds of programmers, specification documents, and clear architectural leaders, are important to the overall long-term success of your software.

You may be tempted by The Devil to ignore proper specification in order to achieve the impossible up-front, but like any deal with The Devil, you better be prepared to tackle the ugly consequences that follow.

Wednesday, April 23, 2008

Observatory: Fear

The only thing we truly fear is the unknown.

We are afraid of snakes and spiders because we don't know if they are going to strike.

We are afraid of heights because we don't know if we are going to fall.

We are afraid of strangers because we don't know if they have bad intentions.

If we take the time to learn about the thing we fear, we will find that we are no longer afraid.

It is only then that we can make responsible decisions.

Thursday, November 29, 2007

Tip: Bug in Ruby 1.8.5 on 10.5 Leopard

When I first got Mac OS X 10.5 Leopard, one of the first things I had to do with compile and install ruby 1.8.5 in order to facilitate my job as a Ruby on Rails developer (as we are just getting around to upgrading our thousands of lines of enterprise code to work in rails 1.2.x).

I downloaded, compiled, and installed, ruby 1.8.5 p114, and for the most part things went smoothly.

However I would intermittently get the weirdest error:

NoMemoryError (negative allocation size (or too big))

I still don't know what the root cause of this error is, but as of this morning it seems that downgrading to p52 fixed my problem.

Hopefully this will pan out in the long run, but so far this error (which I'd see several times an hour during active development) has disappeared.

Tuesday, November 20, 2007

Rant: Auto-Refresh News Sites

I was reading and article at www.computerweekly.com this morning, who, like many other news sites, seems to think that they need to force a server side refresh on the whole page every five minutes to reload an article so that I can have the latest, greatest, news in some ticker.

In this case the problem was that their server is flaky, and on refresh, it gave me a "Server Too Busy" error! So now instead of getting to read the article (which was apparently in necessity of updating because the content of individual articles changes so frequently), I get to stare at an error message generated by a Microsoft Web Server.

This is just the most infuriating case in a string of annoyances with auto-refreshing news pages.

One thing I consistently find annoying is how Google News has to refresh the news listing on me, while I'm browsing through the article summaries.

Indeed, it always seems to do it when I am about to click on the link to an article I think would be very interesting to read, and it always seems that that article has rolled off the bottom of the list and I have to dig around if I want to read it!

Now the question on my mind is "why"? Why do we need to have our news updated so frequently (or at all)? Did so many important things happen in the course of the ten minutes I was reading the site that I need to have it updated?

And if I'm that addicted to news, what am I going to do for the hours that I'm actually working? Or for that matter, how do I possibly think I could make it through a nights sleep!

Or is it that they think we are so freakin' lazy that we can't hit 'refresh' several hours later to reload the page if we want to be ten extra minutes up to date with the cutting edge news?

Wednesday, October 31, 2007

How To: Manage Your Own Subversion Repository In Leopard

Mac OS X 10.5 Leopard ships with Subversion 1.4.4 pre-installed. It also ships with Apache2 pre-installed. It does not, however, ship with a pre-installed subversion repository configuration.

So let's say you want to create your own subversion repository host on your Leopard box your own source code management goodness?

You could go to the subversion homepage and download the free svn book and sort through the instructions trying to figure out how they apply to you... Or you could follow these simple directions which I've laid out for you.

Make a Repository

The first thing you need to do is to make a repository. Actually, for my needs, I had to make multiple repositories, so these instructions will set everything up to make that work. It really only changes two steps anyway, so it isn't a big deal.

Now I decided to make my repository collection root directory be in /Users/Shared/, but you can really make it be anything you want, including the ever popular /usr/local. Just be sure to replace /Users/Shared/ with your directory of choice whenever necessary.

Anyway, I opened Terminal and entered the following commands:

$ sudo mkdir /Users/Shared/svn
$ sudo mkdir /Users/Shared/svn/reposname
$ sudo svnadmin create /Users/Shared/svn/reposname
$ sudo chown -R www:www /Users/Shared/svn/reposname

Note that you can create multiple repositories by following these directions but replacing every instance of reposname with the name of the repository you want to use. Thus, if you have multiple repositories, you will have multiple directoris in /Users/Shared/svn

Make Access

Most directions do this later, but I'm going to do it now because I think you are smarter than that.

You might want to create a passowrd file, unless you want full public access to your repository. For our purposes, simple http basic authentication is fine, but remember that the password is only weakly encoded and the traffic isn't encoded at all, so a snooper could get to the information if you access your computer outside of your own computer.

So if you do want to use authentication, create the password using the following command, substituting username for a user name of your choice, and following the directions for password creation:

$ sudo htpasswd -cm /etc/apache2/svn-auth-file username

To add other users to the file, just ditch the c switch in the -cm options to htpasswd. The c stood for create, and since the file has been created you don't want it anymore.

Note that you can put the svn-auth-file anywhere you want, but this seemed like a good place for it in my mind. (Just remember where you hid it from yourself if you put it anywhere else.

Apache Configuration

Navigate to /etc/apache2/other and use your favorite command line text editor as root to make a file named anything you want (I chose svn.conf, but you could name it foobar_banana.conf and it would still work!):

$ cd /etc/apache2/other
$ sudo vim svn.conf

Now that you are editing this file as root, you want to make it contain the following bits, and save:

LoadModule dav_svn_module /usr/libexec/apache2/mod_dav_svn.so

<Location /svn>
    DAV svn
    
    SVNParentPath /Users/Shared/svn
    
    AuthType Basic
    AuthName "Subversion repository"
    AuthUserFile /etc/apache2/svn-auth-file
    Require valid-user
</Location>

Note that you can leave off all the authentication related stuff if you didn't want authentication on your repository. Also note that you need to fix the SVNParentPath and the AuthUserFile if you varied from my directions.

Restart Apache

Now restart Apache. This can be done in the Sharing panel of the System Preferences application. Just click to turn off, and then back on, Web Sharing.

Now, if you didn't make a mistake, you should be ready! Try going to http://localhost/svn/reposname (where you need to put the repository name you chose earlier instead of reposname!) and see what happens.

If you are lucky you'll see revision 0 of your repository. But most people are human and will have made a typo that results in an error. For hints on what created the error, trying checking out /var/log/system.log and /var/log/apache2/error_log for hints as to what you did wrong. (And as a bonus, the Console application works great for monitoring thes logs as they are writen to!)

Where From Here?

And now you are ready to use your repository. At this point I figure you already know how to use SVN and don't need my help anymore. But if you need to know how to use SVN, just refer to their book, which tells you everything you need to know, including how to make your server better!

EDIT (11/6/07): Forgot to encode my character entities in the example script source. I fixed this so that the <Location> tag actually shows now.

EDIT (04/23/08): The chown line now uses the full path as it should. Thanks to all those who pointed this confusion inducing mistake out.

Tuesday, October 23, 2007

Video: Keep It Simple Stupid!

Being that I'm a simple-is-sexy kind of person, I've always been flabbergasted by the busy packaging so many corporate marketing teams put out.

As a developer, I've learned that the KISS principle (Keep It Simple Stupid) is very important for solving problems. And as a teacher I've experienced the confusion an onslaught of information can provide. Indeed, in most situations, whether it being a scientist, and engineer, a teacher, or even a parent, keeping it simple is important.

Which is why I loved the following video. It does such a good job of punctuating the differences between Apple's approach of simplicity, against the corporate world's attempts to attract attention while cramming the package with information.

Of course the ironic thing is that, with all these super-busy boxes on the shelves, the one that actually stands-out is the simple-sexy one.

Rant: Rails Lacks Accessor Bottlenecks

Being a longtime user of (and zealot for) Ruby, I have a made a name around the office for my unusually strong understanding of its details. Thus my employers decided that, out of our team of developers, I should be the one to write the security sub-layer for our Ruby on Rails based chiropractic applications (as it would require digging into, and understanding the internals of Rails.

Now before I proceed to beat Rails around a littl bit, I don't want you to get me wrong. I love using Ruby on Rails and would easily chooose to do this project (and future projects) in it again, but sometimes I find some of their design decisions to be, well, web-developerish, while we need a more robust enterprise-developerish solution.

A Bottle With Too Many Openings Can't Hold Water

One of the design patterns that I have seen over and over again in Object Oriented APIs is the use of bottle-necking. That is to say, even within your own class, you choose to use a set of accessors to get at instance variables instead of poking at them directly.

The big benefits to this approach are two fold: It allows you to be more agile with changes (as you only have edit the accessors to change behavior), and it allows third part developers to easily augment or override the default behavior of your code.

An simple example of this may be a simple vector class that stores a magnitude and an angle. Now this angle, in all reality, needs to be between 0 and 360 degrees. So if you have two ways to set this data (from rectangular or polar coordinates), you would have to enforce these limits, probably by using a modulus operator. (Which isn't very DRY, now is it?) But what happens when you have to change the behavior? Maybe the angle needs to be stored in radians, maybe the range needs to be changed to -90 to 90, or maybe it needs to be compass oriented instead of right-handed-axis oriented? Or maybe you just want to add a way to set the angle without changing the magnitude?

All of these requires refactoring your code, which requires duplicating and/or altering your code in several locations! But if you were to make a single accessor for getting, and a single accessor for setting the internal value, then all your other code could go through this accessor (including possible intermediate accessors) to enable fast, agile, and DRY code changes!

Playing With Steam Engines

So, now, let's imagine your employer gives you the spec that your enterprise Rails driven software must have a security system where you can restrict read and write ability to different models, on a per attribute basis. This is what I faced over a year ago.

The way I really wanted to solve this was to override the accessors to the basic hash that stores, as key/value pairs, all the data for the table row the objects represents. That way I could add my own code that would allow or deny access at this fundamental level.

Unfortunately this basic and simple idea (based on past experience with other APIs) exploded in my face. You see, the @attributes hash instance variable that ActiveRecord uses to store these key/value pairs is directly poked and prodded by many separate methods throughout the ActiveRecord class. Thus I had to override all of these methods, making them call the same friggin security check method, to decide if they could continue onto their default implementation or if I should restrict access. (I'm only lucky that the runtime generated accessors utilized standard accessors, or else I'd be re-writing the code-generation routines as well.)

Of course, all this work meant that besides sinking a lot of the company's time and money into implementing a task that should have been simple, maintaining our release against newer versions of Rails is more costly because of all the hacks I've had to do to meet the requirements.

Which is why I'm so peeved at their code. If only they had followed good design practices and bottle-necked all calls to @attributes through a read and write accessor method, it would have been easy to implement customized behavior at this point in the program flow, and thus would have made my security requirements a trivial task.

Sunday, October 21, 2007

Rant: Marching Forward; Why Should I Upgrade?

With the oncoming release of Mac OS X 10.5 Leopard, I feel compelled to rant about a post I saw on the discussion board for the open-source software AudioSlicer.

On this forum there was a post that seemed like the author was taking out his frustrations on the AudioSlicer developer that software was not backwards compatible with his old version of Mac OS X.

Now, if I picture myself as the average user, ignorant of anything to do with software development, I can see where the guy is coming from. However, as a developer, I find his attitude to be quite maddening, particularly because AudioSlicer one man's solution to a problem he was trying to solve for himself, and was nice enough to share it with anyone who wanted to download it!

Building a Home Of Code

Building software is kind of like building a home. Each operating system gives you a set of tools and building blocks to build your "home", and then you figure out how to stack them with building blocks of your own and glue them all together to make an application for people to enjoy.

In the old days, there were no power tools and, depending on the place and time you lived, you might even have to your own plaster and maybe even cut down your own trees. Thus, most of your time, energy, and money, goes just into building things for construction, much-less building a complex and interesting house.

In modern times, however, you are lucky enough to be given power tools, pre-cut lumber, drywall, and fiberglass insulation, allowing you to spend more time on building the house the way you want it, and less time preparing to build.

Now what happens if someone wants you to build that nice modern home but with old fashioned techniques and materials? You'd have to spend inordinate amounts of time re-inventing technologies such as drywall and fiberglass fabrication, and spend large amount of time overcoming the lack of power tools and even back-hoes!

No contractor would dare to take on this task because it is just not economically feasible for anyone to build a modern house using old-time building techniques.

The same is true for software developers. We are given many building blocks and tools from our operating system manufacturer that smart people have spent months or even years developing and testing, and then we take advantage of their hard work so that we can focus on building better software applications for end users.

Jaguars, Panthers, and Tigers, Oh My!

Most users were very unimpressed by OS X 10.4 Tiger's release, and I can agree with them. From a user's perspective, it was a lackluster release. It was hard for the average person to justify spending $130 to upgrade from their comfortable and productive little OS X 10.3 Panther machines.

But what these people didn't realize was the massive amount of really cool new building blocks Apple gave developers. These new building blocks (called APIs if you want to learn a new word), got developers very excited as it allowed them to make much better software for a lot less time and cost.

As a result, several months after Tiger was released, a flood of cool software came out that could only run on OS X 10.4 Tiger. This, of course, caused all the people who wanted to stick with OS X 10.3 Panther to complain that no one was releasing their cool new offerings for them.

The Amish Can't Have Cell Phones

As much as many people hate it, technology marches forward at a breakneck pace. What was sate-of-the-art in computing four years ago is slow and a useless novelty now. It may be cheaper not to upgrade, but to stay in the past is to be left behind.

If you refuse to buy pre-manufactured building materials and use power tools to build your house, you can't expect the same quality of house for the same costs and built in the same small amount of time.

So why do you expect anything different from software developers?

Misunderstood

Development is a very long, complicated, and expensive process. Most software projects--even while taking advantage of all the tools and building blocks they have available to them--still run significantly overtime and over-budget. So how do you expect developers to find the extra time and money to spend the months (or even years) necessary to re-invent building blocks from scratch, just so that you can live in the past?

Indeed, in reality, many new pieces of software that require new building blocks were only ever made because the new building blocks exist. CoreData and CoreImage allow developers to build database-driven libraries of scanned images, and Cocoa Bindings allow developers to build whole CoreData driven GUIs without touching a hardly any code. Without these nifty new technologies, cool applications like Delicious Library would be such a chore to make that only companies with large development and marketing budgets would be able to build them.

Change Course Into The Wind

The world is always changing, and technology is no different. People are scared of change. But those who do not cope with change get left behind while those that embrace change become successful. As much as it can be a large time, energy, and monetary investment to change, it is worth it to yourself and your family to learn and adapt...

...or don't complain when you get left in the dust.

Thursday, October 18, 2007

Rant: Fixing Bugs is Like Solving a Crime

So I got a bug report the other day, from our own staff member no-less, that simply said: "Can not 'submit fixes' in finalize screen."

Now let's think about this for a minute. Fixing bugs is very much like solving a crime. You are presented with a bunch of clues about what went wrong, and now you have to follow those clues down different avenues to piece together what went wrong (which is a very time consuming process to do right). But you have to do this, because you can only fix the bug when you understand why it exists.

So, now, if you were a police detective and someone called with the brief report "I found a dead body", would you be able to catch the murderer? You don't even know where the body was found to be able to start collecting crime scene data with your trained eyes. It would, thus, be an impossible case to solve with just the information provided from your well intentioned, possibly anonymous citizen.

So, then, how does the submitter of such a brief bug report expect me to solve their bug?

The Things You Need To Provide

For a bug to be successfully fixed, the developer has a few things the he needs to know.

Although it is very important to give your computer's specifications and the software version that threw the bug, it turns out that it is actually more important for you to supply the following information:

  1. What you were trying to do,
  2. What you were doing before the bug occurred,
  3. What you expected to happen when the bug occurred,
  4. What you expected to see when the bug occurred, and
  5. What you saw when the bug occurred,

Of course, just like a murder investigation, the more information you can take the time to give, the more likely a developer can fix your problem.

Remember that a developer has to be able to re-create the bug you saw in order to fix it, so if you don't give them enough information to make the same thing happen on their own computer, they will possibly just jump to the next bug in their bug list!

And Why You Should Give It To Them

Of course, the main reason why you should give all this information to them is because you want the bug to be fixed! But you should also realize that giving all of this information also saves both you and the developer time (and money in most instances).

For one, it usually takes you about the same amount of time to describe a bug up-front or after asked for details later, so being so brief doesn't actually save you any time in the long run.

By not giving the information up front, however, the developer loses quite a bit of time due to a bunch of small reasons, such as lost time in bug list triage, inability to group and collectively solve related bugs, and loss of time in having to contact users for details.

And, I'm not going to lie; commercial companies are driven by money. So if it is going to cost them money to get the details on a minor bug, they are likely to just bump it to the bottom of the list and go on to more important things. It sucks, it may even seem wrong, but I guarantee you every software company does it (and that you would do it too).

Last Words

If there is one last thing I want to impress upon you is that, despite all my harsh words for people that don't give details, having a bad bug report is still better than no bug report. We can't fix a problem we don't know about, and appreciate any information we can get. Just remember that it can be frustrating for everyone if you don't spend the extra two minutes of your life to give us some details.

Wednesday, October 17, 2007

Article: Egos Stop Innovation; How to Have a Discussion

Any academic professional, from a software architect to a physicists, is at their peak innovative performance when they can effectively communicate, discuss, and refine their ideas with others.

Unfortunately, it seems that a large number of people are more concerned with their own egos than with innovation, as evidenced by their inability to communicate with others. It seem that these people are always irrationally attacking ideas that are not their own while taking an emotional bias towards ideas that are their own.

This is a natural thing for people to do. It is in our blood. We evolved from the genes of the top-dog alpha-males and their mating successes.

But today should be different. The human race is now capable of attaining much greater heights if we work with others instead of against them.

Take Quantum Mechanics, for example. Quantum Mechanics was not the invention of a single mind quietly working away. No. It is the hard won innovation that resulted from many great minds working together to solve a common goal.

So how can we keep from being the jerk down the hall that no one wants to work with, and help to further the innovations of yourself and your company, making your managers happy and helping you to attain popularity, love, and wealth?

I Like Friends

Let's try to learn by example.

I had two friends that, through countless discussions and debates, showed me most of what I know about a successful exchange of ideas.

One of these friends was infuriating clam and methodical in his approach, but his goal was always to lead to a common understanding of the truths behind the material we discussed.

The other of these friends was irrational and stubborn, always hanging onto his idea no matter how well it could be proved false, and then would stomp off in a hissy fit whenever he was defeated.

What I learned from all this is contained in the ten rules below; but before I go there I wanted to follow my own rules and define two terms. These definitions aren't the dictionary definitions of these terms, but as long as you can understand my definition then you can follow what it is I'm trying to say.

Defining Talking

The way I see it, there are essentially two ways to exchange and refine ideas with others: discussion and debate.

I define "discussion" to be the friendly and logical open exchange of ideas, where the goal of everyone involved is to reach a new, common understanding of the material, knowing that this will most likely be different than any of the ideas brought to the table by anyone there.

On the otherhand, I define "debate" to be what happens when a discussion breaks down into egos and arguing, caused by even just one person to not want to budge from their flawed arguments, leading to an overall breakdown of the process of innovation.

The Ten Commandments

These rules take practice and hard work to follow, but following them is important not just to others, but to yourself as well.

One last thing before I start, though: I should note that rules 1-3 are mostly concerned with how to hold yourself, rules 4-7 are about communication, rules 8 and 9 are about arriving at a conclusion, and rule 10 is stating an obvious fact that people seem to forget about in the heat of a debate.

So without further ado, on with the show!

1.Be civil; always treat other people with respect and dignity.

People will only take you seriously if you treat them like an intelligent human being. If you let your frustration take over, you run the risk of insulting another person, causing them to close themselves from your point of view, destroying the whole process.

2. Place your ego aside; readily admit when you are wrong.

I doubt the knowledge you bring to a discussion is without flaws, inaccuracies, and other mistakes. Therefore you need to know and admit the limits of your knowledge. Admitting when you are wrong is probably the biggest and hardest step for people, but being ready to admit when your idea just isn't right is an important part of innovation. Put another way: don't let yourself look like an idiot by defending a lame-duck idea to the bitter end. People just won't ask for your input anymore because no one likes a self-centered, stubborn donkey!

3. Be open to new and different ideas; put yourself in the shoes of others.

Great thinkers are able to view things from many points of view other than their own. You do want to be like a great thinker, right? It is important, then, that you put yourself into the shoes of people presenting alternative (and usually contradictory) ideas and try hard to understand why they support that idea. This can help you either rebut their idea, accept their idea, or realize that there is no way to agree.

4. Make sure everyone agrees as to what the question really is.

As stupid as it seems, I have seen (and been in) many discussions or debates where each person thought a different question was trying to be answered! This, of course, causes much frustration. If it seems like the other person isn't understanding, try rephrasing what it is you are trying to find out, and see if they agree that that is the question at hand.

5. Define terms; be vigilant of disagreements caused by different definitions.

One of the funny things that often happens is that communication break downs can be the cause of many long discussions where everyone actually agreed the whole time. For example, I was in a debate with someone once where, after three hours, we found out that we were using slightly different definitions of the word "money". Once we hammered out a common definition, we suddenly found that we never disagreed on the real question at hand! This happens more often than one would think! So be vigilant of disagreements stemming from different definitions of terms and try to nip them in the bud.

6. Listen patiently and carefully to what others say.

This is really a two fold problem. One is that people get in such a hurry to say what is in their mind that they stop listening to what everyone else is saying and just want to blurt out their own thoughts. But listening turns out to be one of the most important skills in innovation. So don't be a jerk, listen up! The other part of this is that people naturally interpret, filter, and infer the words of others. It is important to pay attention to detail and make sure you understand what they mean and to ask questions when you don't understand.

7. Say what you mean.

There seems to be some sort of mangle-o-matic filter between the brain and the mouth. Be careful to say what you mean, try to make statements that don't leave anything to inference, and be willing to re-explain yourself in different terms if someone is confused as to what you meant. (Seems simple? It is harder than you'd think!)

8. Strive to reach the crux of any disagreement.

In order to reach resolution on a disagreement, it is important to find the crux of what it is, exactly, that you disagree on. It is no fun spending three hours hammering over a topic just to find that the crux of the disagreement lay in a misunderstanding of a word definition. Pealing away the layers to reveal the point of disagreement quickly will save everyone a lot of time, energy, frustration, and headache.

9. Discussions hinging on personal values are doomed to become debates.

Some discussions have no agreeable resolution. This is especially true of many socio-political discussions. When the crux of a disagreement hinges on a personal value or opinion, there is no way to agree. Whether it be a disagreement over something as stupid as the best flavor of ice-cream or the best band ever, or it be over deep issues such as abortion, gay marriage, and the validity of your own religion, there just isn't an answer that everyone can agree on. This doesn't mean that you can't understand and respect what the other person believes, but it means that you'll probably never agree, and so should agree to disagree.

10. Use logic, facts, and reasoning.

This should go without saying, but it doesn't seem to be the case. People cloud their reasoning with emotion. This is, once again, part of being human. But if you want to convince someone of the validity of a viewpoint, you must always support that with facts and logical reasoning, while being careful to avoid such traps as logical fallacy, inaccurate facts, and mis-representation of your knowledge limits. (But if you follow the other 9 rules, none of this should happen to you, right?)

Every Article Needs A Conclusion

So there they are, in all their glory. Some simple rules that take a lot of hard work to follow; but will quickly make you that innovative, team-playing, cool-guy, that everyone wants to have on their team and at their parties.

Monday, October 15, 2007

Code: Passphrase Generator

In a couple previous articles I talk about the benefits of randomly generated passphrases. But how do I generate random passphrases you may ask?

The answer is that I use the following Ruby script that looks at the words list that is provided on every Mac OS X 10.4 box:

#!/usr/bin/ruby

words = []
File.open('/usr/share/dict/words','r').each { |line| words << line.chomp }

selected_words = []
while( selected_words.length != 4 )
  w = words[rand(words.length)].capitalize
  selected_words << w if (3..6).include?(w.length)
end

puts selected_words.join

In the event that you don't have access to the same words list, you can use your own by changing the filepath in the second line to that of your own text file. Just put one word per line and save!

EDIT: In case you are wondering, I did have another 2-line version of the script that harnesses the power of ruby's API and syntax, but it actually runs slower, so I went with this one.

Saturday, October 13, 2007

Rant: Passphrases Continued -- Why They Haven't Caught On?

In a previous article I talked about passwords vs. passphrases, and why, in general, randomly generated passphrases are a better idea than passwords. But while discussing my results with others the question came up "if they are so cool, why haven't they caught on?"

I think there are several reasons for that. One of which is that, historically speaking, space was limited, so passwords couldn't be longer than 8 characters. Thus password culture has been built around an eight-character password, even though this limitation has disappeared from pretty much every system.

But I think the main reason is that they just don't feel secure.

Think about it. Which one feels more secure to you? Opjk8J2Q or PiecesStudySmoothCatch? With all the security experts always telling you that you need a password with symbols and numbers in it, while avoiding dictionary words, passphrases seem to go against everything we've been told!

But as my article showed, a random four-word passphrase that is generated from a dictionary of only about 4000 words is just as difficult to crack as a random 8-character password made of alphanumeric characters! Indeed, if the exact dictionary of words you are using for password generation is unknown by an attacker, it becomes even more secure, as they have to try passphrases with words that have no relevance to your system.

A Little Extra

While I'm here I do want to mention two extra thoughts that came up while discussing my previous article with others.

The first thing is that passphrases are really no good if people can choose their own, as people are likely to choose certain words more over others. This uneven distribution could be figured out through simple studies and then exploited by an attacker.

On the other hand a random passphrase is so much easier for someone to remember that with a secure situation, issuing passphrases that people cannot change will work out better than issuing random passwords (since people usually have to write down random password, creating a place to breach security by a spying party; e.g. a janitor) while being more secure than a password that someone chooses for themselves.

Remember, depending on the system, it only takes one weak password for an attacker to hack into a system and get access to password information to start doing brute-force attacks on administrator passwords!

Video: iPhone in a Blender

Last night I was checking my e-mail after a long evening of playing The Sims 2, and found that my mother-in-law sent me this video.

As someone that loves his two Macs (and rarely boots his PC because he has little use for it), I should be abhorred by the wanton destruction of an iPhone. But as the curious scientist in me wanted to see what happened, I found myself being quite entertained by the absurd and humorous presentation.



Friday, October 12, 2007

Article: Security Words

Ever since I've been a kid, I've been fascinated with codes and cryptography. Of course, this topic has some great overlap with security, such as solving problems with storing, retrieving, and comparing passwords. So I was the obvious choice to implement the security engines for the software at the small start-up company that I work for.

Now this article is NOT about implementing security in Rails. This is a big and complicated topic that I do plan on writing about because implementing enterprise level security in Rails is not easy at all.

No, this is about another common topic. We often have to reset passwords for users. (We, of course, only store the hashed form of the password with salt so that no one get them!)

Now most of the time, assigned passwords are generated out of random sets of characters. This is great. A randomly generated password is very secure. But it is really hard to remember.

So I got to thinking. What if instead of passwords, we used a passphrase? That is to say, instead of joining together a series of individual characters, what if we joined together a fewer number of words? For me, at least, it is easier to remember a series of words than random characters, as I can come up with some visual or some rhyme to help me remember it.

For example, most sys-admins might give you a random password that looks something like 2qLzj94k. But what if, instead, they gave you a password like GreenRunDallasOrchard? I'd be willing to bet that you'd be much more likely to remember the 4 words better than the 8 random letters.

But this gives rise to the question: How many words would I need to string together, and from a dictionary how large, in order to match the security of a random string of characters?

Now For The Math

If you hate math, you may want to skim this section. Though I am pretty sure that if you are still actually reading this, then either you are a technical individual and like math, or you are a manager who was forced to read this by your IT staff because you just don't get it.

Normal passwords can be generated using all the letters (uppercase, and lower), digits, and a plethora of symbols. Looking at an ASCII table, it looks like there are, 94 eligible characters that one could use in a password.

Strangely enough, though, most random passwords are generated just with a subset of all letters and numbers, giving only 62 possible symbols. This reduction of 32 symbols leads to a drop of 5,877,349,279,825,920 passwords from all the possible 8 character passwords, which is a reduction in passwords of about 1/28th the fully possible 6,095,689,385,410,816!

Now to calculate the number of random passwords possible with just letters and numbers given password of a given length, we just raise 62 to the power of that length. Thus the following table hilights the number of possible passwords that exist with lengths of 4, 6, 8, and 16.


Password
Length
Possible
Passwords
462414776336
662656800235584
8628218340105584896
16621647672401706823533450263330816
Possible Passwords Composed of Letters and Numbers

So the real question then becomes: If we had a pass-phrase of 4 words, how many words would have to be in our dictionary of random words to match the security of a random string of letters and numbers of a given length?

To calculate this, it is just a matter of the reverse problem from above. We know how many passwords we want there to be, and we know the length needs to be 4, so we use some n-th roots to produce the following table:


Password
Length
Dictionary
Size
4624/462
6626/4489
8628/43844
166216/414776336
Dictionary Size Needed For 4 Word Passphrases to Match Passwords of a Given Length

The Answer For The Math Weenies

So what do all those numbers mean? They mean that to reach the security of a randomly generated 8-character password generated just of letters and numbers, we only need to pull four random words from a dictionary of 3844 words, which is a completely reasonable feat.

Indeed, doing a quick
grep -Ec "^[a-z]{3,6}$" /usr/share/dict/words
on my OS X box seems to indicate that there are 29,041 words, none of which are proper nouns, that are from 3 to 6 letters long that could be used. And expanding the list to contain proper nouns results in a dictionary of 33,925 possible words.

Thus with a dictionary of 30,000 words, it would be possible to match the security of a random 10 letter password made of letters and numbers, with only a four word passphrase! And if we increased the number of words in the passphrase to five, it would add 24,299,190,000,000,000,000,000 passphrases, which is the same security level as a 15 character alphanumeric password!