blogstrapping http://blogstrapping.com Blogstrapping is Chad Perrin's development blog, focusing on Lump development, software design, and the philosophy of programming. Wed, 22 Feb 2012 21:38:06 +0000 en-us Droll Has Class http://blogstrapping.com/?page=2012.026.17.33.38 http://blogstrapping.com/?page=2012.026.17.33.38 Sun, 01 Jan 2012 00:00:00 +0000

Droll Has Class

In the last month, give or take, I dove into my droll project again. I finally did that minor overhaul, turning it into an object oriented library that provides a Droll class. Most of the functionality within it is tucked away in private methods. I have big plans for the future, but at present the big news is the run-up to a 1.0 release.

Classy

As already mentioned, there is now a Droll class. It can be used to deal with die roll codes essentially in the style of old-school roleplaying games (e.g. 1d10+3) such as Dungeons & Dragons. It allows for much more sophisticated die code syntax, though, to indicate things like two different types of "exploding" die rolls, dice that start at 0 instead of 1, so on. Example:

droll = Droll.new '1d20+3'
puts droll.roll

Output looks something like this:

1d20+3: [18] + 3 = 21

First it shows the die code you gave it, then the raw roll number(s), followed by the modifier (+ 0 if there was no modifier), and finally the result of the die roll after an equal sign.

Execution

In addition to the core library/class, the project also provides a command line front end and an IRC dicebot front end. Each of these has a --help option, which provides useful information about how to use it. In fact, especially in the case of the dicebot, run it with the help option first. The command line front end is called droll:

droll --help
droll 1d20+3
droll -q 1d20+3

The output of the first example is basic information about how to use it. It suggests the --syntax option for finding out more about how to use it, and provides a lot of information -- and I do mean a lot of information -- about the die code syntax droll uses.

The output of the second is pretty much exactly like what you get from the "Classy" example above. The output of the third only shows the number that would come after the equal sign.

The IRC dicebot front end is called drollbot:

drollbot --help
drollbot --config
drollbot start

The first example provides simple execution instructions, as with droll. The --config option provides a bunch of information about how to create a configuration file for drollbot, and you need that configuration file for it to work. Finally, drollbot start fires up the dicebot so it can connect to the IRC channels specified in its config file and provide its functionality for your enjoyment.

Are Doc?

I added some RDoc documentation to the Droll library so you can get information via the documentation browser functionality familiar to most rubyists. For instance, there's ri:

ri Droll
ri Droll.roll

Those two examples would give you the (probably over-informative) documentation for the class itself, and for the roll instance method, respectively. Then, there's also the ability to run an RDoc server:

gem serve

This starts a local Webserver that you can access in your browser by entering this address, by default:

http://127.0.0.1:8808

Testy

I had to recreate the test suite for this thing, in part because it was previously incomplete (in that it didn't deal with all methods in the droll program) and in larger part because of the reorganization of the entire project. The test suite, however, is still a work in progress. I wouldn't call it comprehensive, yet -- just method-complete.

License-ous Behavior

As always, the droll project is available under the Open Works License, a copyfree license that pretty much allows you to do whatever you like as long as you keep the license with anything you redistribute (modified or otherwise). The license is short and simple, so it is not one of those ugly EULA- or GPL-like things that even lawyers frequently misunderstand.

The Die Is Cast

I'm looking forward to big things in the future for this library, like compound die codes and "exploding" dice whose "explosion" rolls use different die codes than the original roll (I guess that's another kind of compound die code, now that I think about it). Mostly, this kind of stuff is being put off for v2.0; for now, v1.0 is what I am trying to finish polishing up for public consumption.

Gimme!

In addition to the droll repository on Bitbucket, there's also the droll page on Ohloh. When I get around to publishing it with gemcutter, you should also be able to get droll by way of the gem install droll command. That'll be primarily for release versions only; for the latest bleeding edge version, you can download a gemfile from the Downloads page of the Bitbucket repository, then install it with a command that would look something like this:

gem install droll-1.0rc5.2.2.gem

Give Back!

Complaints, bugs, feature requests, contributions to the codebase (even if not made up of actual code), and death threats should probably be submitted via the Bitbucket repository's issue tracker, pull requests, private messaging system, and so on. Anything that does not seem to fit well within those options could be sent to me via the blogstrapping contact page I suppose, or via discussion at reddit or Hacker News (see links below). Long-term, I am probably less likely to notice the discussion options with any alacrity than the other choices, so plan accordingly.

Tim O'Reilly Gets It http://blogstrapping.com/?page=2012.014.19.45.58 http://blogstrapping.com/?page=2012.014.19.45.58 Sun, 01 Jan 2012 00:00:00 +0000

Tim O'Reilly Gets It

Abstract

Tim O'Reilly discusses an important problem underlying the Stop Online Piracy Act and PROTECT IP Act. His company is a popular publisher of quality technical books, and uses digital media and the Internet to strengthen his business, rather than regarding them and his customers as enemies like the largest media conglomerates do.

The Pedestrian Genius Of The O'Reilly Business Model

Tim O'Reilly, publisher of extraordinary (and a few ordinary) technical books beloved by geeks like me all over the place, really seems to understand the economics of the expanding markets in which he does business. Unlike hidebound corporate bureaucracies run by geriatric people referred to in mainstream news media as each having "home addresses" rather than "a home adress" like the rest of us, who are so out of touch that in most cases they barely understand how to operate a computer and can't tell the difference between "the blue e" and "the Internets", Tim O'Reilly offers anyone who has purchased a hardcopy O'Reilly book anywhere, from anyone, a five dollar ebook copy of the same text in any of several file formats -- actually, as many copies as they want, on whatever devices they own that can handle them, without DRM.

This forward-looking recognition of the digital era of publishing, with its effectively zero-cost publication costs for not just industry giants but anyone at all, allows this publishing business to grow and profit from digital distribution as marketing rather than suffer the travails of costly, ineffective attempts to enforce the unenforceable through the act of mass punishment of its best customers. This policy of embracing new technologies and markets, and customers who wish to benefit from these technologies and markets as well rather than persecute them, has made something of a loyal customer of me -- someone who is in general even more disinclined to be a "loyal customer" than most people in our increasingly disposable culture.

O'Reilly Points Out "We" Are Solving The Wrong Problem

In Before Solving a Problem, Make Sure You've Got the Right Problem, Tim O'Reilly details some substance of the real problem behind legislative arrogance such as SOPA and the PROTECT IP Act. It's worth a read.

To those of us smart enough to have spent five minutes examining the situation as it stands, and as it will continue to evolve, the enlightenment in that piece of writing is eminently obvious. To those who simply seek to impose their wills and their preconceived notions in service of maintaining a stranglehold on outdated markets, it is possible the insights Tim O'Reilly offers may never appear to be anything other than traitorous lies, regardless of their truth and obviousness for those with open eyes to see it.

Unsolicited Endorsement

By the way, I heartily recommend Ruby Best Practices, an excellent book for intermediate through advanced Rubyists (and precocious Advanced Beginners according to the Dreyfus model of skill acquisition). Buy it as cheaply as you can get it, wherever you like, then go to oreilly.com to register it and get your five dollar ebook. I did.

Barnes & Noble Doesn't Understand "Service" http://blogstrapping.com/?page=2011.356.12.57.36 http://blogstrapping.com/?page=2011.356.12.57.36 Sat, 01 Jan 2011 00:00:00 +0000

Barnes & Noble Doesn't Understand "Service"

Barnes & Noble has a history of not getting the point.

How To Completely Screw Up Coupon Code Promotions

A few months ago, I received a promotional coupon code via email, as a holder of a Barnes & Noble membership. It was a very specific coupon code, one of those things for which I usually have no use because it only grants discounts on half a dozen books or so, though they are usually pretty deep discounts. My reading habits do not tend to match up with the mainstream, so the books they offer are typically of little or no interest to me. This time, however, there was a discount for something like 40% or 50% off Neal Stephenson's new book. As it happens, my membership was on the verge of expiration, too, so I decided to get a new book and a renewed membership in one stop to the bn.com site.

That is when the trouble started. For some reason, bn.com would not accept my coupon code. I fought with it for a while, then used a contact form on the site to send a message to what one can only call a "customer service" department if one crosses one's fingers behind one's back.

It turns out that there was already a discount applied to the book due to some other promotion that lasted longer than the coupon, and because of that discount the coupon would not work. Well, okay -- except it is somewhere on the other side of absurd that I would get a coupon code for a specific product, but the coupon is invalid for that product. Remember: the coupon code expired before the discount. There was no indication that the discount was different from bn.com's normal "online price", which is generally lower than cover price anyway, either.

Well, no biggie. Barnes & Noble lost a sale, and a brick-and-mortar independent bookstore got my business when I bought the book there, full price, at a Neal Stephenson book signing instead.

How To Prevent People From Buying Your Products

More recently, I got one of those "You could save as much as 50%!" things from Barnes & Noble. I decided to give it a whirl, though I had a sneaking suspicion the coupon code would have some kind of problem. As far as I know now, after the fact, it might well have had some kind of problem, but I never got that far.

I tried quite diligently to buy a book and a Barnes & Noble membership. I failed -- or, rather, Barnes & Noble failed to sell these things to me -- without even getting far enough to enter the coupon code. This is what I sent to Barnes & Noble via the customer support contact form:

Every time I try to place an order in which I try to renew my (now expired)
membership, something goes wrong.  I was trying to place an order using
coupon code 25N5AKLUEVTQ1 for the book "The Passionate Programmer", and to
add a membership to my order, but every time I try to continue my checkout
to get to the point where I can enter my coupon code the thing pops up a
CSS window asking me if I want to buy a membership.  If I say yes, it goes
nowhere and pops up the same CSS window.  If I say no, it goes nowhere and
pops up the same CSS window.  I guess you lost a sale -- again.

I probably could have been a little more felicitous, but it is not like this is the first time bn.com has failed me. Just like last time I tried getting in touch with someone in customer support for Barnes & Noble to fix a problem I had with ordering something, it took several days for someone to get back to me. There is, of course, some kind of message on the site when sending a message to customer support that it might be some period of time, either twenty-four or forty-eight hours (I really do not recall), though in the previous experience it took notably longer than that to get a reply. I don't know about you, but if someone was trying to buy something from me and failing, I would definitely try to fix the problem pronto.

It Gets Worse

I had no intention of writing about any of this until I got my (slow, as usual) response from a "customer service representative" whose job it evidently is to send canned responses. The response did not even have my name attached to it, like any respectable automated system replying to contact from a site where the customer is logged in should; it calls me Customer.

Thank you for your e-mail.

We have received your inquiry regarding renewing your Barnes & Noble
Membership Program.  Members who have not opted for a Continuous Service
membership can renew at any register or by visiting our site at www.bn.com.
Your Barnes & Noble Membership Program expiration date will be updated at
the time your renewal fee is paid.

If, however, you would like to enroll in Continuous Service so that you may
enjoy uninterrupted benefits and offers, you may fill out a Member Update
form at any Barnes & Noble store, or simply call the Barnes & Noble Member
Service Center.

If we can be of further assistance please do not hesitate to contact us
on-line at [email address elided]. If you prefer, we can be reached at
[telephone number elided] and a customer service representative will be
happy to assist you.

This, of course, completely misses the main point -- that the site will not let me buy anything. My coupon code has expired, and a new membership will not do me any good until I want to buy something again. Rather than help me buy things from bn.com, these jokers are telling me I should go to a brick and mortar store to pay for a membership, or do so over the telephone. This is asinine. I am done playing this silly game, at least until I get another coupon code. Maybe then I will go all the way to the physical store to get a membership, then come back home and use the online coupon code. Maybe I'll just buy from Amazon instead, or walk across the street to an independent bookstore near my home if the book I want is something it carries.

You see . . . I am willing to pay a little extra to get some actual customer service. I am not as willing to engage in a lengthy back-and-forth with a low-level flunky whose only job is to copy and paste canned replies when certain keywords appear in a customer service request -- if there's a human being on the other end at all -- or play telephone tag with bureaucrats.

Past Transgressions

Customer service is not the only thing Barnes & Noble fails to understand.

In An Open Letter To Barnes & Noble About Text Files, for instance, I talked about the incredibly brain-dead fact that my Nook Simple Touch Reader works with EPUB and PDF files, among others, but is incapable of reading a simple ASCII text file.

An issue that is not entirely the fault of Barnes & Noble, though the company is complicit at least, is that of ebook pricing. The majority of fiction newer than 1923 costs the same as an ebook as it does as a physical volume. If the book is out only in hardback so far, the ebook is sold at hardback prices. I have adopted a policy of not buying an ebook unless it is enough cheaper than the hardcover to compensate me for the fact an ebook cannot be sold to a used book store. Even worse, these things typically come with DRM that prevents me from sharing with a friend.

These days, I just get most of my ebooks from other sources. Barnes & Noble is obviously not interested in retaining me as a very active customer. At least my Nook is a pretty nice device, apart from that text file lobotomy.

Epilogue

I got an email inviting me to fill out a customer satisfaction survey following my experience with Barnes & Noble customer service. Of course I took the opportunity to inform Barnes & Noble's automated system of my dissatisfaction. The page is currently sitting open to the last page of the survey, where I am asked:

Is there anything else you would like to tell us?

I will enter the URI for this blogstrapping essay in the relevant field. Maybe an explanation of my experience on the web will make more of an impression than direct contact, and someone will fix the chronic problems of bn.com's support for customer satisfaction.

By the way, the email is in that format that indicates one of two things: either spam/phishing emails or a corporation that is totally out of touch. Specifically, it is an HTML email with no plaintext fallback. The fact I was using w3m at the time as my HTML email viewing option is a big part of the reason I did not fill out a similar survey after my bn.com experience failing to buy a Neal Stephenson novel; w3m is kind of a pain in the butt to use.

I mentioned more about that sort of thing in Use Browsers With Mutt.

Use Browsers With Mutt http://blogstrapping.com/?page=2011.355.16.18.59 http://blogstrapping.com/?page=2011.355.16.18.59 Sat, 01 Jan 2011 00:00:00 +0000

Use Browsers With Mutt

I use a console-based mail user agent called Mutt to read my email. By default, it does not do anything to render HTML email. It just shows the raw HTML as plain text -- though you might need to open an HTML attachment to see it. Any remotely reasonable mail user agent or email client offers plain text as a back-up plan for cases where HTML will not render, though, so HTML-only emails are almost always spam and phishing emails, anyway.

I only really need to view an HTML email about once every six or eight months, on average. When I do, of course, I like to have some mechanism for doing so. For a while I used a script I had written myself that cleaned markup out of emails and presented it all as plain text. Alas, I managed to lose it somewhere along the way, thanks to the extreme rarity of my need for it and the fact I occasionally move to a new laptop as technology advances.

A more typical way to do things is set a configuration line in a mailcap file to direct text/html MIME types to be opened in a browser. I had tried out a browser called w3m (and distributed under the terms of the MIT/X11 License) for a while before writing my markup-cleanup script, but despite my generally keyboard-oriented way of using my computer I find w3m's interface a touch less than ideal, and got tired of it. After losing my filter script, I went back to using w3m until I got around to rewriting that program.

To use w3m to view HTML attachments, add a line like the following in either ~/.mailcap or your system-wide mailcap file (probably /etc/mailcap or something along those lines):

text/html; /path/to/w3m -T text/html '%s'; needsterminal; description=HTML Text; nametemplate=%s.html

Replace /path/to with the path to w3m on your system, of course. It should be largely self-explanatory, aside from the part that you can figure out from the w3m manpage.

I never got around to rewriting that script. Today, I decided to switch to a browser called surf instead. Like w3m, it is an MIT/X11 License application. Unlike w3m, surf is a GUI application, rather than console-based. It is also easier to use, and simple enough for purposes of viewing an HTML email once every six or eight months. The mailcap configuration I use looks something like this:

text/html; /path/to/surf 'file://%s'; nametemplate=%s.html

Enjoy.

Vim For Programming, Sometimes In C http://blogstrapping.com/?page=2011.346.00.02.43 http://blogstrapping.com/?page=2011.346.00.02.43 Sat, 01 Jan 2011 00:00:00 +0000

Vim For Programming, Sometimes In C

Most of the time, when I write code, I write it in Ruby. I also write code in languages like Perl and even Scheme. For them, I tend to like my indentation depth at two columns (character widths). Once in a while, I write code in C or even C++. For them, I like my indentation depth around four columns. For all of the above, I use Vim -- not because I love everything that Vim does, but because there are just a few things it does that I really want, above and beyond what nvi provides. What I would most prefer is actually nvi plus just a handful of extra features.

I use a custom color scheme for syntax highlighting. I don't really feel like I need syntax highlighting, and in fact without this customized color scheme I would prefer no syntax highlighting at all -- because any other color schemes I have seen (including Vim's default) are all worse than no syntax highlighting at all, at least for my tastes. I did finally go to the effort of tweaking and fine-tuning a color scheme that makes syntax highlighting a better option for my purposes than no syntax highlighting, so I use it these days.

I occasionally use folding. This is where a block of code gets collapsed into a highlighted line that tells me how many lines have been "folded" away, so I can see the basic structure of the code without having all the code within those blocks taking up space on the screen. It is difficult to explain; I recommend just doing some searches online for screenshots of folding in Vim to get an idea what I mean.

What follows is a selection of some of the configuration options I use in my ~/.vimrc file. I specifically chose the options that pertain to the above described preferences.

filetype plugin indent on

syntax enable

set autoindent
set shiftwidth=2
set tabstop=4
set expandtab

set nofoldenable
set fdm=indent

autocmd BufEnter *.txt syntax off
autocmd BufEnter *.c set shiftwidth=4
autocmd BufEnter *.cpp set shiftwidth=4
autocmd BufEnter *.h set shiftwidth=4

The filetype line basically says that Vim should use settings derived from plugins that are sensitive to the type of file currently opened in the editor.

The syntax line turns on syntax highlighting.

The autoindent line tells Vim that I want it to automatically indent and unindent lines based on the syntax of the code I write.

The shiftwidth line tells Vim how many columns of indentation it should default to starting or ending when doing its autoindent thing. I choose a value of 2 for this because that is the level of indentation I want for the high-level dynamic languages I use most often.

The tabstop line tells Vim that it should default to moving the cursor to the nearest-next multiple of four columns when I press the Tab key.

The expandtab line ensures that all tabs are actually expanded into the correct number of spaces rather than inserted into the file as the \t (tab) character.

The nofoldenable turns off folding by default, because a lot of the time I do not actually want to use folding. I tend to use za in Vim's command mode to fold something because it does some automagical stuff, including turning on foldenable the first time I use it -- and za also unfolds when I use it where there is already a fold. Check the :help foldenable and :help za for more details.

The fdm line tells Vim that I want it to use indentation levels as indicators of where a level of folding should occur. This is a good choice if your indentation discipline is good. If it is not so good, you probably should not be programming much anyway.

The autocmd BufEnter lines execute Vim commands automatically when opening a file with the specified filename format (e.g. *.txt indicating a filename ending in .txt):

  • I do not want to use syntax highlighting for plaintext English files.
  • I want indentation to default to a value of 4 for the .c, .cpp, and .h files that are common for the kinds of basic C and C++ programming that I tend to do.

I may add to this file or edit it in the future.

Zed Shaw Is Not My OSS Friend http://blogstrapping.com/?page=2011.338.15.57.06 http://blogstrapping.com/?page=2011.338.15.57.06 Sat, 01 Jan 2011 00:00:00 +0000

Zed Shaw Is Not My OSS Friend

Zed Shaw writes open source software. He seems very proud of this fact, and demands respect for it. He also gives others little or no respect that most of us can see (though I am pretty sure he probably does so in non-prominent ways that he could use as a counterargument). He uses contradictory reasoning to disagree with others, and gets indignant at the disagreement of other people.

As I recently said about the man:

He says some smart things sometimes (laced with profanity), but also says some monumentally stupid things. I think he's a classic case of a strong-personality smart guy who outwits himself, and ends up not actually examining his own assumptions enough to avoid coming off like a clueless jackass sometimes.

Let us examine the situation in a bit more depth.

This will go in your permanent record.

Zed Shaw was once famous as the creator of Mongrel, a Web server written in Ruby. He then became equally famous for blowing up at the Ruby community and excusing himself from that community's company in some kind of epic hissy fit. Over time, he has developed a number of rants, screeds, and hissy fits that are every bit as abrasive as one of Rails creator DHH's conference presentations, but less technically oriented and helpful for actual coding. His fame has gradually shifted from recognition of his work on Mongrel to recognition of his abrasive personality and sense of entitlement.

One of the more entertaining things he has created is Programming, Motherfucker -- Do you speak it?, which is in general hilarious (in a good way) and full of good ideas he took too damned far, to the point of denigrating other good ideas and dismissing them out of hand with profanity laced scorn. Basically, it's a good rant that went a little too far -- without which, of course, it would not be as fun to read.

More recently, he wrote Why I (A/L)GPL, in which he attributes much of his notoriety in the Ruby community to his use of a more permissive set of terms for distribution than he now prefers:

I wrote Mongrel and then gave it away, on the hopes that it would help a bunch of other people, and that giving it away would come back to me in some way. Maybe a job, or some respect, or hell maybe my own company doing more software like it.

. . .

After Mongrel I almost need companies to have to admit they use my software. I would actually rather nobody use my software than be in a situation where everyone is using my gear and nobody is admitting it.

Or worse, everyone is using it, and at the same time saying I can't code.

He goes on to complain that "Programmers Are Plagiarists" (direct quote, including capitalization):

You take the software, and use it like Excalibur to slay your dragon and then take the credit for it. You don't give out any credit, and in fact, I've ran into a vast majority of you who constantly try to say that I can't code as a way of covering your ass.

This all seems to add up to basically a strong sense of entitlement and hurt feelings over the fact that he's not lauded as a hero. He's so emotionally injured by all this that he goes so far as to call the world's programmers (by implication, including me and probably you) "plagiarists". For the record, I've never used Mongrel for one of my projects, and if I had I would give him credit for producing something I found useful (most likely just before saying that he may be a good hacker but would probably make an awful employee). Maybe I'll use it in the future. Maybe I'll use (or write) something else, just to avoid the taint of his karma.

This is not really about his record with Ruby, or about his hurt feelings over Mongrel, or even about his decision to use the LGPL, GPL, or even AGPL (egad!) for future projects to try to force people to give offerings at his temple. This is about his passive-aggressive attacks on people who disagree with him all while claiming everyone who writes open source software should be singing songs together around the campfire.

Freedom and courtesy go hand in hand.

More recently than all the above, he wrote Is BSD The New GPL?. In it, he starts passing out a ration of shit to people who find his recent licensing behavior objectionable. He quoted a couple of "tweets" to kick off the party, and identified what he sees as an inconsistency in their reasoning:

@jacobian @zedshaw Django uses the BSD, so Lamson can ship an example of Django integration. Lamson's GPL means Django couldn't ship w/that same code.

@jessenoller @zedshaw The problem with you using the BSD code with GPL'ed code is that the BSD'ers can't reciprocate. A company using the code can.

Wait a minute, is this some kind of trick? I thought BSD was all about "true freedom" to use their code? When did they start getting pissy about people not returning the favor? I thought that whole "viral" thing was why they hated the GPL? I guess I'm not in Lawrence, Kansas anymore.

What I'm seeing is the following double standard among BSD license users:

  1. If you are a commercial company like Apple or Google then BSD licensors love you and want you to have "true freedom".

  2. If you are a GPL project, they pressure you into releasing your code BSD licensed and get offended because you use their code.

Of course, the situation is nowhere near that simple. Only someone who has never stopped to think about why someone with different motivations might choose to use a copyfree license. For instance, even if we go with the most simplistic interpretation of a principled choice for using copyfree licenses, as he has, we should consider the consequences of that choice, which include the fact that "freedom" is a matter of universal rights -- which are still violated by people who claim to be all about freedom but restrict that of other people.

The people whose "tweets" he quotes are referring to a very simple fact that he appears to overlook willfully. He replied [1] to one of them thus:

I find it funny that people cry it's not fair I use the GPL and use their BSD software, but then don't cry when a company uses their gear.

My analysis of the situation, and specifically the essence of what both @jacobian and @jessenoller were saying, was passed along to Zed Shaw in this form:

@zedshaw -- It's not that you don't reciprocate; it's that you demand reciprocation, but hypocritically deny others the same consideration.

He also seems to completely miss the fact that a business using copyfree licensed code in a proprietary codebase can still contribute back to the original project, where the code can still be used in that project under its own license. A GPLed project, on the other hand, establishes a licensing bubble from which nothing can escape, including back to the original project.

Open source Atheros wireless network driver development ran into a similar problem a few years back, where a huge brouhaha erupted over the fact that development on these wireless drivers within the Linux community was being distributed under the terms of a copyleft license, thus preventing that development from going back into the codebase of the original driver project distributed under a copyfree license. When the dust settled, some GNU/copyleft partisans essentially ended up apologizing and ensuring the original, copyfree Atheros driver project got the code under friendly license terms after all. The bad press was evidently enough of a deterrent to the maintenance of GPL lock-in, in this case. Zed Shaw, on the other hand, seems immune to such opprobrium, just assuming it is a consequence of everyone in the world but him being the asshole.

When all is said and done, it just seems a bit hypocritical to many that someone (like Zed Shaw) might launch a protracted whine onto the Internet, belaboring the demands for reciprocation and credit that he seeks for the code he writes, then specifically sets out to write code that implicitly depends on copyfree licensed code and licenses his code under a copyleft license so that the original project can never make use of it at all. Obviously, people find his projects useful, or they would likely not give two shits about it -- but they're also obviously annoyed that he refuses to let other people use his code in the same contexts where they use the project on which it depends, especially given the hypocritical nature of the demands vs. what he himself gives others.

"Don't lecture me on freedom while denying such freedom to me," in other words.

Define "friend" for me.

At the end of his Is BSD The New GPL? complaint, we find a section entitled We're All OSS Friends Here. He opens it with a statement of his scorn for those who have differences of opinion about licensing:

There's a deeper problem here than just GPL vs. BSD. First off, I don't believe in idiotic dichotomies like this.

That does not seem very friendly, but it might be justifiable as part of a statement that we should all be friendly with each other, and this is an "idiotic dichotomy" that keeps us all from getting along. Okay, taken in a vacuum, I guess that makes a certain amount of sense. It utterly ignores the good reasons some people disagree on license choice, of course, but so far it serves as a statement of his position, which is fine. It is good to know where he stands.

Next, however, he goes on to set himself up as specifically sitting on one side of the "idiotic dichotomy" he identified (which, incidentally, makes him one of the idiots):

The real problem here is that a bunch of people who give software away get mad at other people who give software away because of some retarded difference in their interpretation of "free".

He appears to just want to impose a particular set of principles on the community of people who prefer copyfree licenses as their sole motivation for preferring them -- principles that create for him the perfect opportunity to treat them and their code exactly the same way he claims he and his code were treated by others, prompting him to choose GNU licenses instead to enforce recognition of his heroic contributions to the world. He wants people to release things under one of the several BSD Licenses if they want to, but to then be happy that he takes their code, uses it, redistributes it, builds things on it, and specifically excludes them from any recognition or reciprocal technical support that might arise from his dependence on the technical support of their projects, and what he builds with the help of that dependence.

If he did not demand from them things he is unwilling to give them, the people involved in those projects would probably be less annoyed by his behavior.

The very next thing he says is:

For fuck's sake, we're both on the same damn side! Petty squabbling over bullshit like this, and even worse doing it in such a contradictory way, is stupid. This kind of useless bickering doesn't help promote excellent software and "freedom" to express yourself with code.

I think he means "'freedom' to express myself with code", because -- as he seems to have missed -- he is refusing to allow those original projects to adhere to their own licensing standards while building on what he has made. If you want to keep the code to yourself, go ahead, but please do not presume to lecture me on how I am not supporting your freedom to express yourself with my [2] code by pointing out that you have not given me the same consideration.

Finally, he wraps up with this gem:

Because, I am releasing my code for you to use, unlike most of those corporations you seem to love.

Some developers of copyfree software certainly choose copyfree licenses because of principles that involve support of corporate business endeavors, just as some developers of copyleft software do. In fact, copyleft licenses are often used precisely because of their usefulness in the effort to create and maintain anticompetitive advantage over other organizations and open source development communities while at the same time reaping some of the benefits of open source development.

Are you familiar with "open core" development? That's where the core of something is distributed under an open source license, but extensions to it that make it actually useful to a much greater degree can only be had at great expense, supporting the corporation's business model. Copyleft licensing is far more useful for this development model because competitors cannot create similar extensions and keep their source closed. Only the owner of the copyrights for the copyleft components are able to fully take advantage of that business model.

One might notice the "open core" model sounds strangely similar to what happens to the licensing situation of a copyfree licensed project when someone starts developing copyleft extensions to it. You get a "copyfree core" project, with a more restrictive copyleft model of development based on copyleft principles.

Dear Zed Shaw:

You create a licensing trap with your extension code, so that people have to decide between reimplementing what you have done in a pseudo-cleanroom environment, giving up their own licensing choice, or simply doing without the opportunity to build on what you have created. For those who have the luxury of the time and other resources necessary to undertake the first option, that is surely the most palatable, and the last is a common choice as well. Those who choose copyfree licenses are unlikely to choose the middle option, unless compelling practical need overrides the concerns that prompted them to choose a different licensing model than you in the first place.

For those who do not choose to just drop their own licensing preferences in favor of yours, the difference between your "contributions" and you never having "contributed" at all is only that you are effectively making sanctimonious demands on the contributive spirit of others -- demands that you, yourself, would not similarly choose to fulfill.

That does not sound very friendly to me. You are not my "OSS Friend", Zed.

Notes

  1. It appears that he has somehow ensured I do not have access to his Twitter commentary on this matter any longer. Twitter keeps telling me I am "not authorized to see this status". Maybe he deleted it, or maybe there is something else going on. I am not a Twitter expert by any stretch of the imagination.

  2. Well, not exactly my code. As far as I'm aware, he has not used any of my code. I'm just sympathizing with the people whose code he has used, here.

Dennis Ritchie, Innovator http://blogstrapping.com/?page=2011.286.13.52.17 http://blogstrapping.com/?page=2011.286.13.52.17 Sat, 01 Jan 2011 00:00:00 +0000

Dennis Ritchie, Innovator

In a recent posting to a Linux User Group mailing list, one of the members mentioned that Dennis Ritchie had passed away and went on to describe his formative years when he encountered, and used, the C programming language and Unix operating systems that would never have existed in their current forms without the influence of Dennis Ritchie. The following is adapted from my response to that list.

Real Contributions

At some point in the last couple days, I heard about Dennis Ritchie's passing. As I have remarked in a few venues, Dennis Ritchie has done a lot more for technology markets, the advancement of the state of the art, and the lives and livelihoods of people with an interest in computing technologies than Steve Jobs ever did -- and remember, everyone who loves his or her smartphone, uses TiVo, or browses the Web has an interest in computing technologies. This disparity in the good these two men did applies even if we discount the damage Steve Jobs has done when we tally his contributions. Despite this, I doubt discussion of Dennis Ritchie's passing will even reach 1% the volume of discussion surrounding Steve Jobs' passing. While tech pundits all over the Internet are singing paeans to the Second Coming of Steve Jobs, a true great is being ignored except in small niches -- a great innovator whose contributions make Jobs' look like making mud pies in the sandbox and whose damage done is immeasurably small beside the tremendous harm Jobs has caused (even if we only count the harm done via his time at Apple and his cult of personality). I speak, of course, of niches such as LUGs, for those LUGs that have noticed Ritchie's passing.

We lost Dennis Ritchie just as I have altered the direction of my career, turning toward professional C and C++ development. My first C and C++ source code encounter was about a quarter century ago, and while I have touched the stuff on occasion since then (including what amounts to 1.5 college courses around the turn of the millennium and occasional piddling about since then), I have never really seriously delved into C and C++ until now. I'm also about ready to become a FreeBSD port maintainer for a paste buffer utility written in C. The timing is ironic, and as I look at the copy of Kernighan & Ritchie's The C Programming Language -- often referred to by programmers as "The White Bible" or just "K&R" -- it makes me sad that I never considered tracking Dennis Ritchie down in person to ask him to sign the book for me. I might need to take that book and my copy of Kernighan & Pike's The Unix Programming Environment on a pilgrimage to get Kernighan and Pike to place their signatures appropriately some time in the next couple years.

I am glad that subscriber to that Linux User Group mailing list was motivated to describe his experiences. I think the world needs more references to the undervalued contributions Dennis Ritchie gave to the world, both in terms of personal impact on those of us who are aware of his importance and in terms of the sweeping changes he helped bring into the lives of pretty much everyone on the planet (at least indirectly). I also think the world needs more reference to Steve Jobs' "contributions" -- a mixed bag at best -- like you (dear reader) need a hole in your head. The exception would be in cases where any new references exist for the purpose of contrasting the relative paucity and backhandedness of what Jobs has given the world with the more substantial, foundational, pervasive, lasting, and broadly positive contributions of Dennis Ritchie.

The Unrevealed Mystery

Incidentally, as I wrote the first (mailing list) draft of this, I recalled an interesting piece of writing: Dabbling in the Cryptographic World--A Story, by Dennis Ritchie. I am saddened that the day evidently never came when he felt comfortable sharing the final details of that story, and I wonder if they will ever come to light. It seems unlikely Robert Morris (Sr.) will ensure we learn the rest of the story in honor of Ritchie's evident suppressed desire to tell the tale, but I suppose hope springs eternal.

More About Debian Clang Stupidity http://blogstrapping.com/?page=2011.235.13.10.13 http://blogstrapping.com/?page=2011.235.13.10.13 Sat, 01 Jan 2011 00:00:00 +0000

More About Debian Clang Stupidity

I wrote a toy program in C that looked like this:

#include <sdtio.h>

int a, b;

int main() {
  a = 3;
  b = 5;

  printf("a = %d\n", a);
  printf("b = %d\n", b);

  a = a + b;
  b = a - b;
  a = a - b;

  printf("a = %d\n", a);
  printf("b = %d\n", b);

  return (0);
}

I tried compiling it:

$ clang varswap.c -o varswap
In file included from varswap.c:1:
In file included from /usr/include/stdio.h:28:
/usr/include/features.h:323:10: fatal error: 'bits/predefs.h' file not found
#include <bits/predefs.h>
         ^
1 diagnostic generated.

So . . . no compiled program resulted. Here's the same program again, with a critical line removed:

int a, b;

int main() {
  a = 3;
  b = 5;

  printf("a = %d\n", a);
  printf("b = %d\n", b);

  a = a + b;
  b = a - b;
  a = a - b;

  printf("a = %d\n", a);
  printf("b = %d\n", b);

  return (0);
}

I tried compiling that:

$ clang varswap.c -o varswap
varswap.c:7:3: warning: implicitly declaring C library function 'printf' with type 'int (char const *, ...)' [-pedantic]
  printf("a = %d\n", a);
  ^
varswap.c:7:3: note: please include the header <stdio.h> or explicitly provide a declaration for 'printf'
2 diagnostics generated.

Hmm. Different error message. The really interesting part, though, is that I then had a varswap file:

$ ./varswap
a = 3
b = 5
a = 5
b = 3

As mentioned in Reminding Myself How To Write Simple C, I do not have these problems on a FreeBSD laptop. It compiles with the #include <stdio.h> line without complaints, and gives me a program that Just Works.

Reminding Myself How To Write Simple C http://blogstrapping.com/?page=2011.234.16.15.35 http://blogstrapping.com/?page=2011.234.16.15.35 Sat, 01 Jan 2011 00:00:00 +0000

Reminding Myself How To Write Simple C

I have written a little bit of C in the past, including a couple of actually useful programs. Mostly, however, I have used higher level languages -- especially Perl, PHP, and Ruby. I have decided to refamiliarize myself with basic C development, though, then learn more about writing good code in C than I ever learned before. I have half a dozen C books (give or take) lurking about my bookshelves waiting to be read.

The Syllabus (Or: What I Have Lying Around)

The list currently looks something like this:

  1. Practical C Programming by Steve Oualline

    This is not anywhere near the best written beginner level book for a programming language that I have ever seen, but it does present basic concepts of the language in a fairly clear, beginner oriented manner, so it will at least provide a gentle re-introduction to C, even if not a brilliant re-introduction. It is painfully slow, too, but I will surely manage to slog through it nonetheless.

  2. The C Programming Language by Kernighan and Ritchie

    'Nuff said.

  3. The Unix Programming Environment by Kernighan and Pike

    This book is basically the Unix companion to The C Programming Language. It seems likely to be the ideal follow-up to K&R for a Unix-phile like me.

  4. Mastering Algorithms with C by Kyle Loudon

    An algorithms book focused on C seems like a pretty good choice for continuing self-education in the language.

  5. Test-Driven Development for Embedded C by James W. Grenning

    My experiences with test-driven development just keep getting more rewarding. I want to carry the benefits of TDD with me from Ruby to C. I am also thinking about tackling some simple embedded development with Arduino projects as a way to sharpen my skills and have fun hacking at something new to me, though as the book's author points out embedded development is not a requirement for making effective use of TDD for Embedded C.

  6. Designing BSD Rootkits by Joseph Kong

    Along with The Unix Programming Environment, this points me squarely in the direction of a big part of the reason I chose now to start working on C familiarity: development for FreeBSD. Of course, the book contains some assembly language too, which is something I intend to learn (unlike C, it is not a language I have ever really touched before), but I may need some other resources to help sort out the assembly language bits. I suppose I will see when the time comes.

This is far from a rigid didactic plan; it is subject to change at any time. I guess I will take something of an "agile" approach to the learning path. I will use short, goal-oriented task iterations, reassess the plans for the next step as each iteration draws to a close, and so on -- though not in nearly so formal a manner as this description might make it sound, of course.

FizzBuzz (Or: Hello World For Internet Denizens)

In the meantime, I have managed to remind myself about enough C to write a fizzbuzz implementation in about five minutes without having to use any references in the middle of it:

#include <stdio.h>

int count = 1;

int main() {
  while (count <= 100) {
    if (count % 3 == 0) printf("Fizz");
    if (count % 5 == 0) printf("Buzz");

    if ((count % 3 != 0) && (count % 5 != 0)) printf("%d", count);

    printf("\n");

    ++count;
  }

  return (0);
}

Simple enough.

Compiler Choice (Or: Crummy Platforms Are Limiting Factors)

I decided to use LLVM/Clang as my compiler of choice for these investigations into C refamiliarization.

I have been using Debian GNU/Linux for much of this year due to some hardware requirements with my new laptop that are not yet properly supported in FreeBSD. The frustrations of dealing with a Linux-based system are much magnified since I used Linux-based systems much more regularly half a dozen years ago. Things have gone downhill in the interim, to put it gently. I will not go into detail here, though. I could fill a book with the shortcomings and annoyances of dicking around with what has started to feel like a rinky-dink excuse for a Unix-like OS.

I have an older laptop that was just lying around unused for a while. An acquaintance has been convincing me to let him mentor me into becoming a FreeBSD port maintainer. As part of preparing for that, I decided to install the preview release of FreeBSD 9.0 on that older laptop. Guess what comes next.

I installed Clang on the ThinkPad T510 running Debian, then I started trying to use it. No dice; something is broken. I started trying to find some guidance on where to look for a fix by plugging error messages into Google. This is an example:

$ clang fizzbuzz.c
In file included from fizzbuzz.c:1:
In file included from /usr/include/stdio.h:28:
/usr/include/features.h:323:10: fatal error: 'bits/predefs.h' file not found
#include <bits/predefs.h>
         ^
1 diagnostic generated.

I did not get very far, because it occurred to me I would probably get more done if I installed Clang on FreeBSD and discovered a complete lack of problems. On FreeBSD, I got this result instead:

> clang fizzbuzz.c

No command output -- just a useful little a.out file. It works like a charm. I get to add yet one more ridiculous, petty failure to the "no" column when tallying up reasons for whether I should use Debian instead of FreeBSD.

Clang itself, by the way, is working great so far.

Why Use A C-Style For Loop? http://blogstrapping.com/?page=2011.234.11.39.11 http://blogstrapping.com/?page=2011.234.11.39.11 Sat, 01 Jan 2011 00:00:00 +0000

Why Use A C-Style For Loop?

Using C as the example language, I'll compare two ways to loop over an incrementing variable.

First, the for loop:

#include <stdio.h>

int counter, total;

int main() {
  total = 0;

  for (counter = 0; counter < 5; ++counter) {
    total += counter;
  }

  printf("The total is %d.\n", total);
}

Next, the while loop:

#include <stdio.h>

int counter, total;

int main() {
  counter = 0;
  total = 0;

  while (counter < 5) {
    total += counter;
    ++counter;
  }

  printf("The total is %d.\n", total);
}

A diff of the two versions is pretty simple:

5a6
>   counter = 0;
8c9
<   for (counter = 0; counter < 5; ++counter) {
---
>   while (counter < 5) {
9a11
>     ++counter;

Basically, the whole syntactic difference is that the for version combines three lines from the while version into one line. There is a pretty violent ongoing disagreement on the Web between two extreme camps with regard to line counts. One side says that line count is good (as a measure of verbosity), because it means your code is easier to read and understand. The other side says that line count is bad (as a measure of verbosity), because it means there is unnecessary complexity and bureaucratic overhead in the code. Both sides get some things right; their point of difference seems primarily to be where they focus on one good principle to the exclusion of another. Of course, most devleopers have a much more sophisticated and nuanced perspective, but many of those still tend to lean a bit too far in one direction or the other.

This looping code example -- simplistic though it may be -- is, I think, a pretty good example of where one of these two extreme camps gets something right.

Read the code for each out loud. When you are done, consider which of them would make the most sense to someone unfamiliar with C-style syntax, as a succinct plain-English explanation of what the code is doing, without adding a whole lot of text that is not immediately evident in the code. I think the explanations would go something like the following.

First, the for explanation:

"Total" starts at zero. For variable "counter" starting at zero, for counter less than five, increment counter. Add counter to total. Loop. Print total message.

Next, the while explanation:

"Total" starts at zero. Variable "counter" starts at zero. While counter is less than five, add counter to total. Increment counter. Loop. Print total message.

I don't know about you, but I think the second explanation is much clearer, and more easily translated into code in languages that do not use a C syntax with confidence that I am writing code that correctly produces the desired result. The interesting thing about that fact is, I think, that it does so while directly modeling the C syntax. In short, the while loop syntax more clearly and directly models the problem domain.

In this case, the greater verbosity camp is right, as far as pure syntactic clarity is concerned. The for loops appears to be little more than an utterly misguided bit of purpose-focused syntactic sugar. Its existence in the language, however, prompts one to use it where it seems like a natural fit -- makes it seem like the right tool for some jobs. Why would it exist, otherwise?

From where I am sitting, the only immediate syntactic downside to choosing the while loop is that, for sufficiently sophisticated operations within the loop, the while version separates the counter increment operation from the loop condition. This may be suboptimal in some way for grasping the code, but I am not entirely convinced.

I will keep my mind open to the possibility that a for loop makes for clearer code in some case, and if I find such a case I will use it, but otherwise my default will be the while approach. Counter-arguments are welcome.

Response

Brant Wedel replied in email. I made a couple of minor formatting edits and added a comment line to the C code example of a do/while loop in the second email to make it look a little better (to my eyes, at least). His commentary, with my responses attached, follows:

A counter argument is that anyone with basic proficiency in a programming language will be 'head wired' to understand a For loop, so assuming that people that know how to program in C are reading your C the For loop provides a better balance between readability, typing speed and understanding. This is especially true when people are thinking in the foreach mindset when looping through a collection of some number of elements it is largely unnecessary to actually worry about what is happening as far as counter variables and conditionals, it is better to have a for statement with a small comment (of course pointer arithmetic within a while loop over some more advanced list such as a linked list rather than a simple array easily breaks this point). . . . But in many cases from a readability viewpoint commenting a for loop to explain what is happening in the loop is vastly superior in readability to expanding the loop to a while loop especially for simple cases.

You could look at it from an instruction set viewpoint, the mind is not limited to x86 instructions, so coders have a hard wired pathway that can execute a for loop in one clock cycle where reading a while loop would take several mental cycles because of the scattered instruction parameters, like the counter being after the actual repeated block etc. these are things that I wouldn't imagine the brain would be as efficient at.

Anyway, while loops are great, but they aren't the end-all be-all of language readability and efficiency. I would hope that you use them where it its beneficial but use For loops where they are beneficial and comment more, a comment often is much better than expanded code especially where something trivial or common is taking place =P

He makes some good points. In reverse order:

  1. I do try to make sure I use the best construct for the job at hand, based on the circumstances of course, to ensure the most readable (and thus maintainable) code I can reasonably write. I do not dispute that the for loop might be preferable in some cases, given that it is already a standard part of the language. I do think it might have been better left out of the language, though. Note "might" in that sentence. I am still undecided.

  2. The reduction in mental clock cycles might be accomplished by a trade-off with state maintenance. It is often the case that you achieve reduced parsing complexity only at the expensive maintaining increased memory usage, and it seems to me that the potential reduction in the case of a for loop is often minimal in comparison to the fact that you have to remember for of its conditions while reading through the body of the loop to still come out the other end remembering how its iteration is handled. I'm not entirely sure abou this, though, and it may depend on the way the specific programmer thinks and juggles abstractions. This does not entirely contradict Brant's comment aboutary about mental cycles; it just cautions against assuming that necessarily means the for loop is the better choice all the time.

  3. I don't find that comments are a suitable replacement for clear syntax, generally speaking. Programmers get used to the for loop, so that it might look clear to those who use it regularly, but from a more objective perspective I think it takes a fairly cryptic form that must be memorized before it starts feeling clear and easy (which doesn't take long, but that step is still there). Comments actually clutter up code, particularly when they explain what you're doing rather than why you are doing it. I prefer keeping my comments succinct and limited to explaining why I do something (or making notes about stuff that needs to be changed on the next pass through, sometimes), ensuring clear ("self-documenting") code to convey the what-and-how of it.

    Of course, there is an argument to be made that the fact the crypticness of the for loop goes away for those used to reading it is all we need to justify its use, if there are other benefits (such as tighter representations of integrated pieces of code -- i.e., the increment, the condition, and the start value; such as encapsulation of the initial value within the loop syntax). After all, understanding requires at least basic knowledge of the language, and understanding the syntax of a for loop is pretty basic to the C language.

Brant sent another email, before I had written the above response:

Also you might include the

   do
   {
       /* do something */
   } while(true);

. . . possibly as a more accurate model of the domain, altho I almost never use it myself . . . probably because I'm hard wired to understand the while more than do-while and so the do-while is therefore more difficult to comprehend even if in reality sometimes its a more accurate model.

Also you might have inspired me to write my own blog/cms engine, I've been seriously looking at wordpress but I have been fighting with -- for non-blog reasons -- flexible demand scalability of cloud servers and wordpress may not be flexible enough for dynamic scaling by dynamic scaling I mean scaling up and down throughout the day to compensate for traffic on the fly on a hosted cloud platform (amazon ec2 for me). It seems the majority of information out there is for static scalability and of course MySQL has its limitations.

He makes an excellent point about the do/while loop, which appears to live in a blind spot for me, considering I did not once think of it while writing the original essay here, and even when it was pointed out to me I at first rebelled against the idea of it serving as a better syntactic expression of the computation model here than the while loop. There is a bit of a hiccup, however, in that the do/while ensures execution of the block of the loop itself at least once, which is not entirely obvious from the syntax, and might surprise an unwary programmer.

I'm heartened to hear that I'm inspiring people to write useful software, and I'll take this moment to urge Brant to write his own CMS rather than use WordPress. The popularity of WordPress is like that of PHP and MySQL, honestly; undeserved, and prone to creating problems later on down the line. His notion of dynamic scaling in a cloud deployment for a CMS sounds like a fun project to undertake, and I wish him luck.

Metaprogrammatic Hash Accessors http://blogstrapping.com/?page=2011.219.21.26.32 http://blogstrapping.com/?page=2011.219.21.26.32 Sat, 01 Jan 2011 00:00:00 +0000

Metaprogrammatic Hash Accessors

I used to have a metric crapton of accessors, individually defined the old-fashioned way, in a Ruby class (that's an object oriented class, not a school class). I was not using attr_accessor and friends for this, because the accessors do not access discrete instance variables; they access the elements of a specific hash in the instance. I had been thinking about doing something to reduce the weight of repetitive code for this, but had not ever really gotten around to it for a while. By "something", of course, I mean "metaprogramming". Eventually -- meaning a day or two ago -- Bitbucket user captainjey took a look at the code and commented to me in IRC about how I should sprinkle some metaprogramming magic on it to get rid of all the crufty repetitiveness, and while I basically said "maybe later, it works for now and I have more important things to work on," it was not long before I felt inspired to tackle the problem.

I am now using metaprogramming to accomplish what I used to do with brute force: defining accessors based on hash keys rather than discrete instance variables. I feel like I'm doing something a bit hackish/kludgey with the syntax, specifically where using the send message for send_method, but I have not come up with a more elegant way to do it. Any suggestions are welcome. You can use the contact page on this site to get in touch with me with such suggestions.

Actual Working Code

class Persona
  attr_accessor :personalia

  def initialize(personalia=Hash.new)
    @personalia = personalia

    # skip a bunch of stuff

    hash_accessors
  end

  # skip a bunch of stuff

  def hash_accessors
    @personalia.keys.each do |k|
      unless self.respond_to?(k.to_sym)
        reader_code = Proc.new { @personalia[k] }
        self.class.send :define_method, k.to_sym, reader_code

        writer_code = Proc.new {|new_value| @personalia[k] = new_value }
        self.class.send :define_method, "#{k}=".to_sym, writer_code
      end
    end
  end

  # skip a bunch of stuff
end

Code Notes

  • unless self.respond_to?(k.to_sym) exists to ensure that hash_accessors will not accidentally overwrite an accessor for which I have done something special -- and I do have one or two of those in the class.

  • reader_code = Proc.new { @personalia[k] } exists because, while define_method is supposed to be able to take a block, it does not appear to work that way when using send to send the define_method message. The same applies to writer_code = Proc.new {|new_value| @personalia[k] = new_value } later on.

  • self.class.send :define_method, k.to_sym, reader_code is the money shot for setting a getter, and also the most hackish looking part of all this, to me, along with its setter sibling self.class.send :define_method, "#{k}=".to_sym, writer_code. It basically just uses define_method to define a method with a name that matches the hash key and whose code is what's in the reader_code or writer_code proc.

Used Twice

I have actually used this technique in two different classes, in two different projects, this weekend. In both cases, accessors for the parent hashes of the keys I'm turning into method names already exist. For convenience purposes, though, I wanted the keys to become method names for accessors themselves. In the case not shown in the above example, I only use the technique to generate getters; there are no setters for the keys of that hash. Of course, in that case the hash in question (and stuff to manipulate it) is not pretty much the entire object instantiated from the class, as in the case in Persona.

Language Design and Metaprogramming

I have plans to delve more deeply into C than I ever have before, some time later this year (probably in a month or so). I expect that when I do so I will miss the metaprogramming facilities and dynamism of Ruby pretty bitterly. On the other hand, after doing some embedded work in C (which is kinda where I plan to go with it for a while, maybe starting with a low-end Arduino kit), I expect that I'll miss some of the capabilities of C when working with Ruby, too. It has been long enough since I have done anything with C right now that I rather don't miss it, because all I really remember is the amount of work that often has to be done to achieve things that are exceedingly simple in many other languages.

As I said above, I really feel like there must be a cleaner way to code this up. If there isn't, though, I have to wonder about the design decisions that led to this. It is entirely unstraightforward, in that it's the sort of thing that requires more knowledge of the fiddly bits of the language than I tend to feel such things should (yeah, and soon I'll be using C a lot more, where everything in the language is fiddly bits). I do not have enough experience with Common Lisp (precious little, in fact) to be able to judge Ruby by that benchmark, but from the way people talk about Lisp macros I expect things are probably a bit more eloquent in the realm of Lispy metaprogramming than this example might suggest about Ruby.

RVM Day http://blogstrapping.com/?page=2011.217.10.21.18 http://blogstrapping.com/?page=2011.217.10.21.18 Sat, 01 Jan 2011 00:00:00 +0000

RVM Day

You may have noticed that I'm making an effort to do more Rubinius-related stuff today (Friday, 05 August 2011) because it's rbx day. I've been doing stuff with Rubinius off-and-on anyway, ever since first giving Rubinius a try, but this is a great excuse to start doing more.

One thing I have not done that I decided to do today is start using the Ruby Version Manager, or RVM, making this RVM Day as well as RBX Day. I installed RVM for the first time, and am now trying to get used to using it. So far, it's pretty slick.

The downside so far is that it is not very clearly explained anywhere -- and not just it, but the Ruby installation options that come with it. I don't recall where, but I saw an example somewhere that suggested using rvm install ruby-head for some reason, but I do not recall it saying anything about why rbx-head rather than rbx, for instance. I imagine rbx-head must give me up-to-date rbx from the current state of the Rubinius project, but beyond that my guesswork does not provide me much in the way of specifics.

I have not yet found a single point where one can go to get a quick start with RVM that provides understanding as well as examples, thus resulting in my choices being either by-rote cargo cult installation and use or hours of sifting through help documentation and pestering people in email or IRC for details.

I know that RVM's popular use is a relatively recent thing, and of course I grant some leeway for that fact; people have likely just not had time to bring the quality of tutorial documentation up to par with the needs of new users of RVM, and it is totally understandable. I am just pointing out where that deficiency exists. Unfortunately, I am still so much a novice with RVM that I am not qualified to fill in the gaps in documentation.

Some Nascent Tutorial Stuff

Before you start just blindly following my examples below, I recommend you actually read the whole thing, and check out the linked pages. I know this is kind of a lot of reading, but this is not in fact a fully cleaned up tutorial, ready for prime time. It is more of a rambling explanation of my experiences getting started with RVM, including explanations that might be of help to other people just getting started. If you have a better (more succinct) source of information than this, you should use it -- and you should tell me about it. I have a contact page here at blogstrapping you can use to tell me about any tutorials and explanations you've found.

If I do not find something suitable before I get around to it, I may write up a complete startup tutorial at some point when I feel sufficiently well educated about all the relevancies of RVM to do so. For now, though, this is what you get from me.

Installing RVM itself employs this incantation:

bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)

The -s is "silent mode" for the curl command. Curl itself is basically a download tool (though a fair bit more, as well). By issuing that parenthesized curl command, you are effectively telling it "give me the file at the indicated URI, and don't tell me any details". Obviously, the -s can be omitted if you want more information about what is going on. If I am not mistaken in my assumption, the redirects before the opening parenthesis basically tell your shell that instead of saving the file somewhere, it should send its contents to bash for execution. This makes me wonder if it would work with other shells than bash, such as sh, but I have not investigated the matter.

The Ruby Version Manager - Installing RVM page actually suggests a complete script for getting up and running with RVM:

#!/usr/bin/env bash

# Install git
bash < <( curl -s https://rvm.beginrescueend.com/install/git )

# Install RVM
bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)

# Install some rubies
source "$HOME/.rvm/scripts/rvm"
rvm install 1.9.2,rbx,jruby

The 1.9.2,rbx,jruby part at the end of that pretty obviously seems to indicate that RVM should install the MRI/YARV implementation of Ruby 1.9.2, the Rubinius implementation, and the JRuby implementation. Edit to suit. Of course, I wanted to try something a bit more nuanced that I found elsewhere; the following example from beginrescueend.com's Ruby Version Manager - Rubinius:

rvm install rbx -- --enable-version=1.9,1.8 --default-version=1.8

Also, I already have Git installed, so that part seemed unnecessary. As such, I just ran the command using curl to install RVM, and used this rvm install command. Unfortunately, it did not work as advertised.

Rather than doing what one might expect -- installing Rubinius with support for Ruby versions 1.9 and 1.8, defaulting to 1.8 -- it errors out. Apparently, the --enable-version and --default-version options are not actually supported by whatever tool actually supports them. I may investigate this further, but for now the error messages I received did not offer much in the way of useful information for figuring it out. At that point, I just eliminated everything after rbx and ran this much shorter command to install Rubinius via RVM:

rvm install rbx

Once you have your Ruby version of choice (rbx -- right?) installed via rvm, you can execute a program using that version pretty easily:

rvm rbx program_name.rb

What else can you do? The output of rvm help is a gigantic wall of text that can be a little difficult to wade through. Spend hours on it?

Well, I suppose I could have done that as the next step in my own odyssey of learning about RVM, but it would be nice if there was something a little quicker and easier to give me a boost. In fact, the situation is worsened a bit by the fact that rvm rbx does something surprising, in that it fails to start an instance of irb, the Ruby REPL, when run without a program name argument. This is surprising because via other means I have used to install Rubinius just executing rbx with no arguments would start the REPL. In addition, while managing various different Ruby versions simultaneously with a centralized system like RVM would hopefully make things easier for admin purposes, I am not really sure this is a huge win when every time I need to run a program I need to prepend the rvm command to the command I would otherwise use.

Luckily, that is not actually required. With some prompting from j`ey in IRC, I figured out the value of this:

rvm use rbx

When j`ey mentioned this, I immediately entered the rvm help use command to get specific information about the use command for RVM. The results:

M-bM-^HM-4 rvm use [ruby-string]

Setup current shell to use a specific ruby version.


For a list of currently install ruby string please run

  rvm list

Please see documentation for further information:

  http://rvm.beginrescueend.com/rvm/basics
For additional information please visit RVM's documentation website:

    https://rvm.beginrescueend.com/
If you still cannot find what an answer to your question, find me 'wayneeseguin' in #rvm on irc.freenode.net:

    http://webchat.freenode.net/?channels=rvm

Well, this helps a lot. Specifically, this very succinct line -- the second line of text in that help output -- explains quite clearly what rvm use rbx will do for me:

Setup current shell to use a specific ruby version.

Within that shell, but not others, RVM establishes the indicated rbx install as the Ruby implementation that should be used every time I enter a ruby or irb command. Other shells will use whatever is the system default, but that one uses what I have just told it to use. This is, in fact, pretty slick.

Conclusion

So far, so good. I just wish there had been a nicely formatted page somewhere obvious explaining all of this, plus the information I still do not know about what has gone before, in a clear and helpful manner. At least trial and error, in the cases where I have used that approach to muddle through so far, has not resulted in screwing up something on my system yet (as far as I'm aware).

Perhaps, once I get some more experience and understanding with this stuff, I'll flesh out and clean up this explanation for an RVM tutorial page. Then, maybe I'll try providing it to the people maintaining RVM so they can put it somewhere obvious enough that people like me who come along in the future can gain the benefit of my experience.

RBX Day http://blogstrapping.com/?page=2011.216.21.57.29 http://blogstrapping.com/?page=2011.216.21.57.29 Sat, 01 Jan 2011 00:00:00 +0000

RBX Day

As I write this Thursday evening, tomorrow (Friday, 05 August 2011) is "rbx day", aka #rbxday.

Background

Rubinius is a VM written in C++ with, which runs bytecode produced by a Ruby parser written in Ruby. Yes, that's right; it's a bootstrapped Ruby implementation. It is also distributed under the terms of a three-clause BSD License which, as you probably know, is considered a copyfree license, an open source license, and a "Free Software" license (in the terminology of the Free Software Foundation). That makes it a winner of the open source software triple crown, I guess. Rubinius is actually pretty awesome, in my humble opinion, and among other things offers an awesome way for people learning Ruby to learn a lot more about it -- by reading the source code of a Ruby implementation in Ruby.

I wrote about Rubinius a while back, right here on blogstrapping, in Giving Rubinius a Try. It's amazing how far Rubinius has come since then -- and it was already great software at that time.

The Event

From the #rbxday In Real Life page of the Rubinius Weblog:

Originally, the idea for #rbxday was a day that people all around the world could have fun experimenting with Ruby and Rubinius. Try your application or pet Ruby project on Rubinius, or pull out that idea you've been wanting to explore, code it up, and run it on Rubinius. We are not asking anyone to contribute to Rubinius, but we would be most flattered if you wanted to dig into the Rubinius code to see what's going on under the hood. To sum up, the motto of the day is "Ruby, Rubinius, Fun Fun Fun Fun".

As of this writing, that page shows information about meatspace #rbxday meetings in Amsterdam, Netherlands; Barcelona, Spain; Los Angeles, CA, USA; Montevideo, Uruguay; Mountain View, CA, USA; Portland, OR, USA; and San Francisco, CA, USA. It does not look like there will be one near me. I have mentioned the idea in the IRC channel for my local LUG, and nobody so much as blinked. I'm not surprised, considering it's pretty well overrun by Pythonistas (and Ubuntu users).

The page also lists the following ideas for how to celebrate #rbxday and help out the Rubinius project:

  • Wearing your Rubinius shirt
  • Talking to co-workers about Rubinius
  • Asking your company to sponsor some of your time contributing to Rubinius
  • Writing libraries in Ruby
  • Writing blog posts or books about Ruby technology, like fully-concurrent threads in JRuby and Rubinius 2.0
  • Talking to people about what you find painful in Ruby and how we may be able to improve those pain points
  • Testing your code on Rubinius and submitting bug/performance reports

These are ideas other than hacking on, and contributing to, Rubinius itself, which (as pointed out on the #rbxday page) is welcome but hardly required.

They seem pretty focused on Twitter, but I find Twitter nearly useless for my own purposes. I guess I just don't get the interest people have in sitting around staring at their browsers hitting the refresh/reload button for the Twitter site for hours at a time. That's why I wrote this, instead. I might write more related to Rubinius tomorrow.

There's also an official promotional page for #rbxday, which calls it:

A global day of Rubinius performance testing, bug reporting, community strengthening and super awesome fun times.

Let's make it a party!

There's a slightly longer list of things to do for #rbxday on that page. Apart from that, there's a bit less information about #rbxday than in the Rubinius Weblog, though.

My RBX Day

For #rbxday, I have already committed a Rubinius-specific front-end tool to the Ruby project on which I've been hacking the most, recently -- an RPG character tracking library I call Persona. The standard front-end is called pertracker.rb, which I have symlinked within my execution path as pt. The Rubinius-specific version, called ptx.rb and symlinked as ptx in my execution path, does the same thing, but instead of executing the irb command on the local system (which often points the irb REPL at the MRI/YARV reference implementation of Ruby), it executes the rbx command (which specifically uses the Rubinius implementation with the irb REPL, instead). I've been using ptx instead of pt to use and test Persona functionality since I created the ptx.rb frontend script a couple days ago, and will keep using it at least through the end of tomorrow. If I do not encounter any problems with it in the meantime, I will keep using that except in cases where I specifically want to test things with MRI/YARV.

I'm also using rbx persona_tests.rb to run my test suite, in addition to ruby persona_tests.rb (which runs Ruby 1.8.7 on this system) and r persona_tests.rb (which is an alias that runs 1.9.2 on this system).

$ rbx persona_tests.rb 
Loaded suite persona_tests
Started
..............................
Finished in 0.007965 seconds.

30 tests, 43 assertions, 0 failures, 0 errors

Looks good to me. Did I mention I think Rubinius is pretty awesome? The developers that hang out in the irc.freenode.net #rubinius channel are generally pretty cool, too.

Programming With Discretion http://blogstrapping.com/?page=2011.197.13.23.05 http://blogstrapping.com/?page=2011.197.13.23.05 Sat, 01 Jan 2011 00:00:00 +0000

Programming With Discretion

In Reviews: The Book of Weird Ruby and Eloquent Ruby, I commented on the coding style Huw Collingbourne uses for Ruby in The Book of Ruby. One statement I made was:

I spent about ten minutes thinking about, and refreshing my memory of, idiomatic styles for other languages, with Google as my guide; I was trying to figure out what language might have been the foundation for this author developing the style he uses. I thought maybe he was doing something like writing C++ in Ruby (as an example -- despite the wide range of styles considered "idiomatic" for C++, none of which I am aware exactly fit the bill). I have not yet come up with a language whose idiomatic style could explain this.

There are two discussions on reddit following Reviews. In one of them, user redditornongrata said:

The camelCaseWithoutLeadingCaps convention in The Book of Ruby is a Javaism. The strange indentation and whitespace is just bizarre.

I responded:

I saw a number of different bits of formatting style in that book's sample code that could be blamed on several different languages' idiomatic styles; the Javaism is one of them. I still have no idea of any single language that could account for all, or even most, of them.

Eventually, Huw Collingbourne commented as well, in one case to offer a link to his response to my Reviews: Programming With Style. He admits right away, at the beginning of Style, that he makes an effort to avoid adopting idiomatic style for the language:

So, when I switch from one programming language to another do I change my coding style to fit the language? The answer is: up to a point. Or, to put it another way: as little as possible.

His explanation involved thinly veiled statements that adherence to idiomatic style for a language when programming in that language is essentially slavish attachment to empty traditions. His choice of terms is not quite so blunt, using phrasing like "fiercely devoted to language-specific idioms". He also states that he dislikes underscores in method names for "aesthetic" reasons.

He goes on to explain that, though apparently people often think he gets his programming style from Java, "That is simply not the case. For some insight into my stylistic preferences, however, you may want to take a look at a series of articles I wrote called Ruby The Smalltalk Way."

I suddenly recalled some similarities between his style and that of some Smalltalk code I have seen following a TechRepublic article of mine, Understanding Ruby Blocks, including the camelCaseWithoutLeadingCaps style of labeling. I have, of course, seen more Smalltalk code than that -- but not recently, so it was not the first thing that came to mind.

Even so, Smalltalk does not explain everything about Huw Collingbourne's style. For instance, I have never seen any Smalltalk code formatted quite the way he did at times. Compare this Ruby code from his book:

["hello","good day","how do you do"].each{
    |s|
    caps( s ){ |x| x.capitalize!
        puts( x )
    }
}

. . . with this Smalltalk code from a discussion comment at TR by Mark Miller:

foo := #(1 2 3 4 5 6 7 8 9 10). "1-based array"
vals := #(0 1) asOrderedCollection.
foo do: [:n |
   Transcript
      show: n asString, ': ',
            (vals at: n
                  ifAbsentPut: [(vals at: n - 1) +
                                (vals at: n - 2)]) asString;
      cr]

Note the way Mark Miller did not for some reason orphan the block argument on its own line, did not place one expression or statement on the same line as a block argument and opening delimiter while placing others on their own lines, and tended to line up indentations with placement of associated tokens on preceding lines. The style in Mark Miller's code sample tends to match the idiomatic style of Smalltalk that I have seen in the past fairly well, in addition to matching some conventions that are typically observed across different languages. It is reasonably clear and readable, tailored to the linguistic context of Smalltalk syntax, and maintains conceptual clarity by accounting well for the semantic elements of the language.

Though Smalltalk clearly does not account for all of Huw Collingbourne's quirks in Ruby programming style (one might say it does not account consistently for most of them), I can see that Smalltalk might well have had some influence on his style. The fact he writes a lot of C# might account for some parts of it as well, such as the choice of four-space indents. The weird inconsistencies, however, still seem inexplicable to me.

For those of you who are not very familiar with Ruby code, the above Ruby sample would more idiomatically be formatted thusly:

["hello", "good day", "how do you do"].each do |s|
  caps(s) do |x|
    x.capitalize!
    puts x
  end
end

. . . though most Rubyists would probably rewrite parts of it as well. A quick rewrite to suit my preferences and Ruby idioms might look like this:

['hello', 'good day', 'how do you do'].each do |s|
  caps(s) {|x| puts x.capitalize }
end

We will just ignore for now that, in The Book of Ruby, the definition of the caps method is completely useless and makes it a redundant piece of code that adds nothing but complexity to the task. The implementation of caps was used as an example of how to implement a method that takes a block as an argument, and not as an example of good software design (I hope).

In the end, I do not buy into his argument that his personal biases trump the stylistic idioms adopted by the language's community, for reasons I will touch on a bit more in a moment. Before we get there, though, I think it worth mentioning that even if I did buy his argument, I would certainly not buy his notion of the best-ever code formatting style to try to cram into the context of every language I use. I shudder at the thought of how Scheme would look using his preferred style.

On the subject of why I do not buy his argument, let us consider his objections to employing the idioms of the language as jumping-off points for arguments in favor of Ruby's idiomatic style when coding in Ruby, some of which come from Style and some from the introduction to The Book of Ruby:

  • He apparently objects to "fierce devotion", as indicated in the statement "I came across this (largely negative) review of my Ruby programming book recently, which reminded me of just how fiercely devoted to language-specific idioms some programmers are. The truth of the matter is, I am not." Whether adopting the idioms of a particular language is any form of "devotion" at all or not, this argument in no way addresses the benefits or detriments of employing such idioms. Even so, I for one have no blind devotion to empty conventions. I use the style that works -- to make collaboration with other programmers smoother and more efficient, to keep the look of my code in a given language consistent as an aid to rapid recognition, and to highlight by relative positioning important aspects of the code corresponding to key semantic relationships within the code.

  • He goes on to complain about differences in conventions for indentation in different languages, saying "I don't make a habit of setting different tab-settings when I switch from one language to another (4 spaces per tab in C#, say, but 2 in Ruby) . . .". Evidently, he has not heard of IDEs and editors that can recognize the language you are using, then automate selection of indentation standards according to language. Vim can do it; I'm pretty sure Visual Studio can as well, and I think that is what he is using.

  • He continues, with ". . . nor do I change my naming conventions for methods and variables any more than is absolutely required by the syntax of the language." There are certainly some things that, for the most part, should not be changed in labeling conventions from one language to the next -- descriptively encompassing the purpose of a variable or method, for instance. On the other hand, where the syntax of one language differs from another, different types of syntactic clutter arise that might distract the reader if naming conventions are not selected to clarify meaning.

  • He makes a curious claim, that "Some Ruby programmers have very fixed-or even obsessive-views on what constitutes a 'Ruby style' of programming. Some, for example, are passionately wedded to the idea that method_names_use_underscores while variableNamesDoNot." Of course, actual idiomatic Ruby style uses snake_case for both methods and variables. It seems odd that he would make a mistake like that in his complaints about Ruby idioms unless he simply never took the time to understand them before rejecting them.

  • He takes a surprisingly provincial view of stylistic idioms. "As far as I am concerned, the way in which you choose to write the names of identifiers in Ruby is of no interest to anyone but you or your programming colleagues." This is fine if you assume you will never have to deal with code outside of an insular little team of developers. On the other hand, taking it upon yourself to write a book about the language and distribute it widely -- a book that flies in the face of the language's idiomatic style -- is the polar opposite of dealing only within your own team of developers. The strong cultural involvement in open source development for the Ruby community places a necessary premium on consistent code style and maximizing compatibility with that style to the greatest extent reasonble. Even small commercial teams still need to be able to learn from outsiders, the foremost experts in the language and the best teachers in its community, and having to translate concepts between wildly disparate styles that are mutually nigh-incomprehensible makes that quite difficult at times.

  • He dismisses the notion that naming conventions have any value, saying that in his view "good programming style has nothing to do with naming conventions and everything to do with good code structure and clarity." Of course, given that policy, one must wonder why he does not just name his variables a, b, c, and eventually aa, ab, and so on. It is a self-defeating argument.

  • He sure does like parentheses, though. "Parentheses clarify code and avoid ambiguity that, in a highly dynamic language such as Ruby, can mean the difference between a program that works as you expect and one that is full of surprises (also known as bugs)." Of course, a rule so simple is bound to run afoul of exceptions. In fact, much of the beauty and clarity of Ruby code is tied to the fact that it uses special character punctuation much more sparingly than some other languages. The syntax of the language seems to have been especially well tuned to improve clarity with the reduction of such punctuation in cases where grouping is obvious. By contrast, close proximity of multiple layers of nested delimiters such as parentheses, brackets, and braces can make it difficult to recognize the current level of nesting. Delimiter matching functionality in your editor can help for very localized, precise efforts to untangle parenthetical nesting, but that provides very little help for quickly taking in code within the context of sorting out larger scale program flow. This is one reason why, in Ruby, it is common to leave off parentheses for method calls with relatively few arguments.

Ultimately, I do not begrudge anyone the right or desire to do his or her own thing in the privacy of his or her editor of choice. I believe that programmers' lives can often be made easier by adopting good coding style idioms, but if they for some reason find those idioms limiting, let them do what they wish when it affects only their own circumstances. A problem arises when their choices affect others, however.

First, it is in your own best interest to employ a coding style calculated to maximize the comfort of readers and the clarity and rapid comprehension of the code for readers when you wish to get any kind of input from them. That input might involve code contributions to your open source project or help from a mailing list, for instance.

Second, it is in the best interests of any goals you may have for reuse of your code to employ such a coding style, because given a choice between two codebases -- one of which they find easily readable and another of which uses inconsistent, alien stylistic choices -- the would-be user of your code will generally choose the codebase he or she finds most readily comprehensible.

Third, it is in the best interests of whoever ends up using the software in any case where anyone else will have access to your code, and -- where those users are your customers -- it is again in your own best intersts to use common, effective idioms. That is because readability is an important component of maintainability, and if other programmers are likely to have to maintain your code it needs to be readable to them.

Finally, writing a book intended to be used to introduce new Rubyists to the language should consider the maximum value such a book can provide, and make stylistic choices consistently and with the good of the reader in mind. If you must, make a case for particular deviations from idiomatic style in a limited part of the text; if the deviations are not too great, you can even use them, so long as you make it clear how and why you deviate from the norm. Overall, however, it is for the best to adhere to idiomatic style in any case where an alternate style is not the point of the book. To do otherwise is to do your readers the disservice of indoctrinating them in stylistic conventions that will set them at odds with the rest of the language's community.

. . . and that is really my biggest problem with the code style in The Book of Ruby. I may disagree with Huw Collingbourne's choice of coding style, but do not much care if he uses it for his own private purposes. I just care that he replaces idiomatic style in a book designed to impress good programming practice on new students of the Ruby language. Even that was not the reason I objected to the coding style in my review, though: ultimately, the reason I rated the coding style poorly in the context of that book is that it becomes less readable for me, thus dissuading me from buying it.

I was glad someone else mentioned the un-idiomatic style of the book before I bought it, increasing the likelihood I would give it a critical look for such problems. I figured someone else might benefit from my experience as well.

Reviews: The Book of Weird Ruby and Eloquent Ruby http://blogstrapping.com/?page=2011.194.11.46.40 http://blogstrapping.com/?page=2011.194.11.46.40 Sat, 01 Jan 2011 00:00:00 +0000

Reviews: The Book of Weird Ruby and Eloquent Ruby

No Starch Press is one of the better technical book publishers, like O'Reilly and The Pragmatic Programmers. It is offering a new book about Ruby called The Book of Ruby by Huw Collingbourne, the Technology Director for SapphireSteel Software (and, as far as I can tell, one of only two people in the company). SapphireSteel develops tools for Visual Studio, including a Ruby IDE extension to Visual Studio called Ruby in Steel, "the professional Ruby IDE" in the words of the SapphireRuby site. I'm a little skeptical of the value of Visual Studio integration for Ruby, but whatever floats your boat is good for you, I suppose.

Announcement and Availability

On the ruby-talk list, someone identified as Jon announced the publication of The Book of Ruby, something that has been included with the RubyInstaller for Windows for quite some time. Luis Lavena, the lead developer for the RubyInstaller project, informs me "'Jon' is one of the biggest contributors to the RubyInstaller project itself." For those unfamiliar with it, the RubyInstaller project has become the de facto "official" Ruby installation tool for MS Windows, and it works great for general usage. I have used RubyInstaller for occasions where I needed Ruby on an MS Windows system with no complaints, though until today I have not actually read any of The Book of Ruby; see my comments about reading PDFs on a laptop below for why.

While No Starch Press -- on its Book of Ruby page -- offers Chapter 11: Symbols as a free PDF download, Jon's announcement of Huw Collingbourne's book includes a URI for a free download of Chapter 10: Blocks, Procs, and Lambdas as well. There is a link to that download on the RubyInstaller site's homepage as well.

In addition to that, a discount code worth 30% off the cover price of $39.95 when you buy the book at the No Starch Press site is offered: RUNREADRUBY

As one response on the ruby-talk list pointed out, Amazon offers the same book for $26.37, which is about $1.60 lower than a strict 70% of the cover price, so it is a cheaper purchase at Amazon. I do not recall the shipping rates when buying from No Starch Press, but I do not think the rate is "free", whereas Amazon offers "Super Saver Shipping" (that is, free shipping) for most orders over $25. On the other hand, when I checked today, there was only one copy of the book left in stock at Amazon.

As a nice compensation for that $1.60ish plus shipping, though, No Starch Press offers free access to EPUB, Mobi, and PDF ebooks when buying the book there (and any other hardcopy book purchases there, I think). I own an ebook reader now, as you might know from An Open Letter To Barnes & Noble About Text Files -- a Barnes & Noble Nook Simple Touch Reader -- and as such, ebooks that come with the purchase of hardcopy are very tempting to me, especially that EPUB format. I would like plain text even more if my Nook supported plain text; see my Open Letter for more details about that travesty of technology design.

Even without the Nook, though, I would consider it worthwhile to have a searchable PDF on my computer in addition to the hardcopy on my shelf. Being able to carry it around on my smartphone, if I needed to quickly look up something while I'm away from both my Nook and my laptop, would be nice too. Offering it in all three of those formats is a nice touch of class from No Starch Press. I started to think I should get this book, even though I was not thinking of getting another Ruby book any time soon before that announcement on ruby-talk; I already have eight (or so) Ruby books, counting hardcopies and an ebook on my Nook, and not counting PDFs on my laptop because I find it very difficult to read a "book" on a laptop and big PDFs are annoyingly slow and sometimes slightly broken on my Nook. Big PDFs tend to serve only as reference on my laptop, rather than really readable books.

Because Amazon offers only a very limited number of copies of The Book of Ruby at a price that is not much cheaper than the No Starch Press store's price, and offers no ebook formats of it at all (not even for the Kindle, which I do not own anyway), and because of the discount code and free ebooks when buying from No Starch Press, I was about ready to pull out my wallet and buy it. There was another response to the ruby-talk list announcement that reminded me to be cautious, though. In it, Steve Klabnik said that the author's code style in the book was unlike anything one would be likely to see anywhere else in Ruby. I decided to investigate the matter by downloading one of the free chapters and forcing myself to overcome my distaste for PDF ebooks (or PDF echapters in this case).

The Book of Ruby Review Itself

After downloading and perusing Chapter 10, thinking at first "Well, it probably isn't that bad," I discovered that it is, in fact, that bad. This code is an example of what I found:

["hello","good day","how do you do"].each{
    |s|
    caps( s ){ |x| x.capitalize!
        puts( x )
    }
}

I find this (relatively) hard to read and, contrary to the explanation the author uses to justify his departures from idiomatic Ruby style, inconsistent. I spent about ten minutes thinking about, and refreshing my memory of, idiomatic styles for other languages, with Google as my guide; I was trying to figure out what language might have been the foundation for this author developing the style he uses. I thought maybe he was doing something like writing C++ in Ruby (as an example -- despite the wide range of styles considered "idiomatic" for C++, none of which I am aware exactly fit the bill). I have not yet come up with a language whose idiomatic style could explain this. If any readers can offer a suggestion, I would love to hear about it. Another clue to the origin of the author's style might be found in this code sample:

x = "hello world"

ablock = Proc.new { puts( x ) }

def aMethod( aBlockArg )
    x = "goodbye"
    aBlockArg.call
end

puts( x )
ablock.call
aMethod( ablock )
ablock.call
puts( x )

Notice the inconsistent use of camelCaseWithoutLeadingCaps, parentheses for all arguments in all circumstances, four-space indents (as opposed to Ruby's idiomatic two), and from the previous sample the tendency to use braces without preceding spaces while splitting blocks across multiple lines, bumping block arguments to the next line sometimes (but not always), and occasional inclusion of the first line of code in a block on the same line as the opening brace. I count at least four violations of idiomatic Ruby style just in the way the author uses braces for Ruby blocks.

The book does get some technical issues more precisely on-target than most other books (e.g., a definition of closures that is disguised as an off-hand, conversational throw-away commentary on closures) -- stuff that other books do not necessarily get wrong, but do tend to describe imprecisely in a way that ends up being misleading -- but it trades these items of relatively precise accuracy for others where the use of terminology appears to be misused or at least inconsistently applied, such as the way the author uses the terms "block" and "method" incautiously.

If the code style works for the author and his collaborators, great. It is not, however, code that would encourage me to contribute to the project if I came across it on Bitbucket or GitHub, and within the context of the book I find it distracting. As Steve Klabnik pointed out, you are not likely to encounter this coding style in the wild; it is distinctly un-idiomatic, and does not appear to even conform to any reasonably consistent rules of style at all. It could easily lead the reader astray; most idiomatic stylistic choices in Ruby carefully emphasize things that need emphasis in source, make things more readable within the context of Ruby syntax, and encourage good coding practices, to say nothing of the fact that having a thoroughly alien style relative to the idiomatic style of the language's community makes one a less desirable and effective contributor to others' projects.

As for the content of the book itself, I do not find that it presents any perspectives on the topics it addresses that are new to me, from what I have read so far. In fact, everything that is not superficial departure from norms without any good reason that I can find appears to be boringly conventional and uninteresting. This is fine for a beginner in the language who just wants to build a solid understanding of the language, but not for someone who has been using the language for a while and become comfortable with its basics already. Its possible value for a beginner, in my estimation, is badly undermined by its departures from idiom and even from the author's own explanations of his preferred style and the reasons he chooses to use that style.

I understand that the RubyInstaller project promotes The Book of Ruby to some extent because it was the best available option for distributing a Ruby book with RubyInstaller, and do not fault them for that. This does not mean I think it is the book a beginning Rubyist should use as an introduction to the language, however.

By Contrast, Eloquent Ruby

If you want a book well-suited to familiarizing the beginning Rubyist with the fundamentals of the language and common practice, as well as an introduction to the effective use of tools and techniques that are useful to Rubyists, I would recommend Russ Olsen's Eloquent Ruby, also available for the Nook at a very reasonable price. The prices are better for Barnes & Noble members, of course. Amazon might offer better prices for non-members.

I feel Eloquent Ruby does a better, more consistent job of explaining the concepts it addresses, covers things in a more approachable manner, and does not throw all idiomatic convention out the window for no (apparent) good reason. Also important, in my case, is the fact that even for a Rubyist with a couple of years or so under his belt already it still has something to teach the reader. It even provides a useful, if brief, introduction to test-driven development with both Test::Unit and RSpec.

Russ Olsen did a great job with his book, and I do not feel the same about Huw Collingbourne's effort at all -- even if the latter's book does have a cool ninja motif on the cover and come with free ebooks. In the case of Olsen's book Eloquent Ruby, I had to either choose between formats or spend more money than I was willing to spend on the text. Living with that choice, however, is much easier than living with the mistake of paying real money for a book whose content follows the patterns I saw in the sample chapters for The Book of Ruby.

Acknowledgment

Thanks to Luis Lavena for offering feedback on my review, allowing me to better clarify some of what I intended to say.

An Open Letter To Barnes & Noble About Text Files http://blogstrapping.com/?page=2011.183.05.19.03 http://blogstrapping.com/?page=2011.183.05.19.03 Sat, 01 Jan 2011 00:00:00 +0000

An Open Letter To Barnes & Noble About Text Files

Dear Barnes & Noble,

Your ebook reader can't even read text files. This is ridiculous.

I Finally Like Ebooks

For a very long time, I resisted the growing ebook craze, mostly because reading books in digital format was such a chore on my laptop that it would actually discourage me from reading. With the advent of the Kindle, some of that changed -- but Amazon's DRM scheme really rubbed me the wrong way, and most of the books I wanted were not available for the Kindle, or at least for any notable discount off hardcopy prices. Spending full price (incredibly even more than full price, in some cases) on a book that I do not even get to put on my shelf and loan to others on top of the exhorbitant price of ebook reader devices might seem a bit unreasonable.

Even at a discount off the hardcopy price, though, I hesitate. I have finally developed a system for deciding whether it is worthwhile to buy an ebook: if it is cheap enough to make up for the fact I cannot sell it to a used book store if it turns out to not be good enough to keep, I will consider buying it. In fact, once it meets such criteria, the ebook becomes my preferred format for reading purposes if there is not a remarkably high chance that I will want a hardcopy on my shelf for the old-school love of paper pages. It turns out that it is really difficult to beat the portability of ebooks on a good ebook reader. Still, with the prices of hardcopy books on Amazon as low as they are, the price of an ebook tends to have to get really low (in comparison to cover price for the hardcopy) to make up for the loss of resale value. Luckily, it looks like things are getting to the point where something like a third of available ebooks for sale meet my criteria. An ebook reader is finally worth owning.

I Love My Nook

I have had a Barnes & Noble Nook Simple Touch Reader for about a week as of this writing.

The Art of Unix Programming (TAOUP) is an excellent example of what I like about my Nook. In the last half-dozen years or so, I have made several attempts to read TAOUP, using PDF, HTML, and plain text digital versions of it. In each case, I never quite got past Chapter 4: Modularity (the first chapter of Part II), because of the difficulty I have with reading books on a computer (and sometimes not even that far). I have even tried using an ebook reader application on my smartphone, and found that the lack of convenient physical buttons and the small size of the screen offer some annoying obstacles to easy reading.

In the last week, in addition to getting a decent chunk of the way into two other books and reading almost an entire magazine on my Nook, I have started from scratch on TAOUP and already gotten most of the way through Chapter 7: Multiprocessing, despite the file format problems I have encountered (more on that later). I might even be making better progress through it than if I had it in hardcopy. I also still have about 60% of my initial battery charge left, which is pretty damned nice. To put that in some kind of context, take note of the fact that I've had to charge my smartphone three times since I got the Nook.

It turns out that dedicated ebook reader devices are pretty nice. There are (unsurprisingly, I hope) still some books that I will always get in hardcopy. For instance, I plan to get another copy each of Stephen Mitchell's translation of the Tao Te Ching and Fritjof Capra's The Tao of Physics -- I gave away my previous copies of both, each to a different person -- and hardcopy is what I need for both, in part because of their giftability.

Format Problems

There are some minor issues with some typical ebook formats. For instance, a BSD Magazine PDF can be a little inconvenient when I try to read it on my Nook Simple Touch Reader. An EPUB format copy would be awfully nice. In cases where there are tools for translating ebooks from one format to another, however, the tools are often pretty flawed, and sometimes produce some garbled messes where everything seems fine until you get to Chapter 4 and discover that you are reading text from Chapter 1 again, or after finishing Chapter 2 you wonder why there is another copy of the Table of Contents in front of you. Even without translating formats, however, text sometimes ends up displayed out of order because of the difference between the way PDF layouts are designed and (evidently) the way an ebook reader without much layout awareness deals with the structure of the PDF file romat. It's a bit like what often happens when you try to translate a PDF into a plain text file.

Readability has not been an issue, though waiting for the next page of a PDF to load or having to skip through a dozen pages half a page at a time because part of an earlier chapter has resurfaced due to a faulty format translation is kind of annoying from time to time. Hopefully this will not become a big problem once I start mining Project Gutenberg for reading material.

No Really -- Format Problems

You may recall when I said, a few paragraphs ago, that TAOUP is an excellent example of what I like about my Nook. Sadly, it is also an excellent example of what I find most egregiously, catastrophically awful about my Nook. The single most asinine, absurd, ridiculous thing about my Nook is its handling of text files. It's not that Nook handles them badly, mind you. The problem is that the Nook does not recognize text files at all. The Nook Color does so, but then, it is basically just a tablet computer with some of its functionality gimped -- and my experience with both tablets in general and reading ebook readers on LCD (and CRT) displays has taught me to avoid that option when I buy an ebook reader (to say nothing of the absurdity of paying double just to read plain text files).

Seriously, Barnes & Noble, plain text is unarguably the most universal digital data format in the world. In fact, every digital document format the Nook supports requires it to be able to parse text within those documents. Despite this, the plain text version of TAOUP in the form of a taoup.txt file is apparently an incomprehensible alien artifact from the point of view of the Nook Simple Touch Reader. This is not just a bad feature choice. It is perversely, stupidly, downright maliciously wrong. How can it not support plain text without someone in the Barnes & Noble executive staff saying "No, we don't want to support plain text," because of some kind of desire to make things difficult for customers? "It'll encourage them to pay for EPUBs from the Barnes & Noble store rather than get plain text from the Internet for public domain books," I imagine this hypothetical blow-dried douche wearing a suit that cost twice as much as my motorcycle saying in a mahogany-appointed conference room while sitting in a lushly cushioned chair upholstered in the flesh of baby seals and unbaptized children. It takes effort to fail to support plain text files while including support for at least six other, more complex digital document formats. Hell, you could at least have installed the more utility (or a thirty-minute minimally functional replacement for it) on the thing. How difficult is that?

Because of the problems with PDFs, I decided to use an ebook format translator to turn my taoup.txt file into an EPUB file. The result is the hash made of the book's organization mentioned above. Remember when I mentioned tables of contents repeated throughout a book and chunks of early chapters duplicated within later chapters? Yeah, that was the result of translating TAOUP from a plain text file to the EPUB format. Congratulations, Barnes & Noble; you have made the incredibly easy task of paging through a text file into an ordeal on the order of translating a WordPress database of articles into a publishable book (not a task you should attempt if you have much of a choice in the matter).

I Still Love My Nook

I will keep reading, and keep enjoying the convenience and pleasant UI of the Nook Simple Touch Reader. Thanks for making this alternative to the Kindle available to me. Sure, the Kindle supports plain text, but its drawbacks are enough to make me prefer my Nook anyway (though not as much as I preferred it before it occurred to me that it might not support plain text). It really is a great device, and I am sure a great many people at Barnes & Noble deserve to be commended for their parts in designing it.

Whoever decided that supporting plain text was a bad idea, though, needs to get fired, preferably with real fire.

Thanks for listening to an appreciative customer,

Chad Perrin

PS: Thanks to apo of #suckless for suggesting the executive summary at the beginning of this letter.

No GNUs Is Good Gnews http://blogstrapping.com/?page=2011.152.16.25.24 http://blogstrapping.com/?page=2011.152.16.25.24 Sat, 01 Jan 2011 00:00:00 +0000

No GNUs Is Good Gnews

An old children's show called The Great Space Coaster, as described by the Wikipedia article:

The Great Space Coaster is a children's television show that ran from 1981 through 1986. The series was directed by Dick Feldman, and distributed by Sunbow Productions.

The most memorable part of the show by far, for me at least, was a muppet-like character named Gary Gnu, who reported the "gnews":

A gnu newscaster who does a show each episode and is well known for his catchphrase, "No Gnews is Good Gnews with Gary...Gnu". He would add a guttural "g" sound to the beginning of any word he spoke which normally began with an "n", such as "gnews" for "news" and "gnaturally" for "naturally". Whenever introduced by either Goriddle Gorilla or Knock Knock, the introduction is always, "And Now For Something Really Gnew, Here's Gary Gnu." The only difference is that Goriddle always says "WOW!" each time he introduces Gary Gnu. Gary always begins by saying, "This Is Gary Gnu. And The No Gnews Is Good Gnews Show. The Only TV Gnews Program Guaranteed To Contain No Gnews Whatsoever." Gary's unusual speaking style was inspired by the 1957 Flanders and Swann song, The Gnu, which told the story of a gnu in a zoo who spoke much as Gary did, adding a "g" sound to the beginning of various words. Gary actually sang the song in one episode. Gary Gnu's gnewscasts were punctuated by comments and jeers from the filming crew. Occasionally he would be set up for a practical joke as the crew would call him a "turkey", followed by the dropping of a paper turkey (with Gary's picture taped over the face) onto Gnu's head, with a gobbling sound effect.

His catchphrase -- "No gnews is good gnews, with Gary . . . Gnu!" -- is what I most remember about him, of course. Apart from nostalgia, one of the reasons I remember Gary Gnu with such fondness is how well that catchphrase suits my opinion of GNU software:

The first four links in that list don't lead to information specifically about the GNU Project, or GNU software, per se. They do, however, bear directly on issues related to the GNU Project's official preferred license, the GPL.

That last article isn't specifically about GNU software or the GNU Project, either. It is, however, about some software that is heavily influenced by the GNU Project, including anything related to GNOME, and it mentions the GNU Project as one of the first four "horsemen" (actually the first horseman). Anyway, the upshot is that I have a lot of reasons to dislike GNU software.

I'm working on a new article for TechRepublic about DRM -- which sucks, but the GNU Project's attitude toward it is asinine. That comes up in the article. I guess I'm not done writing about the failings of the GNU Project.

Minor Progress on Droll and Lump http://blogstrapping.com/?page=2011.117.21.38.53 http://blogstrapping.com/?page=2011.117.21.38.53 Sat, 01 Jan 2011 00:00:00 +0000

Minor Progress on Droll and Lump

I have two personal projects that are sorta primary for me right now. One of them is a die roller program that I hacked together in December 2010. The other is Lump, the ultralight CMS that I use for blogstrapping. I have been distracted by other little things for a while, including minor tweaks to some pretty big software that needs a major overhaul rather than minor tweaks (pertrack, a "persona tracker" primarily for RPGs but also potentially useful for managing the characters in a more static work of fiction). In addition to being distracted by little nothings, I have also simply not been writing much code of any substance in general.

I am trying to get back into hack mode, now -- doing more substantive work on both droll and Lump, and planning to do something in the near future with pertrack that actually addresses the need for a major overhaul. I have managed to make a little progress on droll and Lump lately.

droll

Droll is a die rolling library written in Ruby, with a command line utility interface built in. Bundled with it is drollbot, an IRC dicebot interface to the functionality of the droll.rb library.

In recent weeks I have declared two version 1.0 release candidates and created bzip tarballs for download for those who do not want to screw around with Mercurial, created a logo for droll, fleshed out some of the help documentation, smoothed out some of its functionality, done a little code organization, and added a needed feature or two.

The release candidate status for version 1.0 pertains to droll.rb, the droll library and command line utility, itself. Drollbot is being handled separately. The release candidate downloads currently available in the Bitbucket repository only contain the library. Whenever I get around to calling drollbot a version 1.0, I will create separate tarballs for that which will include both the drollbot wrapper and the droll library, along with the peripheral stuff needed to make drollbot work, so that only a single download is needed.

I have not tested any of this stuff on MS Windows yet, though I intend to do so at some point. I have, however, been testing it using three Ruby implementations: MRI 1.8, YARV 1.9, and Rubinius (last I checked, 1.8 only). I have been using whatever release versions of each implementation I happen to have handy, and nobody else is leaping up to use these things and report back on test results so far, so to the extent your setup differs from mine -- congratuations, you are a software tester.

As things currently stand, the few things currently sitting in the issue tracker on Bitbucket that have not been resolved are either features that I have decided I want to push back for v2.0 inclusion or drollbot-related, so droll v1.0 itself is as of this writing issue-free, as far as I know. Droll v2.0 is nonexistent; no work has yet begun on the next major version.

At some point, I think I will package this up as a gem and make it available for installation via the gem utility.

If your curiosity runs that way, you can check out some development statistics for the droll project on ohloh.

Lump

Lump is not, in contrast to droll, any kind of "release candidate" yet. It frankly needs some distinct fixing-up and reorganizing to be done before I start thinking about versioning that thing toward 1.0 status. In fact, as things currently stand, the only "version" information in it is the reference to what version of the Open Works License currently applies to the Lump project.

At the moment, there is a template file (Lump depends on an eruby implementation) and a library file, plus a configuration file. This setup is unlikely to change. There are a couple other things associated with it as well; that might well change. The main template file is kind of a mess, with a bunch of the program logic (such as it is) mixed in with markup via eruby syntax. Before I call it a v1.0 release candidate, I want the template file to be very clear, with little or no program logic represented in it more obviously than by method names like page_head and page_foot. I have got a ways to go before that point, though.

I am definitely making progress along those lines, though. Today, I reorganized some of the logic, pulling some out of the template file and putting it into the library file in what I think is a halfway decently organized form, within a Lump class definition.

I do have some ideas about what I will want to do after releasing a v1.0 -- stuff that will work toward a v2.0 -- including one of:

  • switching YAML libraries
  • switching from YAML to another common serialization format
  • making up my own config file syntax

One of the reasons I have felt motivated to get off my butt and do a little more work on Lump is the fact that a friend has expressed some interest in using Lump as a replacement for WordPress. As a result, I decided I should (to put it succinctly) make Lump suck less.

Another reason is the fact that I have registered a domain name (a while ago, now) for another Weblog that I want to set up, but I think I want to set up a single deployment of Lump with multiple interfaces to it using multiple content archives so I can minimize the work involved in maintaining multiple Weblogs. Before I try that, I figure I should do some more work on the Lump code to clean it up and so on -- so there I have more motivation to code quickly and well.

There is also an ohloh page for Lump.

blogstrapping

As I work on this stuff, I expect I will have more occasions to write something for blogstrapping from time to time. I know I have been slacking.

Project Activity != Project Health http://blogstrapping.com/?page=2011.065.16.43.41 http://blogstrapping.com/?page=2011.065.16.43.41 Sat, 01 Jan 2011 00:00:00 +0000

Project Activity != Project Health

Especially since the advent of DVCS focused code sharing sites like GitHub and Mercurial, but even before that point, people seem obsessed with the level of development activity for any given open source software project. I believe this is increasingly becoming a big deal for people because of the fact that sites like this provide easily accessible measures of project activity. When metrics are easy to access and provide simple statistical data that appear to correlate strongly with desirable or undesirable characteristics of some situation, people tend to latch onto those metrics as if they provide the One True Measure of Quality.

This plays itself out everywhere. If two locales happen to have antithetical legal approaches to a given issue that is not itself synonymous with criminal activity (drug use, abortion, ownership of firearms, gay marriage, pick your poison), and those two locales also happen to differ substantially in crime rates, hordes of people will assume that one difference must necessarily be the cause of the other difference. Such people forget, or perhaps never realized, that simple correlation does not imply causation. Try telling them that, though, and they will accuse you of biases, stupidity, and anything else they can think of to dismiss your arguments rather than think about them.

While I have no direct proof of the fact, and am essentially just speculating, I believe quite strongly that in any case where an intuitively comprehensible metric is presented in a clear manner, people will overstimate the importance of that metric for making decisions about matters that may or may not actually be related. In short:

Easy access to metrics leads to overvaluation of metrics.

This has been amply illustrated by the way that corporate middle managers the world over have wasted uncounted man-years of effort trying to find the One True Metric for accurate measurement of programmer productivity. The canonical example, now thoroughly debunked amongst programmers who really know their stuff (but still held in the highest esteem of many petty bureaucrats), is the case of lines of code per time period. If you write five hundred lines of code per day, you must be a good programmer!

I touched on this before, in another venue. In lines spent, I quoted legendary computer scientist Edsger W. Dijkstra's On the cruelty of really teaching computing science, wherein he said:

From there it is only a small step to measuring "programmer productivity" in terms of "number of lines of code produced per month". This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Clearly, measuring programmer effectiveness in terms of "lines spent" alone is also a foolish approach, but Dijkstra's point is well-made. For a given block of functionality, assuming your code is well-written so as to be readable and maintainable, all else being equal, fewer lines of code is generally better. The simple words "lines spent" sum that up nicely; it is better to spend less, as long as you get optimal returns on your investment.

On the other hand, if you assume minimal lines of code for maximal functionality, all else being equal, you come back to a point where writing more lines of code means you are a more productive programmer. The problem is that there is no simple way to measure all of that. The only thing you can measure clearly in a single metric is lines of code -- and all those corporate middle managers are constantly looking for the simplest possible, single-factor metric they can use to judge what underlings to reward (or at least leave alone) and which to punish or fire. Thus, the moment "lines of code" enters discussion, many managers immediately latch onto that as the One True Measure of Quality. Once again, I find myself considering this simple statement:

Easy access to metrics leads to overvaluation of metrics.

The fact that sites like GitHub and BitBucket make project activity immediately obvious to people browsing the site is no different from the ease of counting lines of code in a corporate setting, in this regard. People tend to leap all too quickly to the conclusion that the rapidity of project commits somehow has a nearly 1:1 relationship to project health and software quality.

I was tempted to title this essay thusly:

Project Activity Metrics Considered Harmful

Fortunately, I came to my senses. Pithy it may be, but it is also sensational and, strictly speaking, inaccurate. The true harm comes from the simplistic thought processes of people who do not think much about the fact that the world is usually not so one-dimensional. Even cartoons are two-dimensional, making the tendency of many GitHub users in particular to judge project quality and health on project activity metrics worse than cartoonishly wrong.

I recently commented at Hacker News, somewhat popularly (22 upvotes as of this writing), on a counter-example to the unsophisticated idea that Project Activity == Project Health:

Y'know, there are some projects that haven't had commits for a while because they do exactly what they're supposed to do, and don't need a bunch of commits. I consider it a good thing when a project gets to the point where nobody can find any bugs and they stop adding features because it has enough of them already. I sure as hell don't want my quick, clean, elegant, productivity enhancing window manager to turn into a featuritis infected monstrosity like OpenOffice.org, after all.

I then added to the comment a little while later:

edit: Actually, my window manager of choice has basically been abandoned by its developer several years ago, and it's still bug-free, stable, and lacking basically nothing. I finally came up with a feature enhancement I'd like it to have today, after using it for five years with no complaints or wants -- and that enhancement is really just an improvement of an existing feature.

More to the point, it's an enhancement that no window manager has, so I'm likely to need to write the code for this feature enhancement myself if I want it badly enough. I may pick up maintainership for the project to do just that.

Think about that a moment: one feature enhancement requested in [half] a decade, merely an extension of an already existing feature, and it's something entirely new to window managers as far as I'm aware. This thing has had no need for additional code in all that time, and it has been better as a result of all that.

Do not mislead yourself with simplistic metrics. Do not assume that because one project sees less activity than another it is "sick" or otherwise less worthy. The same goes for feature counts, the percentage of commits that come from one source as opposed to another, or even user base.

These factors can certainly be relevant, but no single metric (nor even two of them) will likely provide a clear picture of the health and quality of a particular project.

Beware the single metric mindset.

Code Reuse and Technological Advancement http://blogstrapping.com/?page=2011.060.00.28.21 http://blogstrapping.com/?page=2011.060.00.28.21 Sat, 01 Jan 2011 00:00:00 +0000

Code Reuse and Technological Advancement

One of the holy grails of software development is code reuse. This is not the kind of code reuse where you copy code from an old program into a new program because you have written this kind of code before. This is the kind of code reuse where you write code once, then every time you need to run some code like that, you use the code that has already been written, automatically, without duplicating it.

Object oriented programming is a set of techniques for code reuse that has become incredibly popular. By building a class, a prototype method, or some other kind of categorical definition of a particular type of code segment, you essentially create a button you can press to generate a new interface to the same code, with different data attached to it, allowing effectively infinite reuse of the same code for many different circumstances.

Libraries serve the purpose of another kind of code reuse. Write the code for a library once, and forever after you can get the functionality of that library by simply accessing it from a new program. This means that, even if the same object class (to draw on the previous example) is needed in many different programs, you only have to write it once. Of course, this depends on compatibility between the library on one hand and the environment and language constraints of the new program on the other. Portable libraries with APIs accessible from multiple languages can help ensure such reusability across widely divergent use cases, but it also requires a bit more work than most people put into their libraries.

An operating system environment that is designed specifically for the purpose of allowing programs to interoperate can avoid the problems of library incompatibility; as long as the program runs on that platform, and is capable of interacting with that program's interprocess operation model, code can be written once and used many times. An excellent example is Unix pipes, which allow the output of one program to be fed directly into the input of another program, chaining them together with a deceptively simple OS facility that can be used to construct complex processions of operations to produce sometimes astounding results.

Code reuse is, in fact, merely a particularly modern example of something that has existed since the first time some prehistoric biped picked up a stick, bone, rock, or other object in its environment and used the object to gain leverage in the attempt to affect its environment, multiplying its effectiveness in that task. Give Archimedes a long enough lever and a place to stand, and he can move the world.

Every single substantive advancement in technology -- substantive, in that it both does something to increase the leverage we can apply to tasks of intentionally affecting the world around us, and comes with a greater understanding of that world or the effects of technology as productivity multipliers -- ultimately serves as the foundation for further technological advancement (at least in theory). Occasionally, some edge case may stand in the way of the use of some advancement to, in fact, apply leverage to the task of achieving the next advancement. When benighted religious zealots executed a forward thinker and burned all his books, an advancement might be lost forever, its discovery doomed to require rediscovery from first principles at some future time the same painstaking way it was originally discovered, for instance. Such cases are comparatively rare; usually the best such obstacles can do is slow us down a little.

As the rate of advancement increases, however, each delay has a more disproportionately significant effect. A delay of around twenty years thanks to an inconvenient patent a century ago could multiply the time between advances by a factor of two or three. The same delay at the turn of the millennium could multiply the time between advances by a factor of twenty or thirty. The same delay in ten years could multiply it by two or three hundred.

It's time to take a step back and explain the changing effects of such obstacles over time.

A Singular Trend

That incredibly strong, almost irresistible tendency for technology to inspire and enable the development of more advanced technology contributes to the exponential growth over time of technological advancement. Ray Kurzweil [0], author of The Singularity is Near -- technologist, visionary, futurist, and potential crackpot -- predicts the arrival of the technological singularity in or about 2045. This event, in vague, hand-wavy terms, is the point where technological advancement accelerates to such a rate that "it represents a rupture in the fabric of human history." In more precise terms, it is that point where all the rules of thumb we have adopted over the millennia for understanding social and technological change go flying out the window, because the development of disruptive technology will become so commonplace and frequent an occurrence that every time we blink the world will have changed in ways we could not have imagined moments before.

Chances are good we'll all die. Chances are about equally good we'll all become immortal. Ultimately, the defining fact of the technological singularity is that all the rules are broken; the world as we know it will end, by metamorphosing into something previously incomprehensible to us. More to the point, the metamorphosis will be ongoing, and will likely continue to accelerate at much the same rate. It is, in short, the moment when Everything Changes.

Whatever you may think of the seemingly outlandish descriptions of the technological singularity as a consequence of Kurzweil's calculations about the date of the prophesied event, the underlying calculation is simple, clean, and inescapable. All he did was calculate the approximate date that, given thousands of years of technological advancement at an exponential rate beginning with agriculture and carrying the trend forward, we will achieve artificial intelligence with computing power greater than the sum total of computing power represented by the brains of the entire human race for the equivalent of today's price of the laptop I used to write this [1]. I suppose I should have saved my money.

Given both the effectively unlimited potential benefit and effectively unlimited potential danger immanent to the technological singularity (and, more to the point, imminent -- given an ETA of approximately thirty five years from now), the question of whether we should pursue or fight this onrushing event becomes incredibly important. Should we fear it as an existential threat or pursue it as the coming of an anticipated Utopian future where we achieve immortality and godlike power? Regardless of what it represents, what it is right now is a shockingly real and relevant possibility, or even probability. Computing technology today advances more in an hour than it did in the entire ninety year period following Babbage's invention of the Analytical Engine. With that kind of headlong rush into the future, every second we waste erects new roadblocks to any attempts to alter or direct the course of that future yet to come. We had better make up our minds in a hurry. You have ten seconds to make a difference.

Are you done yet? Too bad. Too late.

Seriously, though . . .

We are faced by increasing tension between the conservative forces of modern sociopolitical authority and the progressive forces of pure, unfettered, headlong technological progress at breakneck speed. The efforts of humans to dictate the course of history by controlling the structure of the dominant social order often run afoul of technological progress. These conservative forces include the Catholic Church of the middle ages, Luddites and other labor movements of the industrial revolution, Keynesian and Chicago schools of economics, copyright and patent lobbies, environmental custodianship lobbies, stem cell research prohibition lobbies, the FCC, Net Neutrality advocates, and perhaps most visibly the dominant presence of market-manipulating, governmentally chartered legal entities known as "corporations" that attempt to ensure their financial solvency through the crushing of competition that gets too innovative (and, incidentally, oppose the Net Neutrality advocates as an equally conservative force for stagnation).

These corporations, like their mercantilist ancestors of centuries past, are only too happy to use the machinery of copyright and patent law to prohibit the development of new technologies that might compete against their own stodgy, outdated products. Just as steam engine patents stifled the continued development of engine technology for more than a decade, so too do software patents today have a chilling effect on the advancement of the software state of the art -- and software is the necessary twin to hardware in the advancement of computing power that brings the future rushing toward us at such an alarming rate.

If you are Bill Joy, former free-wheeling Berkeley hacker and producer of such productivity enhancers as vi and the Berkeley Software Distribution of Unix, you might be a neoluddite, fully cognizant of many of the implications of the accelerating advancement of technology and fearful of what it portends for the future in the form of the technological singularity. If you are Ray Kurzweil, you may be a reverent Singularitarian disciple eagerly awaiting the Rapture of the Nerds [2], almost radiant with a faith in the ultimate beneficence of the Second Coming of AI [3]. If you are more like me, you're a cynic: the glass is half empty, but it's probably better that way. Sure, billions might die, but maybe the rest of us will become immortal demigods.

Yes, the potential for the ultimate end of the human race with the coming of the technological singularity is substantial. No, we should not avoid it; we should, in fact, embrace it and pursue it with all due haste -- not at any cost, but at the cost of the "safety" we find in stagnation. It is better to try to grab the brass ring, to make the leap toward apotheosis and risk everything in the attempt, to gamble on the chance that we will avoid going out with a bang, than to accept the alternative: the absolute certainty that we will instead go out with a whimper a few years later down the line. Technology, at this point, is our only chance at salvation for the human species. Any attempts to interfere with that by banning or centrally controlling it will, at best, slow things down, throw them off balance, and make the ride rougher. At worst, they may actually grind the bullet train to a stop, and leave it to rust and crumble away to nothing.

Let us then build upon what came before, stand on the shoulders of giants to reach for the stars. Let us claw at the vault of heaven until we find purchase, tear a rent in the sky, and climb through to see what lies beyond. Let us brave the final frontier, our own human limitations, and explore the unknown regions beyond.

Let us look for ways to brush aside the obstacles set in the way of advancement by the petty machinations of small minds.

Code Reuse, Redux

If you want to hasten the advancement of technology, one thing should be clear; the reusability of technology is of paramount importance. The single most dire threat to the reusability of technology today is government. Its restrictions on the use, development, and distribution of technology affects almost every aspect of modern life, retarding such things as education about fertility modification technologies in school, research using embryonic stem cells scraped from a blob of largely undifferentiated protein soup that happens to carry human-compatible DNA, home construction of cellphones from kits, deployment of mesh networking repeaters in urban areas, and -- perhaps most harmfully -- the sharing of software and source code that could enhance a developer's productivity in the pursuit of ways to do things that are more advanced by orders of magnitude [4].

The religion of Intellectual Property has become so ingrained in the public consciousness that even the basic premises of singularity-oriented fiction, such as the roleplaying game Sufficiently Advanced, are to a significant degree centrally reliant on blind faith in the mythic creativity inspiring powers of copyright and patent law [5]. Those who dare to not only question such dogma, but demonstrate their principles by violating that dogma on a daily basis -- the founders of The Pirate Bay and WikiLeaks, for instance -- are figuratively burned at the stake for their heresy [6]. Security researchers are pilloried by corporate behemoths for the crime of sharing their research with other researchers and with the very users of those corporations' products who are most susceptible to the ill effects of the vulnerabilities they have discovered.

People are called "terrorist" for daring to point out that Microsoft often covers up the existence of critical vulnerabilities in its software rather than fixing them to protect its customers.

. . . and strict enforcement of End User License Agreements and mutually incompatible copyleft "free software" licenses force developers to reinvent the wheel daily, wasting uncounted millions of hours inventing what already exists when they could instead advance the state of the art. Every time someone distributes code under a license that is more restrictive than it needs to be, that does not attempt to approximate the conditions of a copyright-free world as much as it reasonably can, someone else will end up having to reinvent that wheel -- assuming the code in question is innovative and useful, and thus worth the effort of reusing it at all.

Programming paradigms come and go; open source licenses come and go; programming languages come and go [7]. A key benefit toward which all these things strive, and which encourages the ever-more rapid advancement of technology by building on existing technology, is code reuse. All these technical solutions to the problem of code reuse pale beside the vibrant truth of the single most effective step that could be taken toward easier, more effective code reuse:

Tear down the walls of the Church of Intellectual Property. Let my source code go.

Notes

0: Ray Kurzweil is today probably the most famous singularitarian in the world, and used a piano to play a song on national TV in the '60s that had been composed by a computer he built. Science fiction author Vernor Vinge coined the usage of the term "singularity" in the early '80s to refer to the future-historical critical acceleration of technological advancement. British mathematician I. J. Good used the phrase "intelligence explosion" to refer to the same concept -- also in the '60s -- and may be the first person on record to theorize about what would later come to be known as the technological singularity based on the very real factors in human action that compel us to work toward that event.

1: The reason the power of a computer exceeding the sum total of the computing power in the brains of the entire human race is so important is simply that this is the point at which a computer, without interference or help from a human, possesses the resources (if not at first the direction, though it seems likely someone would have come up with an answer for that problem by then) to advance the state of the technological art faster than the entire human race. The one type of advancement likely to receive the most attention from such a computer will, of course, be the improvement of computing power, thus magnifying the essentially incomprehensible (by today's standards) effect of computers on the accelerating rate of technological advancement. Ponder that for a moment, and your life may never be the same. Are you a singularitarian yet?

2: Ken MacLeod referred to the technological singularity, through the agency of a character in his novel The Cassini Division, as "the Rapture for nerds". Many singularitarians have adopted variations on that phrase as a tongue-in-cheek bit of self-deprecating humor, wearing it proudly like a badge and -- in the case of the most self-aware among them -- as a reminder to remain humble.

3: Maybe the First Coming of AI was the original 1958 specification of the (then theoretical) programming language LISP, which went on to become the go-to language for generations of artifical intelligence researchers. Maybe it was the design of the Analytical Engine. Maybe it was something else. Suggestions are welcome. If you send me a good one via my contact page, maybe the idea will end up in a novel some day, and I'll credit you by name -- or maybe the approach of the singularity will disrupt my life enough that the novel never gets written.

4: This, by the way, is one of the reasons I prefer open source Unix-like systems over closed source MS Windows systems as my development, writing, and even entertainment computing platforms. I like the fact that open source Unix-like systems, comparative to more restrictive environments such as those offered by Microsoft, allow -- nay, encourage -- me to build tools that make the task of building more tools easier, faster, and more successful.

5: . . . though at least Sufficiently Advanced has been released to the world under the terms of a Creative Commons license. Sure, it's a noncommercial license, which immediately burdens it with a terrible, reuse-discouraging set of restrictions, but at least it's not as bad as "all rights reserved".

6: No, the original Napster does not count. The Pirate Bay and WikiLeaks took principled stands against copyright and other forms of censorship. Napster, on one hand, provided a network that facilitated copyright violation, and on the other hand rabidly defended its own copyright claims, proving itself hypocritical and perhaps a touch sociopathic rather than principled and perhaps a little quixotic.

7: That is, excepting perhaps LISP, which just keeps coming. I'll leave the crudely humorous analogy to the reader's imagination.

How A Server Side Language Achieves Popularity http://blogstrapping.com/?page=2011.029.07.44.45 http://blogstrapping.com/?page=2011.029.07.44.45 Sat, 01 Jan 2011 00:00:00 +0000

How A Server Side Language Achieves Popularity

The single most popular server side Web programming language on the Web is PHP. I measure this not only by people writing applications in the language, else Java would put in a solid showing, as would a few others. Rather, I measure it also by deployments, and in how many people write code in the language merely to tweak things, to cobble together crappy one-offs, and so on. It is also worth noting that there are probably far more people learning PHP for their own purposes in Web development than Java, because the latter is so frequently learned for the primary purpose of servicing a daycoder career.

When one focuses on open source Web applications, the heavy weighting in favor of PHP gets even more distinct. Ignoring for the moment the people who focus on a single language while ignoring most else, get deeply involved in the community, and take note of every new development there, most people with an interest in Web development will have heard of more PHP-based CMSes than the next two most common languages used for such things put together. Drupal and WordPress easily arise in the minds of many. One of these is usually the first to come to mind, in fact.

There are two reasons generally cited as explanations for PHP's popularity success:

  1. Accessibility: PHP is everywhere, on the servers maintained by pretty much every shared hosting provider that offers support for server side Web development above and beyond the humble SSI.

  2. Accessibility: PHP is incredibly easy to pick up and use for the simplest tasks, such that for many it turns simplistic Web development into unskilled labor. Thousands of "dynamic Websites" are floating around out there, built in PHP by people who are not even aware of the difference between include() and require() in the language.

In some respects, both of these factors are simply different applications of a single factor, as I tried to make abundantly clear above: accessibility.

I believe that accessibility is one of the most important factors in the likely popularity of Web development languages at this point. It is possible this may change, given how quickly development technologies at the bleeding edge evolve, but I find it difficult at this time to conceive of a more important factor.

For server side development, no other turing complete server side development language is as available (first factor) as PHP and Perl. Perl is in fact still more available than PHP, via CGI scripting, but it falls short of PHP for accessibility because CGI is less approachable (the second factor) than markup-embedded templating. While markup-embedded templating options are available for Perl, they are not as widely available as for PHP, which offers such templating as its default mechanism for use on shared hosting providers.

For form's sake, let us deal with the two criteria I attempt to carefully include:

  • Eliminating "Turing complete" from our criteria, our field of languages is broadened to include such popularity heavyweights as SSI, SQL, (X)HTML, and CSS. I hope it would be reasonably easy for most developers to understand why all of these should be excluded from consideration here.

  • Eliminating "server side" from our criteria, our field of languages is broadened to include languages like ECMAScript and VBScript. VBScript, for the most part, is ignorable because of its more limited availability if nothing else -- especially given the waning dominance of the Internet Explorer browser even on MS Windows, a browser that also supports JScript (an ECMAScript implementation roughly identical to JavaScript).

ECMAScript's most widely-known implementation name is JavaScript, and is generally used as a client side scripting language with an interpreter implementation embedded in almost every browser in general use. It is, in fact, the one Turing complete language more available than PHP for Web development. It is also more approachable in some respects, because of the fact that little snippets of it -- such as the onload "event" -- are almost seamlessly integrated with the major markup languages of the Web. If any language is poised to take over the title of most popular server side Web development language from PHP, I believe it is ECMAScript.

ECMAScript is more widely deployed than you may think, in fact. It is not only the default, nigh-universal browser scripting language; it is also the language of Adobe Flash, Adobe Flex, and AIR (by way of Flash and Flex), implemented as ActionScript. Beyond scripting content delivered by the browser, the JavaScript implementation of ECMAScript is also increasingly used as the UI development language for browsers such as Chromium and Firefox. One can do some gnarly things to a browser's native interface using good ol' ECMAScript.

The language is even making its way onto the server, for server side Web development. One of the major steps in that direction was the development of Rhino, a stand-alone JavaScript engine written in Java; another was Spidermonkey, written in C. Both of these serve as the basis of several server side Web development scripting engines. Others have arisen as well, including Google's V8 JavaScript engine. The increasingly popular event-driven Node.js environment is built on V8.

Thus far, however, widespread availability of server side ECMAScript offered by shared hosting providers eludes us. It is difficult to get buy-in for a new language on shared hosting platforms when PHP is so popular and already available. Hosting providers, for the most part, would prefer to provide as little as possible; PHP is available mostly because it is a dire necessity of general purpose hosting providers that want to have customers who do their own server side development. SSI and Perl/CGI just don't cut it any longer. That universal availability supports PHP's popularity; PHP's amazing levels of server side popularity supports its universal availability. Server side ECMAScript has neither on the server.

Remember, I said that if any language is poised to take over the title of most popular server side dev language right now, it is ECMAScript. These hurdles are tremendous, and make the likelihood of its widespread adoption very small. This does not mean I was wrong about ECMAScript; it just means that the hurdles faced by other languages are even more prohibitive. Ruby may be the next most likely, via the Rails framework -- but while that has had more success making inroads to widespread availability on servers maintained by hosting providers (where the shared hosting providers are the lowest denominator), it faces a certain amount of resistance to growth at the high end of popularity thanks to the dominating aspects of PHP. After all, with Ruby on Rails you need to learn a specific framework to use it.

Any general purpose Web hosting provider that offers Ruby on Rails and Perl/CGI also offers Ruby/CGI -- or, at least, enough of them that the exceptions are negligible for our purposes. This also provides for a simple means of getting arbitrary, largely PHP-style templating, by way of a clever little hack-around known as eRuby. The way eRuby works is simple enough, from the perspective of the technically proficient: place an eRuby program in your domain's cgi-bin directory, and make sure your server is configured to recognize certain file types as needing to be parsed by the eRuby program. This last step is normally accomplished by adding directives to the .htaccess file on Apache hosts.

The popularity of Rails is a phenomenon not easily dissected and reproduced. It seems almost to have arisen spontaneously, and this is the basis of everything that has followed for Web development popularity of the Ruby language, including eRuby implementations. Some of this popularity could be leveraged by other languages (such as ECMAScript) by jumping directly to a CGI implementation, bypassing the other factors that led Ruby to the point of having its eRuby tool. For that to work, of course, one needs to actually implement the entire language within CGI rather than just create a template parser in the cgi-bin directory that passes code off to a non-CGI interpreter. While this latter approach is how eRuby is becoming widely available, it only works because of the growing ubiquity of Rails; without that kind of fortuitous foot in the door, other languages have to bootstrap themselves via CGI.

This is, naturally, a lot of work. Those eRuby implementations I have looked at do not implement Ruby as a whole without relying on some outside interpreter or VM. If there is a complete, isolated eRuby implementation that comes with its own embedded Ruby interpreter or VM and relies on no outside software to work from the cgi-bin directory, I have not encountered it. That does not change the fact that such a tool would be a huge boon to early availability for lowest denominator Web developers, using shared hosting providers that offer cookie-cutter server side programming options.

There are quite a few JavaScript engines out there, and there are many server side development options based on these engines, but growth of the popularity of ECMAScript as a server side Web development language is still tentative at this point. Something as "simple" (for lack of a better term) as a pure-CGI implementation of ECMAScript, complete with markup embedded template syntax support, would be a gigantic leg up on popularity. The same goes for just about any other high level dynamic language that has aspirations of widespread server side Web development adoption.

Unfortunately, even the CGI implementation approach is suboptimal. It requires the installation of the interpreter needed to do your server side development, and many people have enough trouble getting to the point where they can write valid PHP (which is pretty sad, given PHP's anemic syntax). Even worse, anything that depends on an outside implementation of yet another language that is not widely deployed on shared hosting provider servers, such as Rhino, is doomed to obscurity -- relative to PHP, at least.

If ECMAScript could somehow gain at least nearly the same availability on the server side as PHP has, that -- coupled with client side ubiquity and the near-PHP approachability of the language itself (in some ways greater approachability in the case of its client side scripting) -- could be all that is needed to push server side ECMAScript into direct competition with PHP for its currently unassailable position as King of the Hill where general server side Web development popularity is concerned. Considering the general superiority of language design of ECMAScript over PHP, it would be awfully nice.

In the meantime, I will use Ruby. I like Ruby more than ECMAScript, I know it better than ECMAScript, and it is a lot more accessible as a server side scripting language than ECMAScript. It is also a darn sight more pleasant to use than PHP. Such is life.

Fast Die Roll Permutations Still Elusive http://blogstrapping.com/?page=2011.026.04.22.06 http://blogstrapping.com/?page=2011.026.04.22.06 Sat, 01 Jan 2011 00:00:00 +0000

Fast Die Roll Permutations Still Elusive

In How Can I Find the Number of Permutations per Sum?, I discussed a little programming problem I was trying to solve:

I started trying to automate the process of measuring the probabilities of achieving particular rolls. I wrote some code in Ruby to do so, and the code works -- as long as system resources hold up. The problem is that my code stores numbers in arrays and for any nontrivial example combination of dice the number of possible rolls quickly outstrips my computer's capacity to store all those numbers in arrays that rapidly expand to ridiculous sizes.

Since publishing that essay at blogstrapping, my little calculation problem has garnered some attention from others -- some of them people I know personally, some who I know somewhat impersonally, and others that I do not know at all. Fellow TechRepublic contributor Justin James recently wrote a program in IronRuby to take a whack at solving my problem, and wrote an article about the exercise for TechRepublic. His article, My first IronRuby application, includes source code for his solution. Unfortunately, something awful happened to the formatting, and I am not feeling the urge to try untangling it right now. Luckily, he sent me some code prior to publication that I think is probably pretty close to what ended up in the article:

def calculate (iteration, low, high, currentsum, output)
    if (iteration == 1)
        low.upto(high) do |value|
            newsum = currentsum + value
            output[newsum] += 1
        end
    else
        low.upto(high) do |value|
            calculate(iteration - 1, low, high, currentsum + value, output)
        end
    end

    return output
end

diceInput = ARGV[0].to_i
lowInput = ARGV[1].to_i
highInput = ARGV[2].to_i

if (diceInput < 1)
    puts "You must use at least one dice."
    return
end

initResults = Hash.new

(lowInput * diceInput).upto(highInput * diceInput) do |value|
    initResults[value] = 0
end

results = calculate(diceInput, lowInput, highInput, 0, initResults)

results.each do |result|
    puts "#{result}"
end

puts "Press Enter to quit..."
gets

For the record, I think his "Press Enter" thing at the end might be an artifact of the way he wrote the code for IronRuby; I just delete that crap when I run it on my FreeBSD laptop from the shell prompt.

Justin's result is certainly faster than my first, naively brute-force, attempt. It just is not anywhere near enough faster. It starts getting unusably slow around the point where you are counting permutations for fifteen dice or so. This is because, while parts of the program are better optimized than mine, it suffers the same problem: it takes a brute force recursive approach to calculating permutations, where at least one operation has to be performed for half the total possible permutations.

By way of a friend, someone suggested using Pascal's Triangle as the basis of a function that would generate the correct number of permutations for each possible result of any given die roll. I beat my head against that problem for a little while, then gave up for a while. In the process, though, I ended up writing two or three different versions of a Pascal's Triangle class for Ruby. I expect to polish it up a little and stick it in BitBucket soonish.

The problem with the Pascal's Triangle approach was not speed. It was blazing fast. It was not that I could not generate the correct numbers for many cases (such as three six-sided dice). Before explaining the problem, I will explain the approach.

Pascal's Triangle is an infinite set of numbers canonically generated by starting with a one, then a row containing two ones below it; for every successive row, add together every pair of numbers from the preceding row, and add a one to each end. Thus, a twelve-row Pascal's Triangle looks like this:

                            1

                         1    1

                       1    2    1

                    1    3    3    1

                  1    4    6    4    1

               1    5   10   10    5    1

             1    6   15   20   15    6    1

          1    7   21   35   35   21    7    1

        1    8   28   56   70   56   28    8    1

     1    9   36   84  126  126   84   36    9    1

   1   10   45  120  210  252  210  120   45   10    1

1   11   55  165  330  462  462  330  165   55   11    1

The basic insight that makes this look like the source of a solution to the dice roll permutations problem is the fact that, for any die type with N faces, a rightward-sloping diagonal starting on the row corresponding with the number of dice rolled contains a series of numbers that matches the number of permutations for the first N possible rolls. For example, if you roll 3d6, count down to the third row (which reads 1 2 1); starting with the leftmost 1, follow the rightward-sloping diagonal and you will find numbers that match the number of permutations that will produce the first six results (3, 4, 5, 6, 7, 8). Those numbers of permutations are 1, 3, 6, 10, 15, and 21.

Past that point, of course, things change. For a roll of 3d6, the permutations look like the following:

 3       1
 4       3
 5       6
 6      10
 7      15
 8      21
 9      25
10      27
11      27
12      25
13      21
14      15
15      10
16       6
17       3
18       1

The curve is always symmetrical, so you only have to calculate the first half, then you can mirror that set of results. The problem is calculating the numbers in the first half (1, 3, 6, 10, 15, 21, 25, 27) that come after the first six: 25 and 27. These numbers do not match the next two numbers after 21 in the series taken from Pascal's Triangle: 28 and 36.

It turned out to be reasonably easy to figure out a function that will work for two or three dice. It works for two because, when you get through a number of results equal to the number of faces on a die, you're done with the first half of the "curve", which means you do not actually need to use anything other than the diagonal series from Pascal's Triangle. Unfortunately, it does not work for four. I screwed around looking for a generalized pattern that works for arbitrary numbers of dice, and failed to find one. At least I developed a better understanding of Pascal's Triangle, and maybe I will find a way to do this with Pascal's Triangle eventually. On the other hand, maybe there is no way to use Pascal's Triangle for this.

I received some feedback from various readers of blogstrapping via the contact form. Interesting suggestions included:

  • "Look up 'convolution sums'" (complete with source in a follow-up)
  • "looking for the sequence of numbers online in the sequence database. (http://oeis.org/Seis.html)"

The first was an interesting approach that differed from both mine and Justin's, but looked like yet another (reduced) brute force approach. Maybe I misread it; I will look at it in more depth later. I am still burned out on this after fighting with the Pascal's Triangle approach for a while.

The second included two specific links in the sequence database:

I have not had the inspiration to look at the two pages from the sequence database in any depth yet, but I'm listing them here in part to provide myself an easy sort of "bookmark" to remind me to check into it later.

Then . . . someone Justin knows -- Fancisco Blanco-Silva -- produced an implementation in comments following his TechRepublic article. 100 100 in about a minute is the comment that holds the source. The claim is that it generates the numbers of permutations for all the possible outcomes of a 100d100 roll in about a minute -- later specifically described as seventy seconds.

Of course, my results showed about eleven or twelve minutes using MRI for Ruby 1.8.7, and about six minutes using YARV for Ruby 1.9.2 instead. Maybe IronRuby is faster at executing this specific algorithm than MRI and YARV, and maybe Francisco is using IronRuby; maybe Francisco is using a computer that totally outclasses my ThinkPad T60; maybe the stars just aren't aligning for me. All I know is that even seventy seconds is longer than I would like, and it will still end up taking longer periods of time very quickly as you increase the number of virtual dice.

Francisco wrote up an explanation of his approach and posted it to the Web: Unusual dice. He makes use of a pattern I had intuited while thinking about the problem early on, where listing the permutations for one die in a row, for a number of rows equal to the number of dice, each row shifted one column to the right from the placement of the row above it, provides the basis for simple addition to generate the needed permutation totals.

It is faster than my really naive approach; it is faster than Justin's; it is still too slow (and resource heavy) for practical use in a relatively unconstrained context -- like providing a Webpage that people can use to generate their own permutation counts. Permutations for all results in 100d100 can at least be calculated within a period of time that is by some lights reasonable, with Francisco's approach. If my only need was to produce a set of permutations for 100d100, I would have been done the first time I spent more than ten minutes waiting for the program to run. What I want, though, is a generally usable application, so it kinda falls short. Even one minute is a bit much for something like that.

I did a minimal rewrite of Francisco's program -- renaming variables, replacing a for loop and an Array#cycle(1) message with Array#each iterators, and monkeypatching the Array class for aesthetic purposes -- for purposes of making it a little easier for me to follow. Performance does not really change with my rewrite. Francisco asked me to share my version of his code (I think), but code sample display is not one of the strengths of TechRepublic's discussion software. I decided to post it here, instead of there:

#!/usr/bin/env ruby

class Array
  def sum
    self.inject(0) {|total,num| total + num }
  end

  def fprint
    print self.join(' '), "\n"
  end
end

def build_struct(width, height)
  Array.new(width) { Array.new(height, 0) }
end

def count_perms(faces, perms, dice)
  if dice == 1
    return perms
  else
    perm_struct = build_struct(faces, faces + perms.length - 1)
    (0...faces).each do |key|
      perm_struct[key][key...perms.length+key] = perms
    end

    mirror_struct = perm_struct.transpose
    count_perms(
      faces,
      Array.new(faces + perms.length - 1) {|x| mirror_struct[x].sum },
      dice - 1
    )
  end
end

def get_permutations(dice, faces)
  perms = Array.new faces, 1
  if dice == 1
    return perms
  else
    return count_perms(faces, perms, dice)
  end
end

if $0 == __FILE__
  dice  = ARGV[0].to_i
  faces = ARGV[1].to_i

  get_permutations(dice, faces).fprint
end

That's it for now.

To Be Continued . . .

Slate Invaders: Why You Should Use Two Spaces Between Sentences http://blogstrapping.com/?page=2011.014.10.05.57 http://blogstrapping.com/?page=2011.014.10.05.57 Sat, 01 Jan 2011 00:00:00 +0000

Slate Invaders: Why You Should Use Two Spaces Between Sentences

This may not look like much of a programming related piece, but it is, at least peripherally -- because of the matter of fixed width fonts, if nothing else. Programmers rely heavily on fixed width fonts to make reading, writing, and editing source code much easier and less error-prone. In addition, it is in part for the same reasons I am inclined toward thinking like a programmer that I think much of what I do about this subject.

So, without further ado . . .


Slate featured an opinion piece by Farhad Manjoo about spacing between sentences, Space Invaders: Why you should never, ever use two spaces after a period. If you are a programmer, a pedant, mildly prone to obsessive-compulsive behavior, thoughtful, or all of the above (like me), you might notice Farhad's second mistake right away: he does not consciously realize he is actually talking about spaces between sentences, and not spaces after periods per se.

He goes on a two-page rant about the reasons that everybody he knows other than journalistic publishing professionals is just dead-wrong about how many spaces should be used between sentences; only he and those in his immediate professional sphere know the Secret Mysteries of Proper Spacing Between Sentences (or After Periods, in his imprecise phrasing). Meanwhile, as I type this, my sentences are all separated by two spaces where they are not separated by paragraph breaks.

The points in his argument break down like this:

  1. Julian Assange is a blowhard with a "puffed-up personality" whose writing is "overwrought, self-important, and dorky," so we definitely do not want to be associated with him by using two spaces between sentences like that guy.

  2. Continuing with his personal assault on Assange, he points out that using two spaces between sentences is "antiquated":

    Here's a fellow who's been using computers since at least the mid-1980s, a guy whose globetrotting tech-wizardry has come to symbolize all that's revolutionary about the digital age. Yet when he sits down to type, Julian Assange reverts to an antiquated habit that would not have been out of place in the secretarial pools of the 1950s: He uses two spaces after every period.

  3. Monospace typewriters are to blame for two spaces between sentences becoming a fairly universal standard due to the technical limitations of the devices, and the oldness of typewriters apparently offends him. As a result, a strong mid-20th century tradition of inserting two spaces between sentences arose. Traditions like that (unless they are traditions that support his own preferences) are bad, and should be stamped out, because they are not modern enough:

    The only reason today's teachers learned to use two spaces is because their teachers were in the grip of old-school technology. We would never accept teachers pushing other outmoded ideas on kids because that's what was popular back when they were in school.

  4. While two spaces between sentences makes sense for monospace fonts (a rare admission from this man), "we've all switched to modern fonts" -- where "modern" apparently means "proportional" and not "monospace".

  5. Readability is better with only one space than two, Farhad asserts.

  6. . . . but really, there is no proof that either one space or two is better for readability in any case, because he cannot find any studies linking the number of spaces between sentences with effects on readability. It is all abitrary. Since it is all arbitrary, we should just use his arbitrary preference.

  7. Using two spaces between sentences is the action of aesthetically challenged Philistines, of mere commoners without Farhad's rarefied and clearly superior tastes:

    Two-spacers are everywhere, their ugly error crossing every social boundary of class, education, and taste.

  8. Journalistic periodical typographers use single-spacing between sentences. Though he probably does not know many typographers personally, the typographers he does know are all the typographers he knows, and that must be equivalent to all typographers holding the same opinion. Of course, he may not actually know their opinions. Maybe Farhad is just assuming an opinion on the matter based on the fact that Slate is typeset with single spaces between sentences.

    This is a matter of a long and august tradition, in fact, dating back to the early 20th century at least -- aside from the fact that he also complains about the opposite being the tradition in the mid-20th century. Because of the tenure of this tradition, this respected way of doing things decreed by some small subset of people who care, it should be followed without question.

  9. Typing two spaces between sentences is too much work for his delicate hands.

After laying out these -- well, let's call them "compelling" for now -- arguments, he goes on to indoctrinate the mythical children who might waste their time reading something like this on Slate in the proper response to their teachers teaching them something:

So, kids, if your teachers force you to use two spaces, send them a link to this article. Use this as your subject line: "If you type two spaces after a period, you're doing it wrong."

Obviously, offending your teachers based on the say-so of some Slate writer you have never met is a fantastic approach to life. Before doing so, however, I exhort such children to please research the matter in a little more depth. Let us examine some counterarguments.

Point By Point Responses

1. Julian Assange is Wrong! Don't Be Him!

It does not matter one bit whether Julian Assage is a blowhard (he probably is, but the pot at Slate has little room to call that kettle black), a terrorist (because he embarrassed some politicians), a pedophile (because he was attracted to a nineteen year old, apparently), or anything else you might dislike, as long as none of that has anything to do with typography.

Ignore Farhad's implicit argument that using two spaces to separate sentences is bad because Julian Assange uses two spaces. It is far more important to recognize such logical fallacies than it is to use a single space, especially considering that a lot of modern technology will throw away the second space anyway (such as common word processing programs, or (X)HTML -- thus the proportionally modified single spacing beteen sentences on this page at the time of this writing).

I am pretty sure his main reason for mentioning Assange was better Google indexing, anyway.

2. Being Old is Bad! Don't Use Old Stuff!

Using obsolete technologies and techniques when there are better options available in the modern world is often counterproductive. Conflating mere age with obsolescence is dangerous, however. Arguments to the effect that Common Lisp or C is a "bad" programming language because the basics of it were developed decades ago, and everybody should be using Java (which is also actually getting a little long in the tooth now) or even Erlang instead, is just asking for trouble. C is still where a lot of systems development happens, and for good reason. The LISP family of languages in general, and Common Lisp in particular, is still the source of language design features that are incorporated in new languages to make them fresh, exciting, and "innovative", even if Common Lisp does have a heck of a lot of parentheses and cryptically named functions in it. Make sure you know the difference between "old" and "obsolete" before abandoning an "old" technology or technique, even if both words start with the letter O.

3. Teachers are Wrong! Tradition is bad!

I have no idea where Farhad gets the idea that "we" (as a society at large) "would never accept teachers pushing other outmoded ideas". Pushing outmoded, patently ridiculous, political and otherwise corrupt biases on children is a substantial percentage of what teachers push on kids.

4. Monospace is Dead! Proportional is Everywhere!

Rumors of the demise of monospace typefaces are grossly exaggerated. Fixed width fonts are actually still in heavy use in a number of fields of endeavor, including the legal profession, manuscripts that are printed for editing, screenplays, and -- it will surely come as no surprise to readers of blogstrapping -- for programming, configuration files, and scads of other computer-related contexts where the column-width location of a given character might be important. Many mail user agents, text editor application and input fields, and source viewers default to fixed width presentation because of the benefits provided when trying to read technical subject matter. In fact, while forcing lawyers, editors, and actors to deal solely in proportional typefaces might annoy them (perhaps worse than making readers of Slate read monospace), it would probably not have an apocalyptic effect on their work. Doing the same to programmers, though, would have disastrous consequences. I am convinced that initial bug rates in new software would increase significantly, perhaps as much as 50%.

In essence, just about any field of endeavor that requires any nontrivial technical attention to detail, rigorous expertise, or exacting presentation in textual works -- including a whole lot of mathematical and scientific notation -- uses fixed width fonts. Writing for Slate is obviously not one of those fields.

Aside from that, however, there are even cases where the simple, pure aesthetic appeal of something can be enhanced by the use of a monospace typeface.

5. Using Two Spaces Makes Holes! That's Hard to Read!

Farhad fails to really back up his assertion that readability suffers when using two spaces. He hand-waves about it by quoting someone talking about "pausing" between sentences while reading. He talks about how typographers think it diminishes readability (with their incisive and insightful reasons: because they say so), and complain about it. He quotes the utterly uncompelling argument of one, thus:

When I see two spaces I shake my head and I go, Aye yay yay.

I am pretty sure he means "ai ai ai", like the song. "Aye" means "yes", and "yay" is an exclamation of delight, and I am almost completely certain that is not what he meant to convey.

In any case, his very weak attempt at an argument from authority ignores the fact that this is a fallacious argument form, and that none of this fallacy's presentation actually shows any meaningful evidence or argument in support of his position. It just shows that a couple people agree with him.

6. All Arguments are Arbitrary! Mine are Best!

I hope I do not have to explain to anyone how ridiculous an argument this is.

The truth of the matter, however, is that the arguments are not all arbitrary. There are very real, practical arguments to be made, including the cost savings reasoning in favor of using only one space (please, let this not become yet another "green" initiative).

More to the point, though, the lack of studies on the subject is not due to a lack of practical effect the number of spaces between sentences has on readability. It is due to factors like the difficulty of isolating variables that can be used to measure the readability effects of different inter-sentence spacing schemes, and to the fact that for the most part the people most likely to have both the resources and the interest to pursue such studies are also even more strongly motivated to save their money. They want to save not only the money that would be spent on the studies, but also the money that would be spent on publishing costs if it turned out more than one character-width space would be conducive to easier reading. More on that later.

Regardless of studies, people who read a lot, and really enjoy reading (like me), agree that having more than a single (proportional or otherwise) character width of space between sentences provides a more effective cue to the reader where sentences end and following sentences begin. This provides improved reading comprehension by separation of concepts embodied in adjacent sentences, and speeds up the reading and information absorption process by reducing error and confusion levels when scanning quickly through text.

7. Typographers are Right! Tradition is good!

Farhad accidentally admits that those whose motivations are more strongly oriented toward readability than saving page space, as contrasted with his own profession, prefer two spaces:

The public relations profession is similarly ignorant; I've received press releases and correspondence from the biggest companies in the world that are riddled with extra spaces.

One can probably safely assume this is because Farhad is unable to conceive of someone having different priorities than him, especially since he is apparently not aware of his own profession's priorities. After all, press releases are not subject to the same kinds of space limitations as a book, magazine, or newspaper on a paper budget; their limitations are based more on actual pithiness, memorability, readability, and general appeal. Slate, meanwhile, is all about making self-satisfied statements demonstrating the elite superiority of its readers over the readers of other periodicals, and of its writers over other periodicals' writers and even over Slate's readers.

Oh, yeah -- and Slate is about saving money on printing costs.

8. Don't Question Authority! Conform!

The basis of this argument is actually kind of interesting to examine.

According to Farhad, typographers made some kind of transition from nobody following any of the same standards to all using the same standard: one space. He glosses over two facts. One is the fact that typography in some fields differs on the subject from typographers in other fields (namely, those with which he is familiar). While his favored fields are the more widely obvious, and his perspective on the matter might in that regard be somewhat understandable, there is still that other issue: the fact that his version of typographic history making a clean, mass transition from "no standard" to a single-space "standard", is simply incorrect.

The truth is that using two spaces grew out of a typographic convention of using a 1.5-width space that was favored by typesetters for proportional typefaces. At the time, it was generally recognized that additional space to mark the separation between sentences was desirable as a guide for the eye to improve readability. The 1.5 convention arose in part because full space leads were used a lot, and half-space leads were not used so much, so two full spaces would use up more leads that could be put to better use elsewhere (such as between words within a sentence).

The advent of typewriters was, as Farhad asserts, instrumental in making the transition to using two full spaces. Typewriters were essentially rudimentary, fixed width typesetting machines, and there was no half-space key on the typical typewriter. Even if there had been a half-space key, the greater convenience of hitting the spacebar twice rather than having to remember to hit the spacebar once and the half-space key once would have proved discouraging for most typists, but the single space approach would have been worse for its effect on readability.

9. Laziness Beats Rigor! Don't Make Me Work!

Frankly, this guy could probably use a little more diligence and effort in his life.

Why One Space?

The real reason for using only one space is much simpler and far less idealistic than Farhad's favored self-justifications. Simply put, journalistic typesetters adopted the convention of using a single space to separate sentences at the insistence of accountants. Using more spaces between sentences means using more paper for the same number of non-space characters printed on the page. More paper means more money spent. As the gatekeepers of the most widely distributed publications, publishers who cared that much about saving paper had ample opportunity to propagandize the world toward the ends of making what was cheaper for them appear "correct" to everyone else, and Farhad Manjoo carries on that miserly tradition proudly, even if he does so with a wide range of arguments that are (in the best cases) unconvincing to anyone who knows something about the subject in general. Of course, as we increasingly move toward nearly-solely electronic distribution of journalistic content, even the cost savings argument is losing traction.

Why Two?

In addition to the other arguments already explained above, there are other reasons to prefer two spaces over one. For instance:

  • More space following a period character helps to differentiate between the period at the end of an initialism or abbreviation and the period at the end of a sentence.

  • More space separating sentences gives more certain cues to software such as screen readers for the vision impaired.

  • Even if you prefer looking at sentences separated by a single space, the kinds of people who never use fixed width fonts also typically do not read things other than in MS Word or on the Web, or in their own antiquated hardcopy periodicals, so it really does not matter. Farhad Manjoo is some kind of bizarre mutant, I guess.

  • The opinion article in Slate is wrong. See the above.

  • I said so.

What else do you need?

Notes

  • For all his talk about the elite superiority of Slate and the disdain-worthy status of outmoded technology (which, unlike me, he probably doesn't even try to understand), it's interesting that he is writing articles that Slate breaks up into multiple virtual "pages" on the Web, making it more difficult and annoying for readers to slog through. It is an especially bad problem when his opinion article is such a badly organized ramble with no journalistic value to it whatsoever. Yes, I am aware there is a single-page version but, if you think that is a convincing defense, I hope you can explain to me how clicking a link to get to the single-page version is any better than clicking the link to get to the second page of a two-page piece of drek like Space Invaders: Why you should never, ever use two spaces after a period.

  • For some reason, Microsoft has actually decided to make it difficult to use two spaces between sentences in MS Word. By default, it treats such pairing of spaces as a typographical error, and auto-incorrects it (at least in the versions I've used).

  • The technical term for a character width in the general sense is an "em". This is because the width of a lowercase m character is used as the standard of character width for a font in modern typography. It is worth noting that the smarter modern digital composition tools actually automatically adjust the spacing between sentences to something between one em and two em, resulting in a typical between-sentence spacing of 1.5em instead of either one or two character widths, returning to the original pre-typewriter standard when using proportional fonts. I applaud this advance, and wonder why the Slate writer is not aware of it.

  • Speaking of programming (We were -- right?), two spaces between sentences helps parsing algorithms operate more quickly and accurately, in case that was not obvious from allusions to this fact in the main text of the essay. For that reason, if no other, programmers should prefer two spaces.

  • This essay was written with two spaces between sentences in a vi-like editor, using a fixed width font, then rendered on the Web, primarily using a proportional font in a context that eats duplicate spaces between paragraphs. I guess you win, Farhad Manjoo.

How Can I Find the Number of Permutations per Sum? http://blogstrapping.com/?page=2011.014.06.45.28 http://blogstrapping.com/?page=2011.014.06.45.28 Sat, 01 Jan 2011 00:00:00 +0000

How Can I Find the Number of Permutations per Sum?

It is days like this that I realize just how much math education I have missed.

I am a roleplaying gamer who actually delves into roleplaying game design on a semi-regular basis. The reason it is semi-regular is that, in addition to occasionally designing a game system from scratch, I am also inclined to fiddle with, and tweak, the systems of the games others have written for my own purposes. When playing (or even merely reading) something like D&D, Pathfinder RPG, Mutants and Masterminds, Sengoku, GURPS, Nephilim, Shadowrun, or any of the dozens of other game systems I've used over the years, I tend to find things I want to improve or just change.

I make changes from everything as minor as adding a new Ki Focus technique to the Sengoku system to reinventing the entire dice system used in Pathfinder RPG to make it an open-ended system involving multiple dice numbered from zero to five instead of a closed system involving a single die numbered from one to twenty.

It is while working on some details of this latter, more ambitious change to an existing system that I started trying to automate the process of measuring the probabilities of achieving particular rolls. I wrote some code in Ruby to do so, and the code works -- as long as system resources hold up. The problem is that my code stores numbers in arrays and for any nontrivial example combination of dice the number of possible rolls quickly outstrips my computer's capacity to store all those numbers in arrays that rapidly expand to ridiculous sizes.

I started trying to optimize the code a little bit to minimize memory use, by calculating possible permutations of possible combinations of die rolls in a given roll of dice one at a time, recording each unique permutation of each unique combination before "forgetting" it. This would obviate any need to store the complete set of permutations of combinations in an array. I got as far as only calculating the permutations for each combination, one combination at a time, while still having all combinations stored in their own array. My plan was to optimize it a piece at a time, but ultimately I realized that optimizing my brute force approach is probably just a case of far too much work for too little return on investment.

What I really need is some elegant mathematical expression I can use to translate arbitrary numbers of dice, of arbitrary numerical ranges, into a number of possible rolls. After searching around on the Internet for a while and thinking about it, I realized that there are three problems:

  1. Lots of people seem to be interested in a total number of combinations or total number of permutations possible, for various constraints on what constitutes an acceptable combination or permutation, but nobody seems to be interested in finding out how many permutations of combinations might add up to a given possible number. Thus, there are no examples that I have found so far that would tell me what number of possibilities in ten rolls of a ten-sided die would add up to the number 17.

  2. Almost nobody seems to consider the problem of using algorithms (such as the built-in Ruby methods Array#combination and Array#permutation) that fill up memory far too quickly by generating complete sets.

  3. My math background is insufficient to figure this all out without basically trying to independently rediscover solutions that have probably made people famous in ages past.

I had been using Ruby 1.8.7's already mentioned combination and permutation methods. I have also, just for the heck of it, toyed with Ruby 1.9.2's Array#repeated_permutation, with similarly depressing results. These things are designed to generate arrays, and not to generate statistics for what would be unreasonably resource-consuming operations with a brute force approach.

An example of the sort of input and output I would like my program to generate is as follows -- what the program I have written so far provides for a relatively small number of a relatively low value set of dice:

> ruby permute.rb 5 1 5
     5       1      0.03%
     6       5      0.16%
     7      15      0.48%
     8      35      1.12%
     9      70      2.24%
    10     121      3.87%
    11     185      5.92%
    12     255      8.16%
    13     320     10.24%
    14     365     11.68%
    15     381     12.19%
    16     365     11.68%
    17     320     10.24%
    18     255      8.16%
    19     185      5.92%
    20     121      3.87%
    21      70      2.24%
    22      35      1.12%
    23      15      0.48%
    24       5      0.16%
    25       1      0.03%
total possible rolls = 3125

There is little point right now in showing the source for this. It is nothing more than generation of arrays and counting various things that can be gleaned from those arrays. It is a naive, brute force approach to the problem that any competent programmer should be able to accomplish in his or her sleep (and that I accomplished after agonizing over it for a little while before realizing Ruby had actually done all the hard work for me).

In case it is not obvious, however, this is what my naive, brute force implementation does:

It takes three arguments. The first is a hypothetical number of dice. The second and third are the low and high values shown on the faces of the die type used. Thus, if there were such a thing as a regular polygonal five-sided die, the above would show the number of possible permutations, and the percentage chance for each permutation of all combinations of rolls that would produce each possible sum of values for five of those dice. At the end of the list, then, it shows the total possible rolls -- the sum of all numbers from the second column. Because I was using a brute force approach to this, it took approximately two seconds to generate that list.

I did, as I said, add an optimization that reduced the amount of RAM usage, but it is a relatively trivial change to the basic brute force approach that did not buy me much. It is still effectively an unusable program for any remotely nontrivial die roll example.

Generating the total possible rolls is easy. I wrote a dead-simple method for it just to make the puts expression look prettier:

def total_rolls(dice,start,finish)
  (finish - start + 1) ** dice
end

. . . but generating the number of rolls for each possible sum of die results is a touch more challenging for me.

I guess I will just have to struggle with this for a while, and see what pops out of my head first -- a way to brute-force the problem without eating up all available RAM and swap space (dragging my laptop to a halt in the process), or figuring out an arithmetic expression I can translate to code to produce the needed numbers without actually calculating and storing craptons of permutations. I just need to be able to do something like this without hanging the process until it fills all available memory.

> ruby permute.rb 100 1 100

Once I have the per-sum number of possibilities, generating the percentages is definitely the easy part.

A Droll Little Program http://blogstrapping.com/?page=2011.008.12.58.11 http://blogstrapping.com/?page=2011.008.12.58.11 Sat, 01 Jan 2011 00:00:00 +0000

A Droll Little Program

I have written a number of kludgey little dice roller programs over the years, mostly for the purpose of satisfying my desire to make my enthusiasms for roleplaying game geekery and computer geekery. If I had to guess, I would say that I have probably written eight or ten of them, in three different languages.

At least half of them have been written in Ruby. Mostly, these programs have had very simplistic functionality, such as one that would take any die code that consisted of a(n optional) number, followed by a case-insensitive letter D, followed by another (mandatory) number. With that, then, it would check the last number against a set of predefined die types and, if the correct type existed, it would produce a pseudorandom (in the programming sense of the term) result. Its output was formatted using printf, and it showed only the die code and the end result total.

I hacked that together, in its basic form, probably in about five minutes. I fiddled with it another half hour in total without really adding any particular interesting functionality. I used it for months -- maybe as much as a year. It is, however, basically a crap program, unworthy of redistribution for others to use.

I started using IRC to play online RPGs with friends in other states more than a year ago. In that group, we all trust each other, so we just rolled dice locally and posted our results in the channel. I used my simplistic dice rolling program most of the time so I would not have to take my lazy hands off the keyboard to roll physical plastic polyhedrons. It worked fine.

A couple months or so ago, particularly for a different group that was starting to play RPGs in IRC, I went searching for an existing dice roller IRC bot that would serve my needs. This group's needs for a dice rolling IRC bot were not as sophisticated as for the other group, because we were not using special die types outside the usual run of eight standard Dungeons and Dragons dice, so it was relatively easy to find a dice roller IRC bot that would work for us.

I specifically narrowed my search down to bots written in Ruby because I wanted something that would be easy and fun to hack, and was delighted to find something called Bones that actually seems to be among the better-known IRC dicebots on the Web. Using the Google search string irc bot dice roller, it shows up for me as the third result, and the text from the Google hit says it's open source! The number one and two results are for the same set of bots available from the same place (that is, each from the same place as the other, not the same place as Bones), with a somewhat obtuse syntax and somewhat game-specific functionality. Worse, it's specifically designed for the mIRC client, which is simply unacceptable for my purposes. Using the search string irc dicebot, Bones comes in first (and second).

I followed the link and read the page announcing the creation and release of Bones. On that page, in the list of features, I saw this bullet point:

  • Free: Like most IRC bots, Bones is open source and released free of charge.

Well, Bones it is, then! I looked around briefly for the open source license used for Bones, but did not find it right away. This is a lamentably common problem with open source software, I find; once the developers have told you it is open source, they then assume you do not want to be bothered with details of licensing unless and until you hack the source code to add features or eliminate bugs and want to contribute your changes back to them, so it never occurs to anyone to make the licenses very easily found. Of course, that's pretty annoying for people who care about licensing -- like me. I figured I'd get back to finding the license later, and would just try it out for now.

It turned out to be suitable for my needs at that time, so we used it. I ran the bot script on my laptop while playing RPGs in IRC with the group using pretty standard rules for its games, with no special die type requirements or exploding dice. We used it for a little while.

I started thinking about the idea of extending its functionality and possibly using it for the other group, which does need some of those fancy features for a passel of house rules applied to the Pathfinder RPG system. Before hacking the code, though, I figured I should find out what license is used for this program. I hunted around -- in vain. There was no licensing information anywhere in the code, bundled with it, or anywhere that I found on the site(s) run by the guy who wrote the program.

I contacted the author and asked about it. I found out, through a short email exchange, that he has no idea what "open source" means to people who actually write and/or use open source software. He apparently heard that term used as a buzzword somewhere and decided to use it without ever checking to see if there are any conventions for its use. To him, apparently "this code runs in an interpreter" means it's "open source", whereas the Open Source Initiative (which has the opensource.org domain and whose founders actually coined the term Open Source) defines the term somewhat differently. Check out the Open Source Definition page at the OSI site for details.

Really, what is needed to reasonably call something open source is a legal guarantee that you won't sue someone for using, modifying, and redistributing your source code -- even commercially if you like. That's the short version of the requirements for calling something "open source". That legal guarantee comes in the form of a license. What does the Bones author do? He calls his program "open source" because you can physically see the source if you want to, change it on your local computer, run the changed version, and offer patches to him. Woe betide anyone who shares modified (or even unmodified) copies of the program with anyone else, uses it as the back end of a Website, or otherwise does anything with it that might involve third parties in some way not foreseen by the copyright holder. If you do that, you have no guarantee you won't get legal threats in your email. I asked about licensing, and he said he didn't plan to license it.

Well, screw that.

I decided it was time to write my own IRC dicebot. A friend who sometimes uses the online monicker "n8" was eager to get something put together that would serve our needs, with virtual dice numbered 0-5, exploding die rolls, and so on. He wanted me to wrap an IRC bot around my dice roller, but I told him it sucked and needed to be rewritten from scratch first. I finally decided to tackle the rewrite, and in about ten minutes (give or take) had a totally new dice roller program written in Ruby. I used it for myself then, just as I had used the previous dice roller, but did not have an IRC bot yet. I named it "droll", for "die roll".

I started casually, lazily looking into options for Ruby libraries that provide IRC bot frameworks. Then I forgot about it for a little bit. Meanwhile, n8 decided to start hacking Bones, with the idea that he'd get some of our desired functionality built into it just to tide us over until I got my fourth point of contact in gear and actually wrote an IRC dicebot of my own. I think he must have been taking a lazy approach too, because I'm not sure he spent more than five or ten minutes on figuring out how Bones worked and trying to make some changes to it. I finally got inspired by his low level industriousness to do some work on an IRC bot, though, and got to work. It turns out n8 had actually added support for things like exploding 0-5 die rolls to Bones, but tossed it because he thought it was better to work on my program than Bones.

I used the Isaac library for Ruby as the basis of my IRC bot. It provides a DLS framework for writing IRC bots, offering a very clear, simple syntax that nicely abstracts away the heavy lifting. Thanks to Isaac, my dicebot took about five minutes or so to write in its initial, functional form. Of course my ten minute program, droll, was not written with being used as the basis of an IRC bot in mind; it was just written to get a working algorithm for rolling essentially arbitrary die types ranging from either 0 or 1 to some user-specified upper bound number. I needed to rewrite droll so that my IRC bot, named "drollbot", could call it as a library. For the moment, drollbot was actually shelling out to droll -- calling it as a separate program and capturing the results sent to STDOUT, then sending those results (with a little munging) to an IRC channel.

At that point, I kinda stopped paying attention to how much time I put into changes to the programs involved. The end result, though, is that I have a pretty decent dice roller program suite (droll itself works well as a command line program, and drollbot with droll as its back-end die rolling logic library works well as an IRC bot) if I do say so myself. It handles basic exploding die rolls, it has an unreasonably high upper limit for how many times a given roll can explode so the program itself will not explode thanks to resource exhaustion. Malicious input crashing your dicebot could be pretty annoying, depending on how you use it.

As you can easily discover by checking out the README file at the droll BitBucket repository, which also contains the drollbot IRC script, droll has been released under a copyfree and open source (by definition, if not by certification) license -- the Open Works License, which is the same license currently used for content here at blogstrapping, and for the Lump content management system that powers blogstrapping. The Isaac library used in the making of drollbot is released under another copyfree license, the MIT/X11 license.

At the moment, drollbot lacks exactly two features that Bones offers:

  1. It does add together the results of multiple die codes in a single command.

  2. It does not produce the results of multiple die codes separately from a single command.

At least adding together multiple die codes is something I intend to add to the program. I am not entirely certain the other feature is actually necessary or even particularly desirable, but n8 is keen on implementing that functionality, so I will probably end up adding it to the IRC bot. How much the underlying library needs to be modified to accomodate this stuff remains to be seen.

Other features are also under consideration and/or development, as you can see in droll's issue tracker.

I have two articles pending at TechRepublic that talk about things related to droll and drollbot. One of them discusses the meaning of the term "open source", and uses droll and Bones as example and counterexample; it has been submitted to my editor for the Open Source column at TR. The other talks about writing IRC bots in Ruby with Isaac, mentions drollbot in that context, and offers code examples for using Isaac to produce (and test) a simple "hello world" type of IRC bot, in TR's Programming column. Neither has been published yet, but I anticipate at least one of them being published for public consumption in a week or so.

For the sake of including some code in this account, and perhaps satisfying the curiosity of those who like to see the code, this is the most-relevant chunk of code from the drollbot.rb file for demonstrating how an IRC bot works with the Isaac library:

on :channel, /^([0-9]*[dx][0-9]+[+-]?[0-9]*)\s*(.*)/ do
  result = roll_dice(match[0])
  result.each do |r|
    output = "#{nick} rolls " + r
    if match[1].length > 0
      output += ' (' + match[1] + ')'
    end
    msg channel, output
  end
end

See the rest at the BitBucket page for the drollbot.rb file if you like. It's pretty simple. The code in droll.rb, used as a library in drollbot, is a little bit less simple -- but still quite simple in the grand scheme of things, I think. Using it looks something like this:

12:33 <@apotheon> 2x05
12:33 < drollbot> apotheon rolls 2x05: [4, 5, 3] + 0 = 12
12:33 <@apotheon> d20+3
12:33 < drollbot> apotheon rolls d20+3: [15] + 3 = 18
12:34 <@apotheon> 2d10+7 With Comments!
12:34 < drollbot> apotheon rolls 2d10+7: [8, 7] + 7 = 22 (With Comments!)

The x in that first die code tells droll(bot) to make dice "explode" when they return a result equal to the highest number possible for the die type -- that is, to add an extra die of the same type every time one of the virtual dice returns the maximum possible value for that die type. The zero at the beginning of the die type, 05, indicates that the range of possible die values starts at zero rather than one.

Pretty simple.

Both droll and drollbot are tested using the MRI and Rubinius implementations of the Ruby language.

That Makes Five Programming Articles at TR http://blogstrapping.com/?page=2010.352.13.00.56 http://blogstrapping.com/?page=2010.352.13.00.56 Fri, 01 Jan 2010 00:00:00 +0000

That Makes Five Programming Articles at TR

As of this weekend, since I have started writing articles for TechRepublic's Programming and Development column, TR has published five of my programming articles, at a rate of about two per month. TechRepublic gets twice as many articles for the Open Source column from me, and twice as many as that for the IT Security column, as part of my regular gig writing TechRepublic articles. There are certainly times that I wish the balance were different -- that, perhaps, I wrote four articles per month for each of the three columns, rather than eight for one, four for another, and two of the last. On the other hand, writing eight a month is probably a de facto requirement to keep my name on the top of the IT Security column, which might be a little more of a professional boost for me than if I'm "just another name".

Regardless, I am pretty pleased to get to write about my thoughts regarding programming for actual money, even if it is not a lot of money. So far, the articles in the P&D column appear to have been well-received. They include:

  1. Five Ruby greetings

    This article starts with a dead-simple Hello World example in Ruby, then progresses through four more examples, each of them using more of the language, as a means of demonstrating and an opportunity for explaining some basic features of the language.

  2. A development workflow for Mercurial

    This article explains a simple workflow that developers might use with the Mercurial distributed version control system to make the most of the DVCS' capabilities in small, open development projects.

  3. Learn by doing: seven ideas for learning how to program

    This article points out that doing helps with learning, and offers some ideas for how to find small projects and automation tasks to pursue so that the nascent programmer can gradually build development skill and experience.

  4. A skeptic's history of C++

    This article sparked a little bit of controversy, as I had thought it might. It is a less than perfectly flattering view of C++, mostly from the outside, and what makes it succeed (or not) in terms of popularity. My immediately previous blogstrapping entry, C++ Skepticism, Not Hating was a response to someone else's lengthy disagreement with my Skeptic's history.

  5. Simple filters in Perl, Ruby, and Bourne shell

    This article explains the basics of writing filter programs in the Bourne shell, Perl, and Ruby. It serves in some ways as a follow-up to Learn by doing, which suggests writing small filter utilities on a Unix-like system as a way to practice the basics of the programming craft in a quick, easy, simple, and useful way.

Note that the titles listed here may not exactly match up to the titles on TechRepublic. The latter are subject to editorial alteration, and I tend to like to link to them via titles I like rather than those the editors like.

In the long run, I would like a nontrivial part of my contribution to the P&D column to serve as a set of useful ideas, guidelines, and simple tutorials for the programming autodidact to use to improve his or her coding (and other software development) skills. I will probably write a couple more articles relating specifically to different approaches to early programming practice that were mentioned in the Learn by doing article.

Another article that I have submitted, but has not yet been published as of this writing, presents some basics of the Io language as a way to introduce programmers to prototype-based object oriented programming. JavaScript, as the article mentions, is a more popularly used prototype-based language, though many people who actually use JavaScript semi-regularly never really make use of the prototype-based object model to any notable degree.

I'm considering whether to write articles about subjects like Blowfish encryption and decryption with Io, or OpenPGP encryption, decryption, and signing with Ruby, for the P&D column or the IT Security column. I could even conceivably write such articles for the Open Source column, though that seems like the least suitable of the three at this time.

Blogstrapping has been live about as long as I have been writing for TechRepublic's P&D column. As of this entry, I have three times as many pieces of writing here as in TR's P&D column, and some of them could even have been written differently to be suitable for publication at TR. I guess I have more to write about programming than TechRepublic will pay me to write.

C++ Skepticism, Not Hating http://blogstrapping.com/?page=2010.327.23.36.35 http://blogstrapping.com/?page=2010.327.23.36.35 Fri, 01 Jan 2010 00:00:00 +0000

C++ Skepticism, Not Hating

In a colorful little piece of partisanship, Dean Michael Berris takes issue with a recent Programming and Development article of mine at TechRepublic. He seems to think I "hate" C++ with an intense passion, but the truth of the matter is that I have no passion in relation to the language at all. His error of presumption here seems to quite simply be a result of taking offense at my skepticism about its greatness, and my belief that it suffers some nontrivial design flaws and overuse in areas where other languages might be better suited.

My article under assault is A skeptic's history of C++, which attempts to give a bit of an overview of some of the aspects of the language's history that are often ignored elsewhere. It is surely no comprehensive history, and ignores a lot of the kind of history of the language you might find elsewhere -- but it does so specifically because that is information you might find elsewhere, and bears little relevance to the subject at hand: some of the reasons it might be better replaced, and some of the reasons it has not been effectively replaced.

Berris' soapbox is a Weblog called C++ Soup!, and the particular bit of writing that targets my article is titled C++ Hating. I feel like commenting on it, a bit.

C++ Hating is filled with "ha ha only serious" insults, and bears little of any concrete value in its assault on my article, but I will attempt to avoid commenting on all of that. It would be something of a waste of time, beyond simply pointing out that the problem exists. Let us focus instead on the concrete failings of Berris' response.

The first sentence does not say anything substantive (though it is a bit denigrating), but the second sentence gets something specific, provable, and clear quite horribly wrong:

The article -- like countless articles on the web already -- try to predict the demise of C++, this time in favor of other programming languages that are not as powerful.

I'm not talking about the grammar. I'm not even talking about the claim that all the other languages mentioned in my article as potential alternatives to C++ are "less powerful". I'm talking about the basic premise of the sentence -- that my skeptic's history suggests the imminent demise of C++. It does not. In fact, it says quite the opposite. My article's words:

I am inclined to believe that C++ will have a long, stable tenure in its niche for some time to come.

He goes on to reinforce the error after a lengthy quote from the article:

If you read through the article it is littered with anecdotes from which the author tries to make a case for the demise of C++.

He does make some good points about what would be needed to replace C++ as the implementation language for applications (and application suites) such as the Chromium browser (represented by the Google Chrome distribution in his list), Adobe Creative Suite, Microsoft Office, and World of Warcraft. I would not necessarily agree with everything qualitative he says about his software examples, but I do agree with his basic statements about what kind of capabilities an implementation language should offer to replace C++ in developing such applications. None of what he says about these applications directly challenges the statements in my article, but . . .

He then goes on to say this:

I challenge you to name one programming language other than C++ to which to write applications similar to these, and be able to deliver the same level of functionality, stability, extensibility, portability, and ubiquity that these applications are written in.

Okay, fair enough.

  • functionality: That's easy. Write code, get functionality. Note that he is talking about whether a language can be used to produce the functionality in these applications -- not what functionality the language itself (and its implementations) may provide. Any Turing complete language can provide the indicated functionality, some more easily than others (and that ease is what many would call a language's "power"). Perhaps he overlooked my reference to Objective Caml, wherein I said (and he quoted) that it offers "more succinct and well-organized source code" and provides "far cleaner and more interesting development models", thus directly addressing the provision of such functionality with relative ease. Proponents of D (another language I mentioned in my article) would argue similarly, though for reasons of different technical advantages of the language than in the OCaml example.

    The jury is still out on the other major non-C language I mentioned, Go, as developers are still sorting out the appropriate niches and major advantages of the language; it demands some different approaches to software development to make use of its most compelling functionality, so it offers fertile ground for debates over the trade-offs it offers for the next couple years or so.

    Another option, of course, is Haskell. It is sometimes derided by those who do not understand it as impractical, but it is every bit as capable as C++, and developers who work with it regularly regularly churn out nontrivial code in fractions of the time and linecount required for equivalent functionality. Haskell code, for an equivalently skilled developer, also tends to be more easily maintained thanks to its greater visual clarity and succinctness.

    Regarding C++ itself, as concurrency becomes increasingly important, the lack of standard threading functionality is a growing source of frustration and criticism for C++ programmers. The C++0x standard may eventually serve as the basis for addressing this problem in a portable, widely applicable manner, but for now this is still a bit of a worrisome failing of the language's capabilities.

  • stability: Right away, I wonder where Berris gets his ideas about C++. Stability is not really a feature for which C++ is known. In fact, people like Linus Torvalds refuse to let any C++ creep into their projects built primarily in C precisely because of stability concerns. This is above and beyond the stability concerns it shares in common with C, such as the ease and frequency of buffer overflow with unsafe core functions that are not meticulously checked and rechecked by painstakingly careful coding, and the fun of occasional programmer errors when pointer arithmetic enters the picture.

    Leaving C++ itself alone for the moment, I have yet to see any credible suggestions that Objective Caml, D, or Haskell produce unstable applications as a result of some feature of the language, Objective-C only shares the same shortcomings in this area that exist in both C++ and (because it is a true superset of the parent language, unlike C++) C itself. What criticisms I have seen regarding Go's contributions to software instability issues seem primarily to be similar to, but less egregious than, the example of C and C++ core functions that require special handling to avoid issues.

  • extensibility: I'm not sure what he's trying to say here. Surely Haskell is far more extensible in many ways than C++, Objective Caml's module system is incredibly flexible and easy to use, and both allow much less painful cost:benefit ratios for metaprogramming. I am intentionally holding off on mentioning Common Lisp here. My point is simply that bringing up "extensibility" seems like a losing play from the perspective of his position.

  • portability: It's true that Go is still in the process of achieving widespread portability, though it is already available for Linux-based systems, MacOS X, MS Windows, and the various BSD Unix systems. I am not quite well-versed enough in D lore to be able to offer useful insights in this realm. Objective Caml, Objective-C, and Haskell are available basically everywhere except certain fringe platforms, and binary compilation for such platforms would likely not be far behind the moment a demand arose.

    The most important factor for portability, given that all that is needed for binary compatibility is market penetration, is the ease with which code generic to many platforms can be written -- which is certainly easier at least in most cases using languages like OCaml and Haskell than for the C family of languages. It is this factor of portability that ensured C's early and enduring successes, and some of the lessons learned there have been utilized (and improved upon) in the design of these other languages.

  • ubiquity: What is this? Is he suggesting that a language's popularity is a purely winning factor? I disagree strenuously with such a positive assertion that language popularity is a key factor, as do a number of much more august and famed programmers than me. We might as well write our browsers in Java and PHP if language popularity is such a big deal. If that is what he means, he should explain why this is so important so I can enlighten myself. If that is not what he means, he should explain what he does mean when making such vague, hand-wavy statements.

    C++, by the way, is not a strict subset of C. Parts of C have been replaced in the form that C++ takes, so that some working C code will actually not work for a C++ compiler -- and I speak of more than mere boilerplate. Objective-C is a true superset of C in that sense (or, in the terms in which Berris puts it, C is a true subset of Objective-C), on the other hand. This is one of the reasons that some wish Objective-C had enjoyed the successes that have instead been granted to C++.

    Meanwhile, using different development philosophies than those that best suit C++ -- some would say far superior development philosophies, such as the Unix philosophy -- can suit C as an excellent development language where C++ might otherwise be used. Take for example the uzbl browser, written in C, with extended functionality provided by additional software written in a number of languages including Python where the near-metal performance of a C-family language's compiled binary executables is not needed. In fact, the C code in something like uzbl is much cleaner and more easily maintained than the equivalent C++ code in projects such as Mozilla Firefox.

I said above that I was intentionally not mentioning Common Lisp at that point. Some implementations of Common Lisp are quite performant, and its proponents can tell you that functionality, stability, extensibility, portability, and ubiquity are all quite well represented in that language. On the other hand, it attracts flame wars, so maybe we should just ignore it for now.

Let's move on to Berris' next major complaint:

Have you really thought about what C++ offers in reality before you corner it as simply an object-oriented programming language?

Have you stopped beating your wife, yet?

The reason I have not "really thought about what C++ offers in reality" before I "corner it as simply an object-oriented programming language" is that I never actually pigeonholed C++ as "simply" an OOP language.

C++'s strength is not its object oriented programming support -- it's the fact that you can write in many different paradigms in C++ that makes it its strength.

Objective Caml offers much more complete cross-paradigm capability than C++.

Also, if you're doing benchmarks, then everyone knows that micro-benchmarks are meant to highlight one aspect of the system/solution you're benchmarking. A true benchmark is one that accounts for real-life conditions and usage patterns.

I'm not "doing benchmarks". I don't know that "everybody" knows this, but Berris' statement about the proper use of micro-benchmarks is effectively true. It does not disprove anything I have said, nor does it prove anything he said, however.

He continues on in that vein for a bit, and finishes up with an attempt to discredit OCaml claims of performance:

If you're comparing a bubble sort in C++ with a radix sort in OCaml, then maybe the OCaml will have a better performance profile depending on the inputs.

Ultimately, however, all he manages to do (if we take him at his word) is eliminate the performance argumet that many offer in favor of C++.

He addresses the matter of readability -- sort of:

Also, when you talk about aesthetics, this is a matter of taste and familiarity.

There is a big difference between subjective taste and practical readability issues, however. Greater succinctness without sacrificing clarity improves readability. Greater power in semantic constructs improves readability. Judicious use of language "punctuation" improves readability. These things are, for your average human, largely independent of personal taste and familiarity. Subjective taste and familiarity come into play with issues like preferring significant whitespace in Python, braces in Perl, and do/end blocks in Ruby -- and not so much issues like less significant support for composable source code organization, greater requirement for boilerplate code littering your source files, and cluttered semantic models.

He makes an excellent point here:

A large part of efficiency and proficiency is mainly attributed to familiarity with not only the technical matters, but also the idioms and shared understanding between/among a group of practitioners.

Of all the languages I have used, and whose communities I have encountered, Perl, Python, and Ruby do this the best. PHP mostly fails, Scheme's "community" is too fractious and provincial to achieve that level of idiomatic communion, and Java too bureaucratic for its otherwise strong culture to offer as much benefit as it should. C and C++ have legendary flame-wars between partisans for different cultures of idiomatic code, but if you pick a partisan ghetto and stick with it, you can surely achieve what Berris describes.

Unfortunately, it is largely irrelevant to any of the points I raised in my "skeptic's history".

Borrowing from the great John Dvorak, I'd say "this is baloney!". If you think being proprietary may hinder its widespread adoption, take a look at Java which the enterprise world is treating as the new Cobol.

If you note the word "may" in the statement of mine to which Berris objects so strenuously, you will surely see that his objection is baseless. I did not say it would necessarily hinder its adoption. I said it might. Standardized languages (like C++) and those with open reference implementations (like Ruby) tend to get much more friendly receptions, when lacking the massive marketing power of Sun Microsystems.

Think the x86 assembly language, that's proprietary.

It is well-documented and not proprietary in any way that matters. It has been reimplemented many times, in non-proprietary forms, not only without its originators' interference, but with their blessing and encouragement. In any case, x86 assembly language's widespread use is a product of its ties to hardware that achieved strong market success -- a benefit not enjoyed by languages that aspire to great portability like Java, C++, and D, so its inclusion here is a bit of a red herring.

He evidently believes that technical quality trumps all. I wish it were true, in practice, but I find I have difficulty swallowing this statement:

Unlike in the world of social interactions where investing in politics has an effect, this is not so true when it comes to hackers choosing the tools they use. If you're trying to sell a better mousetrap, your mousetrap better be a lot better than what I have for me to even start considering buying it.

Why then, I wonder, did he bring up Java? What about PHP? These are languages that, to a significant degree, enjoy continued success despite their technical characteristics and hacker-friendly capabilities (or the lack thereof).

Much as Microsoft does, Berris blames users for the failings of the tools he wants them to use:

As far as concurrency is concerned, the real problem is that the general population of programmers don't know how to write effective concurrent programs.

That is rarely more true than in C++. Meanwhile, Erlang provides safer, easier, and more performant concurrency than C++, at least the vast majority of the time. The fact he brings up Erlang (and Go) here with a sentence that begins "Even if" suggests that even he realizes C++ lies well behind Erlang (and, to a lesser extent, Go) in terms of its facility for concurrent development.

Let's revisit Common Lisp again, this time at Berris' insistence:

I will be the first to admit that C++ is not as powerful as Lisp when it comes to the things you can do with Lisp, but if you know your Lisp you can approximate these with modern C++ -- and some even might say "voodoo" with C++ template metaprogramming.

Greenspun's Tenth Rule comes to mind:

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

The concept is more generalizable, in that it applies equally well to most programming languages, with only some of the more interesting -- and less widely used -- cutting through a lot of what that implies. Examples include languages like Haskell, Smalltalk, and Lisp spin-offs.

Stating that you can effectively reimplement significant portions of Common Lisp in C++ to achieve the same capabilities, as Berris essentially did here, is hardly a point in favor of C++. The fact you can effectively reimplement significant portions of C++ in any contenders for the same development niches comes to mind as an interesting counter-point, too, I think.

He endeavors to address the performance benefits of C++ more directly, at last, in an argument against (presumably Common) Lisp:

Also, Lisp is a non-starter when it comes to performance, at least unless you're prepared to tune your Lisp implementation for the specific workload you're encountering.

One must wonder what implementations of Common Lisp he has used to make this statement so categorically, and why it seems to disagree so directly with his arguments about OCaml above.

His wrap-up ends thusly:

Admittedly, there aren't a lot of well-written books about C++ programming for beginners -- the one I can point to is Programming: Principles and Practice using C++. Before you knock on C++ again, read this book first then tell me how not powerful C++ is -- it's alright, you can thank me afterward.

(Just to get this out of the way: I did not say C++ is not powerful.)

I must admit I have not read that book. I wonder if he has devoted similar effort to reading about the other languages he claims fall so short of C++ in so many ways. Regardless, I will make a point of checking out Programming: Principles and Practice using C++, though for now I'm a little more wrapped up in reading books about C, Clojure, Erlang, Haskell, Io, Prolog, Ruby, Scala, and Scheme. I think SQL (yes, I know it's not exactly a programming language), JavaScript, UCBLogo, and Common Lisp books are in the queue ahead of C++ at this point, and a book or two about Perl, and maybe some Objective-C, might sneak in there as well. I'll likely do some more C++ work before I get around to investigating Python, PHP, or Java any further, though.

You may gather that I'm a bit of a programming linguaphile. I am by no means an expert in all these languages, but I am at least minimally competent in several, and in fact C++ was the third (or fourth, if I'm forgetting one) language I have used, after BASIC and Logo in the '80s, roughly concurrently with my first encounter with C. Still, I would like to revisit C++ after mostly not using it for years and refresh my skills in that area. Maybe his book recommendation will prove valuable for me. I suppose it is always possible the book might even change my mind about the relative merits and flaws of C++ -- that is to say: my opinion of it, as a skeptic of the language's greatness [0].

Dean Michael Berris' screed certainly did not.

NOTES

0: The purpose of my opinion piece on the subject at TechRepublic was, in fact, to express the opinion that I hold as a skeptic of the language's greatness -- rather than journalism as he suggests when he refers to it as "yellow bloggerism".

[OT] Laptop Boots on Amazon http://blogstrapping.com/?page=2010.315.16.48.14 http://blogstrapping.com/?page=2010.315.16.48.14 Fri, 01 Jan 2010 00:00:00 +0000

[OT] Laptop Boots on Amazon

I have been thinking about buying a new laptop soon. I made the mistake of buying a refurbished laptop not too long ago, and while it still works pretty well the thing has started giving me subtle hints that the screen might go out soon. Rather than buy a parts laptop to cannibalize the screen for this thing, and pray something else (or maybe the "new" screen) will not die shortly after, I have decided to just shop around to see if I can get my hands on something new that would suit me well.

I have been looking into the possibility of getting a tablet/netbook device, one of those small form factor laptop things with a screen that can flip around and lie flat, display-out, so that the netbook becomes a tablet PC. I am a little concerned about some things, though:

  1. I tend to dislike touchpads quite a lot.
  2. I care about keyboard feel when typing, and a lot of laptops (let alone netbooks) have terrible keyboards. ThinkPad keyboards are a glaring exception to the norm.
  3. I need to be able to run FreeBSD or OpenBSD on it (maybe NetBSD, PC-BSD, or DesktopBSD, in a pinch), with good power management support (specifically suspend/resume that works properly) and all the standard hardware working (including the tablet display, the wireless card, et cetera). Does Flash work on OpenBSD?

I think that about sums it up. I suppose I could test out touchpads and keyboards in stores before buying anything, but it would be a little more difficult to test a BSD Unix install.

I guess the closest this comes to being on-topic for blogstrapping, aside from the generally technical subject matter, is that I want to be able to use it as a development platform, as well as a Web browser and a PDF reader. That should not require much, of course. I have no need for stuff like Visual Studio, Eclipse, or KDevelop; Unix is my IDE. Since I use Vim as my programming editor, though, I need a decent keyboard.

I received an email from Amazon yesterday with the subject line "Amazon.com: ASUS Eee PC T101MT-EU17-BK 10.1-Inch Convertible Tablet Netbook (Black)". I decided to check it out, so I used a URL in the email to go look at the page for that product. This image is what I saw:

At first, I was confused, but it all very quickly began to make sense: this is how a netbook looks when it boots . . . on (an) Amazon. It has all become clear to me now. Hallelujah.

I guess that is a good looking netbook, if you're into that kind of thing.

In case you want to see the page for yourself (and maybe buy the product): ASUS Eee PC T101MT-EU17-BK 10.1-Inch Convertible Tablet Netbook (Black). It may not look the same by the time you visit the page, unless you get to it soon, but if you are in the market for a tablet netbook it might still be a useful link for you.

Giving Rubinius a Try http://blogstrapping.com/?page=2010.312.16.10.07 http://blogstrapping.com/?page=2010.312.16.10.07 Fri, 01 Jan 2010 00:00:00 +0000

Giving Rubinius a Try

Contrary to some popular perceptions, "Ruby" is not actually slow. Ruby is a programming language, and a language is not in and of itself fast or slow. It is the language's implementation that is fast or slow, relative to other language implementations.

The best-known Ruby implementation is probably the implementation generally known as MRI, or Matz' Ruby Interpreter -- where "Matz" is Yukihiro Matsumoto, the creator of the Ruby language. MRI is the reference implementation for Ruby up through version 1.8.x, which means that it is the "standard" by which the Ruby compatibility of other implementations is judged. MRI is, indeed, quite slow for a lot of purposes. Version 1.9+, on the other hand, uses an implementation called YARV (or Yet Another Ruby VM), also known as KRI (or Koichi's Ruby Interpreter). YARV offers significant performance improvement, placing it on comparable footing with the reference implementations of Perl, PHP, and Python. Both of these implementations are distributed under a dual-license model -- the GPL and the Ruby License (which is unfortunately about as bad as the GPL).

Other implementations exist as well. There is one for the Java VM called JRuby, for instance -- also copyleft licensed. Ironically, an implementation for the .NET Framework called IronRuby is distributed under a copyfree license, the MIT/X11 License. So far, what we have is three copyleft implementations and one copyfree implementation that only runs in a Microsoft-designed environment. There is, however, another implementation that I have started using recently: Rubinius. (There's MacRuby too, but I frankly do not know anything much about it other than that it is a Ruby 1.9 implementation for MacOS X distributed under the Ruby license.)

Rubinius is in FreeBSD ports, which made it easy for me to install on the laptop I use as my primary development environment. Rubinius is a copyfree (BSD License) implementation of Ruby, primarily written in Ruby -- plus a little C++. It is, according to the Rubinius Website, 93% compatible with RubySpec, an executable specification for the Ruby language. I have started using it with my own Ruby software projects and had 100% success so far, though. Unfortunately, I do not presently have the option of running blogstrapping on Rubinius, because my Webhost does not support it.

Rubinius currently targets Ruby 1.8.x compatibility, though real progress is apparently being made on Ruby 1.9/2.0 compatibility for an upcoming Rubinius version 2. In addition to being distributed under a better license than MRI and YARV, it is also pretty fast. It is, at least, notably faster than MRI. It is reported to be faster for execution of plain ol' Ruby code than YARV, which is in turn about twice as fast as MRI in general, though the fine folks in #rubinius on freenode tell me it is a bit slower for things like array and hash operations.

One performance issue I have noticed is startup time. Obviously, this will vary depending on the computer hardware and other operating environment details. I have seen references to 0.3 seconds as "normal" for startup time for some users; for me (on my ThinkPad T60), it tends to vary between half a second and one second using the Unix time utility, though I have seen it squeak in under half a second at 0.45 seconds.

As @evan put it in #rubinius, the simple use case for the Ruby Benchmark module is easy:

puts Benchmark.measure { your code }

In the midst of trying to get some benchmarks with that, I discovered two things:

  1. I stumbled across a parsing error in MRI. I thought at first that I was misusing the Benchmark module somehow, but after some discussion with people in #rubinius, they eventually managed to confirm that it is indeed a parsing bug. The same bug appears to exist in Rubinius and other 1.8 parsers.

  2. I stumbled across a bug in dscribe, the program I was trying to benchmark with the Benchmark module. This bug does not affect my most-common use cases for dscribe, but I still need to fix it.

Running time rbx -v (checking the Rubinius version number) shows 0.45 seconds, while time ruby -v (checking the MRI version number) shows 0.04 seconds. That is some significant overhead for the Rubinius VM as compared with MRI. That overhead all seems to be startup time, something that I have been told has not been much of a priority so far as compared with execution time for Rubinius development. After talking to the people in #rubinius, though, they told me they would add it to the queue of things to work on, so VM startup time might improve.

Startup times like that result in noticeable hesitation when running simple command line utilities, but that in itself should not have any particularly noticeable effect on long-running processes. Maybe I'll have some meaningful benchmarks to share in the near future, after fixing my dscribe bug. In the meantime, I'm satisfied that my code is running "fast enough" for my purposes with Rubinius, so I'll be using that instead of MRI for a while.

Why Close an Open Platform? http://blogstrapping.com/?page=2010.309.23.46.30 http://blogstrapping.com/?page=2010.309.23.46.30 Fri, 01 Jan 2010 00:00:00 +0000

Why Close an Open Platform?

When I wrote the first draft of this, I thought I was going to be writing for the Smartphones column at TechRepublic semi-regularly (once a month). As a result, I set this aside until I would decide whether to submit it to the editor for that column or publish it myself here at blogstrapping. I ended up getting doubled-up on the Programming and Development column, however -- writing two a month for that, rather than one for that and one for the Smartphones column. By then, of course, I had basically forgotten this draft existed, and it languished.

I stumbled across it again when writing Vimium and Other Chromium Extensions, and now I've decided to share.


Open source software developer Joe Hewitt comments on the nature of openness in software, in Android and Open Source. His purpose is not to say anything particularly innovative or insightful, per se, but rather to explain what he meant in some Twitter posts that he feels might have been misunderstood by readers.

The result is that he explains something that many of us who understand "open" to mean more than just "less closed than Apple" were already thinking: that Android is not really all that open, after all. Anyone who has tried executing the ls sbin command on a typical Android device knows that:

$ ls sbin
opendir failed, Permission denied

Hewitt's theory is that it is the service carriers who make a basically open foundation for an OS into an effectively closed platform. To some extent, he's right about that. The only question, really, is how much is "some extent". It is obvious that Google at least enables service carrier restrictions on the platform, of course.

For a counter example, and for a demonstration of how Hewitt's point about Google having to let the carriers impose those restrictions in order to gain some market share makes sense, look at Nokia's N900 device and its Maemo operating system. Nokia essentially told service carriers where to stick it when they demanded that Nokia let them heavily restrict the OS, and as a result the carriers told Nokia where to stick it when it came to marketing and device+service deals for customers.

It is pretty obvious what Apple gets out of its control freak behavior when it comes to the iPhone. Apple is in the business of selling image, and for that to work it has to maintain that image. Its customers are people who want that image. The same is not true of Android customers, however.

Android customers want something quite different from the iPhone. They are, to some extent, generally anti-iPhone. They want commodity software; they want flexibility; they want something free from Apple's control freak behavior. They get a little of that with Android devices but, thanks to the way service carriers try to restrict the platform, they do not get as much of it as they conceivably could.

The things that come closest to providing "image" for carriers other than AT&T, which is the only official carrier for the iPhone in the US, and for device vendors other than Apple, are hardware exclusivity and add-on software exclusivity (like Motorola's MotoBlur). These are the brand differentiation opportunities for Android. The Android platform itself, however, is necessarily the same across devices and carriers. How, exactly, does unnecessarily restricting that help them?

This is what I want to know. What is the value in imposing these restrictions on Android? The business case for some of these restrictions escapes me. I am not even talking about source code: I'm just talking about being able to do things like run the OpenSSH client from the command line without having to install a separate terminal application, for instance. We need to get past the point where we're allowed to actually manage our own devices' operating system's basic functionality before we worry about whether we can have the source code.

I wonder if, given the opportunity to speak candidly with the people who make these kinds of decisions for companies like Motorola and Verizon, I would find the answers to questions like "What's the benefit of restricting the Android platform so much?" surprising. Would they have interesting answers that I had not considered, or would they even have meaningful answers at all? I suppose if asked such questions by a journalist they would give canned answers like "security", and if speaking to MBA students they would give cockamamie answers like "vendor exclusivity", but I would like to know if they actually believe that nonsense.

If that is what they say when trying to be 100% honest, they do not have good reasons, and are not actually thinking. If there are other reasons that they do not share with the public, it would certainly be interesting to hear them.

Vimium and Other Chromium Extensions http://blogstrapping.com/?page=2010.308.12.13.14 http://blogstrapping.com/?page=2010.308.12.13.14 Fri, 01 Jan 2010 00:00:00 +0000

Vimium and Other Chromium Extensions

Last night, I decided to tackle learning how to write extensions for the Chromium browser. Writing extensions is something I never bothered learning how to do for Firefox, over something like seven years -- since before it was called Firefox, and even before it was called Firebird. Part of the reason for this is that by the time I started getting really interested in extensions there were far more of them than I had time to explore, and every time I thought of something I'd like an extension to do, there was already an extension for it.

In my (somewhat limited) experience using it on MS Windows, I quickly came to appreciate Chrome's design, and like it a heck of a lot more than any other browser on that platform. Since before Google first announced its Chrome browser, and released the open source Chromium project in 2008, I have been at least one step ahead of it in operating system selection until very recently. Chromium finally made its way into the FreeBSD ports system as a stable piece of software. Because FreeBSD has been my primary OS of choice for a while now, that means that the opportunity to migrate to Chromium as my primary browser has not really existed before this.

Even now, however, I have run into some problems switching to Chromium. You may note, if you read that article, that one of the major hang-ups is the lack of a good vi-like keybindings extension for Chromium that works on FreeBSD. I had high hopes for Vimium, but ran into some problems with getting it installed.

Last night, I was prompted to look into extension development for Chromium because of this problem. I got at least far enough to figure out some of what's going on with Vimium, and plan to learn more about extension writing for Chromium as well. Thanks to my investigations, I have figured out how to "fix" Vimium to get it to install on Chromium, and am now much happier using the browser on FreeBSD, but I have come to realize that Vimium is certainly not perfect. The Vimperator extension for Firefox is still clearly superior in a number of ways. Vimium, at least, should make using Chromium for a lot of tasks much more tolerable, though I do not know if I will find that enough to make it my primary browser any time soon. We'll see, I guess.

"Fixing" Vimium

The actual Vimium bug on FreeBSD appears to be something wrong with Chromium on FreeBSD, and not a problem with Vimium per se. Chromium extensions can be written using a pattern matching configuration to select which URLs loaded in the browser can be affected by the extension. The pattern matching system is quite simple, and allows matching of URL schemes, domain names, and path names, with very rudimentary support for wildcards. A special pattern, <all_urls>, can be used to match everything that Chromium will let you match.

Unfortunately, for some reason it appears that anything that does not specify a particular URL scheme (such as http://) will not work on my FreeBSD install of Chromium, and I know of at least one other person -- Sterling Camden -- who has run into the same error when trying to install Vimium. It was through trial and error that I was able to figure out what does work in Vimium on FreeBSD.

By default, there was a line in the extension's manifest.json file that contained this code:

"matches": ["<all_urls>"],

In order to get it to install, I have had to get the extension from the Vimium project at GitHub (using git to clone the repository, naturally), then edit that code on my local copy. The new version looks like this:

"matches": [
  "http://*/*",
  "https://*/*",
  "ftp://*/*",
  "file://*/*"
],

As far as I've been able to determine so far, the only way to get Vimium to install on FreeBSD is to specify every URL scheme you want it to match -- and, of course, you can only specify those URL schemes that Chromium will allow. My investigations thus far lead me to believe that http://, https://, ftp://, and file:// are the only URL schemes that Chromium extensions can support, so I've specified all of them. Note, by the way, that a trailing comma on the last list item is invalid syntax, and will make the extension unloadable in Chromium, so if you copy this code you should not add a comma after "file://*/*" in the manifest.json file.

(Note: Once you have the source for the Vimium extension modified on your local machine, you can install it from the local source by following instructions on the Tutorial: Getting Started (Hello World!)) - Google Chrome Extensions page. You can quickly find the snippet of instructions you need by doing a text search on that page for the words "load the extension", if you do not want to take the time to learn how to write a simplistic extension.)

It is my suspicion that this problem is particular to the FreeBSD version of Chromium, or perhaps to the Linux version as well (from which the FreeBSD version is derived). Because of this, I doubt I will submit a patch to the maintainer of the Chromium extension himself, since the correct way to handle broad URL support is surely the way he handled it, and not the way I did. I only did this because the "correct" way is apparently broken in the FreeBSD port of Chromium.

Imperfect Vimium

There are other problems with the Vimium extension, at least on my FreeBSD install:

  • It appears to "lose" some pages from time to time, so that it will just stop working on them after I have switched to a different tab then back again. I do not yet understand the pattern that determines when this happens. It happened a lot when I first installed Vimium, but after restarting the browser it has become much less common.

  • Some URLs are entirely immune to Vimium, apparently. One of them is, perhaps ironically, the Google Chrome extension gallery. Chromium's own chrome:// URL scheme also appears to be exempt from support by Vimium, which I suspect is an intentional exception to the entire extension system -- and highly inconvenient.

  • There is significant overlap between Vimium keybindings and Chromium keybindings, and it seems that the default on my machine at least is for Vimium's and Chromium's keybindings to both work. This can get really annoying when I forget and try to use Ctrl-D to move down a page, resulting in both the page scrolling and a bookmark being added. I intend to look into this a bit more.

I have probably run across other issues as well, and forgotten. I may add to this list later if I remember them or discover more gotchas. I will see about reporting bugs to the appropriate parties as I figure out what exactly to report, and to whom.

Addendum: The Vimium Issue

I accidentally stumbled across something in Google's tutorials for extension building that led me to a page where it was indicated that the <all_urls> URL scheme wildcard was added in Chromium 6. A few minutes' investigation later, and I discover that though the stable version is 7.x, the FreeBSD ports maintainer has not committed anything later than 5.0.x, which would explain why I had to use that clunky URL schem list to make the extension installable. Hopefully the port maintainer is actually planning to commit a new port soon.

Meta-Dynamic Link Generation http://blogstrapping.com/?page=2010.279.15.34.00 http://blogstrapping.com/?page=2010.279.15.34.00 Fri, 01 Jan 2010 00:00:00 +0000

Meta-Dynamic Link Generation

I have started a much-needed refactoring of Lump. Configuration-driven automatic markup generation is being put into a lump.rb library, which is then loaded by the index.rhtml template for the main page. In the process, I have also been cleaning up a little of the configuration file's format.

It has been going well so far, but I am quickly catching up with part of the task that will be interesting to try to figure out how to accomplish.

First, code:

class Lump
  def initialize(conf)
    @config = YAML.load_file(conf)
  end

  def page_title
    @config['head']['title']
  end

  def link_list
    @config['head']['link'].collect do |k,v|
      '<link rel="' + v['rel'].to_s + '" type="' + v['type'].to_s +
      '" title="' + v['title'].to_s + '" href="' + v['href'].to_s + '" />'
    end
  end

  def anchor_list(a_type)
    @config['body']['anchor'][a_type].collect do |v|
      '<li><a href="' + v['url'].to_s + '">' + v['name'].to_s + '</a></li>'
    end
  end
end

That is my new Lump class. So far, that's where all the magic happens. As you should be able to tell, if you know much Ruby, there is nothing magical about it yet.

I have so far managed to avoid thinking about it too much, but I suspect I will need to start adding some magic to this when it comes time to deal with lists of anchor tags that have to be constructed at least in part dynamically. An example would be something like this:

print  '<p><a href="http://news.ycombinator.com/submitlink?'
print  'u=http%3A%2F%2Fblogstrapping.com%2F%3Fpage%3D'
print  "#{cgi['page'].to_s}&t="
File.open(page, 'r') do |c|
  print c.readline.sub(/^#+\s+/, '').chomp
end
print  '" style="text-decoration: none">'
print  '<img src="http://blogstrapping.com/img/y18.gif"'
print  ' style="border: none"/> Discuss At Hacker News</a></p>'

While I could (relatively) easily write a new method to do this for me, I'd like to keep the anchor_list general enough to handle these links as well. The problem is that I have not yet figured out how -- and, if I am going to do so, I am beginning to suspect I will need to sprinkle a little metaprogramming magic dust in there to keep it from turning into a single method bigger than the entire rest of the class (so far) combined. After all, it is going to call for dynamically generating the pattern by which anchor tags are dynamically generated. Stuff like that is not as much a walk in the park as the straightforward coding I have done so far for Lump, but if I learn something I suppose the time and effort will prove to have been worth it.

. . . even if I end up getting rid of it later because I decide the code is too "magical" to really be a good idea. I suppose time will tell. For now, though, I have other things to do, so it will have to wait.

Show Me The Documentation http://blogstrapping.com/?page=2010.277.16.31.54 http://blogstrapping.com/?page=2010.277.16.31.54 Fri, 01 Jan 2010 00:00:00 +0000

Show Me The Documentation

Lump is poorly documented. Lump is open source. Lump is not a good example of poor open source software documentation.

Some jackass Microsoft-certified IT weenie who can't find his own ass with two hands, a flashlight, and the Microsoft Knowledgebase at his beck and call may some day use Lump as an example of how open source software has crap documentation and crap usability for users, all of whom demand Features! Features! Features! (cue Steve Ballmer dance). This jackass is making the wrong comparison.

Comparisons

Instead of comparing Lump with . . . I don't know. Does Microsoft have an equivalent? What about Adobe? No matter. Let us assume that there is some Weblog oriented CMS out there, closed source, that anyone gives a shit about. I know, it is a bit of a stretch, but let's say it exists. Maybe it is part of Steam's market diversification plan, and called Pile, so it's a Steaming Pile. My point is that a comparison between Lump and Pile is the wrong comparison to make.

Let us assume instead that Lump has a closed source software equivalent created by someone named Lee Peregrine, just so we have a proper comparison to make. Lump and its hypothetical equivalent have some things in common. The first thing they have in common is that each was created for the sole purpose of serving the needs of its creators. The second is that it lacks sophistication. The third is that it is poorly documented. The difference -- the only real difference for purposes of this discussion -- is that unlike the creation of the closed source equivalent's creator, I decided to make Lump's source code available to the public. Period.

Another example of a more-correct comparison than Lump and Pile here is something more like a comparison between FreeBSD and Microsoft Windows. As far as I've been able to determine thus far, FreeBSD is the single best user-documented modern, general purpose operating system in existence. Microsoft Windows, meanwhile, is one of the worst. While there is certainly a huge mass of documentation related to MS Windows, about half of it seems to be of dubious usefulness, and about half of it is either totally unuseful or useful only for extremely limited circumstances to solve extremely rare problems even though presented as a general solution to a very common problem.

I do not even want to consider the notorious horrorshow of Oracle documentation.

Part of the problem with Microsoft documentation is that it is developed the same way as its software -- with little or no input from users, a sad state of affairs that is applicable even to Microsoft's open source software. This bears repeating:

Many of the problems with Microsoft's documentation and programs is that Microsoft does its development with a markedly dismissive attitude toward user input, even when the software is open source.

Open Source Not Immune

This is not to say that open source software is immune to bad documentation. As I pointed out in The Lure Of Features, Ruby sometimes runs afoul of poor documentation:

As it turns out, it was easier to write my RSS feed generator from scratch than to figure out how to use Ruby's RSS core module. Yes, really. The problem, you see, is that documentation for the RSS library is shit, and all the examples on the Web for how to use it to build an RSS feed are every bit as anemic and sad as the actual documentation. I'm not entirely sure most of the people writing these howtos have even used it. I think most of them have probably just copied the same old thing they found somewhere else, word for word, and changed a few details to thinly veil the plagiarism.

By contrast, I have seen indications that Microsoft's .NET libraries tend to be very thoroughly documented. Then again, if I was looking at Ruby from the outside and had only as much experience with it as I have with C# (next to nothing), I would have said the same thing about Ruby -- "I have seen indications that Ruby's core libraries tend to be very thoroughly documented."

An occasional acquaintance of mine, Curtis "Ovid" Poe, said:

Plenty of questionable assertions (ha!) are made about tests and some of these assertions are just plain wrong. For example, tests are not documentation. Tests show what the code does, not what it's supposed to do. More importantly, they don't show why your code is doing stuff and if the business has a different idea of what your code should do, your passing tests might just plain be wrong.

Despite this, there seems to be a (hopefully small) subculture within the Ruby community that claims that any library with complete test coverage is "well-documented" code. This is only slightly more offensive to someone who actually needs documentation than claiming that code with comments in it is "well-documented". As Ovid's commentary suggests, good documentation explains not only what it is, but how it is used -- and, in addition to that, why. By contrast, unit tests only explain what it isn't, and they do not explain it in plain English (or Japanese or whatever).

I find attitudes like those of people who regard a test suite as "documentation" particularly odious, not only because it's frustrating to be told something is well-documented when it so obviously is not, but also because it stands in the way of proper documentation being written.

This is not something I encountered when I spent a few minutes looking for documentation for the RSS library, but it is something I have seen in the past. It needs to be identified, exposed to the light of day, and thoroughly denounced so people will get on with the business of actually documenting their software and libraries. Anybody who claims that a test suite is sufficient documentation for a language's core library should be publicly embarrassed for that claim and voted off the island before (s)he influences some credulous newcomer.

Testing is great, mind you. I have no problem with it. I believe that, the vast majority of the time, software development projects should have extensive test suites. I believe, in fact, that even some rinky-dink project like Lump should have a test suite. I also believe that to truly appreciate the benefits of helpful software development processes and tools, we must be honest with ourselves about those benefits. There is No Silver Bullet, and if we kid ourselves that something is a silver bullet it will -- no matter how great it might be when used appropriately -- become more of a hindrance than help in many cases, as it is misapplied when some other aproach to solving a given problem should be taken at the time.

By the same token, I will not turn a blind eye to the documentation weaknesses in the open source world, despite the fact that I still believe that open source software tends toward better documentation given similar conditions. MS Windows and FreeBSD have similar effective lifespans and are deployed across a similar range of use cases, but FreeBSD has far better documentation. That does not mean that the poor documentation practices for some open source software should be ignored. If we ignore problems, we cannot reasonably expect to fix them.

I expect to start doing a better job of documenting Lump just as soon as it becomes a bit more stable -- if not sooner. If you need it to be sooner, letting me know you need it to be sooner is a good way to make it sooner, because I tend to listen to that sort of thing. I recommend using the issue tracker at Lump's BitBucket project repository, and being specific about what you need when submitting an issue. For the moment, unless someone tells me otherwise, I am likely to believe that other priorities for Lump development rank higher than documenting it for other users that I do not believe exist. In the meantime, then, expect it to lack thorough documentation, and please refrain from using it as an example of how open source software is poorly documented. Its closed source equivalent would, of course, be even less well documented, to say nothing of being unavailable, because where I simply do not expect anyone else to want to use Lump right now, it would equivalently be unlikely that random people would be allowed to use the closed source equivalent.

Show Me The Docs

Linus Torvalds famously said "Talk is cheap. Show me the code."

I would like to extend that to documentation, here. If someone complains that there is a lack of documentation for something, there is obviously something wrong. It may be that the documentation does not exist. It may be that the documentation is insufficient for the person's needs. It may be that the documentation, no matter how complete and helpful, is not easily enough discoverable.

The solution to this is not to claim your documentation is at least better than something else's documentation, or that it is well documented because of a test suite, or that someone is stupid for not being able to find the documentation. The solution sure as hell is not to point to Lump as an example of bad documentation. The solution is to "show me the documentation."

Talk is cheap. Show me the documentation.

Hypocrisy?

I suppose someone might call me a hypocrite for making such a fuss about the importance of documentation while failing thus far to document Lump. I could perhaps defend myself by pointing out that Lump has, as of this writing, existed for less than a week -- but I do not consider that much of a defense, given the fact I just said "Talk is cheap. Show me the documentation."

Fair enough.

I do not consider it hypocrisy, personally. I have a better argument than that: I am not excusing a lack of documentation that others say they need, or that I expect others to need. If someone points out to me that Lump is poorly documented, I will readily admit that person's statement is correct. More to the point,if someone claims reasonable need for documentation, I would even be inclined to write it. If you go back and read the end of the previous section in this very blogstrapping entry, you will see that I even asked people to prompt me if they need documentation so I can revise my estimation of how important it is to document Lump.

This is why I do not consider myself a hypocrite in this matter. You may still think I am. If so, however, I would like you to keep this in mind:

Even if I am a hypocrite, that does not make what I said wrong.

The Lure Of Features http://blogstrapping.com/?page=2010.276.18.25.01 http://blogstrapping.com/?page=2010.276.18.25.01 Fri, 01 Jan 2010 00:00:00 +0000

The Lure Of Features

I am not immune to featuritis.

Things that I really need to do with Lump include:

  • refactoring so that major routines -- especially reused code -- are shuffled out into at last one separate class

  • move discussion links into configuration

  • replace YAML as the configuration format with something simpler

What did I do instead of working on these things? In no particular order:

  • I added an RSS feed.

  • I moved head/link tag stuff into configuration.

  • I added a head/link tag for RSS autodiscovery.

  • I added a second off-site discussion/submission link.

  • I turned the title banner into a homepage link.

  • I added a TR Article List link to my lump.conf file.

  • I changed the name of a link from Bugs to Bug Reports.

  • I eliminated a Home link at the top of the page.

I've been bad. I really need to focus more on the stuff that matters; a design that makes it easy to maintain and extend (and test) the software. Oh, that's another thing I need to do, but haven't:

  • add a test suite

It turns out that featuritis is a shockingly contagious disease, and using the software publicly at the same time I develop it might increase my susceptibility to infection. I shall have to try to develop stronger resistance to it.

Creating an RSS Feed

As it turns out, it was easier to write my RSS feed generator from scratch than to figure out how to use Ruby's RSS core module. Yes, really. The problem, you see, is that documentation for the RSS library is shit, and all the examples on the Web for how to use it to build an RSS feed are every bit as anemic and sad as the actual documentation. I'm not entirely sure most of the people writing these howtos have even used it. I think most of them have probably just copied the same old thing they found somewhere else, word for word, and changed a few details to thinly veil the plagiarism.

Note that the problem with these howtos is not so much that they are all the same as it is that none of them explain anything. All we get is a cryptic set of examples with zero explanation of what each step does, at all. You have to guess, and hope for the best. Perhaps worse is that all of the examples are basically code snippets taken out of context. At least if an actual live code sample that includes the entire source file was presented there would be more to work with when trying to figure out for myself what the code means and how it works. Instead, there is nothing of the sort.

It is a bit like presenting someone with a "gentle" introduction to programming by showing a code sample that consists of nothing but a tree-recursive function in Common Lisp with no indications of how to compile the code or make it call an appropriate interpreter, nor of how to do anything with the function's return value, nor of what data it takes as input, nor of how the syntax works, nor of what each function does. Oh, yeah, and another thing: most of these howtos do not include any code indentation at all, so it looks like crap on top of being almost useless.

. . . so I just looked up some simple introduction to how an RSS feed is supposed to look, and cobbled together a way to produce that from scratch. It is really sickening that there is an open source library out there -- not just out there, but part of the standard distribution of Ruby -- that is so poorly documented that one basically has to already know how to use it to get any use at all out of what little documentation exists.

In general, open source software is pretty well documented, but Ruby has this problem that, for some rason, people seem to think that providing nothing but a skeleton for documentation with zero fleshing out, basically just a glorified function name list in an almost non-searchable form, is "documentation".

Bad News for RSS

My RSS feed does not validate. Maybe I will fix that soon. Sorry. Hopefully it at least works properly in your feed reader.

The Sourcecode Network http://blogstrapping.com/?page=2010.275.23.12.32 http://blogstrapping.com/?page=2010.275.23.12.32 Fri, 01 Jan 2010 00:00:00 +0000

The Sourcecode Network

I watched a movie today. Perhaps you have heard of it. This movie is called The Social Network, and it is the dramatized, fictionalized story of Facebook and one of its founders, Mark Zuckerberg. There is a lot I could say about this film, but none of it is really relevant to what I am about to say. There is a lot I could say about Facebook and Zuckerberg, too, but even Facebook itself (to say nothing of Zuckerberg) is not really relevant to what I am about to say. Facebook is just a metaphor, for now -- a better metaphor in this case than the all too common use of a car as a metaphor for anything related to information technologies. There is a lot I could say about source sharing sites like SourceForge and GitHub, too. I am here to talk about that.

GitHub is Facebook for coders. It is a social network in which your profile information is not your birthday, relationship status, and a photo; it is instead your contributor memberships, project status, and source code. I suppose you might say that SourceForge is MySpace for coders. I do not really know what that makes Bitbucket, which -- of SourceForge, GitHub, and the probably dozens (by now) of such sites -- is the only source hosting site where I currently have an account. Maybe I should rectify that at some point. In any case, maybe Bitbucket is Orkut or Friendster or Google Buzz or something like that. Actually, I think Google Code is Google Buzz, while Launchpad is probably the OkCupid, of these sites.

I think the metaphor breaks down before I find an appropriate choice for Bitbucket, for some reason. Maybe that is a good thing. On the other hand, if Diaspora doesn't flop, maybe that will be the metaphor for Bitbucket's place in this code hosting site niche.

The point is that these are not just code hosting sites. To the extent that they encourage collaboration, they are code hosting social networks. People with accounts can "follow" or "friend" or otherwise monitor the accounts of people they like -- or, more precisely, people whose code they like. They can also create groups, or projects, that other people can then join or monitor. Messages can be sent. Status messages for your code can be posted for others to peruse. Conversations occur in issue trackers and private messages and even in the code itself. Have you ever stopped to consider just how social it really is to clone a software repository at Bitbucket, make some changes that solve a need for that project, then send a pull request to the original project's "owner"?

While you are at it, consider how much such code hosting sites tend to discourage antisocial coding. You have to pay real money to have projects that are not necessarily public! Well, that has slacked off a little bit since Bitbucket's recent acquisition by Atlassian, but even now you cannot have give more than five people (yourself included) access to private repositories hosted by your Bitbucket account unless you have a paid account. Essentially, you have to pay money for the privilege of being even remotely antisocial on Bitbucket.

While GitHub seems to be the default social code network of the near future, there is nowhere near the kind of single-site dominance in that market as exists in Facebook's market, where things are getting so ridiculously dominated by a single provider that the whole of the Web is sprouting Facebook "Like" buttons and "Login Using Facebook" links in the most unlikely of places. Part of this reduced dominance of the player with the most buzz in code hosting, I think, has to do with the fact that sites like Bitbucket, GitHub, and Launchpad are each focused on a single version control system. In fact, each of these three are the de facto "official" code hosting sites for their respective DVCSes: Mercurial, Git, and Bazaar. If you are only just getting into the idea of code hosting on the Internet, and you use Mercurial, Bitbucket is the obvious place to do it. The same goes for Git/GitHub and Bazaar/Launchpad. This, of course, limits the number of people who might choose one or another of these sites as the primary choice for code hosting, because there is no consensus for what tool people should use as a VCS.

Sites that are less one-trick ponies, that support more than one VCS, lack the obvious initial draw of those that are regarded as some kind of "official" choice for a given VCS. They also, generally, do not have their shit together quite as well -- because they cannot specialize, and keep things simple, the way the one-trick ponies can, and users are still motivated to pick a single VCS for most purposes and stick to it. As things currently stand, a multi-VCS hosting site seems mostly to be stuck with either lacking a lot of the social networking capabilities of something like Bitbucket or suffering community disunity amongst the users of different VCSes. They also tend to be slow to adopt new VCSes, because of course they do not want to waste time and energy adding a new VCS to the system, and maintaining it, only to find that it falls by the wayside and never really picks up any user base to speak of, including unfortunate examples like Codeville (whose Website at codeville.org has vanished). Meanwhile, the increasing sense of need for a code hosting site for VCSes will pretty well ensure that someone will create a site devoted specifically to that VCS early on, before it has "proven" itself in terms of popular appeal, effectively cock-blocking the multi-VCS hosting sites.

Ultimately, though, I think this fragmentation in the code hosting market is a good thing. Bitbucket, GitHub, and Launchpad thrive, and grow, and provide an incredible amount of value for their regular users. They encourage and foster open source development, innovation, and collaboration. They do for open source projects what closed source advocates might claim they would do to those projects -- they encourage forking, make it easy. This is a positive thing, because it keeps open source projects honest and, more to the point, it makes it easier to try out a new approach and ultimately merge it back into the original project, or at least share ideas back and forth. The only downside is the mutual incompatibility of the DVCSes that go with those sites, and that problem is being solved a little at a time by repository translation tools.

Take the examples of FreeBSD, NetBSD, and OpenBSD, which are three forks of a single original BSD Unix codebase. They are entirely separate projects, each managed in a very different way from its siblings. Despite this, they have all been sharing code fairly promiscuously for about as long as they have existed, and all of them benefit from this collaborative sharing. Imagine how wonderful it will be when they are all standardized on version control via some DVCS that allows such easy forking, merging, and patch sharing. They will probably give birth to some kind of special-purpose analog of sites like Bitbucket, in time, to ease the process of collaboration and sharing specifically between these three projects.

Now let us extend that idea to every little basement or off-time open source code project conceived and executed by one or two or maybe three people, all of whom can search out similar small projects on a common Web-based platform like Bitbucket. Consider as well the possibilities that come from becoming a known coder by the popularity of one's projects, or by the influence of code submitted to others' projects. This is the power of the social network. This is not the social network for Ivy League student bodies, for sales managers and petty bureaucrats, or for college sports has-beens and high school seniors looking for a hook-up, though. This is the social network for people whose social interactions online are oriented around actually getting something done.

This is the social network for coders, for people whose several hours a day spent in front of a computer outside of work time involves producing valuable innovation rather than posting brain-dead comments at YouTube. This is social networking as a multiplying force acting on the productivity of insomniac computer geeks with brilliant ideas, reducing the cost of innovation while increasing the ease of propagating ideas across the world.

This is also, ultimately, likely to really undermine the authoritarian control freaks' grip on the Internet even as they try to establish that grip in the first place. If you think Napster was a disruptive technology for the record industry, just wait until you see what can be done with distributed version control systems aided by the power of social networking. The most interesting part of the disruptive potential of social DVCS networks is that it may be that almost nobody will notice until long after the effects have already made their mark. Nobody really pays attention to the code people write on their own time, except maybe organizations like the NSA if it happens to be cryptography code.

It is said, often enough, that the Internet treats censorship like damage and routes around it. On the Internet, source code is power, and a distributed version control system puts the power of source code in the hands of the people who most agree with the idea that censorship is damage (at least where their code is concerned) -- who depend most on the effects of that idea put into practice. A distributed version control system turns the sharing of that code into an act that is perfectly natural, easy and rewarding. Turning DVCSes into a social network unifies that power in a way that cannot be easily combatted.

Eventually, social code hosting sites like Bitbucket will surely become obsolete. Something better will come along, just as sites like Bitbucket were "something better" compared to SourceForge when they came on the scene, and just as SourceForge was preferable to FTP and email and postal mail. I think that they will likely become obsolete by way of DVCSes being extended into, or replaced by, distributed social version control systems, so that centralized code hosting like Bitbucket provides will no longer even be necessary. I, for one, am chomping at the bit to see that happen.

In the meantime, though, you can have your Facebook if you want it. Sure, I have an account there, but I do not really use it or like it. I do not even particularly condone its use, given the sketchy, underhanded way Zuckerberg's monstrosity abuses the trust of its user base. Instead of using a Social Networking site, I will focus my time and effort on a Sourcecode Networking site; instead of Facebook, I will use a Sourcebook, to repurpose a term. If you want to babble about what you had for breakfast on your "wall" at Facebook, go ahead and waste the time. I, on the other hand, will share something useful that I created with my own faculties, something worth sharing, on Bitbucket.

You can use Facebook and share it with your two hundred and fifty "friends" who would not even loan you $50 each to pay your medical bills if you want to, but really, I would prefer you do something productive, innovative, and valuable, and share it with me on Bitbucket. Let's do something good with our time. Save your babbling for advocacy and contemplation on your Weblog.

By the way, if you like, you can use this Weblog CMS application for free. You can find it on Bitbucket. Most of you probably would not like it much, though; it's pretty unsophisticated.

Blog Strap Ping http://blogstrapping.com/?page=2010.274.17.39.40 http://blogstrapping.com/?page=2010.274.17.39.40 Fri, 01 Jan 2010 00:00:00 +0000

Blog Strap Ping

Maybe it is a little late for my Hello World entry at blogstrapping. Such a thing is usually the first entry in a new Weblog, rather than the fifth. Oh, well. Better late than never, I hope.

> ping -c 4 blogstrapping.com
PING blogstrapping.com (69.89.25.183): 56 data bytes
64 bytes from 69.89.25.183: icmp_seq=0 ttl=52 time=78.644 ms
64 bytes from 69.89.25.183: icmp_seq=1 ttl=52 time=83.022 ms
64 bytes from 69.89.25.183: icmp_seq=2 ttl=52 time=74.842 ms
64 bytes from 69.89.25.183: icmp_seq=3 ttl=52 time=81.643 ms

--- blogstrapping.com ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 74.842/79.538/83.022/3.139 ms

When I first conceived of this thing (and registered the domain name) lo these many moons ago, the idea for the name came from "bootstrapping". I think one of my friends suggested the name, or at least helped inspire it -- probably Sterling (edit: I've confirmed that it was indeed Sterling who suggested the name). Regardless of the source, though, the similarity of the name to "bootstrapping" was appropriate to the concept.

The idea was that this would start as two things in one: both a content management system development project and a devlog. I envisioned primarily talking about the CMS, at least at first, and gradually broadening the subject matter until it became a more general purpose development Weblog.

By making the Weblog dependent on the CMS, and working on them both simultaneously, development of the CMS would then be encouraged and driven by the needs of the blogstrapping Weblog itself. That development, in turn, would give me more material to discuss for the Weblog so that it would not be likely to languish for long periods without fresh content.

This all arose in part because I have a fairly strong belief that, all else being equal, we get better quality software when the developers have a very personal stake in the quality of that software. I do not mean just a financial stake for developers selling or supporting their software professionally; I mean the kind of personal stake that comes from having to use the software.

Taken to its logical extreme, this means that software is more likely to be high quality if the developer has to use it while creating it from scratch. I am my own guinea pig in this, developing a CMS while using it even though the CMS itself is not yet anywhere near what I would call complete (though more substantially complete already than I would have guessed I would accomplish in less than a week when I first conceived of the idea).

Of course, it has occurred to me that "blogstrapping" sounds like a BDSM term, and I have had that pointed out to me by a friend. Just overlook that for me, and try to stay on-target while reading this. It is more about software development quality being improved through the process of bootstrapping it into existence, where using the software serves as the bootstraps for developing it -- or, conversely, bootstrapping the use of the software into existence with development serving as the bootstraps for using it.

There it is. Welcome to blogstrapping.

Expedients Should Be Temporary http://blogstrapping.com/?page=2010.274.15.39.58 http://blogstrapping.com/?page=2010.274.15.39.58 Fri, 01 Jan 2010 00:00:00 +0000

Expedients Should Be Temporary

I started out using YAML as the format for my Lump configuration file format, because I did not have anything specific in mind to use. YAML is easy to get started using. As long as the contents of the file conform to the YAML format, this is all it takes to get a single data structure containing all your configuration data into your program (in Ruby, the source language for Lump):

path =   File.dirname(__FILE__)
require 'yaml'

# more code here

config = YAML.load_file(path + '/lump.conf')

It's simple enough, and it gets the job done. The configuration data is even hierarchically arranged in config.

This does not mean all configuration data is well-suited to YAML compatible storage, however. For instance, the way lump.conf needs to be put together with its current philosophy of configuration management, I end up with code like this in Lump:

config['menu'].each do |a|
  print   %Q{        <li><a href="#{a[1]['url'].to_s}">}
  puts    a[0]['name'].to_s << '</a></li>'
end

I flinch at what appears, from this code, to be the inclusion of "magic numbers", namely the [0] and [1] parts of the code used to retrieve URLs and link names from config['menu']. I could probably figure out a better way to organize a YAML compatible data file so that this is not necessary, but it seems to be both a little too much work to bother and like effort to support using a data format that is not perfectly well suited to the task at hand, anyway. Far better, I think, to simply write a method or two of my own to use for the purpose of getting configuration data out of lump.conf.

I do not know when I will get around to tackling that task, but I have come to the conclusion that it is something I want to do at some point. I want my config file to not only provide for cleaner looking code within the Lump source file(s), but also to be more "intuitively" human-manageable and less dependent on a large third-party library.

Using YAML was an expedient, rather than a conscious and well-considered design decision. Eventually, I will need to think this through a bit more and make a decision I know to be appropriate to the design philosophy of Lump.

Take Your Discussion Elsewhere http://blogstrapping.com/?page=2010.274.12.01.11 http://blogstrapping.com/?page=2010.274.12.01.11 Fri, 01 Jan 2010 00:00:00 +0000

Take Your Discussion Elsewhere

There are many different types of content management software. Some of them are designed as collaborative tools, for sharing documents between people working on some project together, allowing them to collectively edit those documents. Others are designed with discussion in mind, like a bulletin board or forum for a flamewar quorum. Still others are community sharing venues, where people present what they have found or made to the world -- or at least to each other -- for review and admiration.

Then, of course, there are the moderated or single-user publishing platforms. This category includes Weblogs. This is the kind of content management system Lump is meant to be.

Keepin' It Real

Real Simple, that is.

Adding a comment system to this Weblog would require quite a lot more code than it takes to make a Weblog. In fact, I am pretty sure that managing a commenting system is the primary reason that Weblog applications need RDBMSes on the back end, though of course I've never built my own Weblog CMS from scratch before, so I don't have quite the same experience with the bottlenecks and resource needs of such things that, say, the WordPress and Blogger developers have.

I want to avoid that kind of complexity in Lump itself, so for the time being at least there are no plans to integrate discussion with Lump.

Getting Social

The term Web 2.0 doesn't seem to get the same buzz it did just a couple of short years ago, at least among the technorati. I do not tend to hang out with a bunch of buzzword addicted middle-managers, though, so for all I know they might still be talking about the term as if it was the name of the Universal Solvent.

In any case, that Web 2.0 stuff has, as far as I can tell, faded into the background noise. What seemed shiny and new and special a couple of short years ago has, to a substantial degree, become the norm. I do not just mean "rounded corners" and "AJAX" and other front-end interface design features that had become associated with "Web 2.0", either; I mean the social web, leveraging the power of the masses to enhance the value of a Website, trading information for viewership and viewership for information in a virtuous circle like Hacker News, reddit, twitter, and Wikipedia.

I do not remember for sure who it was that said he thought that discussion should most likely be mediated through online social networks, offloading the responsibility for that kind of content management to "the cloud". I think it might have been Reg Braithwaite. In any case, the idea was that if people with something to say really want to say it, they can do so in their own Weblogs. For those who do not wish to manage their own Weblogs, they always have social linke/news discussion sites like reddit to satisfy that need. At first, I was inclined to think that was a terrible idea, but I tried to keep an open mind.

I have finally come around to that way of thinking, at least for the purposes of blogstrapping -- and blogstrapping is at the moment the only site using Lump, and I am Lump's target userbase. As such, I have "added" my "comment system" to Lump already, in the form of a link to reddit. That link you should see at the bottom of every blogstrapping entry points to the submission page at reddit if the entry has not yet been submitted there, or to a discussion in progress if it has. I would like to have it list the number of comments (if any) already made there, but I have not yet figured out how to do that in a non-kludgey way. The good news is that clicking that link will not commit you to starting a discussion there; you can back out if you see that there is no discussion yet.

It is not a perfect solution, at least yet, but it is something at least. If you have any ideas for how to improve it, do not hesitate to let me know via the issue tracker for the Lump project at BitBucket. You can get there via the "Bugs" link in the menu box on this page.

While I am at it, I do know for sure it was Mr. Braithwaite who said something else about not hosting comments that, for the nonce, rings true for me:

I lose a lot of good feedback, but I also shed myself of a damaging temptation to pander to the crowd. When blogs become "conversations," they also tend to converge on a common group-think.

I am increasingly inclined to agree with this estimation of the problem of comment discussions.

No Hassle

One of the things that convinced me to avoid running my own comment system is, of course, the fact that it needs constant management. It seems that every time I post some new entry at SOB, I get two or three or maybe even half a dozen or more legitimate comments -- following which I get a trickle of spam comments that lasts for a month or more.

This is only the smallest of security issues with a comment system. Of course, simplicity is security, and complexity is fairly antithetical to that. Worse yet, taking input from arbitrary strangers tends to increase the danger to security. I really do not want to have to deal with that, either on the back end when maintaining the design of the software or on the front end when trying to moderate and filter comments as the site administrator.

I Can't Fight City Hall

Finally, of course, there is the danger of legal hassle. Law and precedent protect Websites, at least somewhat, against civil and criminal responsibility for content that has been contributed by visitors to the site -- and rightly so. Even in a best-case interpretation of the legal state of things, though, content hosts must be prepared to take down content for which they have received complaints that could lead to litigation, which amounts to selective censorship on behalf of people the content hosts have probably never even met. Most of this kind of problem comes from copyright enforcement, of course, but there are other areas of law that could also cause problems.

Even worse, those protections for content hosts are constantly under attack, and it may be the case that these protections might ultimately be worn down to the point that it is simply not reasonable to host contributed content any longer without a huge, corporate stable of lawyers at one's beck and call. While one might argue that I should just host contributed content until that day comes, I have decided to take a more proactive and "safe" approach, and just farm out third-party content hosting to others.

After all, when rights and liberties are worn away, the change tends to occur by way of some "test case", where those wearing it all away attack some hapless schmuck to set precedent, and make an example of the unfortunate soul. I have no interest in being that target, and while the chances are slim, it is easier for me to avoid the problem than it is to try my luck anyway -- at least in the case of Lump and blogstrapping.

I still host contributed content elsewhere. I have not decided what I will do with that, yet, so I'll just keep on hosting it for now.

The End(?)

The "final" word -- at least for now -- is that anything here you feel like discussing will have to be discussed by visiting some third-party Website that hosts contributed content. The link to submit to reddit at the bottom of each entry, as of this writing, is a hint. I would like to add a similar Hacker News link, but have not yet bothered to sort out how to write a submission link for that site. Feel free to let me know via Lump's issue tracker or via reddit discussion (whichever is most appropriate to the approach you want to take to letting me know) if you have ideas for how to do that.

Lump Interface Philosophy http://blogstrapping.com/?page=2010.274.00.55.17 http://blogstrapping.com/?page=2010.274.00.55.17 Fri, 01 Jan 2010 00:00:00 +0000

Lump Interface Philosophy

One of the key characteristics of Lump at this time, and a characteristic I would like to maintain in the future, is a simplistic quality of its administration interface. No matter how slick the Web interface for the visitor might be, I want the administrative interface to require no more than two things:

  1. a text editor

  2. a file transfer tool

Neither of these requirements should be specific to Lump. At the time of this writing, the tools I use are:

  1. Vim

  2. OpenSSH

Writing a new entry for blogstrapping follows this process, as of this writing:

  1. Write it in Vim, using Markdown syntax so it is a human-readable, clear, basically plain-text file.

  2. Save it with a datestamp for its name (plus the .txt filename extension).

  3. Upload it with the scp utility, a part of the OpenSSH suite.

As for the Web interface for visitors, it should be simple to manage. At the time of this writing, it basically consists of two things as well:

  1. the RHTML template that makes up the main index.rhtml file that controls content structure

  2. a css file in the css directory that controls presentation

At present, all that appears here at blogstrapping is the main archive menu, headings, a footer, and Weblog entry content. Some menu-style detail should be added soon, in that blank right-hand margin area in the page design as of this writing. I have not decided on everything that will go there, but at least three things come to mind:

  1. a link or two that lead(s) to discussion (more on that in a later entry)

  2. a link to the BitBucket issue tracker for Lump

  3. a link to a contact page

At the moment, it seems that anything else is negotiable.

The Virtues of Lump http://blogstrapping.com/?page=2010.273.12.17.19 http://blogstrapping.com/?page=2010.273.12.17.19 Fri, 01 Jan 2010 00:00:00 +0000

The Virtues of Lump

The three key virtues of a programmer are laziness, impatience, and hubris. These may sound like vices rather than virtues, akin to sloth, wrath, and pride. Would those pillars of Catholic virtue -- diligence, patience, and humility -- not be better choices? Such is the apparent perversity of the person who loves programming in the eyes of the common folk, however, that notions of virtue and vice are turned on their collective head.

Larry Wall is at least one of the originators of the idea that laziness, impatience, and hubris are the chief virtues of a programmer. In the "bible" of Perl programmers, Programming Perl (also known as the "Camel book"), these virtues are described thusly:

  • Laziness - The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris.

  • Impatience - The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris.

  • Hubris - Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won't want to say bad things about. Hence, the third great virtue of a programmer. See also laziness and impatience.

Some time later, Larry Wall described his motivation for creating the Perl programming language in an interview with Marjorie Richardson, casting it in terms of these three virtues:

I could have solved my problem with awk and shell eventually, but I possess a fortuitous surplus of the three chief virtues of a programmer: Laziness, Impatience and Hubris. I was too lazy to do it in awk because it would have been hard to get awk to jump through the hoops I was wanting it to jump through. I was too impatient to wait for awk to finish because it was so slow. And finally, I had the hubris to think I could do better.

Software advocates like Eric Raymond and entrepreneurial programmer luminaries like Paul Graham assign the label "hacker" to the most creative and talented of computer programmers. It is this category of programmer that is most easily defined by laziness, impatience, and hubris, as the terms are described by Larry Wall and Programming Perl co-authors Randal L. Schwartz and Tom Christiansen. This conception of the motivations of software development stands in contrast to the typical motivations of what some call "daycoders": people who write code purely for the paycheck, who are not interested in improving the state of the art of computer software or the challenge of creation per se, but simply do so toward some other, fully pragmatic end, without any personal concern for the process itself.

A tale of three virtues affected me recently. Over the years, dealing with content management systems designed primarily for Weblog use, I have quickly become dissatisfied with many common characteristics of this category of software application. They tend to:

  • . . . be big, bloated, featuritis-afflicted, spaghetti-coded monstrosities that have essentially zero relationship to the concept of elegant design.

  • . . . break things during upgrades.

  • . . . be difficult to configure to meet my needs.

  • . . . load slowly.

  • . . . impose unnecessarily large resource demands on the Web server, such as the typical requirement for dozens of database hits for every page load in WordPress.

  • . . . behave as though they know what the user wants better than the user knows, thus making it difficult to do what I actually want to do.

  • . . . be highly prone to security vulnerabilities.

  • . . . be written in PHP, with is really a miserable language, or rely on frameworks that are largely unnecessary for my simple CMS needs.

Impatience

I have become increasingly impatient with all these shortcomings of the common CMS over time. Despite diligently searching for options that would satisfy my needs, and being willing to bend on some of those points of dissatisfaction with common CMSes suitable for Weblog use if it would mean solving the problem of other, more annoying points of dissatisfaction, I never actually found anything to suit my preferences.

This made impatience the first of the three virtues of a programmer to influence me with regard to CMSes.

Hubris

My hubris led me to consider the possibility of writing my own CMS. I gradually came to the conclusion that, if I want something done right in terms of developing a decent CMS for a Weblog, I would have to do it myself -- where "right" means "right for me". Given a little time and effort, I decided that I should be able to do so without too much trouble. All it would take was a little diligence.

This made hubris the second of the three virtues of a programmer to influence mee with regard to CMSes.

Laziness

Did you notice the use of the term "diligence" above? My laziness served at first as a discouragement from writing my own CMS. That is, in fact, a good thing most of the time -- because when something else is serving your needs enough that your laziness does not overcome it, there is always the possibility that you should not bother putting in the effort. In this case, however, it was misapplied laziness. If ever there was a piece of software I should have written but did not do so, it was a CMS that satisfied my relatively simple needs much better than garbage heaps like WordPress.

For several years, I avoided writing my own CMS, out of laziness. Ultimately, only laziness itself would prove a powerful enough motivator to overcome that laziness. Fight fire with fire, as the saying goes.

I believe it was 27 September 2010 that I started formatting some IRC logs for display on the Web. I was hacking away at the logs in Vim, using its powerful editing capabilities to ease the process of eliminating extraneous text and adding markup, when I decided that I was doing far too much work. At least part of this should be handled automatically, especially given the way the BlueCloth library (a Markdown implementation) for the Ruby language mostly Does The Right Thing when handed plain text of the sort found in IRC logs (with minor edits like adding double-spacing).

In less than twenty minutes -- most of which time was spent deciding what I wanted to do and how I wanted to do it -- I wrote a relatively simple script that:

  • . . . works by way of eruby.

  • . . . a page formatted for the Web, using CSS for easily adjusted presentation.

  • . . . produces a menu of available logs by scanning for files whose names fit a particular pattern.

  • . . . generates pages with well-formatted content from those files, using Markdown syntax.

  • . . . does not have any of the common shortcomings of CMSes I described above.

When I was done, I suddenly realized I had created the basis for that CMS I had been putting off for so many years. My laziness-fueled procrastination was finally over come, not by willpower or determination, but by greater laziness.

Welcome to Lump

I came up with the term "lump" by running a simple script that randomly selects words from a computer's spelling dictionary. The first two words it selected were "interinsurer" and "juniorship", but I decided neither of them was really appropriate. Lump, then, became the name of my incredibly simple content management system.

Perhaps the first two letters stand for Lightweight and Unpretentious; thanks, n8, for making that observation. Maybe someday I will come up with a complete backronym for LUMP. Lightweight Unpretentious Murphy Protection -- because if it's simple enough, not much can go wrong -- might work; thanks for the suggestion, Sterling. In the meantime, though, I am satisfied with the name Lump.