rss feed livejournal twitter

2010 2009 2008 2007 2006 2005 2004 2002

December 31, 2011

Sprint still really sucks at this "internet" thing. When it works, it's great, but I'm sitting in einstein's and WITHOUT MOVING the signal goes from 3 bars to nothing about twice a minute. When it has three bars I don't always have internet. I get bursts of updates, and then can't reply to them.

I left T-mobile for a good reason and stand by it, but _dude_. Sprint, please get your act together.

December 30, 2011

Updated the Funtoo Wiki with info about Aboriginal Linux and what Daniel Robbins and I were trying to make work when he was in Austin just before my trip to Florida.

Basically, it's the old gentoo-bootstrap stuff, except A) done on top of lfs-bootstrap instead of the basic system iamge, B) done by somebody who actually understands how portage _works_. (Doesn't get much better than the founder of Gentoo.)

Meanwhile, I'm poking at toybox stuff. I should get an aboriginal linux release out, but I still have a couple days of vacation left.

December 29, 2011

Home from Florida, recovering from the driving equivalent of jetlag, and poking at Aboriginal and Toybox.

On the Aboriginal Linux front, I've gotten sparc working, baseconfig-busybox checked in and building LFS, and various random cleanups. Basically all that's left before the release is doing the release notes, and possibly an upgrade to the 3.2 kernel since it's so close and doesn't require new patches. (If there are obvious regressions, I may not hold the release for it. Time to ship.)

On the toybox front, added TOYBOX=toybox to aboriginal so it can build toybox in host-tools and simple-root-filesystem. (This doesn't suppress the busybox build yet, it just installs over the equivalent busybox commands.)

Then I did a little "boxen.sh" script, which I haven't checked in because it doesn't _quite_ fit in either project but falls squarely between them. That goes through the record-commands.sh output and determines how many times each command is called during a build, and what it's linked to in build/host. For the current i586 build it gives me a display like this (trimmed a bit 'cuz it's long):

1 mksquashfs 
1 wc toybox
1 yes toybox
2 dd busybox
2 gzip busybox
2 head busybox
2 sh toybox
2 whoami busybox
3 chown busybox
3 id busybox
...
1424 gcc /usr/bin/gcc
2297 cc /usr/bin/cc
2440 mv busybox
2793 as /usr/bin/as
3467 expr busybox
7222 grep busybox
9748 echo toybox
23914 rm busybox
31238 cat toybox
32854 sed busybox

I.E. how many times was each command called, what command was it, and what was it symlinked to? So I can grep for busybox (35 commands total) or toybox (14 commands total), and see how often each one's used, and whether it's worth grepping the logs to see what command lines got called (things like dd may not need every corner case implemented yet to make the build happy) or not (23k rm instances are too much to look through).

The top few most heavily used busybox apps are:

455 chmod busybox
506 basename busybox
606 egrep busybox
1188 mkdir busybox
1422 ln busybox
2440 mv busybox
3467 expr busybox
7222 grep busybox
23914 rm busybox
32854 sed busybox

Note how sed and rm are more calls than everything else combined. My first major busybox project was rewriting sed, (although the code ownership there isn't nearly clean enough for me to just relicense, it was a derived work of what was already there so I have to do a new one). I already wrote a cp for toybox so rm and mv shouldn't be hugely different, it's all the dirtree stuff. (Which has a PATH_MAX limit in its current implementation, and should probably really be using openat() and friends, but that can change later.) I did an expr equivalent in java in 1998 for Quest Multimedia's precalculus tutorial applets. (That was great code, I was quite proud of it, too bad it was closed-source and died with the company that owned it.)

The "not quite sure what's involved" part is grep, since I dunno if I can rely on libc (bionic is crap) or if I should write my own regex engine (which I did years ago for OS/2 already; that code's long-lost but the principle's the same)... Then again sed's gotta use that to, so use libc now and supplement it later if necessary.

Basically, I can take most of the load off busybox during the build with the top few commands, and then work my way down the list: 35 commands isn't a whole lot. And then do the same analysis on the native LFS build, which shouldn't be hard since I copied the record-commands infrastructure to the native side and checked it in already. Then fill in my triaged SUSv4 list, then start on the other stuff busybox does that's worth doing.

I still have a few days off before work starts, and I'm rested and feeling better than I have in months. Let's see what I can get done...

December 28, 2011

The problem with "software as a service" is that it goes away. Not "oh noes, my site went down" away, I mean they keep "upgrading" things people used to like, without asking us, often into a form that isn't as appealing. We get no say in this, and usually we get no warning. We haven't got the option to keep using the old version we liked: it went away.

Twitter's done this a half-dozen times now, as I've ranted about here and various famous people (Wil Wheaton, Seanan McGuire, Diane Duane) have tweeted about their own annoyances with. Gmail's done everything from inflict "google buzz" on people (without asking, in the process leaking contact info of iranian dissidents when it auto-populated its social networking links) to the recent color changes making the site sigificantly less readable to many people. Livejournal has slowly strangled itself chasing whatever Facebook looks like this week, and now wordpress is doing it. Ho hum.

When the scrivener people screw up a new version, you have the option of sticking with the old version, because it's a piece of software you download and install on your own computer. With steam, not so much. My old phone's netflix client refuses to play any more shows until I upgraded to a new version, and the android marketplace app that it loads to perform the upgrade wants me to agree to Google's new "terms of service" for continuing to use the marketplace that worked just fine before without ever asking me to do this. (Google can die in a fire: the evidence suggests they broke the android marketplace for me because they're incompetent bastards taken over by lawyers who drove out all the engineering talent, I'm not signing anything).

With some things, like paypal, interactivity is an inherent part of what they do. But I've got a twitter app on my phone (tweetcaster) I've refused to upgrade because each new version got progressively more annoying (losing the ability to keep indefinite backscroll so I can READ ALL THE TWEETS, growing a "gap detected" thing you have to click on instead of automatically loading ALL THE TWEETS like it used to before twitter's servers refuse to serve them up anymore because they're 24 hours old and got flies on them or something...) and the version I stopped on still talks to twitter's servers just fine. (Ok, the one on my old phone lost the ability to send direct messages six months before I got a new phone, and didn't give me the extra character budget from URL shortening, but I was ok with that. It didn't rewrite "RT @user" retweets to look like Horrible Retweets with the WRONG ICON, like the version on my new phone inexpicably started doing shortly before I stopped upgrading it again.)

Yes, there are security implications of running old apps (the vaunted "java sandbox" is "still crazy after all these years"), but Linux containers may finally fix some of that, and when it's a tower defense game and you preferred the old textures? I want the option to keep the old version. (Sometimes they're good about making old versions available, but it's the exception. The one on my phone, all upgrades are permanent, take it or leave it, sight unseen.)

This probably comes across as "resistant to change, set in his ways", but from my point of view I am BUSY, I was COFORTABLE with the old app, adding an "accidentally report this user for spam" option with no confirmation dialog to the same menu as "copy this tweet to the clipboard" did NOT improve my comfort level using the thing. (They fixed that one eventually.) I want to upgrade on _my_ schedule, not yours, I want to try the new thing out before I decide whether to switch, and I want the _option_ to say no.

Some things I want to actually _own_.

December 26, 2011

Wow, busy couple of weeks.

Swamped at work leading up to the holidays: I got a bit burned out and fell behind. Mostly caught up before I had to leave, but not quite.

Daniel Robbins wandered through Austin and we spent an evening getting some of the basic Gentoo/Funtoo bootstrapping on top of aboriginal (on top of the Linux From Scratch build really). Basically we got portage to run but then hit the weird circular dependencies in the portage "system" packages, and Daniel went off to frown at them. Apparently package.provided as A) added after his time, B) doesn't work when USE flags are involved, which is always. But if you cd into the various portage tree directories and "emerge thingy.ebuild" directly, that seems to work. We're trying to get to the point you can run his Metro tool to create a new stage1, but there's lots of details to get right. For once, he came out of it with a bigger todo list than I did, which is soemthing...

Lots of laundry and catsitting details elided over (and another brief visit from Camine, seeing the Master Pancake christmas show with Tryn which took two attempts because we got the date wrong the first time...), and then Fade and I drove to Florida to visit Ian (who is 11), Sean (who is something like 9), Carrie (who is 6) and Samantha (who is 4). My sister, who is behind all of this, normally lives in Minnesota but I flew her and the kids down to see my grandparents, plus an aunt and an uncle who all live in Florida. (A further cousin and two nieces were sick and we didn't get to see them.)

Tomorrow is the inevitable trip to Disneyland, or possibly world. If you bring small children into the state they won't let you out again unless you can show Proof of Disney.

I'd expected to get a lot of work on aboriginal linux and toybox done over the holidays. This has not happened.

If I sound somewhat disjointed and exhausted: yeah.

December 17, 2011

I keep thinking "toybox work seems harder than busybox was, am I really that burned out by sitting in a cubicle every day at work?"

The answer could easily be yes (something about cubicles just drains all the life out of me), but it's also that this is a harder job, because I'm trying to be reasonably standards compliant. Standards compliance keeps going beyond "nontrivial" and into the realm of "mind-bogglingly strange".

Take xargs, which was really low hanging fruit when I did a quick and dirty BusyBox version, but is incredibly fiddly in the standard.

What I initially implemented for busybox was basically just "each line of input becomes a separate argument, then you exec the rest of the command line with these arguments appended". My first stab at it didn't even support -n. But the standard requires a whole lot more than that.

To start with, newlines are not the only whitespace to separate arguments, any whitespace does, so if you "echo -e 'one\ntwo three\nfour' | xargs blah" you get four arguments to blah. So how _would_ you pass through an argument with whitespace in it? To do that, xargs is supposed to parse single and double quotes to collate arguments (but not across newlines). And when you parse quotes, backslash escapes always work into it so you can pass through the quote characters (and the escape character has to be able to escape itself). So there's a mess of quote parsing logic to churn through.

And of course -n isn't the only option, there's eight flags (-ptxEILns) several of which take arguments, and some of which can't be combined. (Strangely, the "find -print0 | xargs -0" combo, far and away my most common option to xargs... isn't in posix. It's a Linux extension. Most of this complexity is to implement obsolete ways of dealing with the lack of the -0 option.)

And some of these arguments are NUTS. I've read the description for -I four times and still haven't figured out what they're trying to accomplish exactly: the man page implies it's replacing strings in the existing command line options, but it's _also_ changing the input parsing so it takes whole lines, except it strips leading spaces, except when the leading space is quoted (so it's still stripping quotes even though non-leading whitespace isn't significant...?), and turns trailing space into a magical line concatenation trigger (presumably still one argument). I _think_ that each occurrence of the string to be replaced (a literal, not a regex match) gets replaced with the SAME input, not with consecutive lines thereof. Then there are the weird limits, needing to support "at least five arguments" each argument can only grow to a certain size (different from how big normal arguments can be?)... plus it forces on -x.

Yeah, I don't care what the standard says, I think I'm skipping that one unless I find something that actually uses it...

Meanwhile, busybox has -r and uses -e instead of -E for the EOF string. Ok...

And then there's the question "how long can a command line be". According to the standard, you have to support at least 2k total command line length, which is why ARGS_MAX in limits.h is 2k. But according to the "man 5 execve" page, Linux guarantees a minimum of 32 pages (at 4k page size that's 128k), and current versions allow up to 1/4 the stack size (which is one of those ulimit things you query at runtime with sysconf(_SC_ARG_MAX)). I'm not sure if that limit is "command line and environment variables combined" or what, current xargs is allowing exactly 131072 bytes of command line (environment variables aren't in here I guess, neither is the pointer array to store the char pointers), so I guess that's the expected behavior.

The slightly fiddlier question is "how long can the char * array passed to exec() be", and a quick experiment (python -c "print 'o '*(131066/2)" | xargs echo) says "as long as it needs to be", I.E. 65536 entries for 131072 bytes of single character strings. (Because argv[0] is "echo" which is six bytes with the null terminator, 131072-6=131066, that's why.)

I kinda want to just use toybuf for this array, but on a 64 bit platform that limits it to 512 entries. Yeah, I can provide a default value to -n, but some things need a single invocation such as the toybox build passing a bunch of *.c files to a single invocation of gcc so it can compile and link in one go. (This is why -x exists, fail if you can't fit it all in one command line.) In the above pathological test case, the storage for this char * array is actually _bigger_ than the command line data (even in 32 bit: 4*65536=262144, at 64 bits we're talking half a meg for the pointer array), so I don't wanna blindly malloc() the maximum possible on small systems, although the size of this array (which essentially _also_ has to be initialized in the resulting process's environment space) is apparently not causing a problem for Linux...

Here are a few fun tests that _do_ cause a problem for xubuntu 10.04's stock xargs implementation, producing error messages I didn't know it had:

echo "'hello world" | xargs echo
python  -c "print 'o'*131072" | xargs echo
echo | xargs -s 1 echo

In any case, the version I'm working on for toybox is a lot more complicated than the one I did for busybox. Possibly it needs an extra config option to chop out... most of it, really.

I should just dig up a clip of the Simpsons' Merry Poppins pardy episode's "do a half-assed job" song, bang out the _simple_ implementation and wait for somebody to complain. But I also want to do it _right_...

December 16, 2011

One of the co-founders of Slashdot linked to an interesting article titled GPL, copyleft use declining faster than ever, to which the obvious answer is "well _duh_". Of course it is. And as he surmises, GPLv3 killed it.

Six years ago I wouldn't have seriously considered working on BSD-licensed code without being paid to do so. Last last month I changed the license of my Toybox to 2-clause BSD, and now I'm actively working to obsolete BusyBox, a project I devoted several years to and used to maintain.

The only part of the article I disagree with is the author's attempt to be "fair and balanced" towards the end, where he says "It is not fair, nor accurate, to lay all of the blame on the FSF's or GPLv3's feet" after spending half the article doing just that, and linking to other articles that do the same thing. It is entirely fair. This is 100% the fault of GPLv3. Android's "no GPL in userspace" policy can be traced to that, my leaving BusyBox was a direct result of GPLv3 license trolling, even Linus Torvalds himself has distanced himself from it, licensing his sparse tool under a different license because of unhappiness with Richard Stallman and a desire to explore alternatives. (Yes that's why he did it: I asked.) Unhappiness with gcc going GPLv3 is why LLVM, PCC, and Open64 are competing to replace it.

The FSF is its own worst enemy, and it has comprehensively fragmented and FUDded its greatest achievement. The graph in the above article estimates that GPL usage in open soruce software will fall below 50% next year, and anybody familiar with network effects can expect it to retreat to a niche pretty quickly after that.

The silver lining in all this is it reduces the FSF to complete irrelevance, allowing the open source developers to get on with their work without distraction from religious zealots.

December 14, 2011

The key to understanding the current economy is that people are hoarding cash. Everything else follows from that. Low demand? Nobody wants to part with cash. Low inflation? Nobody's bidding prices up.

What they _are_ bidding up is the price of US treasury bonds at auction, which smart people see as the same thing as cash, so the yields go way down since people are willing to buy them now for almost as much as they're worth at maturity. (In fact even our tiny amount of inflation is still higher than the current US treasury bond yield, but any yield is better than the zero yield cash gives you.) Stupid people see gold as a cash equivalent (even though it isn't, it's a commodity like beanie babies and tickle-me-elmos that could crash in value at any time if too many people tried to sell at once), and have thus bid gold prices way up. (In both cases the prices got bid up, but driving the price of treasury bills up drives the profits you get from holding them down, and interest rate "yields" are what people keep track of there since it's generally considered the point of owning them.)

Cash is not the same as wealth: you can be "cash poor but house rich", having no money in the bank but having a giant house worth millions (which you could sell or get a home equity loan against if you needed to). Most rich people don't actually have a lot of cash lying around, they own large chunks of profitable companies. (This used to be shares of publicly traded companies, but these days things like Wrigley's Gum have been completely bought up and taken off the market, owned outright by some billionare somewhere and no longer subjected to public scrutiny or SEC reporting requirements, thanks to the relentless efforts of the Republicans to corner the market on wealth and power and re-establish the monarchy, or some such.)

So cash does not equal wealth, but wealth can be converted to cash. Cash provides "liqudity" allowing you to convert one type of asset into another: the purpose of cash is to facilitate the exchange of goods and services, and normally it's a TEMPORARY state for money to be in. Insufficient liquidity sucks: being "cash poor but house rich" above can cause you to loose the hosue to property taxes you can't pay, termites you can't afford to spray for, a leaking roof you can't fix. Businesses with unsold inventory worth millions may not have enough cash to pay their own rent, employee salaries, to keep the lights on, or advertise about how they've got all this great inventory they'd like to sell cheap...

Money is an economic lubricant that facilitates the exchange of goods and services. That exchange is what drives the economy, people working and people receiving the results. The money is just there as a transmitter, to complete the circuit. By itself money is worthless, as demonstrated by confederate money after the civil war.

Hoarding money prevents it from serving its purpose in the economy: money has to circulate in order to do its job. Employment is what generates wealth: high unemployment means skilled workers are sitting idle, their capacity wasted. Obviously, this causes the whole economy to suffer, not that rich people care if it doesn't affect them personally, which they're very good at avoiding.

The federal reserve attempted to combat cash hoarding by essentially printing more money. Since 2008 they've literally TRIPLED the money supply in the united states, to no effect, because the new stuff just got hoarded too. The reason this failed is the federal reserve isn't allowed to give out money to people who don't already have it, so it couldn't drop the new cash from helicopters (which might have gotten to poor people who would have no choice but to spend it on goods and services they actually need from roof repairs to dental work, and thus the money would have circulated). Instead the fed loaned the new cash to banks (at just about zero interest rates), and this let rich people and corporations transfer more of their existing assets into cash, meanwhile reducing inventories, closing businesses, and laying off workers to avoid parting with any of that cash.

In theory hoarding cash is stupid because of the opportunity cost, but right now people don't see any opportunity. Nobody's buying anything, so hiring people to provide goods and services that won't sell is a waste of money. It's a self-fulfilling prophecy: nobody's going to buy until everybody _else_ starts buying, which is classic depression economics.

The other reason hoarding cash is stupid is because cash is a horrible investment, it constantly looses money to inflation. Even the treasury bills are yielding less than inflation, you just lose money more _slowly_ holding those. But investors are so incredibly risk-averse right now they're ok with that: the guarantee of losing a small amount of money is preferable to the risk of losing a large amount of money.

The obvious fix to this is to crank up inflation, so that the cash investors are sitting on loses value FASTER. If it gets painful enough to hold on to cash, they'll start spending it just to get rid of it before it can depreciate any further. This has an additional benefit that one big reason average people off the street aren't spending is they're hip-deep in debt (the mortgage crisis, credit card bills, etc) and every spare dime goes to paying down those debts. Inflation would erode the cost of that debt.

But inflation would also erode the assets of rich people. In the short run higher inflation is good for the economy as a whole (gets money circulating again), and good for all the people with more debts than assets (shrinking the cost of their debts), but the Republicans will never allow it because they're a wholly owned subsidiary of the 1%. (Sadly, that's not hyperbole.)

So demand for all sorts of goods are down because people who have money aren't spending it. Everybody expects the economy to either stay bad or get worse, so they're looking for the safest places to park their money, and the safest place of all is cash. (After Enron and Worldcom and Bernie Madoff and Lehman Brothers investing in stocks seems riskier than ever, so they're parking money in things like gold and treasuries (both of which have been bid through the roof) or just keeping it in cash. Inflation is insanely low because nobody's bidding prices up, instead everybody's sitting on so much unsold inventory they're cutting production capacity, putting people out of work and thus reducing demand futher. (Henry Ford intentionally paid his Model T assembly line employees enough they could afford to buy a Model T.)

The last time we had a demand-limited economy (the great depression), fixing it involved shouting down Republican objections too. Unfortunately, Obama turns out to be a republican...

Kinda sad, that.

December 7, 2011

Today's fun little bug, I hit this and wasted most of an hour going "huh?" before figuring it out.

I tarred up a source directory from one machine to another, did a "make clean", and it wouldn't rebuild. I needed to do a "make distclean", because the .depend crap had cached absolute paths from the old machine and didn't get sanely regenerated by make _or_ by make clean.

The greybeards in the audience are going "of course". And _I'm_ going MAKE IS CRAP. Not mixing declarative and imperative code so order of operations is significant but you have no direct control overit, not the magic whitespace where "tab vs space" is significant: the fact it CAN'T PERFORM THE MAIN OPERATION IT WAS DESIGNED TO DO.

I've said it before and I'll say it again: make needs to die. The whole POINT of make was to determine these kind of dependencies, and the fact it can't DO it is why people keep layering this brittle .depend crap on top of it, _and_ calling it recursively, and then most developers just do "make clean" between each build ANYWAY because they've encountered subtle bugs where make didn't rebuild something it needed to and they've got an SMP gigahertz 64 bit machine now and make was first implemented on a 16-bit PDP 11.

And even then you get this sort of bug, where asking for a "full rebuild except configuration" doesn't work, partly because make hasn't got the concept of configuration info. Really, "clean" and "install" targets are just convention. This is why calling "make distclean" in qemu has to call ./configure if you've done a git pull since the last build. This is why u-boot's distclean tries to call the cross compiler a half-dozen times. (I tend to notice if I run distclean without the $PATH adjusted yet.)

Make is as bad as CVS was, and we need a git-equivalent for building. (Well, a mercurial equivalent: git's UI is utterly horrible.)

December 6, 2011

Yesterday I said bureaucrats are needed to run billion dollar businesses. Why can't you just have a bunch of normal non-bureaucratic employees?

The real question is how do you make a company _modular_? The point of the diversified conglomerate is to have multiple independent business units. Going back to mid-90's General Electric, allowing the Vice President of Light Bulbs to reach over and make programming decisions for your broadcast network is unlikely to end well, but if that guy's a Vice President and his favorite show is run by much lower-ranking manager at the company, why CAN'T he do that?

This is why the bureaucrat was invented. Their job is to say "no", and they're good at it. They enforce procedures, make people go through channels, act as a "rigid backbone" for the company, and above all keep the modular business units separate, even allowing them to be unplugged from this company and plugged into another company when they're bought and sold.

A non-modular company must have a core business. They can't be in wildly divergent business lines at the same time without tearing themselves apart. Of course, it doesn't stop them from trying, especially companies that get too big too fast for anyone to buy them, and are thus forced to diversify themselves instead of being acquired and plugged into an existing structure.

And this gets us into the first layer of what's wrong with google. Their core business is search, a market they essentially saturated years ago; at this point they're fighting to RETAIN market share. Their attempts at growth have, to an extent, made them a worse search engine: the google homepage is far more cluttered than it used to be, and their search pages keep growing strange pop-ups, slow-loading voice recognition plugins, and things like "instant search" that people have to search how to disable if they don't want it. Once upon a time "view source" on google.com provided a simple clean and readable page, now it's a giant unwordwrapped blob of illegible javascript that may or may not work on a given browser version.

They keep trying to expand into new areas, but those areas get bent through the lens of their core business. Google Maps searches the world, gmail searches your email, reader searches your RSS feeds, plus searches your friends. And without any separation between business units, new features like "buzz" and "wave" and "orkut" offend existing users who ACTIVELY DON'T WANT THEM but can't easily opt out.

There's far more to it than that, of course...

December 5, 2011

Still talking about where bureaucrats come from, and why. Let's go back to the end of stage 2, with a company's core business at market saturation.

Being king of the hill is really lucrative. They may not know what to DO with the profits, but clearly they want to stay on top of the mountain, and keep doing what they've been doing, for as long as possible. If it's all downhill from here, you want to STAY WHERE YOU ARE. You want to document what you were doing when you peaked, in as much detail as possible, so you remember what it is and can keep doing it, and can train new people to do it as well as the current crowd.

But if you can't get any better, how long can you keep doing the same thing over and over before you stagnate? The creative types who founded the company are long gone (this place is boring, there's nothing new for them to do), and the employees do the same thing every day until they stop paying attention. The company is in danger of going stale, the dancers can no longer hear the music, and you can't necessarily recapture what you've got if you lose it because nobody's left who can smell greatness. They all left to chase the next shiny thing, the people who remain are in it for the money...

So you reduce their job to a series of explicit, documented procedures, and make sure they follow them exactly. They'll never get any better, but you can try to stop them from getting _worse_ (and anyway, you've already peaked). If instinct won't keep the quality up, checklists will at least prevent it from falling below a certain point.

Another problem with being big is that tiny risks grow with the size of the company. The problem with "billions and billions served" is that million to one chances happen to you every day. No matter how obscure the problem, it will happen, probably more than once. And the deeper your pockets are, the jucier a target they make for lawsuits. Again: the way to fight back is to work safety into your procedures, standardize your process and get everyone to follow it exactly, the checklist documents your fiduciary responsibility, and the more you reduce variation in the product line so every customer gets exactly the same treatment, the less the exceptions can claim something unexpected happened to them when they sue.

Turning the company into a giant smoothly running machine raises another issue: factories that burn down can be replaced, lawsuits can be settled (and bad PR overcome with advertising campaigns), but what do you do if the only guy who really understands a $5 billion product line suddenly comes down with meningitis? The hobbyist stage was all about unique individuals, the employee stage was about training and experience making a vaulable employee, but conglomerates want cogs for the machine. If a monkey can't follow the procedures, the problem is with the _procedures_, not with the monkey.

This is a complete reversal from the hobbyist starting point, the goals went from "change" to "growth" to "stability". The bureaucrat pursues safety, preventing change and reducing risk, and cannot abide anything unpredictable.

This is why hobbyists hate bureaucrats, and bureaucrats hate hobbyists: the empty suit vs the loose cannon. One side knows it could be a hundred times as productive if this moron would get out of the way, to the point where the time wasted in planning and coordination meetings is more time than it would take them to do the task all by themselves. The other side wants nothing to do with an uncontrollable, unpredictable source of repeated failure that has no idea how long anything will take (or even whether it can really be done until they've done it), who will never understand the machine is more important than any individual cog, and who would endanger the jobs of thousands of people if the company were to become dependent on them and they quit, died, got depressed, went off to "find themselves", or simply couldn't deliver because they were having a bad month.

The fact that hobbyists and bureaucrats mix like oil and water and fight like cats and dogs doesn't make one superior to the other, it means one creates change and the other prevents change. Hobbyists create billion dollar businesses, but bureacrats are needed to run them.

December 4, 2011

Continuing my summary/update of the old three waves series I wrote for The Motley Fool over a decade ago.

I glossed quickly over hobbyists (because I am one, as are most people likely to read my blog) and employees (because they're fairly well understood in the woder world). There's a lot more to say about both topics, but a blog isn't it, and in theory I'm just providing enough background to explain why Google is screwed up.

But I need to spend a bit more time on the third stage, because it's not widely understood by most people I know. Explaining bureacurats to hobbyists is actually a big enough topic it'll probably need multiple blog entries. (Really there's a book in this, I just haven't had time/energy to write it.)

Bureaucrats: stability.

Just as hobbyists want to change the world and employees want to grow their paycheck, bureacrats are about preventing change and reducing risk. Why do you need bureaucrats? Initially you don't, but as the company grows scalability issue emerge, and when companies get really big the scalability issues come to dominate. Let's walk through this growth.

The switch from hobbyist to employee stages came about when a company grows beyond what a single person can personally handle. But it also happened because the nature of the company changed when it found a core business it could train anybody to do. The core business offered some profitable, stable, expandable thing it could hire more and more people to work on, which naturally grew to be more important to all those people than the founder's crazy idea du jour. And thus the nature of the company changed: yes the founder still owns lots of the company (although not necessarily a controlling interest if they brought in other investors who bought chunks of the business for cash), but the employees' paychecks come from the core business, not from the founder's new ideas. If the founder can't change with the company, they get "kicked upstairs", out of the way of all these busy employees filling orders.

In the second stage, the company's core business provides them with an identity, whether it's "11 secret herbs and spices" or "windows everywhere", all the employees are on the same team and know what they're supposed to do. A second wave company _is_ its core business, it knows deep in its bones "we do X", and their goal is to do more and more of X... until they hit market saturation.

Market saturation triggers a second great transition. The core business cannot grow anymore because they've filled up their world, so the employee mantra of "growth" stops working. Now what? They're still hugely profitable (if they're doing it right), but reinvesting those profits in the core business is now a waste of money because there's simply no more growth to be had. Beyond a certain point efforts to squeeze blood from a stone do more harm than good: do they increase income by pricing the product out of the market, or cut costs by damaging the product's quality? Whatever they try, they can't climb past the peak of the mountain.

The way to grow is to diversify: milk the cash cow of the original core business and invest those profits elsewhere. But to expand horizontally the company must give up their identity, which is tied around their original core business. They must stop being "the company that makes X", and become "a company that makes money".

Diversifying yourself is INCREDIBLY HARD. Eighty percent of all companies that try to convert from stage one to stage two (I.E. reorganize an experimental startup around a stable and profitable new core business) fail to do so. Converting from stage two to stage 3 is just as hard, but luckily they don't have to go it alone. They can instead allow themselves to be bought by a existing company that's already made the transition.

Third stage companies are composed of modular business units, as much mutual fund as corporation. Over the years General Electric made light bulbs, washing machines, and jet engines, had a financial arm that sold home mortgages, and they owned the NBC television network: ALL AT THE SAME TIME. No one business was "what GE did", and any of these individual businesses could be bought and sold without affecting the rest of the company.

Conglomerates grow by acquisition: each existing business produces a profit, it's collected together in a big pile at company HQ, and every once in a while they use it to buy a new business unit. Those business units are either modules from another existing congolmerate, or mature second stage companies approaching market saturation which haven't torn themselves apart trying to diversify away from What They Do. Mature second stage companies become acquisition targets for existing conglomerates, their core business becomes a business unit of the conglomerate, and then figuring out how to use their profits to grow the company becomes Somebody Else's Problem.

December 3, 2011

Banging on toybox, let's repeat some analysis from 2008 or so. According to more/record-commands.sh, the set of commands needed to build the i686 target is:

ar awk basename bzip2 cat cc chmod chown cmp cp cut date dd diff dirname echo egrep env expr find gcc grep gzip head hostname id install ld ln ls make mkdir mksquashfs mktemp mv nm od patch readlink rm rmdir sed sh sha1sum sleep sort tail tar touch tr true uname uniq wc which whoami xargs yes

This list isn't quite complete because gcc calls as and ld without going through the wrapper, #!/bin/bash doesn't use $PATH, ./download.sh didn't actually download the packages (just confirmed the cached versions) so wget isn't in there... but it's a good start.

Of that list, ar, cc, gcc, make, and nm are build tools. Toybox already provides reasonable versions of cat, cp, echo, patch, rmdir, sha1sum, sleep, sort, true, uname, wc, which, and yes.

That leaves:

awk basename bzip2 chmod chown cmp cut date dd diff dirname egrep env expr find grep gzip head hostname id install ld ln ls mkdir mksquashfs mktemp mv od patch readlink rm sed sh tail tar touch tr uniq wget whoami xargs

Actually a fairly manageable list. I need to check the Linux From Scratch build to see what that uses, but I should be able to get toybox doing all of the above in 2012. At which point, I could replace busybox in host-tools.sh and simple-root-filesystem.sh, and make a busybox.hdc to build it natively intead...

December 2, 2011

Continuing from yesterday, we have:

Employees: growth

Employees get trained by somebody who already knows how to do whatever it is, and then they accumulate experience under the supervision of some kind of boss. They don't make it up as they go along, they have to know what their job _is_.

Employees work in order to get paid. Ideally, they work 9-5 and then go home to their "real lives" (unless they're paid overtime). They want to do their job well enough to get a steady paycheck, earn a bonus, earn a raise, earn time and a half when they work overtime, earn a promotion, and so on: that's their measure of "success". But they generally don't have ideas in the shower because they're not thinking about work in the shower: it's just their day job, not who they _are_. It may be their career, but it's something they look forward to retiring from someday. (If hobbyists ever retire it's so they can spend _more_ time doing the fun parts of their job without having to worry about paying for it.)

Employees get better with training and experience, but the you could always train somebody else to do that job, or hire somebody with equivalent experience. True hobbyists are born, not made, but you can always hire more employees.

Transition from stage 1 to stage 2.

When Fred Brooks suggested constructing "surgical teams" around his most productive programmers in the Mythical Man-Month, he was also describing the normal organization of a healthy start-up as it transitions from hobbyist-driven stage 1 to an employee-driven stage 2. The rest of the team hasn't got the fire and zeal of the hobbyists, but you usually can't _get_ more hobbyists, and they're unpredictable anyway. So you surround a hobbyist with assistants who take the mundane load off (and ideally stabilize things a bit and keep the hobbyist on-topic by reminding them of the job they're SUPPOSED to be doing, which people depend on now). The hobbyist can delegate the "uninteresting" bits (stuff they understand well enough to explain how to do to somebody else) to these assistants, so they can focus on the "interesting" bits (stuff they don't know how to do yet).

Employees emerge once there's something for them to do: I.E. when the company has a core business the employees can work to grow doing "more of the same". A company without an existing profitable core business is still a start-up. A company _with_ an established core business that can be grown by doing more of the same is a second wave company that tends to say "that's not what we do" when its founder(s) come up with new tangents, until the founders get bored and leave.

Hobbyists tend to stop being interested in money at around the $100 million mark (which means if you invest it at 5% you get 50k/week and can buy a new house every month for the rest of your life: further money isn't going to make any difference to your standard of living). That's about when Jim Manzi left Lotus, Paul Allen left Microsoft, Steve Wozniak left Apple, etc. A natural, maturing company, becomes bored to hobbyists when the core business they created takes over, and the company grows to hire enough employees to form its own corporate culture that isn't interested in "look at this, it's so cool!"

December 1, 2011

Heh. Anybody who's read my old three waves series (about the relationship between hobbyists, employees, and bureaucrats) will probably recognize google's attempts to convert itself to stage three. I wonder if Google can navigate this transition without tearing itself apart? (The question isn't "will this be hard" but "how much damage will they take and is it survivable"?)

Alas, there's buckets of backstory to explain that. (They intentionally staffed themselves with hobbyists, turning the company into a giant think tank that continually spins off new tangents, but they established a core business years ago and hit market saturation in that business: that's the triggers for stage 2 and stage 3 right there. The _real_ problem is stage 3 and stage 1 are incompatible, they've worked around this by organizing the whole company as a giant think tank, but that just buys time without really fixing anything...)

I need to write up a better three waves thing than those ten year old articles. Alas, there's a bunch of material that's hard to organize concisely. Let's see...

Hobbyist: change.

The "hobbyist" stage is where somebody starts a business because they want to spend more time doing something they want to do anyway. A true hobbyist would do their thing for free if they didn't need to make a living, trying to get it to pay the bills is just a way to be able to afford to spend more time doing it. This could be anything from mail-order perfumes to Notch's business built around minecraft.

Hobbyists are the ones who invent new stuff, almost exclusively. A hobbyist "in the zone" can be dozens, hundreds, or in extreme cases thousands of times more productive than a trained, experienced, but otherwise average professional. Yes, a thousand times: a thousand professional physicists picked at random probably wouldn't fundamentally change physics the way Albert Einstein did as a hobby while working as a patent clerk. The television show "Gordon Ramsay's Kitchen Nightmares" is about somebody who lives and breathes cooking fixing "professional" establishments more or less off the top of his head, because he sees what they need to do. Here's an article on Fabrice Bellard. The list goes on. Finding hobbyists who are merely a HUNDRED times as effective as an average "professional" is a lot easier; they're everywhere if you know where to look. (Fred Brooks, in The Mythical Man-Month, found a factor of 30 difference between his average programmers and best programmers in 1960's IBM: a very hobbyist-unfriendly bureaucratic environment.)

But more productive doing _what_ is completely unpredictable: Hobbyists regularly perform unexpected feats of grand but dubious utility: "I wouldn't have thought it was possible to deep-fry kool-aid, but there it is. I'm still trying to figure out if I should sing your praises or set you on fire." They set out to do one thing, and accomplish three others instead.

Hobbyists are artists as much as engineers. They get writer's block. They have portfolio pieces showing cool things they've done, which are generally more interesting than their resume listing experience and training. Hobbyists go off on tangents ALL THE TIME, that's how they come up with new stuff nobody's ever done before, with sturgeon's law in full force: 90% of that new stuff is crap, but a tiny fraction of it is spectacular.

Hobbyists are attracted to novelty and challege, the lure of the unknown, and not only have no fear of failure but _expect_ repeated failure along the way to any truly interesting achievement: writers have "trunk novels" (and mounds of fanfic), knitters have "practice socks" (and endless scarves). The above-mentioned Fabrice Bellard not only created QEMU but also this and this and a dozen other things that were cool but haven't (yet) changed the world in quite the same way.

A hobbyist generally responds to "What's it good for?" with "Let's find out!" Their todo lists runneth over because their normal working style spins off endless tangents, as they recognize and chase each new shiny thing that passes their way.

November 29, 2011

This article does a good job explaining yet another facet of "Why The Euro Was A Bad Idea And Is Gonna Go Bye-Bye", but it assumes you know what a "lender of last resort" is.

If you've ever seen "Mary Poppins" or "It's a Wonderful Life" you know what a bank run is. Everybody tries to withdraw their money at the same time because they think the bank's going to run out of money, and thus they won't get theirs back unless they take it out before everybody else does... Self-fulfilling prophecy.

As George Bailey explained, "Your money isn't here, it's loaned out as mortgages" an the bank needs to wait for it to be paid back. So it can pay interest on your deposits, but can't necessarily give everybody their money in cash at the same time because it can't just cash in those mortgages. (Or at least it couldn't before the invention of the mortgage bond, which didn't actually improve matters, but let's not go there just now.)

This isn't just a problem for banks: it's a problem for stock markets, government bond auctions, real estate... pretty much anything financial. If temporary cash flow problems can make the system _collapse_, the risk level is inherently higher, so lenders demand a higher return, which makes using credit cards for monthly bills (or the government/corporate equivalent) really suck. Even when you regularly pay them off, you now have an annual fee, the rate's 30 percent, and you still get charged interest even if you pay your bill on time because lending to you is considered RISKY.

The reason you see bank runs in OLD movies is it used to be a big problem, but isn't so much any more since we invented the "lender of last resort", an organization with a theoretically infinite amount of money that can loan a bank the cash to cover bank runs. Where do you get an infinite amount of money? From the people who print money, I.E. whoever issues the curency. You give them the job of guaranteeing that whatever happens, the system won't run out of money. (You may have INFLATION in a crisis, but that's not the same as winding up with NOTHING. And if you're just loaning out temporary money that gets paid back, that's not the same as printing money to pay the bills, it doesn't _persist_ in the economy after the crisis.)

In the US, the lender of last resort is the Federal Reserve. (Yeah, that thing idiots like Ron Paul keep trying to eliminate along with Social Security and Medicare, because it didn't exist in Ebeneezer Scrooge's day and is thus "newfangled".) It's an organization with a theoretically unlimited amount of money, that can cough up cash if the system really needs it.If you don't understand how banking works, you'd think the obvious lender of last resort would be the US mint. But coins and small green pieces of paper aren't how most financial transactions are done these days, it's all checks and credit cards and direct deposit and other electronic funds transfers, meaning you exchange numbers in a computer somewhere. Even when you have a bank run, investors just move their money TO ANOTHER BANK. (Think about it, "we're out of twenties" does not mean "we're out of money", it means we need to order more twenties, come back tomorrow. Yeah yeah, "newfangled". As of 1913. We've been banking by check for a while now.)

Adding more dollars to a bank account doesn't even require printing money, it requires inserting electronic transactions into a network. That bank network's all double entry bookkeeping, so the money has to come FROM somewhere and go TO somewhere or the books don't balance. The exception is the Federal Reserve, which is the entity that issues dollars and thus has the authority to just create new electronic money with a few keystrokes, and loan it out to things like banks via electronic funds transfer. The "chairman" in charge of the federal reserve (used to be Alan Greenspan, currently Ben Bernanke) tend to be important guys because of this; they regulate the money supply, prevent bank runs, monitor inflation and unemployment, and so on. (In theory, if they put out too much money inflation goes up, if they put out too little, unemployment goes up. Alas as Paul Volcker (the one before Greenspan) rediscovered in the 1970's, it's not quite that simple. I say rediscovered because FDR's guys know this intimately during the great depression, but all the economists thought that was a one-off and forgot everything they'd learned once we were out of it. Sigh.)

The problem Europe has here is that the Euro's equivalent of the federal reserve is the European Cenral Bank (ECB), and the ECB is essentially the German central bank: run by germans, controlled by the german government, only cares about germany. And it refuses to act as a lender of last resort, because Germany had a bad bout of inflation in the 1930's that they think brought Hitler to power. (He actually came to power during the massive austerity that preceeded the inflation, as with our 1970's "stagflation" they tried slamming on the brakes and slamming on the gas and never found a workable balance that let them STEER, except in germany they started out bombed flat from World War I and during the massive "nobody has any money" period fascism apparently seemed like a good idea if it promised to put food on the table. Yeah, Godwin, I know, but this is what ACTUALLY HAPPENED.)

So the European Central Bank thinks that its ONLY job is keeping inflation under control, and unemployment can go hang. It's also acting in the interests of Germany, which isn't what Greece or Italy or Spain or Ireland need. In the US you can just take a bus from Detroit to San Francisco so a labor surplus and a labor shortage tend to cancel out, but in Europe they don't all speak the same language so unemployed in Athens can't just move to Berlin and hit the ground running...

Anyway, that should be more than enough background to read the article.

November 27, 2011

I meant to get lots of toybox commands done this weekend, or at least triage of my various notes so I could put together a plan of attack. Instead I spent all weekend ~~playing skyrim~~ refamiliarizing myself with the option parsing infrastructure.

I'm proud of the toybox option parsing infrastructure, but it has some bugs I _remember_fixing_, apparently in a version that didn't get checked in? (Or there were regressions since?) Weird. Oh well, that's what two years of mothballing gets you. Bit rot, apparently. Fixing it now...

On a side note, a few identifiers that are not keywords in C: and, catch, class, friend, namespace, new, or, private, protected, public, template, this, throw, try, typeid, using, virtual, and xor. Remember, C++ is not C, and your C code is under no obligation to compile in a C++ context. :)

November 26, 2011

I'm always confused by people who idolze capitalism as some perfect system. Ok, confused is the wrong word: I'm amazed how clueless some morons can be.

Capitalism is full of people trying to "corner the market". It's how you get rich: not by supplying demand but by preventing anybody ELSE from undercutting your prices. Variantions on this theme include "regulatory capture" which is basically paying off the police, and of course most of the lobbying going on in washington today.

Here's a blatant example.

There's buckets of these. Insurance regulations ensure that if you offer a policy you can pay it off, credit default swaps were invented to wriggle out of that regulation, and then organizations like AIG made bets they couldn't pay. Gamblers HATE being led away from the table the same way drunks hate being refused another drink.

Another common failure of capitalism is short-term thinking, everything from pyramid schemes through pump-and-dump, leveraged buyouts that loot the company's pension fund...

Charles Dickens wrote "Are there no workhouses, no prisons?" over a century ago. The pure capitalist idea that everything should have a dollar value and then let the market sort out what to do with it implies that you should be able to buy organs for transplant, so once the black market gets going fatal muggings have a built-in six figure profit...

The point is, capitalism has downsides. Big ones. We haven't even gotten to the complete inability to conserve free resources until they become scarce enough you _must_ buy them. (When the free resources are clean air and water, this is a problem for those who can't afford to buy them.)

And of course cornering the market means attacking organizations that _provide_ things you'd like to raise the price of. Public education, for example...

Capitalism is corrosive to democracy, by the way. Democracy is "one person one vote", capitalism is "one dollar one vote". One is built on "all men are created equal", the other keeps lobbying to eliminate estate taxes.

November 25, 2011

Thanksgiving gave me a chance to work on the proposed options parsing upgrades, but they turned out to be too fiddly to be worth it.

I wanted to leverage the existing option parsing infrastructure so I could grab non-option arguments cleanly. But the "left to right" vs "right to left" thing doesn't fit cleanly with the existing stuff, not just because of moving the arguments but moving the bitfield of seen entries. (If zero is a valid numeric entry, how else do you tell whether or not you got an argument, except that its' bit is set?)

It turned into one of those "no clear superior choice" things, where I _could_ make it work as part of the generic option parsing, but the added overhead is required of every command so when you select just one or two commands to build it's better to have it in the individual commands, but when you build everything the overhead might get amortized enough to be worth it, except that there are only maybe half a dozen commands total that really care about this either way...

As I said: fiddly. It's easy to go "well that didn't work" and move on, but some messy problems feel like there's a clever solution I'm not SEEING, so I get stuck on it...

November 21, 2011

More history showing how the great depression was caused by the same idiocy happening today. Some of it boils down to classic logical fallacies: "what's applies to one part must apply to the whole", and the related "things that work in one context don't always work in another". (People who spent 60 years fiddling with interest rates to steer the economy get confused when they won't go below zero and the steering stops working for the same reason you can't push on a rope.)

Most of the economic hiccups over the last 60 years or so were supply issues. For example the big 1970's oil crisis was because the Oil Producing and Exporting Countries (OPEC) formed a cartel to limit the supply of oil and thus drive the price up. (More recently, between peak production happening in 2005 and China becoming the #2 oil importer after the US and thus sucking up much of the world's supply, oil producers have been unable to keep up with demand, despite fracking, deep offshore drilling (anybody remember the BP disaster in the gulf?), throwing money at canadian oil shale, etc. Most of that just offsets the productivity declines from the rapidly depleting Saudi and Kuwaiti fields.)

Rising food prices are a supply issue as well: China's importing more, biofuels compete for acreage, etc. We're FAMILIAR with supply side constraints on the economy.

But the Great Depression wasn't a supply side problem, it was a _demand_ side problem, leading to a persistent liquidity crisis. There was plenty of stuff for sale, but nobody could afford to buy it.

A liquidity crisis happens when everybody stops buying at once, so all the sellers hoard their cash and you wind up with a mexican standoff where everybody is sitting on unsold inventory waiting for everybody else to go first. Giving the sellers more money (through bailouts) accomplishes nothing because that's not the _problem_: they already have money, just not _income_. If they can't replace what they'd spend, they'll just hoard the new stuff too. The shortage isn't capital (inventory, employees, equipment), it's _customers_.

The cause of a liquidity crisis is usually some kind of widespread economic bubble, which pops and leaves loads of average people with mountains of debt. Bubbles happen when people buy some rapidly appreciating asset (tulip bulbs, beanie babies, etc.) for no other reason than that the price keeps going up so it must be a good investment. You get a giant pyramid scheme: at some point enough people stop buying and start selling (to cash in on their profits) that the price starts going down... meaning it's not a good investment anymore. The bubble "pops" and then everybody sells and the price crashes.

In 1929 the trigger for The Great Depression was a stock market crash, during the roaring twenties the stock market kept going up, and everybody bought stock "on margin" (taking out loans against their stock to buy more stock) to magnify their gains... which was fine until the stock price crashed and dropped below the amount people had borrowed, so selling _all_ your stock still left you with debt.

In 2007 it was Bush II's mortgage crisis: housing prices kept going up... until they stopped. There were lots of other financial shenanigans going on -- it was the Bush Administration, it _started_ with Enron and Worldcom and such imploding and STILL CONSIDERED DEREGULATION OF EVERYTHING A GOOD IDEA. The invasion of a WMD-less Iraq because some other country entirely sheltered a terrorist who attacked us wasn't their only horrendously bad idea; pretty much everything that idiot did was mind-bogglingly stupid.

We've had bubbles before, of course. The dot-com crash was a big one. You may remember the Savings and Loan Crisis of 1990, that was Bush I's mortgage crisis. (Bush II was a clueless moron who surrounded himself with his father's advisors, who stopped having original ideas back when they all worked for Richard Nixon, and just kept repeating themselves ever since.)

The thing was, those smaller bubbles could be handled by lowering interest rates, thus encouraging spending and expanding the money supply. (I have a dollar in the bank, somebody uses their credit card to buy a pen: the retailer has the dollar, but I still have a bank account with a dollar in it, and the guy with the credit card has a pen worth a dollar. So my dollar now exists multiple times, that's the magnifying power of "leverage".)

It's the difference between being short of breath and having an asthma attack, between coughing and choking, between a painful electric shock and one that stops your heart. The severity of an issue can change the TYPE of issue you have: "if his heart was beating he'd be fine".

Unfortunately, if you crash the economy badly enough, you'd have to lower interest rates BELOW ZERO to compensate, which can't be done. You wind up with interest rates stuck near zero, and staying there a long time, not able to fix the problem. This is the classic indication of the liquidity trap, which John Maynard Keynes figured out and described in detail a hundred years ago.

People who spent 60 years fiddling with interest rates get confused when they hit zero and stop working because "you can't push on a rope".

The thing is, people who are trying to pay down debts cut their spending. If they're not spending, they're not buying stuff, so _other_ people can't make money off of them. So if everybody tries to pay down debts at once, their INCOME goes down, so they not only can't pay down their debts but they have to cut spending MORE... If enough people stop spending all at once, the entire economy starts to spiral down a big drain, and you wind up in a depression

The fix is to inject demand into the system, so that somebody's buying and thus there's money to be made, making it worthwhile to hire people and make stuff to satisfy that demand. Without any demand, it doesn't work: if nobody's buying, even if you _do_ hire more people you just have idle workers sitting next to unsold inventory, which is why the people who HAVE money are sitting on it rather than investing it in businesses that can't even sell what they've already got.

There are only four places demand comes from in an economy:

Individuals buy stuff (goods and services).
Companies buy stuff.
Foriegners buy stuff.
The government buys stuff.

Let's go through them in order:

Individuals are swamped with debt. That's the original problem, reduced demand from your average consumer because they've got high levels of debt to pay off and can't afford to spend money, and then high levels of unemployment on top of that when the economy slows down in response.

Companies already cut expenses in response to lowered revenue, that made the problem worse. But raining money down on them with interest rates near zero (for large entities, borrowing is essentially _FREE_ right now), plus al this "tax cuts" nonsense, is a waste of time. The Fortune 500 had over a trillion dollars of cash and equivalents BEFORE we tried that, if they'd wanted to spend more they were quite capable of doing so already. They don't. They pointy-haired follow, the don't lead. They'll start expanding AFTER demand for what they're selling picks back up, not before.

More foriegners buying stuff means increasing our exports, but instead we mostly import stuff. Our balance of trade sends money to china, which they use to buy assets here. But when china spends money here they're not buying goods and services, they're buying ownership. They don't hire anybody to _do_ anything, they don't reduce piles of inventory that would need to be replenished by making new things, they're just transferring assets into their name. They buy the deed to the building and tell us to send the rent check to them from now on, that doesn't create jobs for anybody living in the building.

We could adjust the balance of trade by "devaluing the dollar", which means adjusting the international exchange rate to make imports more expensive and exports more cheap. Then people in the US would stop buying stuff from China (because it was more expensive in terms of dollars), and other countries would start buying stuff from us (because it was less espensive in terms of their currency). Unfortunately, this isn't feasible because we import so much oil: if we devalue the dollar against the currencies of foreign countries we buy oil from, the price of gas goes up, and our economy is like a heroin junkie on the stuff.

There are other reasons, the largest of which is that devaluing the dollar would SERIOUSLY piss of china, which created the massive trade imbalance by artificially depressing its _own_ currency in the first place, and which has funded our deficit spending for the past decade or so. Not only would devaluing the dollar deprive china of their trade income exporting stuff to us, but china financed much of the Bush II spending spree by buying buckets of US treasury bonds, and those are denominated in dollars. If each dollar were worth fewer yen, that would erode the value of China's investment from their point of view. If we devalue the dollar, they lose money _twice_.

That just leaves one source that can pump extra demand into the economy: the government. The fed _can't_ run out of money: if all else fails, it can print more. Yeah this could cause inflation, but that's not necessarily a bad thing right now, since inflation would erode the debts that are preventing individuals from spending money. (People who have piles of cash hate inflation. People who have piles of debt love inflation. So once again, the super-rich are lobbying against what the larger economy needs.) Plus inflation is another way to devalue the dollar, which would piss off china and drive up the price of oil, but also increase exports.

Strangely, printing money doesn't always cause inflation: since 2008 the US money supply has _tripled_ without any noticeable effect... because they basically gave it to big corporations which are hoarding it, as mentioned above.

But, the government doesn't have to print money, it can borrow it. The worse the economy gets, the more people hoarding all the money want to park it safely, and the safest place to put it is in US treasury bonds. (After all, the US is the entity that ISSUED the money in the first place, if it couldn't pay its debts the currency would rapidly become worthless anyway. The treasury bills are just as safe as dollars _by_definition_ because they come from the same place, and dollars don't pay interest. Yeah it's sucky interest like one half of one percent right now and you're losing to inflation by keeping your money there, but not losing AS MUCH as keeping it in cash which earns no interest.)

The Herbert Hoover and Barack Obama administrations both made the same mistake, the government doesn't "create demand" by giving money to corporations that won't spend it. The government creates demand by buying goods and services ITSELF. If unemployment is too high, that means lots of people are sitting around contributing nothing to the economy. The government can hire people and buy their stuff directly.

This was FDR's "new deal", where he hired millions of people to build roads and bridges (the interstate highway system), to wire up rural parts of the country with electricity and telephone service (the Tennesee Valley Authority), and so on. Those people got jobs (so they could pay down their debts and buy stuff from the rest of the economy), and the government got huge infrastructure improvements done really cheaply.

Unfortunately, Republican dogma is that the government is useless, and thus they work hard to PREVENT the government from doing anything useful because counter-examples would undermine their position. (They start with conclusion, and then try to adjust the universe to fit.)

The "keep government out of the economy" folks are like the "keep government out of medicare" folks. (I.E. idiots.) The currency the economy is based on is issued by the government. The government is where money comes from in the first place. LOOK AT THE BILLS.

The WORST thing you can do in a demand-limited liquidity crisis is reduce demand further with "austerity" measures. If the government is the only remaining source of significant demand in the economy, and you eliminate that, you get the great depression.

Here in the US we're not in a full-blown depression due to an insufficient stimulus, which wasn't enough to lift the economy out of this mess but at least cushioned the landing a bit, plus we still have a lot of federal "safety net" programs like welfare that automatically increase spending when times are bad because more people fall into the safety net. The Republicans have been screaming for spending cuts, but luckily after eight years of Bush their current insane stupidity hasn't been blindly followed. Yet.

Europe, meanwhile, is having german-imposed austerity that's tearing it apart at the seams. They're probably going to lose their currency, which isn't going to help our exports any...

Really, if you want to follow all this, read Paul Krugman's Blog.

November 20, 2011

It is the weekend, meaning I can get some real programming done. Big long post about toybox, you have been warned.

First of all, I implemented "link", 'cuz it's there.

One problem with a close reading of SUSv4 is that the standard is inconsistent. For example, chgrp has the same -HLP options as cp and a few other commands (specifying when to follow symlinks and when to either ignore them or operate on the symlink itself) but chmod doesn't. You'd think it would since chmod has a -R option, but no.

Another fun corner case is the -m flags on mkdir and mkfifo and such. This sets mode, and explicitly says it uses the same format as chmod. Except that "u-x" operates on a delta on an existing file (grab the permissions that are there, subtract the executable bit from the user tuple, write the permissions back), and the -m variants are creating new entries, so the delta is against... what, umask? It doesn't say. I tried the gnu version, and its behavior is REALLY STUPID.

landley@brillig$ mkdir walrus
landley@brillig$ ls -ld walrus
drwxr-xr-x 2 landley landley 4096 2011-11-20 12:05 walrus
landley@brillig$ rmdir walrus
landley@brillig$ mkdir -m u-x walrus
landley@brillig$ ls -ld walrus
drw-rwxrwx 2 landley landley 4096 2011-11-20 12:05 walrus

I.E. the permissions on the new directory default to 755 (I.E. my umask is 022), but when I say -m u-x (to remove the user executable bit), the permissions are 677! Where did those extra write bits come from? The umask is ignored if you do -m.

That's NOT RIGHT. The delta should be against the values _with_umask_applied_, not against 777!

I'd like to figure out how to let the compiler chop out the dead code when none of the selected commands parse permissions, but it's a state machine operating on a string and there's no way the optimizer is smart enough to figure it out on its own. It's the old space vs complexity tradeoff: amortized over "make defconfig" having the common code shared between apps is definitely the right thing, but if you build one or two apps forcing extra overhead on them is bad. Unfortunately, detailed annotations of the "if (ENABLE_BLAH)" variety add complexity to the source code and a future maintenance burden. (If you ever add another command that needs option parsing requiring it to "select PERMISSIONS_PARSING" in its kconfig blob or else it silently fails to work for non-obvious reasons... that would be a bad thing.)

*shrug* Every approach has its downsides, sometimes it's just a question of making the right tradeoffs to minimize the overall suck. (I keep thinking there should be a clever way to thread the needle so that everything just magically lines up into an elegant solution with no downsides... but that doesn't mean I can find it.)

Another fun issue is that some commands parse permissions as -m options, but chmod parses them as a positional argument. Permission parsing is fiddly enough (octal, or special letter and punctuation combos) that I want to add it to lib/args.c, but I'd need to make it so non-option arguments could reuse the parsing logic.

After thinking about it a bit (and working on "cal") it occurs to me that an option of type " " could grab a non-option argument and do the normal suffix processing. I.E. " #" could "parse as decimal integer and stick in the next global slot" thing a non-space option would do, except for the next non-option argument.

Of course looking at "cal", there's some obvious fiddliness here. What to do about optional arguments? In the case of an insufficient number of arguments, should the list be processed from the left or from the right? Hmmm... All the "[file...]" entries go left to right (and there's already loopfiles() for that).

Files that just have one optional argument (newgrp, nl, uudecode, id, tail...) don't care about order. The "tr" command seems to go left to right, as do basename, split, and uniq. But cal and uuencode go from right to left.

Dealing with a mandatory argument is just "<1 :" and then it complains if it's not there, grabs it if it is. The real advantage of auto-parsing args into globals this way is for non-string types, ala "<1 #", although stuff like chroot might get slightly simpler by using it, hard to tell...

And then there's "env", which is easy to parse by hand and really SUBTLE to get the automated infrastructure to do. It goes:

env [-i] [name=value...] [command [args...]]

And the trick here is that the generic command parsing infrastructure can't tell an assignment argument "x=y" from a non-assignment argument, but checking for "-i" has to stop at the first non-option argument, with name=value considered an option. What happens if you do:

env PATH=/bin TERM=ansi -i ls

Keeping in mind that ls can also have -i (to print inode numbers), so env can't eat an -i past the "ls". Huh. Just tried it on the gnu version and it complained -i "no such file or directory", which means I'm paying more attention to these sort of issues than the FSF did. :)

Oh well, if they got a way it it, I can ignore that. The -i must come before the first blah=blah assignment in env. That makes life easier...

On the non-option argument grabbing order issue, I can always stick a prefix on the string indicating what order to grab arguments in (left to right or right to left) when there aren't enough for the spaces. It just seems a little more subtle than I'd like. (And actually, implementation-wise combined with stopping at the first non-option argument, left to right is much easier to implement... Hmmm. Well, not if I add a loop to move 'em and zero out the other entries...)

Of course I'm running low on punctuation characters to give special meanings to in option parsing. (Letters can be flags, so punctuation gets to indicate what to do with the flags.) So far, special meanings have already been assigned to ":*#@()|^+~![]<>?&", and for permissions I'm probably going with "%". Space makes sense as "invisible ~~sandwich~~ flag", but what says "reverse direction when there aren't enough arguments"? What's left... "\\\"/{};'`$,.=-"

I guess the minus sign is the logical choice here...

(Option parsing is the kind of generic infrastructure everything else has to take for granted. That makes it hard to do, because you've got to get it RIGHT...)

November 19, 2011

Toolbox has the following *.c files, which I'm going to assume are commands:

alarm exists powerd readtty rotatefb setkey syren toolbox cat chmod chown cmp date dd df dmesg getevent getprop hd id ifconfig iftop insmod ioctl kill ln log ls lsmod mkdir mount mv netstat newfs_msdos notify printenv ps r reboot renice rm rmdir rmmod route schedtop sendevent setconsole setprop sleep smd start stop sync top umount vmstat watchprops wipe

Really, I recognize about half that. The non-SUSv4 stuff of interest is probably dmesg, newfs_msdos, reboot, insmod/lsmod/rmmod, ifconfig/route, top, netstat, sync, vmstat, and mount/umount.

November 18, 2011

I triaged the SUSv4 utility list and got the list of commands to implement. The easiest way to look at it is in terms of commands NOT to implement, at least not right now, and the reasons why.

This same standard has been ratified by several different standards bodies: Posix 2008, Open Group Base Specifications issue 7, and the Single Unix Specification version 4. It's the most recent version, approved in 2008, and yet it is chock full of obsolete crap.

SUSv4 specifies an entire obsolete source control system called "sccs" which is so ancient it predates CVS. Unless some new project has inexplicably used this in the past 30 years, these commands can all be ignored:

sccs: admin delta get prs rmdel sact sccs unget val what

SUSv4 also specifies a bunch of commands for manipulating a batch processing queue. Maybe these would be worth implementing later, but batch processing really isn't a priority for me.

batch: batch qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub

Some commands are just plain obsolete; often replaced by newer commands like gzip, scp, and vi, or because infrastructure moved out from under it (printing is generally done in postscript or similar now, C replaced fortran). The biggest judgement call here is asa, which isn't hard to implement but does an obscure action only of use to fortran programs. (Gentoo includes it, Ubuntu doesn't.)

obsolete: asa compress ed ex fort77 pr uncompress uucp uustat uux

Now we're out of the "obsolete" category into the "not toybox's problem" area. Another batch is build tools, which are great but including a compiler and linker is not part of toybox's mandate. (I keep wanting to do a qcc, I.E. a C compiler based on tcg. Someday I'd love to bolt qemu's Tiny Code Generator to either sparse or tinycc's front-end... but not this week. It's really too bad the current tinycc is a moribund windows project that's no closer to building a bootable linux kernel than it was 5 years ago. If it would just DIE already I'd restart my fork, but being overshadowed by a project that's persistently less capable isn't a fun use of my time.)

build: ar c99 cflow ctags cxref getcat iconv lex m4 make nm strings strip tsort yacc

A bunch of other commands have to be built into a shell in order to work, they can't be implemented in a separate executable. (Generally they manipulate the shell's process context, and a child process can't access its parent process's context, just its own.) So a list of shell commands get bumped until I get back to writing toysh:

shell: alias bg cd command fc fg getopts hash jobs kill read type ulimit umask unalias wait

A few more miscellaneous ones get grouped:

sysv-ipc: ipcrm ipcs

internationalization: iconv locale localedef

And that leaves me with the following general-purpose commands:

todo: at awk basename batch bc cal cat chgrp chmod chown cksum cmp comm cp crontab csplit cut date dd df diff dirname du echo env expand expr false file find fold fuser getconf grep head id join kill link ln logger logname lp ls mailx man mesg mkdir mkfifo more mv newgrp nice nl nohup od paste patch pathchk pax printf ps pwd renice rm rmdir sed sh sleep sort split stty tabs tail talk tee test time touch tput tr true tty uname uniq unlink uudecode uuencode vi wc who write xargs zcat

Of which, the following are already implemented in toybox:

done: cat cksum cp df dirname echo false nice patch pwd rmdir sleep sort tee true tty uname wc

These are low hanging fruit:

easy: basename cal cmp comm date du env expand fuser head id kill link ln logger logname mesg mkdir more mv nice nohup rm rmdir split tail touch uniq unlink who xargs

These are medium difficulty:

medium: chgrp chmod chown cut dd diff expr find fold join ls newgrp nl od paste pathchk printf ps renice sed stty tabs time tput tr uudecode uuencode

And these are each a rather complicated project in their own right:

fiddly: awk file sh vi

As an aside to show that SUSv4 isn't the end all and be all of the Linux command line, Toybox already implements the following commands which aren't in the SUSv4 utility list:

extra: bzcat catv chroot chvt count dmesg mkswap netcat oneit seq setsid
sha1sum sync which yes

Now I need to triage android's toolbox, and see what the aboriginal linux build uses (including the full linux from scratch build).

November 17, 2011

Wow, guys in charge of both the US and Europe have really managed to crit-fail understanding of basic macroeconomics. They don't understand what money IS, let alone how it works, let alone how economies use it.

Unfortunately, every time I sit down to explain, I write twenty paragraphs of backstory and look up and the sun's gone down. Possibly I should podcast the explanation or something. So many things I want to do, but sitting in a cubicle just drains all my energy even if I do nothing but SIT there. (I like my job. The money's good, the co-workers are nice, the project's interesting... but the new building is all cubicles. They make me listless and uninspired even after I go home, and it's starting to bleed into weekends...)

November 15, 2011

Digging up ancient issues from toybox development, one of which is an interesting design issue with querying the terminal size.

When you're on a serial console (happens a lot in the embedded world), the local tty device doesn't know the width and height of the window at the far end, so ioctl(TIOCGWINSZ) can't report it to you.

If the term program at the other end supports ansi escape sequences (everything does, including xterms), there's an escape sequence you can use to ask it, but there's multiple levels of non-obvious to the implementation.

The escape sequence itself is "\e[6n", to which the terminal program responds with another escape sequence, "\e[YY;XXR" where the "XX" part is a decimal number with the current cursor's Y location, and XX is the cursor X location. (Both decimal, top left is 1;1.)

Since what we want is the size of the screen, we wrap that in some more commands, saving the current position, moving to the cursor 999 down and 999 to the right (which should stick it at the lower right corner), query that position, and the jump back to the saved location. The full escape sequence is therefore "\e[s\e[999C\e[999B\e[6n\e[u".

The problem is, the response comes back from the terminal program on stdin: along with whatever else is coming in on stdin. There could be a delay of a significant fraction of a second (especially through a rial port, even when you aren't overcommitted and swapping), and there's no guarantee the terminal will actually respond. So blocking and waiting for a response isn't the greatest idea, and can eat other input the user has queued up.

This means you need to do a nonblocking read, assemble an ansi sequence a piece at a time (luckily the term programs generate these atomically so you don't get part of an ansi sequence with user-typed keys in the middle of it), and keep any other data you get for use later. The logical place to do this is in the line editing code the shell has to have anyway, which responds to cursor keys, page up and down, and so on.

November 14, 2011

Created a new mailing list for toybox, and pulled the trigger on the BSD relicensing.

Still nothing from the Sparc guys.

November 13, 2011

I'm trying to decide whether to relicense Toybox under the OpenBSD 2 clause license, or under Creative Commons Zero. The first is the simplest option, the second would maximally piss off "RMS lite" (I.E. Bruce Fscking Perens) in a "hey, Project Gutenberg predates the FSF by many years you irrelevant waste of oxygen" way. (Yes, I am still bitter.)

Tim Bird poked me a couple days ago wondering if I was interested in working on a competitor to busybox. I reminded him that I spent over a year doing that, and he went "oh".

The problem Tim's dealing with is Android's "no GPL in userspace" edict. Google and a bunch of other companies responded to GPLv3 the same way I did (DEATH FIRST). The Jar-Jar Binks of licenses overshadowed the original, the same way the second and third Matrix Movies made the first one less memorable, even before the FSF and SFLC teamed up to go Cisco/Linksys out of the Linux business (Mepis II) in what can only be described as a Tom Cruise jumping on a couch style "career limiting moment". All this had knock-on effects elsewhere (such as spawning LLVM and PCC development projects, to replace gcc).

From a purely pragmatic perspective: I spent over a year doing busybox license enforcement, and a dozen lawsuits later I'm still unaware of a SINGLE LINE OF CODE added to the busybox repository as a result of this, unless you count this:

commit eb84a42fdd1d1c2e228dcd691a67b8ad5eeda026
Author: Rob Landley
Date:   Wed Sep 20 21:41:13 2006 +0000

    The Software Freedom Law Center wants us to add a copyright notice to the
    generated binaries, to make copyright enforcement easier.  Our liason with
    them (Bradley Kuhn) suggested the following text:

My only real concern about a BSD license is it lets for-profit corporations hire your developers away to work on a proprietary fork (as I ranted about here and here). This is why BSD the operating system has never amounted to anything: Sun looted them in 1982, BSDi looted them in 1989, and Apple looted them a third time around 1997 and never stopped stopped. (None of which explain why Free, Open, and Net are separate projects.)

But honestly, I think the FSF has now made the GPL more of a liability than an asset. I've spoken on panels defending the GPL but GPLv3 was a career limiting move. LLVM and PCC and Android are all organizations that were fine with GPLv2, until it got painted with the same brush as GPLv3 and the contamination spread to cloud the old thing.

Android is important because smart phones will take over the way the PC took over from mainframes and minicomptuers (again, already ranted in detail about mainframe -> minicomputer -> micro/PC -> smartphone).

I am very interested in obsoleting GNU bloatware: it's crap. From a purely technical perspective, the FSF _sucks_at_writing_code_. They're horrible, and their stuff should die. That's why I got into BusyBox in the first place: I want GNU-less Linux.

If hobbyists had any interest in forming a real open source project around toolbox, it would have happened by now. I've _always_ thought Toybox had better engineering foundations than BusyBox, and I kept the code much cleaner than BusyBox has become. I managed to push a little upstream, but I no longer find BusyBox a pleasant work environment.

The problem with Toybox is it didn't have a niche, BusyBox was well established and "good enough" that incremental improvements wouldn't displace it...

But a "minicomputer to PC" style switch can.

November 12, 2011

I found the sparc thing, but don't have a fix for it yet. I bisected it to the commit that's _triggering_ the problem, but the actual problem is sparc's relocation code hitting a type it doesn't understand.

Sparc apparently has its own relocation fixup thingy, in arch/sparc/mm/btfixup.c, just like uClibc's dynamic linker. I need to sit down and go through all that crud and properly wrap my head around how it works. I know the theory, but the full chain of "data originates here, gets modified here, winds up here" is a bit fuzzy in places. Especially since there are multiple TYPES of relocations (function calls can be short and long and indirect, the third of which is basically a function pointer call; data access can be all sorts of crud with different sizes and structure members... And then there's REL vs RELA which I understand every time I look it up but never _remember_ afterwards.)

In this case: where is this list of relocations coming from, and why is it being done? The actual instruction it's barfing on is restore, which is conceptually kind of horrible. I'm not sure if this is "we never hit this, so never bothered to implement it", or "you need to feed an -mtune to your compiler to not generate this instruction for this architecture variant, even though it's otherwise worked for years", or "the relocation table did something funky and is pointing at the wrong instruction"...

Hmmm... Confirmed that restore does NOT need to be relocated, so the relocation table is including a wrong instruction. Why is it doing that? Where's this table coming from, and why is an innocuous change to a totally unrelated part of the codebase corrupting it...

November 11, 2011

Ok, doing a git bisect on the kernel (to figure out why sparc stopped booting between 3.0 and 3.1), watching the daily show, and letting thunderbird do its incredibly inefficient thing (whatever it's doing) may be a _bit_ much for this poor little netbook.

United Mug has finally run out of pumpkin bagels. I has a sad.

Inexplicably, they have _not_ run out of Zombie Sinatra yet. He was a Justin Beiber style pretty boy of his day back in his 20's, who also got a lot of promotion due to his ties to organized crime. Both stopped mattering years ago.

In case you were ever wondering how LITTLE talent that idiot had, Seth McFarlane (who does all the voices on "Family Guy") got the same training (from Sinatra's old voice coach) and put out an album that sounds BETTER (at least the snippets played on his Fresh Aire interview). That's how little talent Frank Sinatra had.

That interview also explains where Lady Gaga got the theme of her song "Show me your teeth", it's apparently one of the things the voice coach tells you to do to change the quality of your singing.

In other news, still sick. Something like week three.

November 6, 2011

I meant to get an Aboriginal Linux 1.1.1 release out last weekend, but in addition to the 3.1 kernel I updated the Linux From Scratch native build from 6.7 to 6.8. (Yeah, 7.0 is out, but I can do that later.)

Let's see: I noticed that gettext-stub was installing files that could only be read by root. The native build is done by the root user in the emulated system (no root required on the host, but you get root in qemu), so this hadn't come up before, but it should still be fixed.

The util-linux package upgrade broke --disable-nls. Their header has an #ifdef block that makes their _() macro a NOP... but they call gettext() directly without the macro in several places. This leaks calls to the host's gettext, which is in theory ok since I installed gettext-stub so it should become a NOP at that level... but in the --disable-nls case they trim libintl.h out of the #included headers. So the util-linux release can't have TESTED --disable-nls, because it doesn't work, and yet it shipped. Wheee...

Now it's on to the perl build, which takes a while even natively, and it's about time for me to go back to work...

November 5, 2011

I wrote up, then deleted today's rant, which was all venting about a week long cold followed by having to work the weekend, while still sick.

I note that a neti pot is one of humanity's stranger inventions, but it does seem to help. I finally got some decent sleep last night, anyway. (Still have to go back to work at noon tomorrow, but with the end of daylight savings time I get an hour more of morning than I expected.)

November 1, 2011

Well, I think I can fairly safely say that Sprint uses fundamentally inferior technology to T-Mobile. (How can I have 3 bars of signal and no internet?) Still, they haven't jerked me around the way T-Mobile did.

So, using U-boot under QEMU involves fighting some of its' design assumptions. Of course it's been done, in spite of the u-boot developers. So what's going on?.

When you power on real hardware, a reset circuit switches off everything but the CPU (picking a single "boot processor" on SMP systems), and puts that CPU in a known state as soon as power levels stabilize and everything's charged up. (This is generally a simple state: its oldest backwards compatability mode with optional features like the MMU switched off.) This means the instruction pointer is initialized to a known value, so the processor starts running code at a known location. The board generally maps ROM or flash memory there, something nonvolatile containing early boot code.

The first thing the early boot code has to do is initialize the DRAM controller: until that's supplying the right voltage and performing the periodic refresh at the right rate for whatever memory modules the board is currently using, reading from DRAM will return random garbage. (This can involve a binary search for the right values for a given set of chips, waiting for circuitry to stabilize so you need to fiddle with timers and delays...)

The _tricky_ part is that C code expects to have various chunks of memory (the stack, the heap, global variables) avaiable, so for years DRAM initialization code had to be written in hand-coded assembly language, keeping all its' state in processor registers until the DRAM controller. Then the LinuxBios project (now Coreboot) figured out that you can abuse most processors' L1 data cache to act as a very small stack with a few hand-crafted TLB entries. So the assembly setup does the cache-as-stack trick, and then jumps to C code to initialize the DRAM controller (generally using one big function with a few local variables, to keep stack usage down).

Once DRAM is initialized, the startup code goes into stage 2, where it takes inventory of the hardware, switches on the bits it needs to load the kernel into memory and attaches simple little drivers ot them (could involve a serial console, network pxeboot, spin up a hard drive, etc), hands over extra data to the kernel (kernel command line, flattened device tree, external initramfs, etc.), and then jumps to the start of the kernel. The rest of hardware setup (page tables for the MMU) is the kernel's job, plus the kernel has its own drivers for all the hardware it uses.

In traditinal PCs, the program that did all this was called the "BIOS" (a name that came from the OS DOS copied). It was implemented in 16 bit 8086 compatible assembly language (see "starts in oldest compatability mode", above), and was split into two parts. The early boot part (implemented by the open source Coreboot project today) just did the DRAM initialization, and then the second half (implemented by the open source "Seabios" project, which used to be called Bochs Bios) handles loading a single sector (512 bytes) from a designated "boot device" and jumping to the start of it, and then providing callbacks (via interrupt #10) through which you can borrow its' cheap plastic drivers to print status messages and load _more_ stuff from the boot device, and query what it knows about the hardware (so how much memory is installed in this box, anyway...), and so on.

The BIOS boot sector always contains a loop to call the "load this sector and put it here in memory" callback a bunch of times, because that's about all you can do in 512 bytes. Generally, these days what the boot sector actually loads is a more competent boot loader (grub, lilo, syslinux...) that can present a menu of boot options, supply a command line to the kernel, parse filesystem metadata and load actual _files_ instead of a numerical list of sectors...

Platforms other than x86 have their own traditional bootloaders. One of the big ones is "OpenBios" that the pointy-haired mainframe types passed around like a disease for many years. Its legacy is the device tree data format, for describing the hardware in a system. Other than that, don't go there. (Its callbacks were implemented in the "fourth" programing language.) Also, when Intel perpetrated the Itanic they felt embarassed using 16 bit 8086 code from the 1970's to initialize the thing so they came up with a huge overcomplicated THING called ACPI which was sort of like a Java VM except it wasn't compatible with Java, and then there's EFI and System Management Mode... It's a mess. (If you assume proprietary BIOS developers can't write working code to save their lives, you're still too optimistic.)

U-boot came out of the embedded world, when an ARM bootloader and a PowerPC bootloader were combined to form ~~Voltron~~ the Universal Bootloader. (Except in German.) It's a project that buildsa ROM image that does it all: DRAM initialization, drivers for common hardware (mostly copied from the Linux kernel), a command line to control it with, some simple POST (Power On Start Test) code...

The problem with U-boot is it _does_ want to do it all. Getting it to do less than all confuses it. It wants to initialize a DRAM controller. Then it wants to copy itself out of (slow) FLASH into (fast) DRAM, before starting the second half of its' tasks.

QEMU emulates processors and attached hardware, but doesn't go into details like being clock-cycle-accurate (it just goes as fast as it can), and it doesn't emulate esoteric details like a DRAM controller. The host hardware is already doing DRAM refresh for it, so it just allocates some memory out of that and uses it. (There _are_ types of memory, such as SRAM, that don't need periodic refresh. They're used in tiny quantities for things like CPU cache, but are too expensive, bulky, and power hungry to scale up anywhere near system memory size.)

So if you run U-Boot under QEMU, ideally you want to to disable the DRAM init code and the copying. It's not that this is conceptually hard to do, it's that the developers think you're weird for wanting to.

The code seems to have gotten over this, maybe they haven't updated the FAQ for several years...

October 29, 2011

Blah. Totally out of it today, and most of yesterday. My sinuses have been killing me for days, weather changes I think. (Ok, and I ate a lot of sugar at thursday's halloween party. Plus Fade is just recovering from a nasty stomach bug that sidelined her for a couple days...)

Biked to chick-fil-a, which is probably more exercise than I've gotten in the rest of the week combined. That cleared my head a little, but not enough. The weather's beautiful, but it's in the halfway state where I'm tempted to wear a jacket, but then overheat biking with one on yet can't quite carry it confortably when not wearing it...

I have many things I should do this weekend, but I'm not sure I'm up to any of them just now...

October 26, 2011

Funtoo's "emerge --sync" uses git instead of rsync to update its metadata, meaning instead of building rsync in the bootstrap I need to build git.

Building git under the aboriginal root filesystem isn't actually all that hard. Since kernel.org is still swiss cheese the code's hosted on a google site. Grab the 1.7.7.1 tarball, extract it, ./configure, and build.

The build breaks because it needs zlib. Note that ./configure didn't break, THE BUILD DID, but altogether chant "autoconf is useless" and move on.

Really, it's understandable that git wants to store the repository in compressed format and considers this enough of a win it's not optional. As prerequisites go, zlib is one of the least painful, and I've already got a zlib build in both lfs-bootstrap.hdc and static-tools.hdc, so grab it and... it didn't find it because I have to specify --prefix=/usr or else the headers go into /usr/local/include which isn't in the search path. Ok.

Moving on... it wants lots of other packages. Reading the makefile I see NO_TCLTK and NO_PYTHON and NO_ICONV and NO_PERL all of which I can set on the MAKE command line... and then I have to set them again on the make install command line. Is there a way to avoid having to specify this twice? Grep says that configure can set them, so why didn't ./configure notice that they weren't there? Because it expects me to manually tell it they're not there, via --without-tcltk, --without-python, and --without-iconv.

Honestly, why does autoconf bother to run any tests?

There is a --without-perl but it complains "git cannot run without perl". (It is wrong.)

Alright, so the git build is:

./configure --without-tcltk --without-python --without-iconv
make NO_PERL=1
make NO_PERL=1 install

And now I can "git clone git://github.com/funtoo/portage-mini-2011.git" and it worked. Victory! (Late for my day job!)

October 24, 2011

Linux 3.1 is out. (Most of kernel.org isn't back yet, but they put the tarball up there where I can wget it, so it counts.) Meaning I need to do an Aboriginal Linux release soonish. Meaning I need to update the perl removal patches and find some time to regression test all the targets...

October 23, 2011

Daniel Robbins (founder of Gentoo, currently maintainer of Funtoo) was in Austin today, and had lunch with Tryn and me.

(Aside: Tryn was Mark, he changed his name, and I suck at names. Epically. I am so bad with names I still call Boston Chicken by the original name I learned even though they changed it over a decade ago. I _LEARNED_ONE_ and now I have to do it again and this is a problem for me. I note that when Fade changed her name it was to legally change it to the one she'd been going by since before I met her, so no action was required on my part there. I almost never refer to people by name in person because using it when I do know it would just highlight the times I _don't_. Speaking of which, I dunno if Daniel Robbins goes by "Dan" or not... Why don't I just refer to everybody by their irc/email handles for a bit?)

Mirell explained the original Gentoo From Scratch stuff to drobbins, and I showed drobbins the hdc build stuff in Aboriginal Linux. Drobbins in turn walked us through the new simplified profile format and his restart of portage currently written entirely in Bash (using features present in the old 2.x version I'm using). Highly educational. My brain's full.

We've largely been on the same page since I first talked to drobbins at the start of September, and since then he's added me to the core team mailing list. He's been doing a lot of work at his end to, for example, simplify the portage metadata so you can go KEYWORDS="*" and the individual ebuilds don't have to be annotated with each individual type of hardware they've been tested on so far. When I explain something to drobbins, he almost immediately gets it.

The limiting factor here is my ability to learn how Gentoo works. I was sort of up to speed on it a year ago, but not anymore. I tend to learn in a depth-first rather than breadth-first manner, which can be annoying at times. I need to do a Funtoo From Scratch on top of the Aboriginal Linux root filesystem, which means I need portage working, with a profile, an ebuild tree, and all the /etc/make.conf and such configuration. The new profile stuff collates a lot of that in exciting ways, but I still feel I only understand about half of it.

Still, I have yet to find a better way to learn something than to have the guy who created it sit down and explain it to you. (Mirell seems somewhat disgusted by the number of times I've pulled this off. He also seems somewhat interested in Funtoo now, which would be nice. When something catches his attention he finishes stuff way way faster than I do, and he DID get a fully-working Gentoo From Scratch where mine was only about 2/3 of a solution...)

October 19, 2011

Pondering doing a podcast, since my new phone can actually competently record audio. (I'm told I should get one of these anyway.) I used to teach night courses at ACC, because it's good to have an outlet for my otherwise annoying tendency to go into "lecture mode" on obscure topics. (That's why I keep poking at grad school, so I could make that my day job and get it out of my system, and focus my programming energies on the open source hobby stuff. It's vaguely a sort of retirement plan, but I'd have to spend a lot of time and money getting to that point...)

I also went through and added anchor links to my old blog entries, so if for some reason I need to link to an obscure busybox mount bug I fixed in 2005, I can do that. More importantly, since the date entries are now in the same format as the current stuff, I can run the RSS generator script against it, and use the resulting rss file to import the blog entries somewhere else.

Not quite sure where would be good. Fade gave me a dreamhost invite many moons ago, and I keep meaning to copy my old livejournal stuff there. Tumblr isn't quite right and "landley" is taken there. (Either my attempt to create an account glitched, somebody typoed, or my brother or father poked at it and then never used the result. Oh well, I don't exactly miss a service I never used.)

October 18, 2011

Linux git 86250b9d12caa1a added new system calls to powerpc, which is great in that it provides cleaner mechanisms that work more like the other platforms do, but it also means that the dropbearmulti-powerpc in the aboriginal extras directory won't run on kernels before about 2.6.36, dying with an "unimplemented syscall" error.

Easy enough to abuse USE_UNSTABLE to build against 2.6.34 or so (and I threw the resulting binary in extras as dropbearmulti-powerpc-old, although I doubt I'll make a habit of that), but this shows some fiddliness with USE_UNSTABLE.

For one thing, the naming is inconsistent. The tarball is prefixed with "alt-", the extracted source directory is prefixed with "alt-", the sources/patches files that get applied to it are prefixed with "alt-"... But environment variables controlling it are "UNSTABLE". I should pick one, and the decision which to pick is made slightly easier due to the fact that there's nothing _unstable_ about 2.6.34, it is in fact OLD. It's just an alernative version.

So, change the config entry to "USE_ALT", change the download.sh lines pointing at URLs from "UNSTABLE=" to "ALT=", and change the various plumbing and readme bits to use that.

NOW we get into dealing with the patch stack that applies to old kernels, and is needed _by_ old kernels. In theory I could just ALLOW_PATCH_FAILURE=1 but the partially applied patches break the build when I try that. (Alas, the gnu patch doesn't reject the WHOLE patch the way the one I wrote in busybox does, it allows _some_ hunks to apply and the result is not happy.)

The patches which break pretty much every package upgrade are the perl removal series (which you can bypass with HOST_EXTRA=perl and although the resulting build is no longer self-hosting that's a side issue), and the arm versatile patch (not needed for powerpc). I really need to flush more of these patches upstream. And I need to test 3.1-rc10...

Yeah, not really "easy enough" yet. I need to file off some more rough edges...

October 17, 2011

Monday, new office, mostly unpacked, totally exhausted.

Adjusting back to the morning schedule (where I get up before work while it's still night, because I'll be too fried after work to do anything useful anyway) is a hard thing to do on Mondays. I think I got 4 hours of sleep last night. (But if Idon't do this, I wind up staying up late all week and not getting any non-work programming time.)

Also, my caffeine tolerance has gotten high enough that a rockstar and two cans of diet mountain dew only lasted until about 1pm.

Just wasted a couple minutes trying to figure out how to turn on a board that wasn't plugged in. Right, time for can #3!

(Side note: this morning, work had essentially unlimited quantites of krispy kreme donuts, to welcome us to the new office. Then they added danishes for people who wanted their sugar in a less obvious form. Then at noon they had pizza. This is in addition to the free snack room work always has, and I used one of those monopoly piece things for a free Steak Egg and Grease Bagel at The Donald's this morning (it was tasty). I'm pretty sure I hit my calorie budget for the day before the end of lunch. Yeah, substituting food for sleep, although I felt FINE until the sun came up. Then I went "ZZZZzzzz...." and have been there since.)

October 14, 2011

Today T-mobile cancelled my data plan. The bill is paid, and I can still make calls, but the phone no longer has internet access.

You'd think a phone company might have phoned me about this, but halfway through downloading an MP3 it just stopped, and now any web page I go to gets intercepted by "web2go" and says "Subscription Upgrade Required".

So I called the 611 support line. It's an automated robot, which has no option for the problem I'm seeing, and will not give me a human. I spent half an hour trying to get a human: it hung up on me a dozen times.

So I go into a T-mobile office. After waiting half an hour THERE to speak to a human, they admitted bafflement and phoned a special magic tech support line that 611 no longer connects to. Yes, I had to PHYSICALLY GO to the T-mobile office so I could speak to a support person ON THE PHONE. I spent another ten minutes or so on hold waiting for that guy.

I've had the same data plan since 2008, but it went away today. They have another plan that's the same price but it's metered to 2 gigabytes per month.

Why would I be pissed about rate-limiting when I spent months stuck with only "edge" and am still lucky to get that in my office? And did I mention that their basic "no contract, unlimited everything" (but with rate limiting) is over $20/month cheaper than what I'm paying now?

Let's sum up: T-Mobile took my internet away without warning so that I could come in to their office to speak to someone on one of THEIR phones so I could agree to have my connection metered. They've also permanently removed my ability to get a human being on the phone when I have a problem.

It's really important to me to be able to SPEAK TO A HUMAN because I've had to call them five times this year to get them to TURN SMS OFF YET AGAIN when T-mobile itself keeps sending me SMS spam.

This evening I talked to a sprint guy. They're CDMA instead of GSM, so I'd have to buy a new phone, which means I would have to stop using Causal (the Nexus One Chris DiBona gave me at CELF last year). But really, I'm pretty pissed at T-Mobile right now. I was a happy Sprint customer for ten years (until 2008), and I didn't leave because they were stupid but because their internet plan was behind the times.

Now they're the only one offering flat rate unlimited internet with no bandwidth cap. And the ability to speak to a human through the phone.

October 12, 2011

Today I was told that a company is switching its (non-phone) hardware from Linux to Android because "everything is going that way" to the point where soon "no embedded devices will be based on vanilla Linux".

I wonder if the inevitable resulting crash will drive them back to a more vanilla Linux, or away from linux entirely? I also wonder how long it will take.

The fact that this _will_ blow up in the face of everybody doing it strikes me as inevitable as the 1980's fragmentation of propreitary Unix, or the failure of Monterey, the 86open project, OpenSolaris, and OpenDarwin. Sometimes, you can just see it coming. Google's freewheeling exercise in using "not invented here syndrome" to reinvent the world in the absence of peer review will collapse under its own weight, because they obviously don't understand how open source development _works_.

Sometime I've got more time I need to write up why. It's the same old three waves, editorial function vs sturgeon's law thing I gave a talk about at Flourish in Chicago a year or two back, plus the "source under glass" trailing releases and abandonware stuff that means code non-Google people write doesn't get integrated upstream...

But today, I'm busy.

October 10, 2011

Wow, qemu-0.15 is slow. The loopback network is doing about 300k, and running "time tar tvjf linux-3.0.tar.bz2" says it took 13 minutes, 16.28 seconds. Running it on the host (same busybox tar binary) took 1 minute 17.47 seconds.

Is it because I upgraded QEMU? Something about the 3.0 kernel not interacting well with the emulator? Because I switched from the ext3 to the ext4 driver? Did I just never try it on my netbook (which is a really cheap processor that's uncomfortably CPU cache constrained, so MHZ aside it can slow way the heck down if the working set won't fit in L2).

Alas, I'm at work today so I haven't got time to track it down right now. (The fact slackware is on day 3 of trying to install and reproduce the Aboriginal Linux build means it's not _just_ the root filesystem I put together being slow... but since they're using the 3.0 kernel on the same qemu that doesn't eliminate any of my theories. In any case, this is not the performance I expected...)

October 6, 2011

Dear xubuntu developers: the beta 2 install of Oozing Ocelot, as converted into a bootable USB key by the usb-creator-gtk key, fills up the tiny 128 meg /home partition attempting to download updates. This causes the install to screw up before it's even finished asking the user questions.

Worse, after it's done this, attempting to reboot with the same key hangs after "mounting network filesystems" (which there aren't any), presumably a side effect of /home being full.

Stop it,

Me.

P.S. making a new key and telling it to use a ramfs to store the data, the installer then crashes saying it can't lock "/target/var/cache/apt/archives/lock". It helpfully submitted a bug for me and told me I could comment on the bug in the web browser. The web browser says need a launchpad login in order to do so. I'm not creating an account to comment on a bug your thing already filed for me, it didn't need the account to _submit_ the bug, and could have prompted me for my comments and contact info before submitting the thing. I very vaguely recall creating a launchpad account years ago, but have no idea what password I used and refuse to fiddle with it.

P.P.S. I logged onto the freenode #xubuntu channel and explained the issue (which took a while due to the tiny phone keyboard; they still have tmobile blacklisted and pidgin in 10.04 on my netbook doesn't know the magic dance to authenticate), and the one active guy said there's just users here, no developers.

October 5, 2011

Why is a unified editorial vision important for documentation? Here's an example.

Back when I mantained busybox, I wrote a FAQ entry about why open source projects only support reasonably current versions of their code. It used to look like this, which was a general resource any open source project could link to to explain why "that version's too old, upgrade and see if the bug still exists" was a standard reply to some bug reports. The same entry now looks like this, with a large busybox-specific digression inserted into the middle of it.

The reason for the change is that after I handed off BusyBox, I stopped maintaining the FAQ. (Partly because the method of updating the FAQ changed, the source control switched from svn to git, and moved out of a subdirectory of the main repository into its own web-pages-only repository, and I didn't bother to set up the new configuration since I wasn't doing development there anymore.) Instead I would occasionally post long messages to the list answering some question or other, which Denys would sometimes splice into the FAQ when I'd answered one.

So that new text is actually material I wrote... but it's not where I would have put it. The new addition makes that busybox FAQ entry less useful as a resource for open source development in general. There's nothing wrong with the busybox faq being busybox specific, but I had a larger goal in mind, the current maintainer did not, and the result is editorially inconsistent. And that's with basically two people updating it, now imagine dozens working at cross-purposes...

(A lot of what I wrote in that FAQ wasn't particularly busybox specific, most of the tips and tricks section is documentation for general programming topics I couldn't find existing stuff on. What I really need to do is add a lot of the old material I did for busybox to my current FAQ, or possibly split off a truly project independent educational resource on this stuff. Or maybe just poke the man-pages maintainer and go "here". Either way, it's vaguely on my todo list...)

October 4, 2011

Second day back at Polycom and MY BRAIN IS FULL. (A month of back email, much of it with attachments of powerpoint slides, PDFs, word documents...)

But the big timesink is the links to the wikis. I'm not a fan of wikis. I gave a talk once where "I am not a big fan of wikis" was an underlying theme I should have gone into more detail on. Alas, I still don't have time right now to go into proper detail, but a few highlights:

Wikis collect a slush pile, but suck at organizing it. The lack of linear navigation leads to "tabsplosion", which interferes with other things you're doing but can't easily be put aside and restarted from where you left off. Pages with nothing but links to other pages are _annoying_, the logical thing to do is collate them into a single unified non-repetitive index, but if you could do that you wouldn't be using a wiki. When perusing a wiki how do you know when you're done? It's easy to miss stuff, and the pages themselves often have lots of repetition (since there's no obvious single place to find related information) which adds extra work keeping the information accurate and up to date.

If you have a single unified editorial viewpoint, a wiki is the wrong tool. If you don't, attempts to unify only cause further fragmentation. It's the classic "too many cooks" problem, which open source deals with by having project maintainers with veto power, who fight off sturgeon's by rejecting stuff of insufficient quality (often with rejection letters that suggest how the author can improve it and try again). This is the classic editorial role.

Letting everybody edit the wiki is like giving everybody commit access to source control: it takes away the veto power which is the only enforcement mechanism the maintainer has.

The way wikipedia[citation needed] handles this is with a team of editors coming along after each change, to review and clean it up (or revert it) after the fact. This requies A) an army of warm bodies to put in a huge amount of time, B) abandoning the very idea of a single unified vision of the project as a whole, in this case indexing the entries. (They can't even agree on what is and isn't "notable"; things like tvtropes and comixpedia were created due to wikipedia's inconsistent notability policies, based on the personal opinions of a large and changing group of editors publicaly fighting "commit wars".)

Wikipedia is the best-case example of wikis, and you can't even easily get an alphabetical order list of articles from the live version. (You can from the special subsets which they prepare and polish by hand, but that one only contains about 2000 articles.)

The way you find stuff in wikipedia is via google, or following cross-links, not by having a way to read all of any given topic that doesn't fit neatly onto one page.

October 2, 2011

Aboriginal Linux 1.1.0 is out, building LFS 6.7 for 11 targets. Announcement in the usual place

September 28, 2011

An email I received today:

hi rob,

i seem to recall you actually did a forensic analysis and rewrote a lot of busybox to make bruce perens go away, is that correct?

bruce perens is now "active" in the open source hardware community, and immediately caused some problems.

history is repeating itself, and he is now (trying to) fork the effort and had already started his own open hardware effort. it was after a series of confrontational and disrespectful emails on our mailing list.

do you have any advice with dealing with him, it's all seems like the same stuff from like 1999, the OSI, free vs open software, rms, etc...

thanks for any advice you can provide.

As Paul Simon sang, "some things never ~~learn~~ change". I am _so_ glad he's not my problem anymore...

On the bright side, it was a good excuse to re-watch the how to survive poisonous people video.

September 27, 2011

Back from a week in Arkansas where I helped a friend move into a new apartment (third floor, no elevator), got about a dozen chigger bites (which itch), and got little or no programming done. Oh well.

I need to bang on Funtoo before work starts up again. Where did I leave off...

September 17, 2011

I've now gotten powerpc, all the arm little endian targets, both endiannesses of mips, x86 and x86-64 to build lfs-bootstrap based on Linux From Scratch 6.7. That's pretty much all the "functional" build environments, so it's probably time to do a bit of polishing and tag a release.

The other targets are broken for various reasons that don't directly have to do with any of the Linux From Scratch packages being unwilling to build:

The armv4eb and m68k targets haven't got proper emulation in qemu yet (or at least I haven't made it work).
The sh4 board ignores the amount of memory you tell it to supply on the command line, and hardwires a 64 megabyte allocation (which sadly isn't enough for gcc 4.x to do anything useful, gnu bloatware at play).
Sparc's dynamic linker is still horked in uClibc. I _almost_ managed to care before Oratroll bought it.
The mips64 target is generally bit-rotted: this week it seems to think it's the year 2500, which drives make nuts. That's because mips64 was experimental hardware years ago that never sold well or went very far, and then the mips guys went off in a different direction (the new "mips32" api I need to update the toolchain to use, just like armv7l, thumb2, microblaze...).
I need to take another stab at powerpc-440 and the bamboo board, but that's sort of blocked pending the device tree stuff.

Speaking of device tree stuff, teaching the arm versatilepb board to take more than 256 megs isn't as simple as moving the I/O base address up by changing the #define VERSATILE_SYS_BASE in platform.h and then altering qemu to match. Well, actually altering qemu is easy: it doesn't have such a #define but all but all the addresses you need to search and replace are in the same platform init function. Unfortunately, at least the serial driver has its own address hardwired in somewhere else, because with the old qemu at 0x10000000 I get a console with the "new" kernel, and with the new qemu I don't, which means the serial address isn't moving. I spent about an hour fiddling with it, then put it back on the todo list.

In theory, once there's device tree stuff I can come up with a reasonably generic PC-like board for all architectures, the way I have a baseconfig-linux and baseconfig-uClibc. No idea how that'll work in practice, but I have high hopes. Still, that's kernel and qemu stuff, not aboriginal linux stuff per se.

The last mips build break was util-linux-ng including a horrible old version of lsof which had an intentional #error statement kill the build on big endian mips, despite no obvious endianness issues at a glance through the source code. This command is a horrible legacy pile of crap, as demonstrated by its man page (large portions of which which don't even RENDER right in xubuntu 10.04). Said man page starts with:

DESCRIPTION
  Lsof  revision 4.81 lists on its standard output file information about
  files opened by processes for the following UNIX dialects:

  AIX 5.3
  FreeBSD 4.9 for x86-based systems
  FreeBSD 7.0 and 8.0 for AMD64-based systems
  Linux 2.1.72 and above for x86-based systems
  Solaris 9 and 10

Which is pretty much "kill it with fire" territory right there: either be Linux-specific or be target agnostic, don't play whack-a-mole. Internally, this monster uses ptrace instead of "/proc/$PID/fd" like you'd expect it to on Linux, and I spent fifteen minutes reading the man page to come up with a test case to see if mips big endian did in fact work after I hit the build with:

sed -i '/#error /d' src/peekfd.c

Before giving up and deciding I don't care whether something that horrible actually does work or not, just that it builds. I'll await bug reports from the field before spending more attention on it.

That sed also made armv4l (OABI) build, which had another "oh noes" #error refusing to build there too. That target probably IS producing a broken lsof the way it's doing ptrace and peeking in registers (OABI vs EABI = different registers), but I don't care enough to try to fix it. This command needs to be rewritten from scratch, not patched. I no longer bother with the coding equivalent of "flying buttresses" unless paid to do so; I would much rather ACTUALLY FIX THINGS even when that's not the easy way.

I also tweaked more/buildall-native.sh to dump its output in "build/" instead of "build/native-static/", and that raised one fiddly little issue: native-build.sh is uploading filenames like "lfs-bootstrap.tar.bz2-armv5l", when it would be slightly cleaner for it to upload "lfs-bootstrap-armv4l.tar.bz2".

This is a design conflict between the control image being target agnostic and the host system needing to distinguish between files uploaded into the same directory.

For different versions of busybox, I need to distinguish them with a suffix. For files like "busybox-armv5l" I need a suffix because the multiplexer applet has to start with the name "busybox" plus an optional/arbitrary suffix in order to trigger the multiplexer behavior. (It tries to determine how to act based on its name; the name "busybox" is unique in that it doesn't have to be the FULL name, it can be followed by arbitrary garbage. This was the result of a long design discussion on the busybox mailing list, which you can get the highlights of in two messages or read the whole thread if you like. It was a good idea, wasn't mine, but I liked it and implemented it.)

Adding a unique postfix to uploaded files during native builds was really easy to implement: just have the upload function in the control image common code append a -$TARGET postfix on the filename it was uploading.

But when you build tarballs conceptually similar to the build stages, it makes sense to have the names be thingy-$TARGET.tar.bz2 instead of thingy.tar.bz2-$TARGET. (For one thing, bash's horribly overcomplicated tab completion behavior won't autocomplete a tarball name that doesn't END in a recognized suffix, although it would in Red Hat 9 because the FSF loonies hadn't screwed it up yet.) And having a thingy-$TARGET.tar.bz2 name implies that they extract to a thingy-$TARGET directory rather than just a "thingy" directory, which means changing the directory name chroot-setup creates (or renaming it before tarring it up). Which is a more intrusive change.

It's always the litle fiddly semi-aesthetic issues that eat disproportionate amounts of design time. When what to do is OBVIOUS, it's easy. Turning a grain of sand into a pearl takes forever to get comfortable.

September 16, 2011

Made time to track down the Mips breakage, and it's Perl being stupid. The perl Configure script has a forest of if statements, right after the comment "Half the following guesses are probably wrong", doing various heuristic tests to guess what operating system it might be building on. These targets include such modern gems as sco_xenix, dynix, unicos, next, svr4, and apollo.

One of them is apparently an operating system called "mips". It detects it by doing this:

$test -f /bin/mips && /bin/mips && osname=mips

The util-linux-ng package installs a "mips" alias to the "setarch" command, which is one of those commands like nice, nohup, and strace which expects to run a command line. When presented with no arguments, it runs "sh".

So on mips (and only on mips), the perl build spawns a shell that sits there awaiting input, and hangs forever. (Or until my build's timeout.sh wrapper kills it for taking too many seconds to produce the next chunk of output.)

Note that "not having to do cross compiling" is not the same as "every platform just working out of the box". It's an order of magnitude less complicated, but there is still breakage in packages that are not particularly tested on various targets, and are doing really stupid things. Perl is made up of wild heuristics, it's not surprising they catch on stuff.

In this case, the fix is to redirect input from /dev/null when running Configure, so the shell it shouldn't be launching dies immediately.

September 10, 2011

Experimenting with the amount of memory boards can take isn't promising. I already knew mips was stuck, but the "versatile" board I'm using for arm maxes out at 256 megs too, as does the default board of qemu-system-sparc. The sh4 board completely ignores the -m option and gives _64_megs_ of physical ram, which is kind of hilarious these days. (That's still not a real target.)

I can trivially up the memory in i686, x86-64, and powerpc. Doing so for anything else would take a significant amount of work.

Sigh. This is where device trees come in handy. Everything except mips is stuck on board layout, which device trees could adjust dynamically, but hardwired kernel + board files have preallocated ranges and that's it. Ok, queue this up on the todo list behind device trees, and tweak the CPUS= down to physical memory divided by about 80000k (which gives you 3 processors on a 256 meg system, with a little cushion for the kernel having eaten some so available physical memory being a bit below an even power of 2.) And then see if I can plug in a 256 meg swap file, and if that helps. It's still cramped, but _less_ cramped. (If I can stop some of this crud from being I/O bound, life improves...)

September 9, 2011

Attempting to build Linux From Scratch on all targets via "FORK=1 more/buildall-native.sh lfs-bootstrap.hdc" (with udev commented out in /mnt/package-list in lfs-bootstrap.hdc), I'm reminded that quadrolith is only about half as powerful as securitybreach was. (4 processor vs 8 processor, and those were xeons with insane amounts of CPu cache, and of course 8 gigs ram vs 32 gigs ram.) This sort of makes sense given that I spent 1/4 as much money on it ($800 vs $3000) about one iteration of Moore's Law after we bought securitybreach.

Even extending the timeout to 300 seconds (5 minutes), half the platforms timed out either at the start of util-linux-ng build (where a grep and a sed on the source code produce no output until both commands complete, but for that to take any time at all says that the box is totally I/O bound). A bunch more died either during perl configuration or perl installation (which is slow because the perl build is written in perl).

Part of the problem is that each target is allocating 256 megs of physical memory and then doing a -j 8 build in there (twice the number of CPUs on the host), and even with the actual compilation going through distcc, that's pretty tight. Just running that many preprocessor instances in parallel on a 256 meg system... yeah.

Can I allocate more memory? Some targets (such as mips) max out at 256 megs due to the layout of their physical address space, but most can go up to at least 512. I can also cap the SMP level at one SMP process for every 128 megs of target ram (even when they're doing distcc): it varies per package but generally the QEMU-side preprocessor work and emulated network I/O max out somewhere around -j 3 anyway. And I could add swap: in theory Linux supports swap files and if you're not actually going OOM it should be stable.

I dunno which of these options would actually help performance and which would hurt it, especially when buildall forkbombs the host. Tuning for a single native-build.sh and tuning for buildall regression testing across multiple architectures are different things.

Looks like I have some benchmarking in my future. But performance tuning comes after getting functionality to work, and I still need to see if LFS 6.7 works on targets other than i686 (and try to fix the ones that don't), then upgrade to 6.8 and maybe add a BLFS build control image, and of course bootstrap portage to get funtoo and metro working...

September 6, 2011

So udev uses autoconf, and thus probes for whether "stdint.h" exists, or the compiler supports the "-static" flag... but it's so tightly tied not just to the Linux kernel but to SPECIFIC VERSIONS of it that the udev out of Linux From scratch 6.7 won't build under Linux 3.0 because /usr/include/linux/videodev.h went away. (Note that the ./configure stage finishes happily, then the build breaks.)

Autoconf is not a development tool, it's a disease projects catch.

September 5, 2011

The bug I spent so long tracking down is that SIGALRM happens in PID 1, that signal has a handler which does a longjmp which bypasses the signal return path and thus never clears the signal "blocked" mask, the stuck blocked bit is then inherited by all child processes, and hilarity ensues.

My /sbin/init.sh script is running as PID 1. When it detects a /mnt/init it pauses for 3 seconds waiting for the user to press a key (and thus get a command shell) before execing that /mnt/init. The pause is a shell command, "read -t 3 -n 1 X", which uses SIGALRM.

This is why when I hit space to get a shell (preventing SIGALRM from ever getting sent), and then exec /mnt/init from that shell, the hang doesn't happen. Delivering SIGALRM to PID 1 is what corrupts the blocked mask.

How is the blocked bit set? When SIGALRM triggers, do_signal() calls handle_signal() (both in arch/x86/kernel/signal.c), which sets the 1<<(SIGALRM-1) bit in current->blocked via sigaddset() and set_current_blocked().

Why doesn't it get cleared? Right after the call to handle_signal(), do_signal() has a comment:

/*
 * A signal was successfully delivered; the saved
 * sigmask will have been stored in the signal frame,
 * and will be restored by sigreturn, so we can simply
 * clear the TS_RESTORE_SIGMASK flag.
 */

Except that the sigreturn codepath never gets called: Linux puts it on the stack as the return vector from the signal handler, but bash 2.05b does a longjmp() out of the signal handler. Presumably this _used_ to work and the kernel changed out from under it. I can't quite find documentation on how this _should_ work, "man 7 signal" glosses over a lot. (Such as how to unblock a specific signal, given the signal number, or the fact you _can_ longjmp() out of a signal handler.)

I'm leaning towards calling this a bash bug, especially given how old the version I'm using is. It's easy enough to fix in bash: just unblock the signal before doing the longjmp, the magic incantation for which turns out to be:

sigset_t blah;

sigemptyset(&blah);
sigaddset(&blah, SIGALRM);
sigprocmask(SIG_UNBLOCK, &blah, NULL);

That should be commit 1435 once I finish testing and check it in.

(Then again, why the kernel bothers having TS_RESTORE_SIGMASK if it's just going to clear it here is an open question. Why does alarm() generate a non-reentrant SIGALRM in the first place, did it used to be reentrant and there was a problem? The answer's probably buried in the git archive somewhere, or perhaps on the mailing list, but would take a lot of digging to unearth...)

September 4, 2011

I'm aware that for a while I was trying to channel all this technobabble to the Aboriginal mailing list, but A) nobody ever replied to it there, B) it's easier for me personally to grep my blog than try to find something in the mailing list archives. (The gmail bug where it refuses to send me copies of my own messages remains in force, I need to set up a real mail server again.)

Reading through __hrtimer_start_range_ns() we start with a lock_hrtimer_base() which returns timer->base (which clock source is this timer driven by), then a remove_hrtimer() call that presumably makes sure this timer object isn't already queued. So who determined which clock source this was driven by? It's set in hrtimer_init(), but who called that...

Ah, back in itimer.c function do_setitimer(), we fetched tsk->signal->real_timer, so apparently this is a per-task time object. Interesting. Where is that coming from, and who else uses it... It comes from fork.c, goes away in exit.c, and it only used in a couple other places. Interesting.

Alright, let's add a new global "struct hrtimer *blah;" and initialize that to the timer pointer when alarm grabs it, and in __remove_hrtimer() if (timer == blah) printk() about it...

Ha! Five seconds after the alarm is armed, the timer is removed from the red-black tree but it does NOT kill the process! Who is doing it? Add a dump_stack() after that printk...

Call Trace:
[<c1029ce5>] ? __remove_hrtimer+0x20/0x4c
[<c102a12f>] ? hrtimer_run_queues+0x109/0x176
[<c101efe4>] ? run_local_timers+0x5/0xf
[<c101f2aa>] ? update_process_times+0x18/0x49
[<c10310a4>] ? tick_nohz_handler+0x6b/0xb2
[<c1036214>] ? irq_modify_status+0x87/0x87
[<c1003334>] ? timer_interrupt+0x10/0x18
[<c1034bb2>] ? handle_irq_event_percpu+0x23/0xf7
[<c1036214>] ? irq_modify_status+0x87/0x87
[<c1034c9f>] ? handle_irq_event+0x19/0x22
[<c1036286>] ? handle_level_irq+0x72/0x7a
<IRQ>  [<c1002f8a>] ? do_IRQ+0x2b/0x69
[<c11a8a69>] ? common_interrupt+0x29/0x30

Huh. That's educational. Ok, timer interrupt comes in, goes through a bunch of stages, and winds up in hrtimer_run_queues() which is in the same kernel/hrtimer.c as __remove_hrtimer()... but does not _contain_ a call to __remove_hrtimer(). Ok, something's a macro or getting inlined or tail-called or something, let's see... probably __run_hrtimer(), so add a printk() to that to see what timer->function is... c101a675. And I didn't save the vmlinux file, ooops. A quick "NO_CLEANUP=1 ./linux-kernel.sh i686" later, then "objdump -a build/temp-i686/linux/vmlinux" (because i686 is a subset of the host architecture anyway so I don't need to use build/simple-cross-compiler/bin/i686-objdump) and search for c101a675 and it's something called "it_real_fn". Which is in kernel/itimer.c, ok stick a printk in that. It's being called.

Right, the alarm is getting registered, the timer is expiring, the expire function is getting called, but the signal isn't making it to do_signal().

September 3, 2011

Drilling into the timer issue. Robert Love's kernel book hasn't got anything on signals (not even mentioned in the index), but Google dug up an old Linux Journal article with some stuff on signal delivery, which let me know to stick a printk into do_signal() and it's not getting called for the alarm. (It's getting called for other stuff, like ctrl-c, so I know it's instrumented right.)

This means it's not a uClibc issue: yesterday strace showed rt_sigaction() and alarm() being called, and it's not winding up delivering a signal to do_signal(). The pthreads code doesn't enter into this, the kernel isn't _producing_ a signal for pthreads to mishandle. I need to drill down from alarm() and figure out _why_ the signal isn't getting produced.

In theory syscall "thingy" winds up calling a "sys_thingy()" function in the kernel, but grepping for that only occasionally finds one. Lots of times the sys_function() definition is hidden behind a magic macro you just have to know about, in this case:

SYSCALL_DEFINE1(alarm, unsigned int, seconds)

Which is in kernel/timer.c. (Unless you know exactly what to look for you can't even grep for the syscall's arguments out of the man page: "unsigned int seconds" has a comma in it. This has bit me more than once when I can't remember the magic incantation.)

That function's a trivial wrapper around alarm_setitimer() (and a printk shows we're getting there), which is in kernel/itimer() and printk says we're getting there too, going into the ITIMER_REAL branch and calling hrtimer_start() from kernel/hrtimer.c, which is a wrapper around __hrtimer_start_range_ns().

At this point, we're into the "high resolution timers" code, which is #ifdef salad with a gazillion options: CONFIG_SMP is not set, CONFIG_NO_HZ is, it's a 32 bit platform, and CONFIG_HIGH_RES_TIMERS is set. I vaguely recall reading something about "timer wheels" but this is an area of the kernel that Thomas Gleixner ripped a new one in 2006 and was still in flux last I looked, so I have a reasonable idea what the current code does, but not how it does it.

Right, I have some reading to do.

September 2, 2011

Spoke with Daniel Robbins (founder of Gentoo) on IRC, his new Funtoo project looks much easier to bootstrap than Gentoo, and best of all he understands how it works all the way down to the ground, which nobody left at Gentoo seems to.

Still chipping away at the uClibc alarm(). According to strace, it's doing:

rt_sigaction(SIGALRM, {SIG_DFL, [ALRM], SA_RESTORER|SA_RESTART, 0xb7ee40e7},
  {SIG_DFL, [], 0}, 8) = 0
alarm(5)

And then the alarm signal is never delivered. This seems quite like a kernel bug, but I drilled back to 2.6.29 and it was still failing. (That's about 3 years back, and shortly before that squashfs goes away so testing older versions gets unpleasant.)

I bumped into this with the Qualcomm Hexagon last year, using their port of uClibc 0.9.30 on the development board, so it's not a recent introduction in uClibc and it's not a QEMU glitch.

What it IS is fiddly. My native build infrastructure pauses for 3 seconds to let me hit space and get a shell prompt. If I do so, and then run /mnt/init (which is what it would hand off to if I don't interrupt it), the bug doesn't happen. (Even if I exec /mnt/init: the signal gets delivered and the m4 configure proceeds normally.)

I think next I have to dig into the kernel's signal delivery path, which isn't an obviosu read in the kernel sources. Time to dig out Robert Love's book again...

September 1, 2011

I have a month off between the end of my contract and starting full-time at Polycom, and although there's a bunch of travel scheduled, I'd like to get Funtoo bootstrapped. (That's the distro the Gentoo founder went on to do after he took some time off and returned to find gentoo flooded by Debian developers who'd fled their flamefest during the interminable "debian stale" years, but brought the acrimony with them.)

First, I need to get Linux From Scratch working, which means I need to fix the strstr/alarm hang.

My whole Aboriginal Linux design is predicated around leveraging Moore's Law to throw CPU time instead of engineering time at the cross compiling problem. However, while my netbook is cute and convenient, it's admitedly a bit underpowered for what I do with it. Sometimes, you have to find a happy medium.

Currently, I'm tracking down changes in the sigaction() code in uClibc. The C library on the target comes from the native compiler (at least when you're not building with NO_NATIVE_COMPILER), so you don't need to rebuild the cross compilers but can just do:

rm build/native-compiler-i686.tar.bz2 &&
./build.sh i686

Unfortunately, rebuilding the whole native-compiler.sh stage means rebuilding binutils and gcc, which are FSF packages and thus pigs that take forever to build. If I just stuck some printfs into uClibc, I shouldn't need to wait for gcc to rebuild.

(Actually I stuck in sprintf() and write() calls so as not to perturb the unnecessary complexity of the ascii stdio layer from the middle of the pthreads implementation. Look at the fflush() and ungetc() man pages, and realize that the buffer structure associated with that has not just atexit() functions which sometimes fail interestingly, but also locking to make it thread-safe! Whee! Doing sprintf() on a local buffer and calling the write() system call shouldn't have that problem.)

So what I really want to do is rebuild _just_ uClibc, install it into the existing native compiler, then package that up into a system image. Aboriginal Linux isn't really designed for that, but it can be done. First I edit the copy of the source in build/packages/uClibc to have the changes I want, then:

NO_CLEANUP=1 STAGE_NAME=native-compiler more/test.sh i686 build_section uClibc &&
tar cvjfC build/native-compiler-i686.tar.bz2 build native-compiler-i686 &&
rm build/root-filesystem-i686.tar.bz2 &&
./build.sh i686

Unfortunately, NO_CLEANUP has slightly the wrong granularity, I want to blank build/temp-i686 (to make sure I always "make all" the package in question), but I don't want to blank the output directory (in which case I need to rebuild native-compiler-i686 or at least extract its tarball). Currently, there's no way to distinguish this, which is really a user interface issue. It's hard to figure out a reasonably clean way to specify a fiddly internal detail. (Luckily, in this case the other stages ./build.sh runs don't have NO_CLEANUP and thus blank the tempdir for me, but if I change my mind halfway through a uClibc build and want to ctrl-c and restart it, I might want to rm -rf build/temp-i686).

I note that even in this context, where I'm running the same build over and over to test minor variants, I'm not particularly interested in anything other than "build all" within a package, I'm just selecting _which_ packages to build. I still believe the make command has outlived its usefulness.

August 28, 2011

Armadillocon was loads of fun, handed out a six pack of Penguin Mints to various people, spent way too much money in the dealer's room... I'd do a con writeup but it would basically be "I attended a lot of panels". (Only a couple people there I knew, and those were fairly casual acquaintances of the "I went to a party at your house once, about three years ago" level.) Maybe later.

Now I'm hanging out at the UT student center with my laptop (Einstein's closed before I got there), meaning it's back to the horrors of autoconf.

The 31,000 line ./configure script autoconf generated intercepts stdout and stderr not just of the failing test itself, but of the shell script around it. How do you find the start of the snippet that runs the test? It's not easy: the horrible agglomeration of generated crap that is autoconf isn't indented consistently.

Take the trap handler that runs on exit to destroy the evidence (so if you add an "exit" right before the failing test program runs, it deletes the source and the generated file you're interested in looking at) starts on line 2879 and ends on line 2974. That's 95 lines of shell script, incorporating four here documents, for a trap handler. It's not even a shell function, it's a giant inline blob. Code that's five if statments deep often isn't indented at _all_.

So if I make significant changes to the test code (such as splitting it out of ./configure and running it by hand), the hang doesn't happen. If I stick echo or printf calls into the thing, they get intercepted by ./configure. The files in question are deleted behind my back even if I break out of the script. And my guess is that the failure is in the uClibc pthread signal handling, (although the hang also happens with NPTL), so debugging what's actually going on is going to be horrible even when I manage to drill down to that level.

I don't want to sit there for 10 minutes while autoconf grinds through its weirdness under qemu every time, so I need to strip down my test case to something that can reproduce the failure faster. I already chopped up my lfs-bootstrap image to _just_ have m4 in it. (It didn't depend on zlib, and that was the only package that built before it, so this wasn't hard.) It's set up so I can rebuild the squashfs via cursor up, and then re-run the native build via cursor up in another terminal.

Another piece of fun: tests set variables thet are used by other tests, so if you comment out large chunks of them, the next test breaks with things like "if test $varname == yes" (if $varname isn't set and evaluates to nothing, and isn't in quotes, test has no left operand and throws an error). Things like "AWK" and "CC" and "SHELL" need to be set, but so does "ac_objext" (which is the variable that's actually used for object file extensions, yes the ".o" extension is not considered standard enough to rely on).

I wasted an hour or so digging through the ./configure script trying to chop out time consuming chunks piecemeal, and made it to about line 4600, of which I'd managed to "if false; then ... fi" out about half, accumulating a half-dozen environment variables that needed to be set or the configure would break and abort. The test I want is a bit after line 24,400 so this is not a useful approach if I want to finish this part today.

Time to get out the flamethrowers. Basically what this thing is doing is 1) setting a bucket of environment variables, 2) accumulating a confdefs.h file. It reproduces the same state every time, so we don't need to waste so much CPU time doing it: search configure for "strstr works in", back up a few lines, and insert "set >&6". (The redirect is because stdout is now on filehandle 6, because the developers of autoconf need to be stopped.) This dumps the environment variable state just before the failure, in a format we can use to set everything back.

Now we run the build, piping the output of native-build.sh through tee so we can save it to a file, and letting it run until it hangs. Kill it, grab the log file, run dos2unix on it (because each line from the emulated serial console ends with newline+linefeed and we only need the newline), and trim out everything before and after the environment variable dump. Copy it into the lfs-bootstrap directory as a shell script we can source.

(Did I mention I spent several weeks, some time back, setting everything up so that ctrl-c would exit an aboriginal linux native build? This involved getting a controlling TTY which /dev/console isn't, moving the shell off of PID 1 which has most signals blocked so it would ignore ctrl-c anyway, and then calling shutdown() to exit the emulator. It's sources/toys/oneit.c and sbin/init.sh setting it up, and together they make it look like I didn't do anything because it just works.)

Anyway, repeat the above trick to snapshot "confdefs.h" as well (this time using the exotic "cat" command), chop out everything from where I left off in my earlier trawl to the start of the test I want and instead insert "source /mnt/blah.sh" and "cp /mnt/konftest.h conftest.h", and...

It dies telling me BASH_VERSINFO is a read only variable. Chop that out, and EUID, and... Ok, now it's not telling me what variables its dying on? Add some "echo here" to my environment variable setting script (redirecting to &6 again) and it's "GROUPS=()" of course, now it's back to claiming PPID is a read only variable, and SHELLOPTS, and...

Yes! I can now reproduce the hang in about a 15 second test!

Now to start debugging the actual _problem_. Starting at 8pm, and I have work in the morning...

August 27, 2011

Armadillocon!

First two panels I think I was the youngest person there, where is sad, but it seems larger, more active, and more fun than the last couple times I was here, so can't complain.

YOU GUYS! YOU GUYS! WORLDCON IS IN SAN ANTONIO IN 2013!.

Must... SMOF!

August 26, 2011

I'd forgotten how frustrating trying to debug the strstr alarm hang is. It doesn't happen when you break out conftest.c, build the a.out, and run it from the command line. It requires ./configure to be doing its strange multi-layer nested shell instance thing, while the output is delayed slightly (either through a real serial console with its per-character delays, or through a virtual serial console piped through tee). Which is odd, because conftest itself doesn't produce any output other than "Alarm clock".

Of course the _real_ fun is that configure is pathologically badly designed. The project's _goal_ is to be layers of nested workarounds accumulated over the years for systems nobody can regression test anymore. Here is an actual quote from the ./configure in m4 1.4.14:

# Find a good install program.  We prefer a C program (faster),
# so one script is as good as another.  But avoid the broken or
# incompatible versions:
# SysV /etc/install, /usr/sbin/install
# SunOS /usr/etc/install
# IRIX /sbin/install
# AIX /bin/install
# AmigaOS /C/install, which installs bootblocks on floppy discs
# AIX 4 /usr/bin/installbsd, which doesn't work without a -g flag
# AFS /usr/afsws/bin/install, which mishandles nonexistent args
# SVR4 /usr/ucb/install, which tries to use the nonexistent group "staff"
# OS/2's system install, which has a completely different semantic

That package was released in 2010, at which point none of those operating systems had been particularly relevant for over a decade. Would the 2010 release actually work on any of them? (Hint: there's a test for the command line option to put the compiler in C99 mode.) But every build has to slog through pages of legacy historical crap anyway.

That's pretty much the FSF in a nutshell, and why nobody can ever fix their stuff. Even if you chop through their giant hairball to try to fix something, and even if you jump through the copyright assignment hoops to fill out and _mail_ the bureaucratic ink-and-paper forms (yes really!) to give them permission to take your patch, nobody can ever regression test your changes against all the various broken historical platforms they still inexplicably care about. All you can do is add one more layer for your special case.

The sad part is that they keep convincing Linux guys to drink their kool-aid. The lwn.net guys call this The platform problem: don't add workarounds to your package, fix the broken prerequisite. But the FSF has an enormous case of "not invented here", and thus is NOT DOING OPEN SOURCE RIGHT. If you actually find yourself writing a test "checkign whether getopt is POSIX compatible", you are doing it wrong.

(When I was maintaining busybox, I responded "fix your build environment" to a number of bug reports. I was happy to explain to them how to do it, but I wouldn't add legacy crap to busybox to work around somebody else's bug. Instead I put up statically linked binaries for Linux, on a dozen hardware targets: if you can't competently manage to build this thing, here you go. But making it work against bionic was not my problem. Alas, Denys thinks that windows support is his problem. THAT is feature creep.)

August 25, 2011

The Linux Foundation has announced its keynotes for its european convention, and it's pointy hair all the way with their a couple token excpetions: Vice president of ixnos, vice president of Oracle, _two_ executives from Red hat (a vice president and a "president and general manager"), the executive director of the Linux Foundation itself, and of course Linus Torvalds. (Linus is highly technical, but that's not why he's here. Paying Linus' salary was the reason OSDL, which became the Linux Foundation, was formed in the first place. In exchange for the salary they get to parade him in front of investors twice a year. They're trotting him on stage in his capacity as Linux's other mascot.)

The wildcard is Dirk Hohndell, who was head geek at KDE and then head geek at SuSE, and now seems to have wound up at Intel. He's capable of geeking out big time but it looks like he's doing an "in my day" history speech in honor of twenty years of Linux, rather than anything actually technical.

I am reminded of the Linux Foundation's previous conference in which the foundation's executive director's keynote was "What would the world be like without Linux", the Red Hat guy du jour pondered "The Next 20 Years? Who Knows?" (honest and truly these are the titles), Linus Torvalds got interviewed about Linux history, and the guy who used to be in charge of IBM's Linux Technology Center (but isn't anymore) talked about how Linux is a disruptive technology. (Here's an article I wrote a dozen years ago about how Linux is a disruptive technology.)

So we have achieved meta levels of "been there, done that". We don't seem to be trying to _learn_ from any of this history, this is about congratulating ourselves, with the Linux Foundation benevolently taking credit for things other people did (which is usually the FSF's job).

I'm aware there's an anniversary on, but there's _always_ an anniversary on, we just got over the 40th anniversary of Unix. Every year it's some even multiple of an anniversary of gcc, anniversary of X11, of the microchip, the internet...

What I would very much like is for the Linux Foundation to stop pretending to be remotely technical, or at least for people to stop falling for it. This is a pointy haired marketing organization attempting to represent Linux to the clueless the way AOL tried to represent the internet. "You can buy water from us" is not the same as "Water: we own it." Faceless and interchangeable executives who don't even represent a project but represent organizations selling multiple products? That is not a technical conference.

(P.S. Several people have been wishing Tux a happy birthday, but Tux's birthday is actually May 9, 1996. Well it is.)

August 24, 2011

The aforementioned regression testing is being crotchety. Running the old Linux From Scratch build (version 6.7) through the Aboriginal 1.0.3 i686 image is hanging in the m4 build (the second package in the list) with "checking if strstr is running in linear time".

I actually hit this last year at Qualcomm, doing the native LFS build on the Hexagon. It's because the test program is calling alarm() to kill the test after a few seconds (I.E. it's not running in linear time), and the signal is being ignored.

It's a funky race condition because under QEMU, it hangs when I pipe the output to "tee" but _doesn't_ hang when I just let stdout go straight to the xterm. The only thing that changes is timing, which means signal delivery is missing some locking or some such in uClibc.

At Qualcomm I just looked up the name of the hanging test process and ran a background process to killall that test process name every 15 seconds, allowing the ./configure to proceed. Workaround, not fix. This time, I need to fix uClibc, which means digging down into fiddly library guts. That's going to take a large block of time without distractions so I can concentrate and think through it, which says "weekend", but this weekend's armadillocon...

*shrug* I can at least narrow down the test environment in my bits and pieces of free time...

August 22, 2011

Aboriginal Linux 1.0.3 is out.

I almost called this 1.1 because it's got both Linux 3.0 and uClibc 0.9.32 (which _finally_ has NPTL)... but I'm not using NPTL, the uClibc .config is still doing pthreads, because with 5 years of uClibc fork finally merged back in and released, I need to regression test the REST of the library before swapping out the thread implementation.

Next release I can switch on NPTL, which means I should be able to build glibc under LFS, and that one should be my 1.1 release.

August 21, 2011

Various people have pointed out that the left hasn't been doing its job, and that this is a problem.

Traditionally conservatives defend the status quo, and liberals try to figure out how to improve on it. That's pretty much what liberal and conservative _mean_. Unfortunately, aging baby boomers have decided the best defense is a good offence, attacking liberals (largely with straw men arguments) to show that any alternative to what they want is unthinkably wrong.

The job of liberals is to attack sacred cows, and point out that even capitalism and democracy have inherent flaws which much be addressed. There is no human activity that can't be improved upon. (Even sex needed birth control and various disease cures.)

Democracy and capitalism flawed? Sure. Democracy has been famously called "four wolves and a lamb, voting on what to have for lunch", and the tyranny of the majority isn't the only failure mode: the electorate can be bribed with their own money, often make short-term decisions at the expense of the future, often votes based on ignorance or superstition, and so on. Democracy is "the worset form of government, except for all the other ones".

The united states was the first stable democracy in over 2000 years because its founders set up a constitution (and then patched it the bill of rights) which mitigates the worst excesses of democracy. Things like freedom of speech and separation of church and state; even when people vote for "Jim Crow" laws they can be declared unconstitutional and struck down. Athenian democracy didn't scale not just because the majority could vote anyone into exile (and often did), but because people had to vote directly on every issue and were fined for not participating, over and over again. Representative democracy delegates authority to people who can (in theory) work on understanding complex issues full time, with a staff of experts to educate them, so they can make _better_ decisions than the people who elected them. The executive/judicial/legislative three branches of government, having two legislative bodies (house an the senate), the electoral college... we tweaked the HECK out of democracy to make it work for us.

Liberals are the people who go "um, the electoral college has probably outlived its usefulness", "the Lame Duck period between the election and the new guy taking office can be shorter now that we don't travel by horse", "wouldn't instant runoff voting work better now that we have the internet?" I.E. the system can still be improved upon. Democracy isn't perfect and must be extensively hacked around in order to work in the real world. It's possible for change to make things worse, but stagnation is not an improvement.

Capitalism isn't perfect either. Unregulated capitalism has at least as many problems as unregulated democracy.

Start with "cornering the market". You can't get rich by producing a commodity cheaper than the other guy, you have to dig a "moat" around your business that allows you to charge a premium. Every non-zero profit margin is a soft underbelly for some competitor to undercut, the way you protect your margins is by preventing competitors from doing what you do. This can be by innovating faster than they can copy, or it can be by preventing them from copying you in other ways: patents, contracts, copyrights, trademarks, and trade secrets. (In pure capitalism you send a team of goons around to your competitors' place to beat them up. Russia has a lot of that right now.)

In supply and demand terms, cornering the market is about restricting supply. Similar games existing on the "creating demand" supply, from getting kids hooked on tobacco to "This is a nice army base colonel, we wouldn't want anything to happen to it". Microsoft's history is full of examples of this, variants of "monopoly leverage bundling" such as the "CPU Tax", "Embrace and Extend" of various standards, and all the fun around declaring Internet Explorer part of the operating system.

Another fun failure mode of capitalism is its inability to deal with the idea that anything could be "free". The logic of capitalism is similar to linear algebra, attempting to find optimal price points via cost/benefit analyses... but terms with a zero coefficient drop out. If clean water and clean air are free, there's no need to conserve them until they run out. Using ten times as much water to save 1% of costs elsewhere is a great decision, and demand on any resource that's too cheap to meter keeps increasing without regard to efficient usage until it stops being too cheap to meter.I.E. capitalism is an excellent mechanism for regulating scarcity, but in the absence of scarcity it constantly works to create it. The constant attempts to charge per-kilobyte for the internet are just one example of this. (The boom and bust cycle of resource depletion vs unemployment is tangentially related: capitalism can't NOT overextend itself, over and over. Planning ahead is poor business, a stable well-run company is subject to a leveraged buyout.)

And of course maintaining currency is its own can of worms, which I've written about before. Here's a fun article on that.

Pretending that capitalism only provides what consumers want is far from reality: consumers don't want copyrights lasting "life of the author plus 95 years", but the public domain is the kind of "free" that capitalism is compelled to fence in, corner the market on, and then force you to buy from them. Want to sing "happy birthday" to your kid? We'll charge you for your own culture.

Again, like democracy, capitalism is the worst system except for all the other ones, and we hack and patch it to make something useful. This is the job of liberals: figure out how to improve what's there, by identifying its flaws and trying improvements. If you claim there ARE no flaws, you're not dealing with reality.

Patents were originally one such patch to capitalism: guilds used trade secrets to corner their markets. (They also used violence but that was easier to outlaw.) This led to a lot of scientific advances being lost as they failed to be passed down from master to apprentice on their deathbed, and instead died with the few people who knew them. A patent gave the owner a legally enforceable monopoly on some scientific process... for a limited amount of time. The point of patents was that they forced people to document their valuable knowledge in order to take advantage of them, and after a set amount of time they would expire and the knowledge would be in the public domain.

And now patents have come full circle: software patents are purely obstructionary, a huge problem, and the correct thing to do is probably abolish them. Software should never have BEEN patentable, but capitalism can't see anything free without working out how to charge for it...

The tug of war between conservatives and liberals was always "what should be preserved" and "what should be changed". Unfortunately as the baby boomers get older, more of them fall into black-and-white thinking where "what is what must be", "our treasured institutions are perfect", and anyone who criticizes them in any way wants to destroy everything, in their day kids respected their elders, and everyone should get off their lawn.

The "capitalism is perfect" and "democracy is perfect" mindsets combine badly. People are voting for another idiot Texas governor who flunked economics (Rick Perry, I.E. George W. Bush only more so) based on the claim that a strategy to under-bid the other states and suck what jobs there were to Texas with low wages, shoddy environmental regulation, no need to pay health insurance... obviously he can do that for the whole country, right? If the jobs move to Texas and the rest of the country stoops to Texas's level then... they spread back out again because your zero-sum game hasn't actually created anything, just moved it around? (Let's leave aside that people moved here for cheap housing during the housing bubble, and although their continued need to buy food created enough demand to bring a bunch of jobs with them, unemployment's still as high here as in the rest of the country. The PERCENTAGE of people with jobs didn't improve. So really, Perry was irrelevant to what happend, but what he TRIED to do is the same zero-sum game.)

A policy that helps one state at the expense of others does not scale, but if you don't understand that cornering the market helps the few at the expense of the many you can't see why it won't work.

Even after 8 years of an idiot trying it, and proving that it didn't work. But that's just science talking, learning from evidence involves learning, which the baby boomers have moved past.

August 20, 2011

I sometimes ponder WHY the Republicans do what they do. If they do "drown government in a bathtub", do they honestly believe we'll achieve Utopia instead of Somalia?

A few libertarians might. The nutso absolutists who believe that Microsoft being able to hire Blackwater would be a good thing because capitalism should never be restrained in any way from turning the poor into Soylent Green. But these guys are a small looney fringe, although I've actually met a few.

As always, "follow the money" to figure out who's pulling the strings. We have the best political system money can buy, and I think what the billionaires who are gaming the system hope is that if we revert to feudalism (which is pretty much what the Somali warlords have), then billionaires get to be kings. The poor become cheap labor: all the starving peasants will till the fields for them and they can staff their mansions with servants who work for room and board just like in the 1800's. And this isn't an all-or-nothing transition: these days they have to go to other countries to find people to sew shoes for 12 cents an hour, but they don't want spanish-speaking help, they want british Au Pairs as a room-and-board live-in nanny. They want more of that, attacking the middle class's standard of living is a calculated effort to increase the pool of cheap skilled labor in an acceptable ethnic flavor.

Personally, I think we'd declare billionaires a game species first. London's already got riots, and their economic problems are just a skin rash in comparison to real trouble. (And of course the best way to increae pressure is to try to put a lid on it, the way that moron David Cameron keeps doing.)

Medieval peasants were so docile because they were not just illiterate and ignorant but brainwashed from birth to be obediant, addicted to "the opiate of the masses". They had catholic dogma crammed down their throat to make the perfect slave class: "this world doesn't matter, only unquestioning obedience will get you into heaven." The "you are sheep, I am your shepherd" motif is pretty darn explicit in that theology. Of course the people telling them this were the rich of the time, the kings "appointed by god" and their bishops in velvet robes. (And of course the worst crime, punishable by death, was translating the bible from Latin into the local language so you could actually see for yourself what it said. That was anathema. People got excommunicated AND executed for that.)

These days, torches and pitchforks have been replaced with molotov cocktails and the Improvised Explosive Device, and the populace at large is far more willing to USE them when it feels seriously threatened. Plus we've got twitter, and while the various arab dictators try to shut down the internet to stop angry people communicating with each other (just like the head of the Tory party in britain proposed in response to the riots), people in general are more connected. Even without the internet, the fact we can all _read_ makes serfdom unstable.

(An interesting historical note is that Russia went straight from serfdom to the USSR, bypassing the enlightenment, and their populace continued to act like serfs until a few generations went by and education caught up with the populace. Then the USSR fell over due to subversive literature propogated by photocopier and distributed peer-to-peer: by hand.

What kind of government people accept depends largely upon the _people_. The current "we can't be bothered to learn, any science we don't already understand must be wrong" movement is because the baby boom started 65 years ago (and the youngest of them are in their late 50's): they're old fogeys flustered by newfangled technology like smart phones, praying for afterlife the same way they were investing for retirement in the 90's. They're moaning about "kids these days" and screaming at the world to get off their lawn the same way they wanted Sex Drugs and Rock and Roll back when they were teenagers. That's just a demographic blip (albeit a big one that gave us leisure suits and yuppies along the way). We're still a literate population surrounded by technology created through science. We're not medieval peasants, and are unlikely to suddenly start acting like them.

But why should Republican end goals be any more realistic than the rest of their uninformed anti-science ranting? It's the death spams of the baby boom. Alas, according to Google US life expectancy's around 80 now, so we may have to weather another couple decades of this, although the Republicans continuing efforts to screw up health care ("Can we afford grandma? Let the market decide!") may chop a few years off that.

August 18, 2011

What happened to history courses?

The people insisting that global warming can't possibly be man made don't remember the Dust Bowl. Eighty years ago we destroyed _millions_ of acres of farmland in a five year period because of bad agricultural practices. This wasn't "Goats rip up and eat the roots of plants, allowing topsoil to blow/wash away, which isn't a problem in their native mountains but is when humans take them down to the plains, thus the sahara desert was at least tripled in size by human activity in the past few thousand years". No, this was "rather a lot of Oklahoma is airborne and it's literally raining down on washington DC, displacing large numbers of US citizens who must migrate or starve".

It was too obvious to ignore, and we still have photographic evidence, but the people who actually experienced it have died so obviously human activity can't cause widespread environmental change such as global warming. Not when a hundred times as many people are doing it over a period of decades...

Why does nobody remember the phrase "yellow journalism"? Rupert Murdoch is just William Randolph Hearst warmed over.

And of course Keynesian economics is out of fashion, even when its conclusions are obvious. In a depression, the normal economic tools the government uses to steer the economy stop working because you can't push on a string. When interest rates hit zero and get stuck there (which is essentially where we are now), banks become useless. They have no incentive to lend because they make no profit off of it, but they have nothing ELSE to do with deposits. The government can't drive interest rates BELOW zero to get the economy moving, so they have to do something ELSE.

The great depression in the 1930's was a "demand-limited liquidity crisis", where nobody spent any money so nobody could earn any money. Because everybody was either unemployed or terrified of losing their job, people who did earn some money either saved it or paid off existing debt (which they had buckets of due to the stock market crash, the same way we do today due to the mortgage crisis). Successful businesses which had piles of cash also hoarded it: they didn't hire or expand because there were no new customers to be had, because nobody was spending any money. Because of this, attempts to rain money down on businesses with bailouts, tax cuts, or low interest rates (making borrowing essentially _free_) had no effect, they just sat on that money too. (And when every job opening is mobbed by hundreds of desperate unemployed people willing to lie through their teeth to get ANY JOB, hiring somebody qualified is actually kind of hard.)

The fix for the problem was president Franklin Delano Roosevelt hiring people _directly_, bypassing the banks and corporations and offering millions of people jobs as employees of the federal government. He set up program to hire every unemployed person, figure out what they were best at, and pay them to do it. If they weren't good at anything, they got on the job training.

(Oh how the republicans screamed, and FDR's famous response, I welcome their hatred, was the most kickass presidentail thing since Teddy Roosevelt got shot in the chest by a would-be assassin and FINISHED HIS SPEECH before seeking medical treatment.)

If people were unemployed, FDR's government employed them building roads and bridges and schools, writing tourist brochures, documenting the history of every small town in the west, and every other job they could think of. They wired up the most rural towns with electricity and telephone wires and clean water and sewage systems, and set the stage for a century of prosperity with massive infrastructure investment. We're STILL USING all that stuff, even though it's crumbling after 70 years and dreadfully in need of replacement.

I.E. we know how to fix this. And we know what DIDN'T work. FDR took over from a guy named Herbert Hoover, who threw billions of dollars at banks and corporations that don't result in one extra job, and made speeches about the market's need to fix itself and how the government shouldn't interfere.

Obama's economic policies are the same "pushing on a string" waste of time and money Herbert Hoover tried, which actually made the problem worse by ensuring that large established institutions had plenty of cash to ride out the crisis without DOING anything. Whatever happened they'd be fine, they were Too Big To Fail, they could sit on their stockpiles of wealth and smile as everyone else starved.

I think every high school student should read this. Unfortunately, while I generally side with science in matters of evolution and global warming and so on, the Republicans are the most willfully myopic group of "anti-intellectual" willfully ignorant morons since the mob that burned down the library at Alexandria.

Oh well. Self-limiting problem, they do enough damage the system will collapse on them too.

August 15, 2011

And the GPLv3 supporters are attempting to FUD the GPLv2 again, implying that android is going to have free software copyright trolls descend on it the way for-profit patent trolls have. Brilliant. "Stop using Linux, port a BSD kernel like the MacOS X people did" seems to be the message.

Bull. They're full of it.

Look: I _started_ the BusyBox lawsuits. I'm the one who hooked BusyBox up with the SFLC (No anchor tag, you'll have to scroll down to April 10, 2006) in the first place, to deal with the "Hall of Shame" of license violations Erik Andersen had accumulated. And I spent the next year doing license enforcement (which last came up in this blog here.) Before that, I spent a couple years defending IBM from SCO. Before THAT intellectual property law was a was a longstanding hobby of mine. The last GPL enforcement suit I was involved in finally wound its way to a close last month (it involved gpl-violations.org and FSF Europe vs a french company and lasted _forever_). I've spoken about GPL enforcement on a panel moderated by Eben Moglen, who co-authored GPLv2 and GPLv3. I have as much _practical_ experince in doing this as anybody.

So let's start with the Android thing: the BusyBox suit had the project's founder and its then-current maintainer as principals in order to establish standing: I.E. we did represent the project. Both of us had written large chunks of the thing (between the two of us, at the time, probably more than half). Linux has zillions of contributors but most of 'em you can deal with the way Linux dealt with the Unix allocation function SCO flipped out about: yank it and move on.

A large project like Linux has a couple dozen significant contributors whose code is not easily removed, but most of them don't spend months of their life pursuing lawsuits. The bigger the contributor they are, the LESS time they'll have for legal entanglements. Those who do, such as Harald Welte who founded gpl-violations.org, tend to stop coding. Notice that the busybox suits were mostly pursued by Erik and myself _after_ we'd retired from the project; we were way too busy when we were actually productively coding on the project.

In addition to standing, we had tens of thousands of dollars of legal time donated to us by the SFLC. (Initially, anyway, they then took the settlement money to pay themselves back, and turned into a self-funding lawsuit machine.) So a GPL troll has to find a sponsor to front the money for months of legal work.

Unfortunately, that self-funding lawsuit machine got hijacked by clueless morons, with the result that large companies stopped using Linux entirely. (I.E. the FSF's flaming stupidity is why Linksys is no more. It doesn't get any more counterproductive than that.)

From an engineering perspective was it worth it for busybox? Nope. We literally never got a single line of useful code out of any lawsuit I was involved in. The SFLC got the occasional large check (most of which went entirely to the SFLC to fund the next lawsuit), but the code releases we got were pointless drivel witheld by clueless developers who mostly couldn't code their way out of a paper bag, as the saying goes. When they weren't vanilla releases, they were generally a random source control snapshot plus backports of other patches out of our source control, a couple changed default values (hardwired into the code instead of properly done in the .config), and some debug printfs. It was uniformly useless to us.

I actually updated the busybox license page with a big long explanation of what we actually _wanted_, which pretty much covers what open source's engineering need is. (The SFLC then partnered with the pointy-haired Linux Foundation to come out with their official guide to license compliance. *shrug*.)

So lawsuits were a way to punish behavior we didn't like, but when we got them to stop that behavior it was because they stopped using Linux entirely. It did not result in us getting code the authors didn't want us to have.

The FSF is trying to shoehorn people into GPLv3, which is crap. Code is either "using the same license as the Linux kernel", or it's using Something Else That Is Not Compatible With The Linux Kernel (Sun had the CDDL, OSI had its own license, etc). GPLv3 is Not Compatible with The Linux Kernel.

The FSF is also a big proponent of copyright assignment, which is a big part of the reason they were the Cathedral in The Cathedral And The Bazaar. Yes, the paper that convinced netscape to open its source code was comparing the FSF vs Linux, not proprietary for-profit development vs Linux. The Free Software Foundation started trying to produce a usable system in 1983. It failed, and one reason was requiring copyright assignment which prevents drive-by bugfixes. Of course there were plenty of other reasons the FSF peaked in the mid-90's. The point here is GPLv3 was an attempt by the FSF to regain political relevance, NOT something the community actually needed for any reason.

Android has many things wrong with it, but this license FUD is complete and total BS.

August 10, 2011

So twitter's simple web interface was finally replaced by New Coke this week. I wouldn't mind so much if twitter hadn't gone out of its way to eliminate third party twitter apps. (They have their own official twitter app for Android, which sucks horribly. None for Linux I'm aware of.)

How broken is "new twitter"? When you reply, it reloads your tweet list. You can only scroll back about a day, so if you don't stay on top of reading your tweets you have more loaded than twitter is capable of showing you again... and responding to any of them will blank your cached list, replacing it with the twenty most recent tweets. You can't be completionist and read all your tweets, twitter _won't_let_you_. Twitter "knows" how you're "supposed" to use their service, and if you don't use it the way they want you to use it, they punish you for it. "No no, our tool can only be used THIS way, you can't use it any other way."

Basically twitter has horrible aesthetic judgement and an unflinching determination to shove that horrible aesthetic judgement down their users' throat. They had a simple idea (not a new one, "brevity is the soul of wit", although 256 characters was a more logical cutoff :) and took the high ground first, so they got the users. I use twitter to read what the people I follow have to say, I want twitter itself to stay the heck out of my way. They refuse to do that, and I am not the only person annoyed by this.

As with Horrible Retweets, twitter gives us no choice and simply doesn't care what we think. They're sure we can get used to anything if forced to do so. As someone who avoided Windows through Desqview, OS/2, and then Linux, and has abandoned Gnome for KDE and then KDE for XFCE when they got unpleasant to use, I may be unusual in not going along with this. (People are always going on about choice and comparison shopping and then acting stunned that I don't have a Facebook account. You mean you EXERCISE that choice rather than going along with the herd? What's wrong with you?)

The twitter app on my phone keeps going into a mode where it stops talking to the network. (Basically the first hanging network transaction never times out. Ever. At the moment, it's been stuck for a day and a half.) Whenever I kill the twitter app (which I have to do to fix the network hang, and also happens when the phone reboots due to using it as a USB network connection for my laptop which panics the android kernel regularly, or due to the battery running out, or for other reasons) I lose all my history, and twitter still only goes back a day or so. And of coruse, this being android, I can't start a second copy to look at current tweets while keeping the old one open to grind through the history.

I went through almost a dozen twitter apps on Linux before finding one that's up to date with the authentication mechanism du jour. (I.E. I installed many apps that couldn't connect to my account and show me my tweets.) It's called hotot, and it has Thunderbird Disease, meaning its network access is synchronous and while it's performing a network transaction it won't even redraw its window. (If you wipe another window over it, you get to stare at grey until it completes talking to the network.)

When the network is slow or the server it's talking to is busy (I.E. always), these hangs are anywhere from 15 seconds to 5 minutes. And there's no way to tell it NOT to start a network transaction while you're in the middle of typing something, I had hotot hang to talk to the network twice while composing the same 140-character tweet. And yes, this hang means it's not listening to what you type while it does this, it's fully hung until the network transaction completes.

Thunderbird you can at least throttle into submission by disconnecting from the network while you're composing an email, and then reconnecting to send it. (Or compose the email in mousepad and then cut and paste it into thunderbird's email composer.) Alas, hotot pops up a full-screen modal dialog about "unknown network error" you have to click on a button to dismiss. I haven't timed how often it does this, but somewhere between thirty seconds and a minute I'd say.

Hotot is written in a mix of python and javascript, which theoretically makes it easy to modify, but it's split over multiple directories that install into different places on the machine (/usr/share/pyshared, /usr/share/hotot, /usr/share/locale...) the UI is using some GTK extension library I don't understand so control flow is via callbacks I'm not following... Worst of all, the project has no mailing list or email contact for the maintainer, and he doesn't respond to comments on his blog. (Not being that good with english seems to be part of it.) And yes, this is still the best of a bad lot.

Google Plus is looking better and better. They've showed some spikes of incompetence, but not actual _malice_ yet. (And my original objection now at least links to an explanation, so they seem more overwhelmed than actually bad at this.) They have at least never done the "bwahahaha you have no choice" thing twitter's all about these days. (And the "how dare you demand they not break stuff that used to work" defenders of the service.) I suppose I could also look at Tumblr, but that seems like a sequel to livejournal more than anything.

(No, I still don't care about identica. I just don't. The important thing about twitter was always the people I followed, and they're not on a preachy service that exists to complain about theoretical rights issues, and never will be. Sure that community being tied to a proprietary database that's bundled with horrible UI that keeps changing is a huge downside that's biting me right now: but infrastructure without users is pointless. "If you build it they will come" is naieve and insufficient, and if making the rights issues your main draw even REMOTELY worked The Hurd would have beaten Linux for third place in the OS wars instead of being somewhere behind both OpenSolars and NetBSD.)

August 8, 2011

I wonder if the UK could re-do their voting reform effort in the wake of the Murdoch scandal?

Instant runoff voting is great, and darn simple: you vote by ranking the available choices, and they count the votes by looking at everybody's top pick. If nobody's got 50% yet they kick the worst loser off the island, bumping each of their voters to the next choice on that voter's list, then recount. Repeat as necessary until somebody passes 50%.

Unfortunately nobody seems to be able to clearly EXPLAIN that, the best video on the subject is honestly this kindergarten-style disaster. The rest are _worse_ than that.

Why is instant runoff voting awesome?

It eliminates most of the problems with the current "all or nothing" voting system, starting with the need for "credible candidates" to avoid "wasted votes". I.E. "I really wanted to vote for Perot in 1992, but it would have wasted my vote because he couldn't win". With instant runoff your vote always winds up being counted towards one of the finalists, they just work their way down your list until they find your favorite finalist.
Nobody can act as a "spoiler" the way Nader handed Florida to Bush II or Buchanan drained votes from Bush I. When fringe candidates drop out, the votes for them move to the next person on each voter's list and are still counted. This means there's never a DOWNSIDE to having another candidate: the only way they change who else will win is if they win themselves.
There's no "tactical voting". You don't have to vote to PREVENT somebody else from winning, you can put all the people you actually WANT to win (no matter how obscure) at the top of your list, before ending with a "safe" choice. (In fact the less likely a candiate is to win the HIGHER you want to put them on your list, so they at least get considered before the elimination rounds drop them out.)
Meanwhile the real wingnuts get weeded out fast because they're at the bottom of most people's lists, rather than the current problem where if they win the primary, the party has to fall in line behind somebody they only hate marginally less than "the other guy".

It's all-around a much better system of voting. The voters give the system better information ("I would be happy with any of these people, they're all better than the people who aren't even on my list"), and the result is a system that's much harder to game, which is why politicians and lobbyists with a black belt in gaming the current system oppose it so fiercely.

Best of all, instant runoff eliminates the canididates whose election strategy is "you can't risk not voting for me because the other guy is worse". The mechanics change from "secure the nomination and the base will fall in line" to "vote for who you actually want, in the order you want them, and when you run out of those maybe end your list with a safe choice". This changes how people GOVERN if they want to get re-elected.

I.E. if we had instant runoff voting, Obama could not afford to take his base for granted, secure in the knowledge they have no other choice. And I wouldn't have to let a crazy person win to get rid of him.

August 7, 2011

It's official, Obama's a Republican. He's a Rockefeller Republican instead of the current "Clean Cup, Move Down" Party, but he's still to the right of both Ronald Regan and Richard Nixon. And that was _before_ he threw Elizabeth Warren under a bus. His economic policies are straight out of the Herbert Hoover playbook.

This goes beyond being spineless and easily manipulated, and even beyond taking his base for granted. There's a point where he becomes part of the problem, and for me he's reached it.

I don't think I'm voting to reelect him. I know Obama's argument is things will be worse if he doesn't get re-elected, but that means his MAIN SELLING POINT is doing damage more slowly. All he's doing is making sure the Democrats share the blame for right-wing insanity. Obama took my vote for granted, because he couldn't possibly be worse than the alternative, but I'll waste my vote on a third party before I reward being COMPLETELY IGNORED.

We'll never get an actual progressive until the Republicans get enough rope to hang themselves, and until the baby boomers die off. Sure not every baby boomer in the 1960's was a hippie who dropped acid at woodstock and protested vietnam, but the baby boomers en masse made that happen. They had a hugely disproportionate demographic impact as teenagers, and they have a hugely disproportionate demongraphic impact now, but 1960 was 51 years ago. The boomers are now in their 60's swearing about "kids these days" (apparently long-bearded muslims who don't want to work). The military quagmires in Iraq and Afghanistan (and now Libya) are the boomers wanting the whole world off their lawn.

Of COURSE the baby boomers are voting en masse for ultra-conservative gum-smacking old fogeys and smug self-righteous born again preachers, the boomers are facing retirement so they can live on a fixed income until they break their hip and die. Why do you think the Republicans fielded the oldest candidate EVER last presidential election, other than to appeal to this demographic, which is their base: Crazy old white people. The baby boomers are not burning their AARP memership cards the way they burned their draft cards, they can't march much anymore but they can vote. They haven't forgotten they used to say "never trust anyone over 30", they've just learned to be hipocritical in the service of self-interest. "I know better now, sonny-boy. Better than you, whipper-snapper."

The baby boomers won't stop distorting politics until they die. Luckily, once the Tea Party destroys medicare and medicaid, this shouldn't take long.

The sad part is that as they got older and less flexible, they became set in their ways and stopped learning. This makes them about as good at running a government as the Mad Hatter was at repairing watches, for pretty much the same reason. Very few of them even understand what money is, but they're all sure they know how to fix a troubled economy, and double down whenever their ignorance is challenged by reality.

(Aside: putting rich people in charge of the federal budget is like staffing a hospital with atheletes. A gold medal in the triathalon qualifies you to perform surgery, right?)

August 6, 2011

Warning: this post is an extremely long and appallingly technical dive into the horrors of git bisect, in my usual "chainsaw and flamethrower" programming style. Reading this will drive home the fact that Sturgeon's Law applies to software (especially the stuff I write), and that HTML really needs a "footnote" tag.

Once upon a time I wrote a git bisect howto, which covers the basics but doesn't go that deeply into the real problem with bisect: digging through a repository where you can't search for just ONE bug.

Case in point: between 2.6.39 and 3.0 (maiden name: 2.6.40) the sh4 platform once again broke (possibly because its upstream maintainer explicitly doesn't care about anybody who isn't a customer of his employer). The symptom of the bug is that ifconfig panics the kernel, so launching run-emulator.sh doesn't make it to a shell prompt (unless you do something like "KERNEL_EXTRA=init=/bin/ash ./run-emulator.sh", but then you still can't use the network).

Footnote: how do I know ifconfig is what panics the kernel? Well, in addition to booting with init=/bin/ash and running the init script commands myself one at a time, there's the panic message itself saying so:

Freeing unused kernel memory: 92k freed
Unable to handle kernel paging request at virtual address 6fe0c000
pc = 8c01ca52
*pde = 00000000
Oops: 0000 [#1]

Pid : 27, Comm: 		ifconfig
...
Process: ifconfig (pid: 27, stack limit = 8fe08001)
...

And so on. A userspace program should never be _able_ to panic the kernel, and what changed was the kernel not userspace (this is why it's important to upgrade one package at a time and run the full regression test after each one). End footnote.

So we bisect the kernel between the 2.6.39 and 3.0 releases, starting with a clean repository:

git bisect start && git bisect good v2.6.39 && git bisect bad v3.0

If your repository isn't clean, bisect will complain. You can zap it with "git clean -fdx && git checkout -f", but make sure you don't need any of your local files or changes first because they go bye-bye when you do that. That's what I call a "Ripley Repository Reset", I.E. "I say we take off and nuke the entire site from orbit, only way to be sure".

Footnote: we can bisect the kernel between two arbitrary release tags, but can't bisect uClibc this way, because they "branch then tag" instead of "tag then branch". I.E. the tag they give for each release is the first commit on a new branch, so if you try to bisect between two tags it'll complain they're on unrelated branches which haven't been merged, so there's no direct path between them. (This is because the uClibc maintainer doesn't know how to use git.) The workaround for this breakage is to bisect oldtag^1 against newtag^1, I.E. "the commit before this tag" in each case, which should be on the common development branch. Since the kernel repository isn't broken this way, we don't need a workaround here. End footnote.

Bisecting for a cross compile build is a bit fiddly. The command for the sh4 kernel build we're doing is (deep breath):

patch -p1 -i /home/landley/aboriginal/ab2/sources/patches/linux-fixsh4.patch &&
cp ~/aboriginal/build/root-filesystem-sh4/usr/src/config-linux .config &&
yes "" | make ARCH=sh oldconfig &&
PATH=~/aboriginal/build/cross-compiler-sh4/bin:$PATH \
  make ARCH=sh CROSS_COMPILE=sh4- -j 3

Which is a bit of a pain, although the above is a single cut-and-paste command line because the line continuations glue it together. (I.E you can cut and paste the whole lump to the command line and it should work, although you may have to edit the paths to match where stuff is on your system.)

This assumes we've built the sh4 target already, and can thus steal its cross compiler out of the build directory, and the kernel config out of the root filesystem. (You can also download the prebuilt binaries from the aboriginal linux website. We really are trying to make this as easy as we can; it's got plenty of complexity without worrying about setup.)

Why are we patching the source at the start of that command? Because the sh4 maintainer went crazy over a year ago, and this reverts that breakage. (There is good reason sh4 and its descendants are not a widely used Linux platform, although there are some other people trying to mitigate the damage.)

Footnote: The -j 3 is because my netbook claims to have 4 processors, although it's really 2 processors with hyper-threading. The real limiting factor is that the Atom chip in it has a grand total of 1 megabyte L2 cache, so 1/4 of that is only 256k, which gets thrashed pretty badly trying to compile anything significant. This is a SAD amount of cache for a modern 64-bit processor, even the 32-bit pentium M had 2 megs of L2 cache just for _one_ processor, but the whole netbook sells for $248 at Target and has 4 hours of battery life _while_compiling_stuff_, so I can't complain that much. The point is, doing 3-way builds is a reasonable compromise for my build environment, and actually slightly faster than -j 4 for me. Your mileage may vary. End footnote.

Anyway, the first step in the bisect builds a zImage, so we need to test it under qemu's sh4 emulation. Again, the command line for this is a bit complicated:

qemu-system-sh4 -M r2d -nographic -no-reboot -kernel arch/sh/boot/zImage \
  -hda ~/aboriginal/build/system-image-sh4/hda.sqf -append \
  "root=/dev/sda rw init=/sbin/init.sh panic=1 PATH=/sbin:/bin console=ttySC1 noiotrap HOST=sh4" \
  -monitor null -serial null -serial stdio

This qemu invocation is basically "run-emulator.sh" from the Aboriginal Linux sh4 system image, with the kernel and root filesystem paths substituted and all the environment variables resolved to blank (which by default they are when you just ./run-emulator.sh). We're borrowing the root filesystem squashfs out of the system image, using the kernel we just built, and the rest is configuration telling the kernel and the emulator what to do to possibly give us a usable shell prompt.

Footnote: Don't worry if this command seems like an impenetrable tangle: it is. I separated each platform's qemu invocation out into its own file not just to make it easy for other people, but because I have to look this stuff up myself. I laboriously worked this particular command line out over a period of several months, which involved looking at other people's examples, asking lots of questions on the qemu mailing list about both kernel .config and qemu command line, getting qemu developers to fix stuff and so on. That -monitor -serial -serial stuff at the end is black magic I don't understand myself, which some qemu developer gave me to get console=ttySC1 to line up with the right virtual serial chip to talk to qemu's stdin/stdout. For most platforms -nographic hooks up the serial console automatically and it Just Works without micromanaging it, but sh4 doesn't work, it's been like that for years, and so far nobody's bothered to fix it. End footnote.

The init script in the root filesystem (/sbin/init.sh in hda.sqf) calls ifconfig, so if it gives us a shell prompt instead of a panic the kernel is good. And it does, so we want to report success to bisect. First we need to type "exit" at the shell prompt (don't worry about it complaining on exit, it's just one more rough edge in sh4: the kernel can't quite shutdown qemu cleanly because there's no sh4 bios) to get back to our host system, then go:

git checkout -f && git bisect good

The checkout cleans out our patch, so if the bisect tries to edit the file it won't have a conflict. We can re-apply it again later. In fact the big build command above assumes we're applying this patch every time, since it's needed in both 2.6.39 and 3.0, and applies cleanly to both, thus it's probably the same for every version in between.

Footnote: The rationale for breaking everything before binutils 2.19 by applying this patch was that the decade-old code was unmaintainable otherwise, but two years later the code's changed so little that simply reverting of that patch still applies cleanly. See "the sh4 maintainer went crazy", above. End footnote.

Command history is our friend here: cursor up a few times and we get the big long build command back. That's why we're doing all this && stuff, if any of the commands fail (for example a patch doesn't apply), the command line stops there, but otherwise it's glued together into large chunks we can arbitrarily repeat without too much effort.

So we rerun the build, and this time it breaks:

In file included from include/linux/sched.h:68,
                 from include/linux/cgroup.h:11,
                 from include/linux/memcontrol.h:22,
                 from include/linux/swap.h:8,
                 from include/linux/suspend.h:4,
                 from arch/sh/kernel/asm-offsets.c:15:
/home/landley/linux/linux/arch/sh/include/asm/ptrace.h: In function 'profile_pc':
/home/landley/linux/linux/arch/sh/include/asm/ptrace.h:134: error: implicit declaration of function 'instruction_pointer'

And so on for quite a while. (The interesting part of error messages is always the first bit. You want to see the crash, not the pages of flying debris later on.)

At this point, we can't test the bug we're looking for, because there's an unrelated bug hiding it. Even though this bug was fixed sometime before v3.0 (which builds), we have to find a way to fix this second bug here before we can see through to the original bug we're looking for.

Footnote: There's another option called "git bisect skip", which is useless. In theory it tells bisect "I can't test this one, give me another one to test". Unfortunately, git is badly written here, and selects an _adjacent_ commit rather than say chosing a point 1/3 of the way from one end of the range (instead of halfway between the end points). In active projects like Linux commits go upstream in large batches (a pull request is for a group of commits, and Linus pulls from multiple trees each day), so the commit that introduced this bug is surrounded by tens, hundreds, even thousands of unrelated commits when a bug makes it upstream to Linus's tree (rather than being found and fixed in a subtree, in which case the history could be edited so we wouldn't even see the broken state), there are usually large ranges of broken versions in the repository, all showing the same bug. So if "skip" is just testing the commit immediately before or after this one, it's useless in large projects because you're going to see the same bug even after a dozen skips. This is why I've never had "git bisect skip" give me a useful result.

Also, if bug #2 is present for a long time (which is likely if it only affects the sh4 build, so nobody else would notice it), its build break could be rendering _thousands_ of unrelated commits untestable, and the larger the range the more likely the commit we're looking for is in there. So skip is doubly useless: it doesn't effectively get us away from the unrelated bug, and we may need to see through that other bug to find the one we're looking for anyway. End footnote.

So we're diverting from the original bug to look at bug #2. Let's save our position like so:

git bisect log > ../bug1.log

That saves a list of all the commits we've tried so far into the directory above the repository, so if we do a "git clean -fdx" it won't be deleted. (This is why I tend to have "~/project/project" directories, the inner one is repository files and the top one is non-repository files, often todo items and temporary junk.) To restore our state from the log, do "git bisect replay < ../bug1.log". It's a text file you can edit by hand, removing lines from the end if you need to back up a few more steps.

When our ability to test is blocked by a range of commits with an unrelated bug, we can do one of three things:

Make the bug the new "bad" to bisect towards the start of the bug range, call anything _else_ good (even if it's a different failure), and hope you find a patch introducing the bug that reverts cleanly.
Make the bug the new "good" to look for the end of the bug range, then look at the commit before that (HEAD^1), and hope you find a fix and it backports cleanly.
Guess a direction and hope the bug you're looking for was introduced outside the blocked range. (If you guess wrong, you find one end of the bug range.)

Alas, sometimes NONE of these options work, because the start or end of this bug's range could be another unrelated bug preventing you from testing this one. I.E. bisecting bug #2 can wind up finding bug #3: you can be prevented from finding the introduction of or fix for this bug because it also is hidden by yet another bug, and these stack arbitrarily deep.

On the bright side, the first of the three options is most _likely_ to work. We know this particular bug was fixed before 3.0 happened, so a fix is in the repository somewhere between where we are and 3.0. Thus bisecting forward is a good first guess.

Footnote: We also know this bug wasn't present in 2.6.39, so bisecting backwards to find where it was introduced is a good second guess. But attempts to fix bugs try to be minimal and self-contained, especially as the development cycle progresses and Linus becomes more cautious about what he merges, thus the fix is more likely to be portable back into the bug #1 range we're examining. The bug was probably _introduced_ as part of some larger change which other things may depend on later, so it's less likely to revert cleanly without unrelated breakage as we go forwards.

Guessing a direction is least likely to work, both because neither direction is more likely, and because the bug could be introduced within the range you can't look at. End footnote.

Footnote: Saying "forward" and "backward" helps me keep straight which way we're bisecting, but that's not how git works. Git insists you say "good" when you mean "forward" and "bad" when you mean "backward". This is confusing when "good" is showing one bug and "bad" is showing another, and WHICH bug you're looking for changes. (Have I mentioned that git has a horrible user interface?) If you get this wrong even once, you're now testing a range of commits that doesn't contain what you're looking for anywhere in it, the symptom of which is every test from then on goes the same way.

That's why if you end with a run of all the same direction (bad, bad, bad, bad, bad, done) it's a good idea to revert the commit you wind up on and see if that changes the result. If it does, it proves you actually found what you were looking for. If it DOESN'T, you answered a question wrong somewhere along the way. One way to deal with that is to back up to where they were still alternating and try again. I.E. look through "git bisect log" for the last decision that went the other way, delete that one and all the entries after that out of the log, "git bisect replay" the truncated log to reset the bisect to that point, then build and retest that commit, and resume from there. End footnote.

In this case, searching for the end of bug #2's commit range (saying "good" for commits that die with this particular build break, and "bad" to anything that doesn't,) finds a fix that backports cleanly:

commit db7eba292e913390fa881272bfbc3da0a5380513
Author: Paul Mundt <lethal@linux-sh.org>
Date:   Tue May 31 14:39:49 2011 +0900

    sh: Fix up asm-generic/ptrace.h fallout.

Let's save the patch, pop bug #2 off the stack, and go back to looking at bug #1 again.

Save the commit we found to a patch file ("git show db7eba292e9133 > ../fix2.patch", because it's the fix to bug #2 we found), then replay the saved log ("git bisect replay ../bug1.log", getting us back to where we left off testing bug #1, and don't ask me why log lets you redirect its output to a file but replay refuses to read from stdin and requires a filename instead). Now apply fix2 to the result (we know that commit dies with bug #2), and run the build again.

At this point, the build dies with a THIRD error, which is not initially obvious because it scrolled off the top of the command window due to the -j 3 (which is really make not responding promptly to errors, it shouldn't spawn several dozen more child processes _after_ an error happens that's going to cause the whole build to exit). Since the build didn't produce a zImage at the end, we know it didn't work right and can scroll up to find:

  CC      arch/sh/mm/tlb-urb.o
In file included from arch/sh/mm/tlb-urb.c:14:
/home/landley/linux/linux/arch/sh/include/asm/tlb.h: In function '__tlb_remove_page':
/home/landley/linux/linux/arch/sh/include/asm/tlb.h:92: error: implicit declaration of function 'free_page_and_swap_cache'
make[1]: *** [arch/sh/mm/tlb-urb.o] Error 1
make: *** [arch/sh/mm] Error 2

Yes, we're still on the second step of bisecting the main bug we're really interested in, and we're now on our third bug. We need to push bug #1 back on the conceptual stack and find the end of bug #3. (This is, sadly, not unusual. The fact we didn't even get to test a new commit for bug #1 means we don't have to write out a new log, we can just reuse the existing bug1.log to get back here.)

Once again, we try to bisect forward ("good") to find the end of bug #3. We can try each build without the fix to #2 and apply it as needed, or we can try to save time by applying fix2.patch which gives a "reversed patch" error if it's already there and we can just hit ctrl-c without having done any damage we need to clean up.

Footnote: Jumping forward makes the clean "reversed patch" outcome more likely, if we were testing to find the start of the range we'd be more likely to have the patch apply when it wasn't needed, or to have some hunks apply but not others, which is a situation we'd have to clean up after.

Speculatively trying to apply the fix saves time not because restarting the build after seeing the failure and applying the patch is hard, but because we have to do a "git checkout -f" to undo all the applied patches before we can rerun our build command, or else our big build command saved in the command history will re-apply the always-needed fixsh4 patch from the Aboriginal build, and fail, and not continue.

We can of course edit the command line to remove the patch application, just like we can edit out the "-j 3" at the end if the build failed with a non-obvious error message that got buried a couple screens up and want to try again to see exactly one failure at the end of the output. But then we wind up with multiple versions of the build invocation in our command history, and we have to pay closer attention cursoring-up to rerun the build to make sure we trigger the right _one_. This process already involves keeping track of a dozen minor variants on the same darn thing, keeping the noise down where we can is nice. End footnote.

Jumping outside the range of the second bug (thus not needing to apply fix2 to this version before building and testing it) happens a lot near the start of bisects. The binary search is still chopping off about 1/4 of the range this early on, and kernel versions have many thousands of commits in them.

Note that if applying a patch gives partial garbage you can always fix it with "git checkout -f", possibly followed by "git clean -fdx". The first reverts all currently tracked files, the second deletes any files that aren't tracked, I.E. stuff the patch added that's not in the current version. (The first does not require a complete rebuild, the second deletes all the .o files and thus does. Again, files you want to keep go outside the repo directory where "git clean -fdx" can't delete them.)

In this case I tried applying fix#2 and got the reversed patch error (meaning we don't need it), rerun the build, and... it built!

It's been a while since we tested anything, so cursor up a lot until the qemu-system-sh4 invocation shows up... and we get the panic. (Yay, bug #1 rears its head again. Woo! If we were bisecting forward/good and got a shell prompt, we could skip this whole mess. Just as if we were bisecting backwards/bad and saw bug #1. But in this case, it doesn't quite tell us anything new. If for some reason we needed to start bisecting bug #1 over from scratch, we could use this revision as the initial "bad", this is the lowest "bad" we've seen so far. But we probably won't need to, and the very first step of a binary search eliminates half the repository so it's not a big deal.)

So to restate, we're bisecting to find bug #3, we just saw a failure. Do a "checkout -f" for palette cleansing, "git bisect bad" (both because we saw bug #1 and because we DIDN'T see bug #3), try fix2 (still doesn't apply), cursor up until we can rerun the build... And boot to the bug #1 failure again: another "bad". Then we see bug #3 (good), then we don't (bad), then good... and then I screw up. Oops.

I just did a "git clean -fdx" instead of "git checkout -f" before that last "git bisect good", and it told me I had local changes to one of the files so it couldn't switch branches. But "git bisect log" shows that "good" at the end. I'm not quite sure what state the repository's in, so here's how to hit it with a VERY LARGE HAMMER.

git bisect log > ../temp.txt
git checkout -f
git clean -fdx
git bisect replay ../temp.txt

Back to bisecting. The bad/good pattern repeats for a while (reassuring, as long as it's alternating we know the commit we want is between our endpoints), and we get into a range that always needs fix2 so the tests become more regular.

Regular is nice because running a kernel build takes just long enough for you to forget what you were doing before you kicked it off (especially if you switch windows to do something else with the time, such as blogging the experience). This means you have to stop and think about whether to answer "bad" or "good", or type clean instead of checkout to reset the repo state, or forget which set of patches you need to apply... These minor derailments aren't fatal, but they're frustrating and cost you time. Leveraging command history to avoid screwing up the easy stuff is easier when you're repeating the same thing a few times.

Another minor data point is that when your "git bisect bad" case is seeing bug #1, that's a pretty good indicator that the bug you're really looking for is, indeed, hidden by the ancillary bugs you're trying to peel back and look under.

Huh, interesting. With four revisions left to test tracking down the end of bug #3, attempting to apply fix2 said:

patching file arch/sh/include/asm/ptrace.h
Hunk #1 FAILED at 41.
Hunk #2 succeeded at 132 (offset 1 line).

Ok, let's "git checkout -f && git clean -fdx" and try building without applying that?

*boggle* It built, and gave me a shell prompt. (I.E. I have a bug #1 _good_ case?) Ooookay.

Alright, look at git bisect log. The last thing in it was git bisect bad for commit 3f9b8520b060, which gave the 3.0 panic (I.E. the bug #1 "bad" condition), and "git show" says we're at commit 4e2b1084b0fb (which is showing the bug #1 "good" condition). And it's only a few hops from the bisect converging. The introduction of bug #1 is close enough you can smell it, but is it under bug #3 in a branch, or introduced right after bug #3 got fixed?

Here we gamble a bit and change course again, switching from "looking for the end of bug #3" and instead calling this good because of bug #1. This isn't 100% guaranteed to work, because of branch merges. We may have wandered into a branch that doesn't have this bug, but the introduction of this bug was in a branch we couldn't test because of bug #3, and the merge is "forward" in the tree from where we're looking at.

If it _doesn't_ work, we need to be able to get back here, find and save fix3, pop this bug off the stack by replaying bug1.log, and continue the bisect from there (probably finding bug #4 in the process). But at the moment, we're close to bug #1 commit that going after it just might work, so let's save this log:

git bisect log > ../bug3.log

And call this commit good. Then we bisect our way to a conclusion, and make sure that reverting the patch we wind up on does indeed fix something (either bug #1 or bug #3, possibly it'll get confused enough by us changing the meaning of "good" midstream that it'll converge on something totally irrelevant, generally a merge commit). If it doesn't work, we replay bug3.log and call it bad (when when looking for the fix for bug #3 it is), look for a fix for bug3, and then resume the bisect from bug1.log.

The next one to test requires fix2 for the ptrace problem? Which... applies cleanly. Weird. (It's like we went backward past the beginning of bug #2, while looking for the _end_ of bug #3. I know git repository history can get really strangely nonlinear and linux in particular has a lot of branches, but this is weird.)

And this commit then dies with bug #3. Right, this probably isn't going to find anything useful, because saying "good" for showing bug #3 has a different meaning than saying "good" for not showing bug #1, so what (if anything) is it going to converge on? Still, it's close to the end of the bisect, so let's see it through. Call this one good... next one gives me a shell prompt. Call that one good...

And it says 3f9b8520b06013 is the first bad commit. What, if anything, did it find? Right, check out that commit (just because the bisect reported a result doesn't mean that's the current repository state). Build. it needs the ptrace fix (fix1.patch), but does _not_ die with bug#3. Interesting. Booting gives the panic that's the bug #1 bad condition. Now we revert the patch:

git show | patch -p1 -R

Reboot and... HA! FIX FOR BUG #!.

commit 3f9b8520b06013939ad247ba08b69529b5f14be1
Author: Paul Mundt <lethal@linux-sh.org>
Date:   Tue May 31 14:38:29 2011 +0900

    sh64: Move from P1SEG to CAC_ADDR for consistent sync.

    sh64 doesn't define a P1SEGADDR, resulting in a build failure. The proper
    mapping can be attained for both sh32 and 64 via the CAC_ADDR macro, so
    switch to that instead.

    Signed-off-by: Paul Mundt <lethal@linux-sh.org>

diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index 40733a9..f251b5f 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -82,7 +82,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 	void *addr;
 
 	addr = __in_29bit_mode() ?
-	       (void *)P1SEGADDR((unsigned long)vaddr) : vaddr;
+	       (void *)CAC_ADDR((unsigned long)vaddr) : vaddr;
 
 	switch (direction) {
 	case DMA_FROM_DEVICE:		/* invalidate only */

Footnote: Yes, all three bugs we hit here were sh4-specific. And the commit we finally found is a one-line change which apparently does nothing EXCEPT break ifconfig. Did I mention that the quality of this architecture is substandard? End footnote.

Ok, I got lucky there. If I hadn't, I could have continued by replaying bug3.log, grinding away finding a fix for bug#3 and then continuing from bug1.log once I had a fix3.patch...

But the "reliable" method isn't foolproof: the end of the range could have been hidden under a fourth bug, meaning I'd then have to bisect the end of THAT (possibly between where I found it and 3.0) so I could revert it to find a fix for bug #3... And there's no guarantee a fix will always backport cleanly.

But that's what bisecting a real-world problem is like: tricky, fiddly, full of constantly shifting goals you have to push/pop on a stack, and requiring you to double-check your results to make sure they mean what you think they do.

And yes, this is what I do for fun.

P.S. I should have an Aboriginal Linux release out this weekend. Depends how much I want to delay the release to fix regressions like this. (All the x86, arm, and mips targets work, sparc works, just fixed sh4... but m68k doesn't build, and powerpc runs and natively builds a threaded hello world... which then segfaults when you run it. Sigh.)

I'm tempted to cut a lower quality release just to get 3.0 and the 5-years-coming new uClibc out there for people to play with, and fix it up next time around, but I really hate shipping regressions. Not only is that bound to be the one platform somebody out there is interested in, but as this exercise demonstrates they really pile up if you don't stay on top of them...

July 27, 2011

It's economics 101 time again.

The tea party economic plan is really stupid. So stupid even Regan's budget director won't defend it. It's been dubbed "expansionary contraction" (yes that is a contradiction in terms, that's the point), and is literally the exact opposite of the economic policies that usually address the kind of situation we're in (a demand-limited liquidity crisis).

Nutshell summary: if you lose your job, you may have to live off credit cards for a while. Cancelling those cards to try to "live within your means" WHILE UNEMPLOYED means you wind up homeless. This is what all these right-wing "austerity" packages boil down to, both in Fox News's america and in Rupert Murdoch's europe. It's stupid, counterproductive, and makes things _worse_.

The government is literally capable of _printing_money_. Whether it does so via printing more dollar bills, stamping coins, or by having the federal reserve buy bonds that didn't used to exist in exchange for electronic dollars that didn't previously exist (depositing the bonds as "collateral" against the new money "loaned" to treasury through electronic funds transfer) is irrelevant. Yes, we prefer to "owe money to ourselves" rather than admit a growing economy requires us to print money to keep up with it and avoid a shortage of money. The point is the government is the one entity that can never run OUT of money no matter how the rest of the country is doing, because the government is WHERE MONEY COMES FROM. Our currency is US dollars, and the federal government manufactured every one of them out of nothing. It's an IOU. A promise the government made that when you pay your taxes, it will take these back instead of requiring payment in corn or chickens or diamonds.

I wrote two Motley Fool columns about this ages ago: No such Thing as Money and Oversimplified Economics.

The economy is currently demand-limited. The Fortune 500 is sitting on over a trillion dollars in cash on the balance sheets they've reported to the SEC, and the reason they're not spending it is nobody's buying their stuff. They won't hire more people if they can't find more customers. Giving them _more_ money isn't going to help, they'll just sit on that too.

The last time this happened was the great depression in the 1930's. Herbert Hoover focused on balancing the federal budget, which made the problem worse by removing the largest remaining source of demand from the economy. FDR hired people to build roads and bridges and schools, and got them built more cheaply than anyone else because he was hiring people who were otherwise unemployed. Here's an excellent, detailed explanation of what happened. If you want to understand our current economic problems, go read that.

(On a side note, look up Yellow Journalism sometime. Murdoch is this generation's Hearst. Owning large media outlets and using them to lie for political and economic gain is not new, but people seem to think it's unprecedented and therefore impossible. It's such old news, they taught it to us in grade school.)

July 26, 2011

Polycom's just cut the Austin site budget so they can divert money to staffing up the Israeli branch. Given the insanity in Washington (where the Tea Party clearly was not ready to learn Where Dollar Bills Come From and is throwing a tantrum until it isn't so anymore), I can't really blame them. But it does make them the second company to go from "we want to hire you full-time when this contract runs out" to "we haven't got the budget to renew your contract 30 days from now".

Not a fun economy. I'm sad Obama turned out to be Herbert Hoover rather than FDR, this has been going on since 2008 and it's getting really old.

July 22, 2011

Updated my "full history" git repo to 3.0 release, so there's now one repo that has the complete history from 0.0.1 to the present, put it in my directory on kernel.org/pub/linux/people, announced it on linux-kernel, and noted it on the kernel documentation page.

Yay 3.0 release. (Then again, I argued that 2.6 should have been called 3.0, so...)

July 20, 2011

The weird part about fasting (for me, anyway) is that if I do eat anything (such as an "ensure" brand chocolate flavored nutrition slime), I sugar crash immediately after. I'm fine if I _don't_ eat (sometimes a little spacey and uninspired, but I do that anyway), but if I eat an insufficient amount I do a sort of nuclear food coma afterwards, with full blown headache, for a couple hours.

I have two obvious non-food workarounds for low blood sugar: nap and exercise. Last night I went to bed early (something like 7-ish), turned off my "get up in the middle of the night" alarm, and slept through until 8am when I had to start getting ready for work. Can't nap in the middle of the day, but I can spend pretty much all my time off sleeping until I get back into the groove of this diet thing.

I want to exercise a lot more, but work a desk job I have to drive to. I can exercise before or after work, but what I really need is 15 minutes of biking when I can't concentrate, which isn't available around here. I'd have to drive just find a decent place to walk any real distance.

Speaking of driving, Lamb's is trying to find a new "Throttle Snorkel" for my car. (Fade didn't believe this was really the name of the part, but it is.) Apparently it's the tube between the mass air flow sensor and the engine, and if air isn't going past the sensor the onboard computer refuses to feed enough fuel into the engine because it thinks it'll flood it, and thus it stalls instead. Generally ok while it's running, but when it first starts (especially if you run it for a while, stop it, and then start it again) it gets really unhappy. I've had to restart it twice backing out of a parking space.

Luckily it's not an "engine eating itself" kind of problem, just an "onboard computer from 1995 is confused" thing, but the real problem is my car is 16 years old and it's like trying to find replacement parts for a Type 40 tardis. You either get lucky rummaging through an old closet (note: not a sexual reference), or you have to cobble something together and make do (note: ditto).

July 18, 2011

So I'm back not quite fasting again. (I suck at regulating amount in the face of temptation but am pretty good at yes/no questions. Yes, I'm aware I'm channeling my inner anorexic, thanks.)

I thought this would be hard for two reasons: 1) Work has free junk food (who knew rice crispie treats were so tasty, this is why I NEED to start restricting my caloric intake again), 2) Fade is home and needs to eat on a very regular schedule to avoid The Unhappy. (The majority of my eating is social: going out to eat with other people, or because it's time to eat. Not because I'm necessarily hungry. My last two rounds of dieting ended because I prepared a meal for Fade and then ate some.)

But the big reason today's been hard turns out to be that my local McDonald's has just added the strawberry pie to its menu. This is THE TASTIEST THING EVER, and I can't have it. (I go to The Donald's for dollar salads, which with vinagrette dressing come to something like 60 calories. Now there's a big sign: yummiest pie ever, for a limited time only, and you can't have it. Sigh.

I'm doing this now because the other guy in my office has gone back to China for 3 weeks, so he's not eating in the office. I anticipated low low blood sugar fuzziness and grumpiness could get in the way of work the first day or two, but that's what diet energy drinks are for. (Rockstar lemon and orange are marvelous. The grape is a crime against humanity.)

Exercise also counteracts low blood sugar for me, but I can't do that at work. It's way too hot out to jog around the complex, and I've seen FIVE major injury accidents on 2222 since I started commuting on it, including two confirmed fatalities, so biking to work is out of the question.

I should probably exercise in the early mornings, but that's the only hobbyist computing time I have other than weekends. (I suppose I could bike somewhere with my laptop, but I have to be back by 8-ish so I couldn't go too far...)

I'd like to get back down to 145 pounds, which is what I weighed in college. (It's not so much the weight as the not looking pregnant anymore. That would be nice.) Alas, back of the envelope calculation says that 2500 calories per day times 21 days divided by 3500 calories per pound is only 15 pounds. It would take something like two full months for me to get back down to the weight I want to be at. (And I'm not going TOTALLY without food. In addition to the salads I have a six pack of "ensure" and a full bottle of flintstones chewable vitamins vaguely intended for the next week.)

Oh well, better than spending the time eating more rice crispie treats...

July 17, 2011

Finished installing the fourth netbook for my nieces and nephews. (Gyre, Gimbel, Wabe, and Mimsy.) Target is selling the new Aspire One's (the sequel to the D255E) for $248 each, and the reason they don't sell that well is the Windows 7 ~~Shareware~~ Starter Edition. They're actually nice little machines if you put Xubuntu on them and tweak it a bit.

My sister's previous computer got so full of windows viruses that it not only became unusable, but people from Pakistan called her up to phish bank account details. (Luckily her bank is a local thing with five branches, all in minnesota, and no call center outsourcing, so the accents gave them away. Still a huge pain changing her account numbers, and she's afraid they might do other identity theft stuff even though she hasn't really got anything to steal.)

Then she and all four kids were sharing a single netbook that had juice spilled on it so the keys stuck together. And it was running Windows too so was filling up with new viruses.

The new netbooks may not be the best secured things ever, but at least Linux doesn't get windows viruses. (They've got flash so who knows what that can let in, but the kids need to play games and watch youtube to consider the machines usable, so...)

Still saving up to get Kris her own machine (Slithy). In the meantime she can use Mimsy (which is theoretically for Sam, who is 3). (Kris has a home business ebaying vintage sewing patterns she gets at local yard sales.)

July 16, 2011

Busy month. A friend visited for a week (went home this morning), as a result of which another old friend's never speaking to me again.

Life lesson: there are times when you may be uncomfortable keeping someone in the dark "for their own good", but it may still be best to do so if they can't cope otherwise.

If I feel like posting details (not really) I should probably do so on the livejournal account I never use anymore. That's the place for drama.

Mostly I've been quiet here because work's taking up all my time even when I don't have people visiting...

July 9, 2011

Work continues to be full-time, and it fills up my brain. I'm learning all sorts of stuff about the guts of gigabit ethernet drivers, USB3, i2c... The downside is I get home too tired to do anything.

However, sometime last year I discovered a workaround for my "utterly not a morning person" problem where the sun coming up puts me to sleep. (Years of pulling all nighters where I'd loose track of time until the sun came up and then I'd realize how long I'd been up, and fall over.) I have a really hard time getting up between 6 and 10am, when the bed gravity is at Jupiter levels. BUT: I can get up at 5am no problem because it's still night. Getting up in the middle of the night to go to a 24 hour coffee shop is hold hat (I miss Metro), it's how I got my quiet programming time when everybody else was asleep and there were no distractions. Then when the sun eventually comes up, I have some momentum and am still well-rested, and can power through it.

So I can get up around 4:30 in the morning, maybe go skinny dipping in the pool for a bit (who's going to complain?) and head to one of the places that's open at 5am (Einstein's bagels, McDonald's, Whataburger) to get a couple hours alone with my laptop before work. I'm a night person, but I can get my night at the START of the day rather than the end of it.

Staying up late tends to rotate my schedule forward: it's tempting to stay up just another fifteen minutes, and then I get up late, which means I stay late at work to make up for it, and so on. Getting up in the middle of the night means I have a reasonably hard deadline when I need to leave for work.

It does involve going to bed around 8pm on weeknights, and means that Mondays really, really suck, but you can't have everything. (Where would you put it?)

July 5, 2011

And the gun nuts continue to lobby.

I've been vaguely following the whole "Arab Spring" thing (reasonable summary here as long as you ignore the silly melodramatic violin music NBC decided to put over it). The successful revolutions in Tunisia and Egypt stayed peaceful in the face of governments that weren't, and the people in the army declined to attack their own civilian populace. But Libya turned into a civil war, which is still ongoing.

The berlin wall (and the whole soviet union) fell without a shot fired, because we bankrupted them. Today, China isn't invading the United States, it's _buying_ it. Economic warfare works pretty well. Before nuclear weapons military force often trumped other kinds of political pressure, but World War II ended with Fan Man and Little Boy and _that_ is why there was no profit for anybody in World War III. (Vietnam was like Pickett's Charge, a learning experience teaching us the old rules no loner worked. Of course the whole of World War I was a big repeat of pickett's charge, pitting cavalry vs machine guns. Some people take longer to learn than others. Our 43rd president never did.)

But the domestic gun nuts still insist that having lots of guns protect them from an oppressive government (as demonstrated at Ruby Ridge and the Branch Davidian compound). Because having your political outcomes determined by killing your opponents puts any democracy on a sound footing, as currently being demonstrated in Afghanistan and Iraq. After all, it worked so well for the south in the US civil war.

Why are guns special in this? The nut who blew up the Oklahoma City building was trying to cause political change. So was the unibomber. It didn't _work_, but that's sort of the point. (What, political violence is different from political threat-of-violence? "Before you make your point, know that I have a gun?")

Killing the people who disagree with you means you couldn't persuade them to agree with your viewpoint. Thinking that your viewpoint will dominate a democracy resulting from this when it wasn't persuasive enough to get there in the first place... How is that supposed to work?

Every time I see "Look! Guns!" in politics, it says to me that side has no valid arguments other than fear. Nothing to fear but fear itself, hence the War on Terror. One of us is missing the point.

July 4, 2011

I'm mostly posting Aboriginal Linux commentary to the project's mailing list, but I note that I tried the uClibc 0.9.32 release and it worked much better than -rc3 did, and I've split out the control image stuff into a separate repository. Currently upgrading the LFS in there to the new 6.8 release.

The next release should also have the 3.0 kernel, so sometime after that comes out. (Presumably before August, if I want to sync back up with the quarterly kernel releases...) I do note that changing the version number to 3.0 is likely to drive lots of stuff NUTS, so I may do a 0.9.32 releae with 2.6.39, get Linux From Scratch to build, and _then_ then upgrade the kernel to 3.x next time around... Still pondering.

June 30, 2011

Saw XKCD's remakably succinct summary of why to care about Google+, so I googled for it and tried to log in with my Google account. Got the exact same error message that blogger gave in my previous blog entry. And thus ended my interest.

I was later told that google plus is in invitation-only mode, which is fine. Did I need an invitation to comment on blogger too? The page I found via google said nothing about invitations, and I still suspect "Google Profiles are not available for your organization" is a completely different error, but just close enough that nobody else will look at it.

Plus seemed more promising than Google Wave, Google Buzz, and Orkut, but honestly isn't "hammer away until one sticks" the Microsoft approach to product development? So Google is trying to be Facebook AND Microsoft at the same time? Spam the problem with infinite funding and leverage an enormous userbase to muscle into adjacent niches, I'm sure that sounds familiar...

June 26, 2011

Bumped into a recent-ish reaction to the old World Domination 201 paper. I tried to leave a comment on his blog, but it requires a google account, and the one I have for gmail says that blogger is not an authorized service for my domain.

Rather than fiddle with it, I'll just post it here and point him at it on twitter. :)

I'm the co-author on that paper, and the one who came up with the time vs memory size graph with all the historical systems on it. Basically "transitions happen on a schedule, one is coming up" was my observation, the need to license proprietary codecs was Eric's, everythign else was a mix.

One thing that happened is I missed the divergence between low-end and high-end systems caused by 56k modems in the 1990's being the internet access bottleneck. Since better hardware wouldn't let them access the internet any faster, users traded in two iterations of Moore's Law for "cheaper" instead of "faster" at the low end, and the factor of 4 difference turned into a factor of 16. (This caught Intel by surprise at the time, too. Hence the Celeron.)

This meant that 4 gigs didn't fall off the low end until 3 years after we predicted (basically now, although a continued march downmarket in $300 netbooks is keeping it alive).

The high end triggers the adoption of new hardware (which happened right on schedule, triggering us writing the paper as a hopeful wake-up call to the Linux community, after years of mulling over the issue). But the LOW end triggers the software switch, and the extra 3 years let microsoft crap out Vista and then recover with a "good enough for government work" Windows 7.

Just as OS/2 had a chance against DOS and Windows 3.1 but Windows 95 was finally "good enough" that its users weren't actively looking for something else, windows users stuck on XP or Vista were actively looking for a replacement, but users on Windows 7 (non-starter-edition anyway) mostly aren't. Now that people aren't forced to install a 9 year old 32-bit version on their new 64-bit hardware to get a usable system, Linux is no longer interesting to them.

However, these days the interesting action is smartphones. That's a bigger switch: mainframe -> minicomputer -> -> microcomputer -> smartphone. Our paper was about transitions _within_ the PC, this is about the transition that gave us a computer on our desk. Now it's a computer in our pocket.

How does the smart phone displace the PC? Start by hooking one up to a USB docking station which provides a full-sized keyboard, mouse, big monitor, gigabit ethernet, big speakers, extra storage, and of course charges your phone. The phone hardware is already powerful enough: my nexus one (named "causal" because I'm an old school doctor who fan) has a 1 gigahertz processor, half a gig of ram, and 4 gigs of storage. That was my laptop circa 2004, I know from experience this can be a usable workstation. Add some basic amenities and native compiler so they're self-hosting, and they'll kick the PC upstairs into the server space the way mainframes and microcomptuers went before it. (This time around being kicked up into the server space is called "the cloud".)

So yeah, Microsoft is keeping its stranglehold on the PC the way cobol kept its dominance on mainframes, it's just ceasing to matter in a different way than we expected. (Tired old technologies held in place by an entrenched corporate monopoly generally find a way to die.) Linux on the desktop will never happen, but android on the smartphone already did.

June 20, 2011

I'm positing Aboriginal Linux stuff to the mailing list again. People seem to prefer it there, and this list looks like it's going to stay, so...

I'm really enjoying this day job. What I do there is debug/fix strange hardware issues under Linux, which turns out to be exactly the sort of thing I've been doing in Aboriginal Linux all these years trying to figure out what breakage the most recent kernel or qemu upgrade caused on the various supported targets. Only now I have access to a hardware lab full of people who can connect an oscilloscope to a trace and see what the hardware is ACTUALLY DOING, which makes things so much easier. (Easier than sticking printk() calls into qemu because I don't have to work out what bits to look at; the hardware guys already know. And can explain to me how things are SUPPOSED to work at that level. Yay learning experience!)

I was actually reluctant to go home because I'm merging two git branches and I fired off my miniconfig.sh to boil down the two .configs to compare, and I wanna look at the output, but it was after 5pm and it'd take an hour to run and I only got 5 hours of sleep last night...

(Even the commute's only moderately annoying. 2222 isn't the parking lot Mopac is. And the hill country is PRETTY.)

June 19, 2011

Rice pudding! It's not April 15th*, but here's the recipe I use anyway:

2.5 cups milk
1/3 cup dry rice
1/8 tsp salt
1/4 cup sugar (or half-dozen packets splenda)
1/4 tsp ground cinnamon
1 tsp vanilla extract

Combine milk, rice, salt in saucepan. Cook just at barely boiling, stirring regularly, until it turns into glop. (Somewhere between 15 mins and half an hour.) Add cinnamon and vanilla to glop, cook about 5 more minutes, cool and serve.

*Deep Thought. Rice pudding and income tax. It's traditional.

June 18, 2011

I really hate git's user interface. Today's NEW hatred is "The merge base is bad".

Busybox 1.2.2 (the last version I released) still has BUILD_AT_ONCE mode. I wondered when it was taken out. So bisect between good (1_2_1 since 1_2_2 has no tag) and bad (1_18_5). It complains it needs to test a merge base (ok fine, the grep says the symbol isn't there, git bisect bad), and then it says:

The merge base f86a5ba510ef62ab46d14bd0761a1d88289a398d is bad.
This means the bug has been fixed between f86a5ba510ef62ab46d14bd0761a1d88289a398d
and [f23b96cebfe169eee7131efd8b879748587d1845].

It does not suggest a new revision. I try git bisect skip and it repeats the above message. It found one bad revision, and it STOPS BISECTING THERE.

So I google for this, and am told it's intentional.

As far as I can tell, the git developers actively hate any user who doesn't understand enough of the guts of git to reimplement it themselves from scratch, and actively try to torture them. Could just be a big persistent coincidence, of course...

June 15, 2011

Aboriginal Linux 1.0.2 is out, get it while it's digital.

Long day at work, but I accomplished something measureable. (Apparently it's electrically possible to connect two MACs together with no PHY, but the software needs to be lied to in four places to get gigabit ethernet out of it.)

Got home, Fade reminded me of the Simon Pegg signing at Bookpeople, so we went to that, but there was no talk (just a signing) and the line was hours long. You got one wristband per book purchase, and they were doing rainbow order of solid colors, then again in stripes, so our blue stripe wristband meant that one of us would be called shortly after the zebra that sells chewing gum). Oh, and no photos with the celebrity, please. (Not his fault; apparently he's only doing 3 signings in the US so we didn't stay.

The car hiccuped and stalled on the way home, so I took it to Jiffy Lube (closed) then Lamb's (also closed). Left it in the Lamb's parking lot, taking the bus home. They open at 7am, hopefully they can fix it in time for me to get to work by 10am or so...

June 14, 2011

I just spent almost 3 hours writing up a big email, explaining to the most recent Google recruiter to contact me my 7 month experience with Google recruiters in 2009 and into early 2010, with many links into my blog.

Thunderbird crashed and ate it. I really hate Thunderbird. It's the most obnoxious piece of open source I've used since OpenOffice.

I miss kmail. It's too bad it was bundled with KDE and I had to leave it behind with ~~Microsoft~~ the KDE developers made the gunk it was bundled to intolerable. Luckily apple peeled Konqueror off and turned it into "webkit" (the brains behind Safari and Chrome), but nobody's cleaned the intentional KDE dependencies out of Kmail. Sad.

Oh well, this was a career limiting move on Thunderbird's part. Time to look around for a replacement.

Maybe I'll write up the Google saga and post it here, but not tonight. I should have been in bed hours ago...

(Oh, Aboriginal Linux 1.0.2 is uploading, I need to write up proper release notes and update the website tomorrow.)

June 12, 2011

I keep trying to polish and document and I keep finding one more loose end to tie up (which requires a rebuild on some architecture, and for releases I use "FORK=1 CPUS=1 more/buildall.sh" so it takes a while, even on quadrolith).

Currently, the powerpc-440fp target was broken by the powerpc target cleanup: it borrowed the other architecture's kernel config with a symlink. It's trivial to cut and paste the config from one settings file into another, but that's not the right fix. QEMU _seems_ to have grown the ability to emulate an actual bamboo board, which means I need a bamboo kernel config. The way to do that with the kernel's new defconfig infrastructure is:

make ARCH=powerpc defconfig KBUILD_DEFCONFIG=44x/bamboo_defconfig

Then build it and see if it boots under qemu with -M bamboo, and if it does cp .config tempname and run "ARCH=powerpc miniconfig.sh tempname" to generate the mini.config file, then filter out the baseconfig symbols to make the LINUX_CONFIG entry to append in settings.

(I should make a script to filter out the baseconfig. I've been doing it by hand, and it's really tedious.)

And it wants uboot installed to make its default image type. I decline the additional build prerequisite, what else ya got. I can make vmlinux, which qemu tries to boot but says it can't load the device tree for... that's converted from dts to dtc as part of the uboot script. Sigh.

I may just punt ppc440 to next release...

June 11, 2011

Ottawa Linux Symposium never did get back to me with contact info for submitting the paper (or said whether or not they wanted it if I couldn't be there to present it). I meant to finish it this weekend anyway and just post it to the net, but instead I've been trying to get out an Aboriginal Linux release out by Monday.Mips and armv6l are fixed. I merged each target's description file into settings, and I cleaned up powerpc to use baseconfig-linux. I dug up the sh4 .config and qemu invocation Khem Raj sent me after we talked at Scale, and they do indeed work together. (With init=/bin/ash I get a shell prompt! It then kernel panics if I try to ifconfig eth0 up, but you can't have everything.) Currently working on paring that down to something I can merge. (I also note that qemu _might_ actually emulate a bamboo board now, and if so I should adjust ppc-440 to use a real board and not 440 userspace on a 7xx kernel.)

I'm working my way to the point where each sources/target is a file instead of a directory (right now settings is usually the only thing in the directory). I have to figure out how to let targets supply their own linux and uClibc config override files, but possibly I can hijack the (mostly unused) hw-targets for this: instead of a meaningful prefix treat directories as special? Haven't worked otu a way to do this I'm comfortable with yet. The capability is important, but making the design simple takes a lot of work.

While I was at it, I also fixed the cron job on quadrolith, which hasn't been running for a while. Apparently, allowing gentoo to remove unused packages after an upgrade can delete the gcc support files and make it complain about an invalid profile if you try to build anything, which puts portage in a bad spot because it compiles to install. (Luckily it was a version skew issue; the gcc binary was 3.3.x and the support files lying around were 3.4.x so strace said it was looking for a directory that wasn't there and I made a symlink to the one that was and it was happy again. Reinstalled binutils and gcc just to be sure. It seems to be working again...?)

The uClibc guys released 0.9.32. Nobody anywhere is going to care until they release 1.0, so I'm not too worried about it. In -rc3 I couldn't get x86-64 to build NTPL because it tried to mix pic and non-pic code in the libc.so link (other targets built fine, previous version built fine...), and building pthreads didn't work for powerpc. The repo is a mess you can't sanely bisect in, so I haven't tracked it down yet, just shipping .31 one more time.

June 9, 2011

So, one week into it the new job's nice but exhausting. Not likely to get much open source development done except on weekends. Luckily, there's one of those coming up...

June 8, 2011

Attempted to add a quick feature to current BusyBox. This ended predictably. Sigh.

Xubuntu 10.04 has a "truncate" command to set the length of the file. I've been complaining about this lack for years, and added a -l option to toybox's touch back in 2007 to set the length of files. The FSF, being a political organization uncomfortable with software development, chose to add a whole new command in 2009 which duplicates touch's -c and -r functionality. Fine, whatever, at least it's THERE now.

Since the FSF is open source's microsoft, establishing de-facto standards that the rest of us must follow however stupid they may be, this means busybox needs a "truncate" command. The easy way to implement this is:

Add a -s option to busybox touch.
Add truncate as an alias to busybox touch.

I don't even have to tell touch not to fiddle with the file date when called as truncate because setting the length fiddles with the file date. Yes, both commands will understand a few extra command line parameters: so what?

I'm not particularly interested in implementing the -o option to touch (specify a block size), because A) the shell can do math ala $((6*7)), and B) the -o option understands the suffixes K, M, G, T, P, E, Z, Y and an optional B suffix to flatten them all to decimal thousands instead of powers of two (that's right: B to _not_ be binary), so your most common reasons for wanting to specify a block size are built-in.

So, I need to modify coreutils/touch.c in busybox, to add a sub-config option for "touch", add help text, and add/implement the new option.

The fact the new //thingy: format busybox uses is so much less clean than toybox's format is merely an aesthetic issue, but it's one that causes a lot of typing. Since I'm still fairly typo-prone on this new netbook keyboard (which is great for a netbook but still smaller than I'm used to), this is tiring.

The fact that the code is #ifdef salad that's hard to parse is tiring and disappointing, but ok...

The fact that CONFIG_DESKTOP is chopping out the -d and -s options which are generally used for build scripts... what does "DESKTOP" mean here? The config symbol is badly designed. But nevermind...

Tracking down where current KMGTPEZY suffix support code might already be implemented by doing "find . -type f | xargs grep -i kmg" is a learning experience, but strangely does _not_ pull up the place where it actually lives. Instead, I started reading the dd.c code (which I know already implements this for bs=1M), which eventually drilled down to libbb/xatonum_template.c which IS AN ABOMINATION. It's implementing c++ style templates using C macros. In busybox. WHY IS IT DOING THIS?

Look, xstrto_blah() ends with a bb_error_msg_and_die() that includes %lld in its arguments, and typecasts said argument to a (long long). You've already got 64 bit math happening on the 32 bit platforms, so you might as well pipe all this crap through a single 64 bit core function and then provide smaller sized wrappers as needed. This whole macro templating thing is antithetical to everything busybox used to stand for: avoiding code duplication, avoiding unnecessary complexity, making the code READABLE with a simple and obvious path through it...

Denys is doing a great job, the project's more popular than ever, but I'm not finding any trace of trace of the "simple is good" mindset that attracted me to it in the first place. There's FAST_FUNC on everything (hello compiler options?), needing to repeat each main() function twice with magic UNUSED_PARAM and MAIN_EXTERNALLY_VISIBLE stuff...

And once again, my desire to engage with busybox just drains away upon actual exposure to the current code. I really should finish up the truncate implementation, but what this makes me _want_ to do is revive toybox...

(I dunno, maybe part of it's that busybox was a younger project when I started banging on it (about less than 5 years old), and now it's over a decade. It was small and simple. Now it's smaller and simpler than the alternatives, but not by any non-comparative metric.)

June 7, 2011

Finally got a chance to continue my arm bisect from Tuesday, and it's git f159f4ed55bb0fa5470800641e03a13a7e0eae6e which does indeed add:

+       tst     \tmp1, #HWCAP_TLS               @ hardware TLS available?
+       mcrne   p15, 0, \tp, c13, c0, 3         @ yes, set TLS register

to arch/arm/include/asm/tls.h. Dunno why I missed it in my grep earlier.

The commit changes hardware TLS support from something a config symbol sets to something it autoprobes based on the CPU type, meaning the 2.6.35 and later kernels are trying to use a hardware feature QEMU doesn't emulate.

Luckily, there's a workaround: "-cpu arm1136-r2" sets the cpu type to something that claims not to have that feature, so the kernel doesn't try to use it. Gotta update the versatile board patch to put armv6 back in (old patch doesn't apply to current kernels), and change the settings file to use -r2.

I need to poke the qemu guys about this anyway because the _proper_ fix is to add the missing feature to QEMU...

June 6, 2011

My apprentice Nik was visiting town this week (headed back this morning), and today I start a new contract at Polycom (day job, yay money). I.E. "been to busy to blog recently, likely to be too busy for a bit longer".

I wanted to get an Aboriginal Linux release done before this, but it sort of snuck up on me. (Ok, I was trying to get one done before OLS, then I got a "can you start monday" job offer which meant not going to OLS but I'd already been considering that anyway.) end result: Life happened and Aboriginal Linux got starved for cycles again.

It's pretty close to being ready for the next release. I've got mips fixed, the powerpc issue can be worked around and thus is not a show-stopper, and I've got a pretty good handle on the armv6 thing. That might be enough for a release: I still want to clean up sparc and sh4, but they've been broken for a while so I could presumably worry about that next time...

Hmmm, I'll try to put some time in this weekend and see what I can get done.

June 2, 2011

Finally somebody explained what's wrong with chroot[1]. (Thanks to Tim Chavez for pointing me at it.)

Basically, the "/ == ." test is crap, and you can work around it by chrooting a second time, moving / into a temp dir but intentionally leaking a reference into your ORIGINAL chroot and then going cd .. back up the tree from there.

Good point. Nice if somebody had mentioned this when I asked instead of pointing me at a Red Hat Enterprise thread about how the pivot_root syscall didn't do what containers wanted it to do, and thus they were patching it. (If I followed a link from there and went several pages down, there was example code demonstrating this technique: right after a paragraph talking about a dozen or so other unrelated things that containers already address.)

So neither pivot_root nor chroot did what they wanted: so they patched pivot_root. Sigh. Using an inappropriate syscall which ALSO needed to be hacked to work instead of fixing chroot really doesn't strike me as the right approach.

The obvious problem (to me) is that the "current directory equals root directory" test is not reliable, and easily circumvented. The less obvious problem is that the per-process / link predates per-process mount namespaces (which mount(2) man page says showed up in 2.4.19, it's when /proc/mounts became a symlink to /proc/self/mounts). Once each process has its own mount tree, chroot should adjust the process-local mount tree instead of just acting like chdir() on the old "/" symlink. (You don't even leak mount points when you do this: if you prune the mount tree the inaccessable mounts get unmounted by reference counting.) The fact it _isn't_ doing this is one of them "four hysterical raisins" things. The chroot() syscall predates bind mounts, the MNT_DETACH flag to umount(), and per-process mount namespaces.

So what _should_ happen is that chroot() should adjust the process's local mount tree so that the root of that tree is in this inode. If the new root is already a separate filesystem, this is relatively easy to do. If the new mount is a subdirectory of an existing mount point, then create a bind mount and splice _that_ in as the root of the tree. Doing this removes the old mount point naturally from your process's mount tree, and then if the usage count falls to zero it should get unmounted naturally the same way the MNT_DETACH flag works. (The old root of the filesystem you chroot into a subdir of can go into a detached state but the bind mount will pin it until that gets unmounted. The kernel already does all the hard parts for us here.)

All that stuff about evicting kernel threads from the old root is irrelevant to this. Manually unmounting the old mount point from your name space (and providing a transitional mount point to park it on) shouldn't even come up. The stuff about closing filehandles so fchdir() can't get you back out can be handled in userspace by lxc (which can mount /proc and examine it to see what filehandles this process has open).My discomfort with the LXC developers' approach is that they patched switch_root() rather than fixing chroot(), and then were uninterested in discussing it on the list or in irc. As far as I can tell, they took the short but wrong approach instead of the longer correct fix: The actual _bug_ is that "mount --bind sub sub && unshare -m chroot sub" doesn't prevent the re-chroot exploit, because the process's mount tree doesn't actually get adjusted.

[1] Sorry, not trying to blame anybody specific, I just noticed somebody involved with containers also talking about this topic on the kernel list while trying to research the issue. This thing "everybody knows" was not a prominent google hit, and despite being explainable in one sentence (as demonstrated above) nobody bothered to do this until now, and even the blog delegated doing so.

June 1, 2011

I just emailed Andrew Hutton to pull out of Ottawa Linux Symposium: I'm not going to Canada this year.

I feel bad about it, but money's tighter than I expected (putting one's wife though grad school is expensive, as are sudden medical and dental visits), and the idea of spending a couple thousand dollars traveling to a conference that I wouldn't bother to attend if I wasn't going to present at was kind of questionable even before I was offered a contract I'd have to turn down in order to attend. (I can start Monday: but not if I'm going to OLS.)

I could still afford it if I wanted to, but OLS isn't the draw it used to be. Once upon a time everybody was there, from Linus and Alan Cox on down. Now I only recognize two of the authors listed on their "presentations and tutorials" page: Jon Masters and Tim Riker. (Add in the keynotes and the total goes up to three: Maddog.) They're cool guys and I'd drive to Houston on a Saturday to see them, but a whole week in Canada is a higher threshold. The only session I want to _attend_ is Tim's Android tutorial, and there are plenty of others on that topic.

Still, I did propose a paper, and if I'd actually managed to _submit_ my paper yet I'd probably feel more obligated to go present it. I can still finish it up and email it in if they're interested, but on the 16th I got cc'd on the email about a new (unnamed) person being in charge of papers and a new paper submission process coming soon... and the month ended without further details. Two follow-up emails to Andrew got no response. I have no contact info for anybody other than Andrew. There's no email on the website (which was broken over the weekend, main page redirected to 2009 and forcing it to the 2011 page gave me a login prompt), just twitter. The conference starts in 11 days, and this sort of stuff isn't resolved yet.

Last year, they posted draft proceedings, and _never_ posted final proceedings. (I'm still trying to track down a final proceedings PDF to update my docs collection, more emails about that to Andrew have been ignored for months.) A number of papers listed in the 2010 draft proceedings are not actually included, so I doubt they'll miss mine.

While I was pondering this over the weekend (because I had to figure out whether I could start work on monday, or not until the 20th), the OLS website was broken. The main page redirected to the 2009 sub-page, and going directly to the 2011 page gave a login prompt instead of displaying info about the conference.

But I still feel bad about it.

May 31, 2011

Posted about the ppc serial issue to the mailing list and switched to getting armv6 working. None of the non-versatile boards really seem like a good fit, they generally don't have a PCI bus but instead use flash for their block devices, and I'd rather not deal with flash filesystems or erase block sizes. (I can stick an initramfs in there for early testing and then attach soemthing else through virtual USB, but it's not my first choice.)

Since they all suck, I picked one that I've seen in real life: the Nokia 8xx. Then spent rather a long time poking at various kernel configs trying to get the N810 board to talk to the QEMU serial console.

The problems with this include the defconfig mechanism having changed in the kernel recently, so all the old docs I google for that indicate which defconfig to use for N810 aren't talking about the current format, which doesn't seem to have an N810 board. Also, I dunno what magic name this serial device uses, I've seen ttyO0, ttyO2, ttyS0, and several others. Reading through the source has not proved illuminating.

The problem with booting a new board is you're not sure if your problem is on the kernel side, the qemu side, or both. If your attempt to boot it produces no output from the emulated serial console, this could mean ANYTHING. I'm using a known working toolchain, which is a luxury, but it could be a recently introduced driver bug (non-x86 doesn't get a tenth the testing of x86, no not even arm since x86 has a standard architecture subset providing minimal expected functionality and every arm system has its own weird system-on-a-chip set of attached drivers in one big GPIO blob). It could be a similar regression on the QEMU side. It could be (probably is) kernel .config. It could be kernel command line. It could be qemu command line options.

I can use qemu's built in gdb remote debugging capabilities (the -s option fakes having a jtag attached to the hardware so you can use gdb's "target remote" option to attach to the kernel), assuming you compile an armv6l version of gdb. Which I'll probably wind up doing.

But first, lemme revert back to a known working version of armv6l (back before the versatile board broke) and see what that was doing, at least to rule out obvious qemu weirdness. Let's see, maybe the 2.6.37 kernel in aboriginal commit 1328? It built, and booted to...

qemu: fatal: Unimplemented cp15 register write (c13, c0, {0, 3})

Which sounds familiar, yes. Time to build that gdb. I have the tarball for the last gplv2 release (6.6) hanging around in my mirror, so extract that, "./configure --disable-werror --target=armv6l-unknown-linux-gnueabi"... and qemu's syntax changed again, it's now "qemu -gdb tcp::9876 -S" so that becomes "QEMU_EXTRA='-gdb tcp::9876 -S' ./run-emulator.sh" and...

Of course, the error isn't a processor fault it's QEMU ABORTING. And it even says in target-arm/helper.c line 1663 "/* ??? For debugging only. Should raise illegal instruction exception. */" So all gdb is telling me is that qemu exited. Ok, what do these registers do...

Oh yeah, there's a stable URL. Just designed to be bookmarked for posterity and be the same a year from now. Fills my heart with joy to see information indexed in such a clearly human readable way independent of the guts of some database implementation, that does.

So let's try googling the kernel source for "C15"... that's not it, try "c15"... Ah. There are two c13's paired with it:

arch/arm/mm/proc-feroceon.S:	mcr	p15, 5, r0, c15, c13, 0		@ D clean range start
arch/arm/mm/proc-feroceon.S:	mcr	p15, 5, r1, c15, c13, 1		@ D clean range top

but neither of them are getting an obvious "3" from anywhere.

Git bisect time! 2.6.35 used to work, actually booted the versatile board armv6l arch to a shell prompt. 2.6.36 has the same failure as 2.6.39 does. So bisecting between .35 and .36... Ah, helps to test the right file...

And out of time to work on it for the moment. Breadth first search (trying to find a way through to a working system) may not be cutting it here. I'm eventually going to have to depth-first debug one of these, with bolt cutters and a chainsaw. But so far, none of them leaps out at me as providing much traction for such an operation. Gimme something to WORK with, guys...

May 30, 2011

Mips is fixed now, and I'm on to powerpc. What's wrong with powerpc? Exposition time!

When you fire up a system image, you can get it to run your own code in several different ways: you can interactively type at the shell prompt, you can pipe a script into QEMU's stdin, you can create a custom initramfs for the kernel you boot, or you can create a control image for Aboriginal Linux to hand off control to.

Control images are the newest and most comprehensive mechanism. You feed a third disk image into qemu as /dev/hdc, the sbin/init.sh script mounts it on /mnt, and then at the end of the init script if /mnt/init exists and is executable, it runs that instead of launching a shell prompt. This is cool, but fairly heavyweight to set up: if all you want to do is build "hello world" to smoketest the native compiler in the system image, creating a control image is a bit fiddly for just that.

The previous mechanism was to create a custom initramfs, which is like a control image except nothing is cleanly separated: your kernel image isn't generic, your root filesystem isn't generic... not fun. (And it takes more memory, since your initramfs has to be completely extracted into memory, uncompressed. On a platform like mips where 256 megs is a hard limit, or on my current netbook "brillig" which still only has 1 gig of ram on the host and can't spare too much for QEMU... yeah. Control images more or less obsolete the need to do a custom initramfs, although it's still an option.

The original mechanism was to just pipe the script you want to run into qemu's stdin. If you just want to run two or three lines of script, this is pretty cool.

The problem is flow control: I.E. if you pipe data into stdin there isn't any. On most platforms the serial controller initialization eats a bit of data at the beginning, but if you start your script with a line of spaces it generally works pretty well.

And this is where PowerPC has an unhappy, specifically with the PMAC ZILOG serial port. For some reason, it's _really_ not happy when input and output cross. As in it hangs. If a flood of input data comes in all at once, as soon as userspace launches (I.E. interrupts are enabled), it manages to output about 16 bytes, slight pause, manages to output about 2 more bytes, and then hangs.

Alas, my normal approach for debugging this (stick printfs into the code, or in the kernel printk(KERN_ERR "blah")) is less helpful when what I'm trying to debug is /dev/console. For things like filesystems there's User Mode Linux (printf works just fine there :), but that's not helpful for inspecting specific non-x86 hardware drivers.

I suspect what I'll have to do is allocate a largeish chunk of memory in the ppc kernel, sprintf() into it, use the qemu -s option to attach gdb to the emulated system... This is not sounding like fun.

I was hoping I could git bisect this problem, but I haven't found a "good" yet. This is not a new bug, it failed the same way 2 years ago. (The zilog serial port works ok when it's not under simultaneous input and output load. It drops interrupts when it is, and does not recover. Maybe QEMU's emulation is imperfect, dunno, but something is not happy.)

I could make puppy eyes on the PPC mailing list, but this is this is not the first problem I've had with PMAC_ZILOG, and it took a while to get anybody's attention last time. Most likely this is fallout from that, and it was just never quite completely fixed. But Alan Cox's redo of the tty layer was like a truck hitting this code, and fixing fallout from that isn't "this three line change had an obvious bug" but "the way interrupts work changed, and figuring out what the new code should look like requires a much deeper understanding of how this obscure piece of hardware works than I have".

*shrug* As usual. But it would be easier to deal with if I could stick printfs into the code.

May 29, 2011

I'm getting increasingly ambivalent about attending OLS this year. I _still_ haven't got the info to actually submit my paper from the OLS guys. (It needs another couple hours of work plus the Tex conversion, but it's hard to motivate myself to finish it when there's still nowhere to _send_ it.)

I got an email two weeks ago saying that they had a new paper submission process in the works and details would follow. Details have yet to follow. My follow-up emails asking for clarification have been ignored. I note that I still haven't managed to track down the final papers from LAST year's OLS; I updated the kernel.org OLS 2010 page based on the "draft proceedings" which are missing a number of papers. (Repeated emails about this were also ignored by the OLS staff.) Is this sort of disorganization REALLY worth going all the way to Canada for?

I didn't bother submitting anything in 2009 or 2010, but mostly that was because they'd moved out of Ottawa. After following Atlanta Linux Showcase to California, I had no desire to go through that again. They're back now, but more apparently disorganized than ever, and I don't know if anybody ELSE I want to see is going. At previous OLS I had a marvelous time hanging out with Thomas Gleixner and David Mandala. This time, of the submitted papers, I recognize exactly two authors: Jon Masters and Tim Riker. They're nice, but not "drive from Texas to Canada if those are the only people I'm going to see" nice; Jon's in the US all the time and I think Tim lives in California. Thomas is German, I don't _get_ to see him unless I make an effort to.

Am I missing keynote speakers? Let's see... Darn it, OLS's 2011 page is stuck on proposal submission, asking me to log in. Its cookies are screwed up or something. And when I backed up to the main http://linuxsymposium.org page it redirected me to _2009_, and then asked me to log in THERE. Brilliant.

Not feeling good about this.

May 28, 2011

Hmmm, wonder if I should start pseudo-fasting again? I went a week where I ate a combined total of maybe 2000 calories, but I stopped when Fade got back on Thursday, because half my eating is social and I feel strange if other people are eating and I'm not. (Also, when food is in front of me, I eat it. I'm ok avoiding food that isn't there, but really bad at avoiding food that _is_.)

Fade and I had a really productive day going out to Fry's and not finding the right memory to upgrade brillig with. (It needs 256x8 DDR3-1333, which it turns out Fry's does not have in stock, and 128x16 isn't close enough.) On the way back we went clothes shopping, and I got some less ugly shorts, which do _not_ have an elastic waistband, which means they fit fine when I'm standing but when I sit down they're uncomfortably tight in a way that makes me nauseous after a while.

It's not as bad as before my week of fasting (I'd guess I lost around 7 pounds, but considering the scale was convinced I gained two pounds by NAPPING, it's all just a guess). But still, clearly not a full fix: somewhere between 1/3 and 1/5 of what I needed to be properly healthy.

The downsides to fasting are that I basically have to disassociate myself from Fade's eating entirely (not be around when she's doing it), and she tends to forget to eat herself if I don't remind her. The difference is she gets bad blood sugar crashes when that happens, which screw up her mood for hours. I just get tired and a bit headachey; my metabolism's actually pretty good with no food. It takes me half a day to notice, and then _exercise_ of all things restores my blood sugar level. (I also cheat and use diet energy drinks, the rockstar lemon and orange flavors are on sale at Several Elevens for 2 for $3. After a few days things like boiled cabbage start sounding really tasty, but I'm ok with this. :)

If I do another round, I should probably start soon. My schedule is unlikely to remain this flexible for much longer: my so far completely passive job search is starting to turn into phone interviews (well, threats of phone interviews so far), and in a week and change I need to drive to canada for Ottawa Linux Symposium. (I have free frequent flyer flights from two different airlines, but neither is willing to cross the canadian border. Driving is actually cheaper than getting an "international" plane ticket, and it would mean I have a car there and can stop by and see my sister in Minnesota. Of course I could just use one of those frequent flyer tickets to see her. And the budget for visiting canada would buy quite a number of netbooks for her and my nieces/newphews...)

May 27, 2011

I heart my new netbook, even though I haven't actually installed the ram upgrade yet so it gets a bit unhappy at times with my workload. Next time I have disposable income I need to buy my sister five of these (one each for her and four children), and since this one's named "brillig" I'm almost obligated to call its successors "slithy", "gyre", "gimbel", "wabe", and "mimsy". (You have been warned.) Amazon's currently selling low-end D255E's for $240 each, plenty powerful enough for basic websurfing and email.

I'd install xubuntu on them because Windows 7 "sourdough starter edition" is a toy, and because they haven't got the admin skills to be left alone with windows: my sister's current windows desktop (an old steam-powered Dell) is paralyzed by viruses to the point she's stopped using it.

I'm also seriously tempted to file my nails so as not to dig grooves in brillig's pretty keyboard. After 3-4 years of banging on my previous laptop, the keys didn't just have the letters worn off but got mutiple milimeter-deep gashes in the plastic. I'm trying ot type lightly on brillig, which is a losing battle since I taught myself hunt and peck on a commodore 64 keyboard and then learned touch typing in on manual typewriters where each keypress was like a scaled down brake pedal in your car. (And not ABS brakes, either: the key didn't just travel a long way but you had to press hard to make the mechanism work.)

If you think nobody was still using manual typewriters in the 1980's, you haven't met the New Jersey public school system. The class was in a "permanent temporary" classroom (I.E a trailer installed to handle overflow on an emergency basis and still there 15 years later). Some of our textbooks dated back to the 60's. Not the edition, I mean the actual physical books. The experience did not leave me a fan of US public schools.

Anyway.

Ralf Baechle has diagnosed the mips bug in 2.6.39 although so far his attempted fix just turned it into a build break. Since I'm on a more or less night schedule (with a couple extra hours thrown in for texas vs the west coast), I may attempt to fix it up myself.

His patch description explains that the underlying problem is a variable is left at the wrong value in a certain config, thus causing the panic. The fix is a multipage patch in which that variable name never occurs once even in the context lines, so I'm reluctant to fiddle with what the patch is _doing_. But the build break is simple missing declarations, I can at least fix it up enough to test whether or not what he did was conceptually right.

My earlier gripes about kernel development remain in force, although it's still true about any large project you haven't drunk enough of the kool-aid for yet. Digging in to properly understand his patch could eat the weekend if it throws off enough tangents, and Mips isn't the only target I need to fix.

For example, the armv6l target just needs a new kernel .config. I've been abusing the "versatile" board becuase it _is_, but in the wild the real hardware is just armv5l, not armv6l or armv4tl. (Think of armv4t as arm's 486-DX, armv5 as i586, armv6 as i686, and armv7l as the shiny new hotness ala the core duo. Still 32-bit, but some of 'em are SMP. The "l" at the end just means little endian, there are "b" big endian versions too but they don't get used much.)

QEMU lets you plug all these processors into a Versatile board layout, but Linux doesn't. I patched it to be more flexible, but the kernel developers keep changing stuff and breaking my patch, and since one of my goals is to push patches upstream and these guys aren't interested, let's see what natural armv6 boards there are.

The problem is, I need a specific set of hardware out of each board: enough memory, serial port for the console, realtime clock chip because make gets unhappy if the clock isn't accurate (doing dependency checking against files with dates in the future blows its little mind), network card, and three hard drives (for the root fielsystem, /home for scratch space, and control images for automated builds).

The hard drives are the problem here. Lots of these boards only have flash, which has different semantics than normal hard drives. If virtio worked reliably I could just use that (either virtio block devices or virtfs), but it's not supported on all targets or by all qemu versions, and the syntax for using it is clunky. I could also use network block devices if I'd bothered to port the server side to busybox (maybe I'll do that at some point) but I'd still need at least an initramfs to set up and mount the rest. I could also use diod to do a 9p network mount, but I need a 1.0 release first, and it has the same initramfs bootstrapping problem.

All the current targets are using virtual IDE or SCSI drives hooked up through a virtual PCI bus. (Really most of them seem to have an Intel PIIX controller, since QEMU's had good emulation of that forever and it's a cheap part widely used in the real world. That can do both PCI and USB, and then qemu can slap devices into the bus from the command line as long as the board's got it.)

And so we look at the cornucopia of boards in "qemu-system-arm -M ?", figure out which ones are armv6 (the arm926ej-s ones are v5, the cortex-a8/9 are v7...) and then build 'em and figure out if we can stick either PCI or USB hard drives in there...

(P.S. the mips bug was a missing #include, and then the patch worked.)

May 27, 2011

Not for the first time, I find myself wondering if the developers of an open source project's user interface have ever actually used their own project.

Here is a screenshot of my netbook desktop, with thunderbird maximized. Brand new hardware, that's actual size of the entire display. The little sliver in the bottom right is the amount of space actually devoted to showing the email body, in this case not quite managing to show two full lines of text.

This is because the header on the email message won't scroll with the message, it sits there taking up a large chunk of screen real estate which was annoying even on my old laptop. Then, when it doesn't load HTML elements in a message, it gives you a warning box with NO WAY TO DISMISS IT other than to load the remote content. No "I don't want to, make this box go away" option. Of course this box also doesn't scroll with the message either.

There's a header scroll extension which apparently last worked in 2009, and refuses to install on current thunderbird. I opened up the jar (it's just a zipfile) and tweaked the range of compatible versions to force it to install... but apparently it puts a scrollbar WITHIN the header window. In case you "show all headers". That needed an extension.

I am not the first person to want this. Kmail gets it right, netscape got it right back in the 90's, but Thunderbird's an email client that thinks the content of an email is not a worthwile thing to devote any significant screen space to.

I has a sad.

May 26, 2011

I've got linux on my new netpad, but it's been a number of years since I last wrote down my full install checklist, and it takes a bunch of work to set the system up how I like it, so it's probably time for a new version.

The checklist for turning a fresh install into something useful:

Select your wireless network, log in.
Run the package updates. (This will probably require a restart.)
Right click the bottom task bar, just to the left of the desktop switcher. Add new item, launcher, name bash, icon (select from->application items) bash, command bash, check run in terminal. Close the dialog, then right click the new icon and move it in front of the firefox icon.
Launch a terminal window:

sudo passwd root (set a password for root account)
sudo rm /lib/modules/*/kernel/drivers/input/misc/pcspkr.ko (shut up "pc speaker" beeps)
sudo aptitude install libc6-dev manpages-dev mercurial subversion git-core pdftk g++ ncurses-dev xmlto gcj libsdl-dev diffstat patch flashplugin-nonfree htop apache2 libattr1-dev lua5.1 lua5.1-doc liblua5.1-posix-dev vlc xmame-x
sudo aptitude remove mlocate
sudo add-apt-repository ppa:libreoffice/ppa && sudo apt-get update && sudo aptitude install libreoffice libreoffice-gtk
sudo sudo rm /etc/init/ureadahead*.conf (disable memory hog)
sudo vi /usr/lib/pymodules/*/mercurial/util.py (zap the "traverses symbolic link" test to unbreak mercurial for projects where the "www" subdirctory is a symlink to the website dir for the project out of "~/www". This is still the same basic patch as 2008).
Add home server (quadrolith) to /etc/hosts, rsync home directory from quadrolith
sudo ln -sf /home/landley/landley.net/loopback.conf /etc/apache2/httpd.conf; sudo killall -HUP apache2
Fix debian's vi brain damage and make vi the default editor: sudo ln -sf vimrc /etc/vim/vimrc.tiny && echo export EDITOR=vi >> ~/.profile

Build libthwim.so, stick it in /usr/local/lib, and turn /usr/bin/vi into a wrapper script to LD_PRELOAD it and call the old vi "$@".
Add "echo 500 > /proc/sys/vm/vfs_cache_pressure" to /etc/rc.local (use more memory for applications and less for disk cache)

Launch Firefox, view->toolbars->bookmarks toolbar, switch it off
Right click workspaces (bottom right), select 2 rows. Go to applications->settings->xfce4 settings manager, click workspaces, number of workspaces: 6
Applications->settings->xfce4 settings manager, click power manager. On AC when lid is closed: suspend. On battery: when lid is closed suspend, when power is critical ask. Extended: consider low power 5%, do not lock screen when suspending.

In terminal: rm .config/xfce4/xfconf/xfce-perchannel-xml/xfce4-power-manager.xml; killall xfce4-power-manager; xfce4-power-manager

I'm probably missing a bunch of steps, I'll come back and edit this entry as I notice stuff missing.

Memo: build new kernel, beat boot menu out of grub.

May 25, 2011

Ok, this new keyboard is going to take some getting used to.

Biked to Fry's this morning and spent a little over $300 on an Acer D255E-1802 netbook (plus memory upgrade). It's a really tiny laptop (10.2 inches by 7.3 inches, and just under an inch thick, weighing 2.8 pounds) with a surprisingly nice keyboard for its' size, although it's still a little cramped (the escape key is _tiny_ and home/end are fn-pgup/fn-pgdn just above the somewhat squashed cursor keys. (Of course it wastes space on caps lock, windows key, and menu key.) I keep hitting page down when I want to cursor right, but for actual typing... not bad. (Touchpad instead of pointer but IBM's patent has 5 more years to go.)

I first saw one at Office Max when I took Camine to buy colored pencils several days ago, and was impressed by the keyboard, general form factor, and price. All the D255E variants are externally the same, but the one Office Max had (and Walmart) was the cheaper single-processor version. I did a bit of research and found the SMP version, and I thought I might have to mail order it, but Fry's had them in stock.

The 1802 has an Intel Atom N550: 1.5 ghz dual core x86-64, hyperthreading so it fakes 4 processors (each of which claims to have 512k L2 cache even though the chip only has 1 meg total). No VT extensions so KVM isn't any faster than QEMU. Claims 8 hours of battery life with the 6 cell lithium ion (we'll see, although the power supply's only 40 watts so it sounds reasonable). 1024x600 display's on the small side but bright and clear, and a full 80x25 term fits comfortably at the default font. Comes with 1 gig of ram, upgradeable to 2. (The N550 only has 32 bits physical address space, and if you carve I/O memory out of that upgrading to 4 gigs isn't an option.) N550 also has built-in GMA 3150 3D acceleration, which can apparently play 720p video and run World of Warcraft at 13 frames per second, more importantly current X11 supports it. The hard drive is 250 megs (5400 RPM SATA). 3 USB slots, both wireless and cat5 ethernet, vga out, headphone and microphone jacks, and a memory card reader. Oh, and a microphone and webcam.

All in all a reasonable machine, roughly as powerful as the 2007 laptop it replaces. Just about every ergonomic or performance decision is "cramped but bearable", for a price that's "wince but bearable". (I bought in-state so paid sales tax, but got 15 dollars off for buying a returned unit and didn't pay shipping... Averages out.)

The reviews warned that the memory upgrade procedure is a rube goldberg affair involving removing the keyboard and something like 7 screws. (I've bought the part but haven't performed the ceremony yet.) And that the built-in speakers are crap, but I mostly use headphones anyway. Otherwise, everyone who's bought one seems quite pleased with the hardware, including "dropped on concrete and it still works" praise.

The software, on the other hand, the reviewers completely pan. It comes with a shareware version of Windows 7 "32 bit starter edition" which can't even play DVDs. (The hardware can, the preinstalled OS requires you to insert coins to continue.) The box proudly notes that Office is "preinstalled" but you have to buy a registration key to activate it. (Meaning what, that downloading software is too hard for Microsoft?)

Luckily, I don't care about that because I'm put Linux on it without ever even booting into the other thing.

I'm using Xubuntu (Ubuntu with the the XFCE desktop, much lighter and cleaner than the default). Ubuntu makes releases every six months, but every third release is a stable "long term support" version maintained for 5 years, and the others are essentially beta test versions for the next LTS. The current LTS is 10.04 (meaning it was released in April 2010).

As with most ubuntu variants the install CD is a bootable "live CD" (an idea they copied from Knoppix) which boots up the system you're trying to install, and then you click on an icon to copy it to the hard drive. (That way you know it works on this hardware before formatting your drive.) But we need a bootable USB key rather than a CD, because the D255E hasn't got a CD drive.

The "usb-creator-gtk" program copies an ubuntu install CD onto a USB key, and makes it bootable. You can either boot the install CD and make a key from that, or you can download the .iso file onto an existing ubuntu system and run the command there. It's a gui program that should autodetect everything it needs, just click OK and let it do its thing.

So, the install steps are:

Create bootable USB key (Xubuntu 10.04). Stick it in the D255E.
Press F2 during startup, change the boot order so "USB HDD is first", save and restart.
Boot to Xubuntu desktop, double click install icon, follow the prompts, tell it to wipe the whole drive (including the Windows 7 reinstall partition).
Shut down, remove USB key, restart.

It's after midnight now, so I'll continue this in tomorrow's entry.

May 24, 2011

Still on a diet. When I open the fridge to see what's there (force of habit) I either pour myself diet soda or eat a flintstone's chewable vitamin. (I could get real vitamins, but I'll actually eat these. And the overdose danger is lower than with adult dosages.) I've gone through half a bottle of vitamins in the past 4 days, and I'm almost out of diet soda.

I'm not entirely fasting, because that can cause all sorts of fun health problems (from kidney and gall stones to sudden death due to severe electrolyte deficiency). But I'm trying to keep my daily calorie intake in the 100-500 calorie range, which is ridiculously low. I can walk or bike to The Donald's any time I like for one side salad with diet vinagrette dressing (80 calories total, I burn that even walking to the McDonald's on MLK and back). Mostly I go there for a place to hang out with my laptop for a couple hours away from the cats: I like to walk or bike out late at night but I need a destination, and pretty much the only places open that late are food or coffee shops. (I miss metro. I'm deliberately not thinking about how many calories the "Big Train" spiced chai they used to make with steamed whole milk contained. When it was a 4 hour walk each way to get there, I'm pretty sure I came out ahead. :)

This is my second attempt at fasting, my first was just before Fade left for Maryland, and lasted 3 days: right up until I cooked her some potatoes au grautain for dinner and couldn't resist eating a whole bowl of 'em myself. (I'm related to both sides of the war in Ireland. Every St. Patrick's day I'm supposed to get drunk and beat myself up. Potatoes are my kryptonite. Cooked with half and half, butter, cheese, and sea salt. Did you know the Morton's people make iodized sea salt in the same gigantic containers they do regular salt in?

No, I'm not worried about salt intake. Good grief, I first heard that the salt warnings were bad science back in the _80's_, then we had a biology lab _demonstrating_ it was bad science in college, and they keep rediscovering it as a new thing because they dismiss all the previous studies as not agreeing with the studies that say what they want to hear. This is why the CDC is still clinging to the Body Mass Index which is pure snake oil and always was. Bureacracies are vulnerable to latching on to something that is easy to measure and manipulate, even when it has no real connection to the reality they're trying to infleuence. Once they do, CONVINCING them that they're wrong about something ("ulcers are caused by bacteria, not just stress") requires repeated sledgehammers to the face, and a lot of shot messengers...

Anyway. This is why I don't claim to be an expert, but don't pay all _that_ much attention to the people who do, either.

This time I again started out semi-fasting and just eating one side salad per day, and walking or biking to a _distant_ McDonald's get that. But yesterday I had a can of corn (240 calories, although you have to multiply by 3 to figure that out, "servings" are such a scam). And in the evening I boiled an onion. (Sea salt, onion powder, bay leaf. Really tasty. Google guesses it was maybe 150 calories.)

Basically, I'm holding off on food until something presumably healthy sounds really tasty, and then half the time not eating that either. (I considered buying and eating, raw, a can of spinach at wal-mart this morning. Mostly because I didn't know there _was_ a popeye brand. I wound up getting one of the lemon "rockstar" energy drinks, which are 20 calories per can and have enough caffeine to suppress appetite for about 3 hours. It's made in the presence of real lemons!)

I'm paying attention to calories because that's what matters. Back in college I remember the "Stop the Insanity" lady on television talking about "fat makes you fat", then the Atkins diet came along and you needed to cut carbohydrates to force your body into fat-burning mode... Tried both, neither worked for me. (And of course there are other theories...) But if you go down to the physics of it, the bottom line is calories.

Today, I bought a package of fake crab meat (500 calories total in the bag, but it's a big bag and I'm eating it slowly). Reading the calories on things in the store is... crazy. A giant $5 hunk of smoked samon is less than 200 calories, but a tiny yogurt drink is 260? The cracker packs fade gets are 200 calories _each_. Peanut butter has over twice the calories per tablespoon as strawberry preserves. Don't even _ask_ about the baked goods.

Of course now is when they'd have steak on sale for $1.97 a pound. Bought four packs worth and froze them all.

Much of this makes no sense. The morning bagels with salmon cream cheese is 370 calories, which is almost as much as a McDonald's "McDouble" (390 calories according to their website). The small pack of cinnamon rolls I sometimes get on long walks are 380. The venti hot chocolates I got at starbucks were 500 calories each, for a beverage. All of those are more than What'saburger's fried strawberry pie (250 calories), which you'd assume would be deadly...

Anyway, I'm currently boiling a cabbage with a bit of sea salt and onion powder, because I'm hungry enough for that to sound good. No idea how many calories in that (how big is a cabbage?). Probably go for a bike ride afterwards.

Not so much a diet plan as my project du jour, really.

May 23, 2011

I was in the hospital the day before yesterday for my annual early morning "This can't be food poisoning yet I'm going into shock, am I having a heart attack?" This was year 3.

Since Fade's in Maryland (if I pass out nobody's likely to notice for a while, and nobody can drie me to the hospital), I called an ambulance, and the EMTs confirmed I'd had some kind of attack (still sweaty, and my color was off enough they were wondering if my liver was ok). So they took me to the hospital.

When I got there let them draw blood this time. (I have a phobia of needles, the previous two years I let them do everything _but_ draw blood.) They said my heart's fine, that my resting heart rate is so low (high 40's) that it keeps setting off the heart monitor alarm, they did an x-ray to rule out things like aortic aneurism... and basically said they couldn't find anything wrong with me. By the time I got to the hospital, I'd recovered from whatever it was. Again.

Since it happens at the same time of year every year, my theory is it's probably allergies. I live a couple blocks downwind of one of Austin's ubiquitous parks, and we have the windows open at night. My friend Kelly told me back in the 90's that if I didn't already have allergies _Austin_would_provide_. (She then found out that what she'd been treating for years as asthma was actually congestive heart failure, which is deeply reassuring. I was pre-nursing in college long enough to develop plenty of ammunition for hypochondria.)

But since I now have medical professionals telling me my heart is fine... time to do something about my weight. Yes I go on insanely long bike rides, walks, and swim, but computer programming is a desk job and I eat too much. Even if my weight _wasn't_ the reason for this, it's what I was worried about, and I might as well do something about it now before it becomes an issue.

I started out fasting becuase I suck at moderating amounts but am pretty good at yes/no questions. I got out of debt by cancelling my credit cards. I've avoided a family history of alcoholism by not drinking at all. And I often forget to eat for half a day or so anyway, it's usually the presence of food that makes me hungry. I eat because it's tasty. I eat when other people _around_ me eat. Fade's in Maryland until Thursday, as long as I'm by myself I don't get reminded of food much.

We'll see how that goes...

May 22, 2011

Of course the promise that religious fundamentalists would evaporate en masse proved too good to be true. They seem to want it, and the rest if us wouldn't have to put up with idiots teaching creationism in schools, passing laws to define marriage as "one man, one woman, and the housekeeper", domestic terrorists assasinating abortion providers, pedophile priests, the whole "Okay to be Takei" thing... sounds win/win to me. Obviously reality doesn't actually work that way, but I'm sure scientists could come up with something to serve the purpose. Orbiting space fleet with disintegrator rays, maybe?

Yes, scientists. The people who examine reality to determine the truth and then figure out how to use it, producing antibiotics and air conditioning and automobiles and cell phones and so on. Real things that actually work, on a regular basis. Religion does not do this. Study your holy book du jour all you want, you won't invent the transistor. You'd think this would be obvious by now, but some people are afraid NOT to believe in Santa Claus just in case they'd be giving up the presents. Clap your hands for Tinkerbell while you're at it, otherwise you MIGHT be killing fairies. It's the exact same argument.

It's sad that we can't show stuff like this and this in school. If you want a balanced debate then show a balanced debate.

Personally I consider atheism a religion the way zero is a number, but I'm equally agnostic about Santa Claus, Tinkerbell, and the second coming: prove it. Quoting from the bible means as much to me as quoting from the koran, the illiad, the elder edda, the egyptian book of the dead, dianetics, the book of mormon, wookiepedia, the star trek TNG technical manual, or doctor who. Go read "God: A Biography" (It won a pulitzer, shouldn't be hard to find) or "Ken's Guide ot the Bible" (which is a cliff's notes sort of thing that's full of chapter and verse citations so you can read along with it).

I wouldn't care except that organized religions exert massive political power in this country. They really shouldn't, but separation of church and state keeps leaking.

Anyway.

Gentoo continues to be subtly flaky. For example, this happened building the powerpc cross compiler:

gcc: Internal error: Killed (program as)
Please submit a full bug report.
See <http://bugs.gentoo.org/> for instructions.

It gives no more details than that. It was the host toolchain that died, not the cross compiler it was building. (I.E. it was gentoo's compiler, producing x86-64 code, that threw a wobbly.) And there's nothing in dmesg about the OOM killer triggering or anything. (This is on quadrolith, a box with 8 gigs of ram.)

Re-running the same build script, it worked fine. And on the original pass, powerpc-440 built just fine. Sigh. (The host-tools.sh step can provide known working versions of everything _except_ the host toolchain. Well, and the hardware...)

May 20, 2011

The glibc file sys/personality.h (installed in /usr/include/sys/personality.h in xubuntu 10.04) says "taken verbatim from Linux 2.6 include/linux/personality.h". The Linux file is GPLv2 (as is all the Linux code). The glibc file has a giant LGPLv2.1 boilerplate notice at the top, which is convertable to GPLv2, but so is BSD/MIT licensing. It's not convertable the OTHER way, which is what they did.

Isn't it fun the way the FSF can just arbitrarily relicense other people's code? Oh, and the refusal to #include <linux/file.h> (as if it's somehow CLEANER to copy linux-specific headers into glibc) is fun too.

Oh well, presumably it's all been worked out in smoke-filled rooms behind closed doors...

May 18, 2011

Some people don't know how to use chroot, and unfortunately, these people wrote LXC. And thus they insist that LXC has to use pivot_root() in order to securely chroot, which is crazy.

Sigh.

I myself had to ask Linus Torvalds about this, back in 2005 when I was writing switch_root for busybox, and it took multiple replies from him and others to set me straight. It turns out to be nonobvious, but also fairly simple.

Each process has three magic symlinks into the filesystem ("/", ".", and ".."), and all paths you construct start with one of those. (The default if you don't specify is ".", although you can be explicit and say "./blah" if you want to, which is useful for preventing $PATH lookups.)

The "." link is the main thing chdir() updates, it's your current working directory. The "/" link is what chroot() updates, it's the root directory of this process's view of the filesystem tree. If these were the ONLY links we had to worry about, people would still be confused by the chroot() system call, because chroot() only updates "/", it doesn't touch ".", so if you do "chroot()" to anything other than "." then most likely "." will point outside of your new chroot afterwards.

The ".." link is the one that confuses people, it's _also_ set by chdir() to point to the directory you came from to get to this ".". (No, not the old ".", but the inode representing the next to last path component in the path you cd'd through to get here: in "./subdir/otherdir" it would be subdir.) The full horror of this kludge is explained at length here, but basically it's an ugly hack added because symlinks and bind mounts let you descend via multiple paths into the same directory, so "where you came from" is nonobvious, and "cd dirname; cd .." dumping you in a different directory is considered bad form. Except that ".." just records _one_ level and not the whole path that got you here, so having it doesn't really fix anything ("cd dir1/dir2; cd ../.." still doesn't work reliably), yet it complicates things enough to convince otherwise smart people that you can't use chroot securely because this confuses them. (And now that I look at the code it's been cleaned up rather a lot recently, so this may be less of an issue than it was. I'm only seeing it actually _store_ "." and "/", and now it's mostly calculating ".." instead of storing it, see fs/nodei.c. The code's changed a lot since 2005. So some of the paranoia about this could easily be stale, but I'll continue to explain how it USED to work...)

An important point is that ".", "..", and "/" are independent of the mount tree. You traverse the mount tree when descending into subtrees to get from one directory to another, but each _starting_ location is a process-local cached link to some inode somewhere, and these links only change when explicitly updated.

This means they're _not_ updated by changes to the mount tree, so if you mount over the current directory, "." will still point to the old directory you mounted over, which doesn't have to be empty. Also, the path resolution logic only detects matches (and diverts you to another filesystem) when you DESCEND into a directory that's been mounted over. The directory link that LEADS here got spliced, but once you're in the directory that got mounted over it doesn't matter how you got there, and yes you can access the contents of subdirectories relative to an overmounted "." because you're not passing through a mount splice point to get there. In fact this was already a common trick script kiddies used to hide their warez directories back in the 80's: cd into a directory before it got mounted over, set up an ftp server that served "./pr0n/", and watch the sysadmin scratch their head to try to figure out where your files are.

Anyway: taking all this into account, the way to chroot properly was a three step process:

chdir("/path/to/newdir");
chroot(".");
chdir(".");

The first chdir() sets "." to point to the new directory. (All your complicated path resolution gets done there, and done exactly once.) Then chroot() sets "/" to point to the new directory, copying the cached "." link that already got resolved in step 1. The third chdir() sets ".." to point to the same new directory, because when "." and "/" are the same then ".." gets pointed to the same thing. Each of those three syscalls sets one of the links for your process, and you have to do all three in order to get the result you expect.

The big long argument about "securely" using chroot was that the chroot system call _only_ sets the "/" link, it doesn't change "." or "..", and if you're not careful they'll point outside of the new root. That's why you need to sandwich it between two chdir() instances, the second one doing something subtle. (And possibly no longer required in 2.6.39.) But unless you do that dance, in older kernels at least, you can chdir("..") and reach a supposedly inaccessible directory. But it's _your_ mistake for thinking it's inaccessible. You can MAKE it in accessible, but that's an extra step.

I.E. the chroot syscall, _by_itself_, doesn't do everything you need to isolate processes within a new root filesystem. That doesn't mean chroot can't be used as part of this complete breakfast, it just means you have to understand what it actually DOES and do the rest yourself.

In theory you can also have open filehandles pointing outside the current directory. The way to fix that is to close them before you do the above three-step dance: just iterate through /proc/self/fd and close every filehandle above 2. (Having /proc properly mounted in the host context is your problem, trivial to notice it's not and if root has to conspire against you to open a security hold you're already screwed. And of course knowing when you NEED to do this and when the caller of chroot actually _wants_ to pass in extra filehandles and has a legitimate reason to do so... again, not the syscall's problem, that's policy in userspace.)

And of course if your new chroot lets you re-establish the mounts (mknod the /dev nodes, re-mount the network shares, etc) then losing access to the original mount instances doesn't mean much. Which is half the argument against chroot jails within which you run processes as root: the power to get into the situation is the same power needed to get out of it again. The chroot() syscall is _not_ a container, but it can be used to set _up_ a container... Except the LXC guys don't trust it.

By the way, the reason I had to learn all this is switch_root didn't do what I thought it did. "mount --move newdir /" made the mount on newdir _vanish_without_trace_. Going "cd /" gave me the old one. Why? Because the "/" symlink in my process still pointed to the old inode, it was like looking at "." after I'd overmounted the current directory: changes to the mount tree were orthogonal to which inodes my process's three location links referenced. What I needed to do, in order, was:

chdir("newdir");
mount(".", "/", NULL, MS_MOVE, NULL);
chroot(".");
chdir(".");

That way "." pointed to the new directory, then do the "mount --move . /" to make /proc/mounts look right (which would render the mount point inaccessable except that "." points to it), then chroot(".") to point "/" to the same place as "." (I.E. updating "/" is orthogonal to the mount --move), then chdir(".") again to update the ".." link which will notice that "." and "/" are the same and thus ".." should equal "." because we've hit a boundary condition.

All this has nothing to do with pivot_root(). That syscall is an obsolete hack invented to clean up after initial ramdisks. (The things we used before initramfs was invented.) The ramdisks tied up large chunks of memory so it was nice to free them one we had our real root filesystem, but the problem was that lots of processes had their "/" symlinks (and maybe "." and "..") pointing to "/" which tied down the old initrd mount point, and as long as those references existed you couldn't unmount the old initrd because it was still in use. Since some of those processes were kernel threads (which often daemonize(), which involves chdir("/") as part of their clean-up so they don't pin random sub-mounts like /home and such if that's where you ran ifconfig to bring up a network interface that launched a kernel thread, and yes I hit that bug circa 2001 and reported it to Andrew Morton, who fixed it), meaning you couldn't even kill them from userspace as root because they're not normal processes. So the system call iterated through every process in the system, examined its three links, and moved any that pointed to the old filesystem to point to the new filesystem instead. (Yup, that meant this syscall did a chdir() and chroot() in other process's contexts, behind their back, asynchronously. It is exactly that ugly.) It then did a mount --move of the old / into a subdir of the new one so you could umount it.

All this is TOTALLY UNNECESSARY in lxc. Looking at OTHER processes to move their links out of your directory is both expensive and silly, and fiddling with the mount tree is orthogonal to doing a secure chroot. But if you don't understand how it all works it's easy to just wave your hands and say "security" and pretend that your added complexity somehow improves matters. (Which is about the same logic by which they're using "linux capabilities". Secure is still analogous to watertight: as a great engineer once said, the more they overhaul the plumbing the easier it is to stop up the drain.)

But if chroot couldn't be used securely, then you could drill back to initramfs after a switch_root. This is used elsewhere, and works elsewhere.

It's theoretically possible they have some other reason for avoiding chroot() and using a much more compliated interface with additional side effects, but if so they haven't said why...

May 17, 2011

Reading through the source of the LXC package, both to figure out how Linux containers are implemented under the covers in vanilla Linux, and in hopes of writing a busybox-ish version of the functionality without so much "big iron" thinking in the code.

The LXC package installs a bunch of commands (all with an "lxc-" prefix): attach, cgroup, checkconfig, checkpoint, console, create, destroy, execute, freeze, info, kill, ls, monitor, netstat, ps, restart, setcap, setuid, start, stop, unfreeze, unshare, version, and wait.

The shell scripts in this list are ps, netstat, ls, checkconfig, setcap, setuid, version, create, and destroy. Of those checkconfig, setcap, and setuid are install scripts that really shouldn't get deployed on the target system, and ps, netstat, and ls basically just run a command in the context of the container (although they do so from host context).

That leaves version, create, and destroy. The first is trivial, it just does echo "lxc version: $VERSION". The destroy script boils down to "rm -rf --preserve-root /path/to/$NAME" except it takes 79 lines of boilerplate shell script to get around to doing that. (See "big iron thinking", above. It's not quite as bad as FSF bloat, but it's up there.)

The create script is essentially a wrapper around the "template" scripts, I.E. "$TEMPLATE --path=/path/to/$NAME --name=$NAME". (Of course it takes 175 lines of shell script to get there.) To be fair, those 175 lines do two other things: copy a specified config file to the destination directory, and call the above rm -rf script if the template returns an exit code indicating failure.

A "template" script populates a chroot directory. There's a wrapper around debootstrap, a wrapper around busybox install, one for ~~Pointy Hair Linux~~ Red Hat Enterprise, etc. For some reason, they live under the "lib" directory (or in the case of the default install, "/usr/local/lib/lxc/templates"). Yes, they're still executable shell scripts, and they live in the lib directory.

Basically what a container _is_ to LXC is a directory somewhere (at a hardwired absolute path living under /var for some reason), containing a config file and a root filesystem subdirectory. (That's it.)

The remaining lxc commands (attach, unshare, stop, start, execute, monitor, wait, console, freeze, info, cgroup, unfreeze, checkpoint, restart, and kill) are implemented in C. There's also the lxc init task, which is run in a container and is responsible for initial setup before execing the real init.

May 15, 2011

I'd forgotten that in order to make dvd playback actually WORK you have to "sudo /usr/share/doc/libdvdread4/install-css.sh" after you install libdvdread4. (I.E. the library is useless, you have to run a shell script, as root, out of an obscure documentation directory. Wheee.)

May 10, 2011

I need to follow up on the device tree stuff. Possibly recording that link here will help me remember.

May 9, 2011

Git bisect is fiddly. It's really easy to flub a test (telling it good or bad wrong) just due to the sheer monotony of comple/test cycles, since you're _going_ to get distracted by other things if the compile takes any amount of time.

I've taken to sanity checking each commit it winds up with. Checkout and test that commit to ensure that it _does_ show the problem (note: it may not be the _current_ commit, it may be one you tested earlier, so you'll have to do a fresh checkout and build to be sure), then either "git show BLAHBLAH > file.patch" the commit and apply it as a reverse patch, or "git checkout BLAHBLAH^1" to get the one before it, then test _that_ to make sure that it gives you the "before" scenario. If it doesn't confirm the transition (either both succeed or both fail), restart your bisect using the commit you found as your new good/bad end.

May 8, 2011

There's an old adage in computer science: writing code is easier than reading code. Not less valuable: just easier. If you're any good at programming, it takes less effort to write new code of your own than to fully understand other people's existing programs.

The amount of reinventing the wheel this has caused, the EPIC levels of "not invented here" syndrome, are just staggering. And ironically, this is something that can get WORSE as you become a better programmer.

When you write new code from scratch, you start with a mental model, and the code trails that mental model. First you work out what problem you're trying to solve, then you design a rough solution, then you code it up, then you test and fix it in a feedback loop (which goes on for a long time; I often refer to my code authoring process as "debugging an empty screen".) At every step, what's in your head is ahead of what's on the page.

Programmers usually refine their mental model as they go along, meaning the act of coding and the act of designing tend to get linked. The computer more accurately remembers details than we do, the compiler tells us when parts don't match up, and often our first attempt was just wrong and we figure out what we SHOULD have done after we've "written one to throw away". So we get used to starting with a basic idea and then fleshing it out as we code. We know our goal, what direction it's in, and _a_ path to get there, but the code on the screen helps the same way doing arithmetic on paper or writing down notes during a lecture helps: it came out of your head but having it there in front of you helps.

I.E. we get used to tinkering with code while thinking about design. Give it a few years and this can become a VERY STRONG habit, distracting us into modifying anything we look at, and we convince ourselves it means we're better than the guy who wrote this when in reality we're peeing on it so it smells like us.

When you're reading somebody else's code, everything goes the other way. You start with a confused and incomplete mental model and what you read adds unrelated bits onto it semi-randomly, often with nothing to attach them to yet. This is not comfortable. Programs of any reasonable size are not linear stories, they're choose your own adventure books that you have to read ALL of (or at least isolate a discrete chunk with X ways in and Y ways out), and figure out how it all fits together into a 3d structure later. You _assemble_ an understanding of other people's programs out of initially unconnected bits and pieces. (And if you're lucky, incomplete documentation that's only a few years stale.)

For small projects, this isn't a big deal. If the whole thing is 10,000 lines or less, you can just sit down and read it. You may need multiple windows open, and there may be a lot of flipping back and forth as you figure out what calls who or where that structure member got set, and personally I need a quick break every couple thousand lines because my BRAIN FILLS UP, but small programs are manageable.

Even with small programs, you'll wind up reading bits more than once, because it's possible to miss nuances, or "I didn't read that right". When you're the one who wrote it, it was in your head at one point, and at least you knew what your _intent_ was. Reading it doesn't provide the same guarantees: there WILL be bits you don't understand on the first pass, "what on EARTH are they doing here?"

Up to maybe 100,000 lines of code you can tackle it as a project. You may not be able to handle it in a single sitting, but you can still set aside a couple days and read everything (or at least the interesting bits). I tend to print this sort of project (tinycc, chunks of busybox) out onto dead trees and stick it in a "honkin' big" 3-ring binder, and carry it around with me like a 20th-century luddite. It reduces eyestrain, gives me something to physically bookmark, somewhere to scribble notes... but the main advantage of printing stuff out is that I CANNOT EDIT IT.

Programmers who are used to WRITING code tend to refactor as they go along. When we're writing, we're designing... which is output, not input. Remember the annoying kids in class who shouted out their guess about the next thing the teacher was going to say? (If you don't remember, you were probably that kid. I know I was.) Yeah, this is something you're supposed to GROW OUT OF. Go ahead and fix typos in the comments but THAT'S IT, don't try to fix what's there until you know what's there.

Then there are the really big projects, with millions of lines of code, so much you physically CAN'T read the whole thing. It would take years just to work through a snapshot in time, and in many cases new commits come in faster than you can comprehend existing code.

The reason I've been thinking about this topic recently is because I've been getting into serious linux kernel development, and it's HARD. Not because I don't understand approximately what it's trying to do, nor because the people who wrote it are smarter than me (a few individuals are, but I'd put myself in at least the top 20% even of this crowd). The problem is all the page faults.

When I see kernel code do an fgets() or a __getname(), I don't know what that _means_. So I have to go off on a research tangent: maybe google it, maybe grep for other users, maybe track down the implementation and work it out. For big stuff like "what is a workqueue" there are books and such (I still haven't finished Robert Love's third edition kernel book, which is slow going for its own reasons), but when the code goes "(long long)NFS_FILEID(req->wb_context->path.dentry->d_inode)", if what it's doing isn't already obvious from context in an "I don't need to touch this" sort of way, figuring out what that one line means involves following three indirections to figure out what the various fields are (what they contain, where they're modified and why), plus looking up a macro _and_ figuring out why the result needs to be typecast (dunno, just grepped for something with a lot of right arrows in it). And the "dig deep for data" ones are pretty easy, the "bounce off a function call we looked up out of table" ones mean that to follow the flow control you have to find where some structure was initialized and registered (generally during a module init function). People bitch about gotos all the time for making code hard to read, but function pointer callbacks are just as good at hiding where the code goes next. (They just have a better excuse.)

Stopping to read some OTHER bit of the code usually leads to more tangents, even when it's just a simple function call. And once you're about five levels deep it's easy to lose track of why you wanted to know any of this in the first place. (You can also wind up doing the human equivalent of swap thrashing, which is often a cue to take a break.)

Long-time developers on these projects don't have to stop when they see __getname() because they _know_ not only what it does but why it does it. They can read other kernel code quickly because they have buckets of domain-specific context: where I currently take 30 such "page faults" traversing a file, they'll take maybe 3. Or none. Eventually if I keep at it (and keep up with it), I'll have enough cached in my head to get work done (within specific subsystems) without the human equivalent of swap thrashing, but I have to work my way to that point. Which his slow going.

For new developers on a big project, even experienced ones, getting up to speed on the domain-specific context can take a while. For example, at parallels they assigned me to containerize NFS. This involved learning about containers (which was loads of fun: openvz, lxc, namespaces, cgroups, I loved that part), and coming up to speed on kernel development (which was exhausting but is a cost of doing business), _and_ coming up to speed on NFS (its own horrible can of worms eclipsing the previous two COMBINED, and you can't imagine the smell), plus the various non-programming parts of a job working for people on the other side of the planet who've never had a US engineer telecommute for them before (which turned out to be the insurmountable part, but oh well).

Some advice for other people getting into kernel development: don't start with NFS. Actually, don't ever mess with NFS if you can avoid it, you really don't want to get any of it on you. NFS is the kind of thing nobody ever works on for fun, and my immediate reaction to its crawling horror was either "what are the _alternatives_" or "how do I kill this with fire?" depending on how polite you want to be, which led me to 9p. And now that I don't have to worry about what happens to NFS anymore, I'm poking at the 9p filesystem for fun.

Partly I want to help 9p succeed because I think it has great potential to be a cool and useful technology. Partly it's because I've already started so might as well follow through: I've set up my test environments (yay diod and virtfs), and read through include/net/9p and most of net/9p, and next up is fs/9p.

But a big part is it's a nice reasonably self-contained little cul-de-sac where I can work through the rest of my kernel development swap thrashing. I understand what v9fs itself is trying to accomplish, it's a little corner where I have traction and can work through the secondary issues like __getname() in peace. Yes, it's got buckets of its own local knowledge too, such as _dotl meaning functions that only apply to the 9p2000.L spec (obvious in retrospect but it took me a while to figure out), but that's self-contained and in the 10k-100k "manageable without special clothing or rituals" territory. I've beachcombed the edge of this ocean for years, I'd like to at least learn to dog paddle.

So: to get back to the "reading code is harder than writing code" thesis. Only AFTER I finish reading the 9p code do I get to redo the option parsing stuff, and make the tcp and socket transports properly layer on top of fd, and configure out when TCP and unix domain sockets are configured out, and add a NET_9P_LEGACY thing to remove the non-dotl stuff like the giant error string->errno conversion table, and everything else that's occurred to me (and got written down in my notes) while studying the existing code.

Just because it's harder doesn't make it any LESS VALUABLE. Quite the opposite: veterans like me have to go back and FORCE ourselves to read code, sitting on our hands to avoid modifying it as we go. (Or refactoring into a temp file and then THROWING IT OUT afterwards; been there, done that.)

When you start out as a newbie programmer, reading other people's code is how you learn all sorts of tricks, it's fun and rewarding because the code you're reading is better than the code you're writing. But after a while, and the average quality of the code you read goes down relative to the stuff you're writing. You also spend your time focusing on the broken bits of existing code: usually the stuff that works gets left alone. You refactor code because it's needs to be refactored. You upgrade or replace code that needs to be upgraded or replaced. You selectively read BAD CODE, and it stops being fun.

And over the years, you learn to design on the fly, code as fast as you can type, until you can bang out new code in seconds. When reading code, you see endless series of things that need to be fixed, which is a distraction... and gets you out of the habit of reading large swaths of code at once even when you can.

And we _do_ read code. we read our _own_ code, during the endless debugging sessions where we compare our code with our mental model to see where they don't match up. It trains us that having a complete mental model is normal, differences between the code and that model are either flaws in the code or flaws in the mental model. Your own code is comfortable, it smells like you. Other people's code is strange and foreign and doesn't look right, and this half-formed mental model you get while reading it is crap so obviously the person who wrote it couldn't design their way out of a paper bag, the mental models you get writing your own code are so much clearer and more detailed...

This is a trap. Indulge in this way of thinking long enough and you'll go stale. For senior developers, the ability to read other people's code is MORE VALUABLE than the ability to write it, because you practice writing code all the time, we _assume_ you can do that. If your code reading skill gets rusty (which happens), FIX IT. When you get out of practice, go practice.

Open source tries to emphasize this, but it's hard. We say "code review patches", which implies reading and coming up to speed with the context those patches apply to. Yes it's slow: make the time. And no your review won't be perfect: you need to open yourself to rejection and ridicule. You're not an expert in other people's code the way you are in your own areas. You will make mistakes when you try something new. That's fine, part of the peer review process is beating a consensus out of the peer review. You're still learning: that's the point. If you ever STOP learning you're no longer any use.

Why do I tend to write documentation? It's my "sit on my hands" alternative to refactoring code as I go along. I take notes, in a separate window, and then I turn those notes into english. I have decades of practice producing OUTPUT as I refine my mental models (it's like moving your lips when you read), so I produce english output instead of code output. And usually the first pass is wrong, but I post it somewhere and people tell me I'm full of it and I fix it. The point is, when I don't know something, and I have to read code to learn it, I write up documentation as part of my learning process.

One corollary of Moore's Law is that 50% of what you know is obsolete every 18 months. Luckily in the Unix world it's mostly the SAME 50% cycling out over and over, and stuff like Posix sticks around for a while. But being afraid to admit ignorance, being afraid to make mistakes, are poisonous in this field.

(The point at which I knew the Parallels job wouldn't last was when my "supervisor" gave me my performance review criteria. They wanted me to look like an expert on the lists, which meant my ratio of "questions asked" (from me, to other people) to "correct answers given" (me anwering questions other people had asked) was supposed to be 10 to 1. I had to give 10 answers for every question I was allowed to ask. It... doesn't work that way. I've been programming for almost 30 years, and I am _still_learning_.)

May 7, 2011

In debian, installing git installs perl, libx11, openssh, rsync, and gnu privacy guard. Bravo, distribution package dependency maintainers. Bravo. *golf clap*

I'm updating my lxc howto, because the debian guys shipped a release ("squeeze") which means the testing branch ("sid") has gone all unstable again. I was using testing because the previous debian stale didn't have lxc in its repo, but the current one does so I'm redoing the instructions for the stable release.

Part of the reason for this is I need to write up my OLS paper, and part of it is so I can test containerizing p9's tcp transport. (I think I've done it, it compiled, now to test it.)

May 6, 2011

Feeling vaguely uninspired. There's a dozen things I could do, but I don't really feel like doing any of them. Probably some variant of a cold.

Ah. Not having eaten anything but a salad until 5pm might have something to do with it.

May 4, 2011

Twitter's "classic" web UI is once again broken, this time because http://twitter.com/statuses/replies.rss is "401 unauthorized".

(I'd twitter that, but they broke their web UI, and my phone's onscreen keyboard is really tedious to use.)

So: twitter clients for linux. "aptitude search twitter" brings up exactly one: "gtwitter", which google says is crotchety and buggy, and an attempt to install wants 27 packages (half of which are mono, ximian's clone of the Microsoft .net language). Uh-huh. Not my first choice.

This page lists 5, and is fairly recent (the first few hits were from 2008), and pino looks good. (I'm currently 4 days behind on twitter in my phone's scrollback buffer, so the apps _without_ a big scrollback buffer aren't interesting to me.) Pino uses a mercurial repository (yay!) but wants a dozen prerequisite packages installed (boo!) to build it from source, more than half of which I don't have. Installing the binary for ubuntu involves plugging _two_ new repositories into dpkg (then remembering to run aptitude update because for some reason adding a repo doesn't pull any headers from it), but that seems like the easy way. And it wants to install one package from each new repo, but ok...

It's thinking... Well, it managed to pull my user icon. Still thinking. (I admit that t-mobile's internet service has degraded to about 2k/second when it's working, whcich is intermittent. I've been in to poke them multiple times -- replaced the sim card and everything -- but I need to find one other than the Triangle location because they say only getting "edge" is normal there).

It's been thinking a while. Hit refresh: 401 unauthorized. Um, re-enter the password in the account? Still 401 unauthorized. Google, google... Maybe the switch to oAuth has something to do with it? But that was a year ago. And the date on /usr/bin/pino is... a year ago. Great. That's their most recent stable binary release. Bye-bye pino.

Right, I guess I stick with reading twitter on my phone for the moment, and only posting very intermittently.

May 3, 2011

Tweaking my miniconfig.sh script. I really need to do a C version of the thing, but that's a much bigger todo item and this is a brief diversion.

Miniconfig lists just the symbols you have to explicitly enable to get the desired configuration, by starting from "allnoconfig" and switching on symbols in menuconfig or similar, letting the dependency checker resolve whatever else it needs. So it doesn't list symbols that are forced on by something else, it's just the checklist of unique symbols that this config requires.

A big advantage off this approach is you can read the list and see what you're actually using, without dozens of pages of irrelevant symbols. My baseconfig for Aboriginal Linux is 42 lines. It's also somewhat resistant to version skew from upgrades, although there's a flaw in that (dependencies only resolve in one direction, so when they add new "guard" symbols on menus and such, it can ignore the attempt to switch on symbols in that menu if you haven't switched on the menu first. Part of redoing the C config infrastructure would involve fixing that flaw.)

The big disadvantage of this approach is that the process of _creating_ a miniconfig is really slow. I have a shell script that creates them experimentally, by iterating through and removing each line one at a time, and either saving it or keeping it depending on whether or not the resulting .config changes. This means it's running make allnoconfig thousands of times, which ain't cheap.

The easy way to optimize the script is to take advantage of the assumption that these files are sparse, so most of the lines aren't necessary. Even in a pretty tightly packed case like the current ubuntu kernel, you start with around 5500 lines of .config file and wind up with about 2600 lines of miniconfig, so even a distro kernel switches half the options off. Most actual configs are much smaller than that. Defconfig's around 250 lines.

I used to start by hitting the .config with a sed script at the beginning to remove all the blank lines and comment lines, but unfortunately THAT SOMETIMES CHANGES THE RESULTS. (Those "comment" lines aren't, they force something off that might otherwise default to on. Yes, even with "allnoconfig". This is a bit funky and probably deserves further investigation, but the todo list runneth over...)

The new heuristic is to add a "stride". If you've succeeded at removing a single unneeded line twice in a row, next time try removing 2 at a time. If that works, remove 3, then 4, then 5... until you get a failure, at which point you go back to a blank stride (which is treated like a 1, and then next time it's set to 1 and increments from there, which gives you the "two successes before stride kicks in" behavior). This allows you to skip large blank chunks in fewer tests.

The cost is needing to retest when you downshift. Testing with STRIDE>1 doesn't actually _prove_ anything about the resulting file, you may need all of those lines, or just one, or any combination thereof. So you've almost wasted a test. (Except not quite: there's one case where the failed test told you something, and that's if STRIDE=2 and the following test removes a line. Then you know that the second line was the one that was needed.)

The pathological case is "YNNYNNYNNY", just enough whitespace to upshift and then an immediate failure forcing a downshift and retest. With stride support this does 4 tests for every 3 lines, theoretically slowing it down by 33% in the worst case scenario. But that's a fairly specific pattern, and most will actually be sped up, and the remaining ones mostly not slowed down (which is why the two removals before stride kicks in)...

That's probably about as much work as is worth doing before just chucking the script and writing something in C or Python.

April 30, 2011

I'm trying _not_ to rant. The Sam Vimes "wanting to arrest the gods for doing it wrong" thing is only useful to a point, and I think I've gone past that. But at the same time, I don't understand how to make this work:

Mark explained to me that the gmail "inbox" isn't a folder, it's a tag. Gmail has three actual folders email lives in: "All Mail", "Spam", and "Trash". The Inbox is messages in All Mail tagged as non-duplicate. So the vanishing email problem I've had where if I'm cc'd on messages I only get one copy, so either I don't see it in my inbox, or the threading is broken in my mailing list folders, is an artifact of me using gmail wrong.

So I tried to adjust thunderbird to run my filters against "All Mail" instead. (This means I'll have to flush my existing folders and let it refill them, but I'm ok with that.)

Here's the problem: thunderbird won't automatically run filters against any folder except "inbox". The "filters for" pulldown has account granularity, and it hardwires in which folder the account delivers mail to. (Drilling down through the layers of shell script used to launch this thing, /usr/lib/thunderbird-3.1.8/thunderbird-bin has four variants of the string "inbox" in its binary.) In the case of gmail, the folder they seem to have hardcoded is not the one I want.

There's probably a simple way to fix this, but I can't find it. (Apparently there used to be a way, but it was removed.)

I'm pretty sure rebuilding thunderbird from source is not the polite answer. Nor is installing fetchmail and trying to work around thunderbird's imap behavior.

April 29, 2011

Reading through the 9p filesystem code. On the 9p development list I proposed redoing the mount parsing so "1.2.3.4:/path/to/thing" and "1,2:/path" could be parsed as IPv4 transport and filehandle pair respectively. And then names it didn't recognize could get checked against available virtio channels and so on.

But when I get to implement that, it the various transports were split into separate modules, meaning I can't write one function that identifies them and calls the transport because the transport might no be loaded. Instead I need to iterate through the list of registered transports and call a probe function that sniffs the block string and parses it into "server ID" and "path to mount".

I also want to save this information persistently, at least in the ipv4 case, because if the connection with the server is lost I want to redial like samba does. (IP masquerading timeout, server reboot... It happens.)

This led me to reading through other parts of the code which are a bit tangled. The linuxdoc comments label both init_p9 and exit_p9 as "v9fs_init": is that a mistake or does it have a meaning I'm unaware of? (Looked at other filesystems and they don't bother labeling their init/exit functions at all.) I am a bit out of my depth, but what else is new? It's still a cakewalk compared to NFS.

The structure storing the per-transport options is "struct p9_fd_opts", which is a _little_ weird because it the read/write file descriptor pair, _and_ the TCP port number. No ipv6 support I can see, I'll have to add that. (I don't personally care, but back in the 90's the whole of Korea was sharing a single class B address range, so they switched over way back. US and Europe use IpV4, asia uses IPv6.)

Eventually I went "where does this start" and went to re-read the Linux Weekly News "libfs" article, which talks about get_sb() which was remomved recently. So is there a version of this in Documentation that might be up to date? If so, I can't find it.

Reading through the kernel Documentation directory, I remember I wrote some docs on page tables and translation lookaside buffers a couple years ago, I should do something for the kernel. There's a thing on TLB apis but it doesn't really explain _why_. And the filesystems directory has docs for filesystem drivers and docs for filesystem apis, those should be split into two directories...

Busy day and I haven't actually done anything yet. :)

April 28, 2011

Rediscovering my 8 gazillion setup hacks, such as "shutupvim.so" which reliably makes it stop the random fsync pauses with an LD_PRELOAD library that stubs out sync(), fsync(), and fdatasync(). (I'm aware there's a "correct" way to do this, but I could never get it to work reliably.)

(Bricked my laptop for a bit trying to upgrade the kernel. Apparently grub2 doesn't fall back to displaying the menu when the default kernel file isn't found. Good to know. Also good to know where I left the xubuntu install disk I burned. The fact that xubuntu install disks are livecds is really, really cool.)

Ah, and the USB cable I have with me is the broken one. I should have thrown that out. (It's really not my day for accomplishing stuff on the laptop, is it?)

Interesting, software suspend-to-disk no longer seems to automatically detect the suspend partition, I guess I need the initrd in order to do the resume. (The kernel used to do it by itself, possibly there's a config option.)

The ubuntu tool to generate their initrd is "update-initramfs -v -c -k 2.6.38.4", and then tweak grub.cfg to have an initrd line for the new entry... (Yes, I'm aware there's a thing to generate grub.cfg but I'd rather just edit the file directly: it's 153 lines long and the stanza I added (which does all the work) is 7 lines. The problem that _cannot_ be solved by adding another layer of indirection is "too many layers of indirection".)

April 27, 2011

Equipment mailed, paperwork faxed, and new laptop mostly installed. (My install checklist has bit-rotted a bit, it's missing several packages (like git) that I now install as a matter of course, and it's for kubuntu instead of xubuntu. And it's 2 years old.

I should write up a new install checklist, but my old Dell (bought in 2007) is a bit cramped now and I expect to get a new one before too much longer, so I've got another reinstall sooner or later. I can redo my notes then.

In the meantime, I have a giant todo list (item 1: collate todo lists). Oh, and I should start looking for a new job. (I had two more random emails from recruiters in my inbox when I got my email set back up, but both of them wanted me to move to California.) But first, I need to do some FUN work on Linux to recalibrate and get the taste of NFS out of my mouth. (And I'd like to see my work used, I.E. committed upstream into vanilla packages.)

April 26, 2011

And that job is over, which comes as no surprise. It was a learning experience.

On the bright side, I got to learn a lot about containers, LXC and OpenVZ, the 9p filesystem (and diod and virtfs), and so on. I got the basis for an OLS paper out of it (still a lot of research to do there, though), and I still might add container support to busybox (or write a simple standalone "container" tool that works a lot more like chroot than lxc does). And I met Kir, the OpenVZ maintainer, who's a great guy I'd love to bump into at future Linux conferences. (I learned tons from Kir, and he's fun to hang around with.)

I also learned that telecommuting for a company with all its other engineers on another continent doesn't combine well with being "supervised" by an engineer (not Kir) whose idea of communication is one email a month. I installed skype to make communication easier, and adjusted my schedule to get up at 5am so I'd overlap with their timezone, but even though it showed my "supervisor" logging in and out, he never replied to my chat attempts.

You'd think he would at least review the patches I cc'd him on. Technically he reviewed two of them: In January he said "The netns_eq helper will handle that.", and in February he replied to another patch with "Thank you! I see you're doing well :)". Those were the entire non-quoted portions of all the patch review I got from him in the entire time I worked there.

At the end of March they flew me to moscow again, and I stood outside my "supervisor's" cubicle; he still didn't want to talk to me. (Kir did, but his daughter was in the hospital during my second visit and he wasn't in his cube much. I still got to talk to him three times during the week.)

I suppose the first red flag was my supervisor setting a performance review goal for me of a 90% signal to noise ratio in my public posts, I.E. don't ask stupid questions on the list. How do you WORK without asking stupid questions on the list? (I never did answer that one, I just lived with a gag order. It's been lifted now. Feels good. I can stop worrying about "looking like an expert" and focus on GETTING THE JOB DONE.)

Now I have to send this laptop back, dig out and reinstall my old one, and copy files over from the server. May be offline for a bit while that happens. (Then I get to job hunt. And finish writing my OLS paper. And finally make progress on containerizing v9fs. And write documentation for virtio. And all the other stuff I didn't have time to do while staring into NFS.)

April 21, 2011

I hate the git UI. I really do. It's insane.For example, if you do a "git mv" to move files, the change still has to be commited. But if you do a "git diff" it won't show anything as having changed. So there's the current revision you have checked out of the tree, and the current state of the tree, and if you ask what's different between them it goes "nothing" even though you moved stuff around.

Now I have to dig around through the man pages to find yet another unquestioned assumption of the developers that this doesn't work normally, there's some magic command you'll never guess, and you just have to know.

(P.S. It's "git diff --cached -M", no idea why. But when kernel developers have to explain to each other how to use the thing and then that's considered worthy of news coverage, something is wrong.)

April 19, 2011

Got my first performance review at work, from a guy who hadn't previously had any contact with me in a month and a half. (When I went to Moscow for a week, he never wanted to talk to me once. The guy I normally _do_ talk to isn't my supervisor and his daughter was in the hospital so he wasn't in his cube most of the week, and was frantically catching up with missed work when he was).

The guy nominally supervising me is convinced I suck because I've gotten so little traction with NFS. He wrote dismissively of the Livejournal posts I've written (which is the first I've heard that he's actually reading them, since he never commented on any). It seems kind of pointless to write more work related Livejournal posts, so I guess I'll just dump that into another textfile for my own reference, and talk about the interesting subset of the work here.

Containers are definitely interesting. Whatever else happens, I'm glad for the introduction to them and the opportunity to work to make them better. My OLS paper this year is on them (wrote the abstract when I got email about the deadline, still need to write the paper, but I have 2 reviewers lined up already). August 22-26 Cannonical is organizing an LXC "development sprint" I intend to participate in. (Ok, it's here in my hometown of Austin, they're flying a bunch of guys in but I'm already here. Odd experience having Cannonical offer to fly me to Austin. :)

And I'm reading through lxc and thinking of writing a busybox-ish variant of that technology, because it's typical IBM overengineered mainframe stuff where you could do a simple version in 1/10th the code. (Not an exaggeration. For example, "lxc-create" is a 175 line shell script wrapper that essentially boils down to a single line call to another script, and "lxc-destroy" is a 79 line wrapper around "rm -rf --preserve-root /some/path". That path has two components: one hardwired into the script at install time and the other supplied on the command line. Really, I see this as "chroot on steroids" and they see it as a port of the S390 VM management layer. To do a chroot you need a new root filesystem to run, but that's NOT A BIG DEAL.

Unfortunately, what I do _not_ have any enthusiasm left for is NFS. At the moment I'm fiddling with the diod 9p server, making sure there's a good way to use P9 in containers, in hopes of helping replace NFS. (I also made CIFS work in containers in a weekend, back in January.)I really really really really hate NFS, which I call "The Cobol of Filesystems" because it is. I studied this sucker for months in hope it would start to make sense, but it only got WORSE the more I learned about it.

I could write the definitive takedown of NFS (and maybe I will someday), but for now I'll summarize it as a bad design from the 1980's, victim of extensive premature optimization (the root of al evil) that gave it nested layers of conflicting cacheing and turned it into a giant layering violation that reimplements half the VFS. It has servers in kernel space (imagine if back in the 90's apache and zeus had been abandoned in favor of khttpd and the tux, and these days php was a kernel module). The code is scattered far and wide in the kernel source tree (off the top of my head, just the _client_ uses net/sunrpc, fs/nfs, fs/nfs_common, fs/lockd, and a bunch of include/linux files. That's not counting the basic infrastructure like the VFS layer and such. Not even the questionable bits like security/keys that probably don't belong in the kernel but CIFS also uses it so eh.)

NFS has been through four major revisions, none of which actually addressed what was actually wrong with it but instead made it bigger and more complicated. The current one ("pnfs", which would be NFSv5 if they weren't embarassed to call it that) is once again entirely a performance hack. No attempt to make "(rm dir/file; rm -rf dir) > dir/file" work on NFS. No attempt to address the fact that cache coherency is theoretically impossible to get right if you maintain local caches without talking to the server and have multiple clients updating files. No acknowledgement that the idea of a "stateless filesystem server" (NFS design idea #1) is a CONTRADICTION IN TERMS because what a filesystem server does is MAINTAIN STATE. No acknowlegement of the fact that 30 years of moore's law rendered all those horrible performance hacks into an IMPEDIMENT to achieving decent performance and these days having thousands of web clients or MMORPG clients talking to a single server is not that unusual, you just have to do it right, which NFS fundamentally does not...

It is long past the point where the only real way to address it is to throw it out and start over. And I hope that 9p can become a new standard network filesystem protocol. Currently it's simple and clean, has no cacheing (it instead relies on gigabit ethernet being fast), and avoids the horrible design problems of NFS. It also needs to be adapted to Linux semantics, and needs up-to-date servers that actually export the updated protocol.

QEMU/KVM has a built-in 9p server called virtfs but that uses virtio instead of TCP as its transport layer, and isn't really designed to be broken out into a standalone server. On the bright side, upcoming qemu versions should be able to export directories off the host system which the client can "mount -t 9p", so no more need to launch samba.

(Note: p9 is the protocol, 9p is the filesystem, and the C file that implements the filesystem in the kernel is v9fs.c, so you see all three used almost interchangeably on the 9p mailing lists. P9 stands for "Plan 9", from outer space, the operating system Ken Thompson went on to design after he stopped working on Unix because the PDP-11 was no longer the state of the art and he wanted to start over designing something for modern hardware. Yes, this is a filesystem designed by the guy who invented Unix. This is the same OS that gave us UTF8 encoding. They tend to be good at designing things right.)

April 18, 2011

I've been sending my aboriginal linux ruminations into the mailing list, which I probably should have done years ago but there you have it.

FYI.

April 16, 2011

Met a homeless woman named "Heather" today. (I seem to meet a lot of women named Heather.) She's only been on the street about 4 months, she says, and already has sunburn scars (in April, they look like near third degree burns in places) and a face that looks ten years older than the rest of her.

Bought her lunch and talked. It's impressive how deeply screwed somebody can get really quickly when there's no living family to crash with. She moved to Austin for a job 7 months ago, but the job fell through and she couldn't find another due to the recession. She had 3 months of savings, and when that ran out her car was reposessed and she was evicted from her apartment. Her wallet was stolen a week later, and here's where it gets ugly.

The homeless shelters in Austin are overwhelmed, Fade phoned around and found they have a waiting period of 3-4 months. (That's assuming you can consistently show up to navigate the paperwork without transportation or a place to stay.)

When I met her, she was holding up a sign at a street corner. This is apparently called "flying", and is not very lucrative. On her best days she might occasionally make a little under half minimum wage, on the day I met her she'd made $4 total. But if you do this, you get tickets from the APD. Heather's gotten one a month.

Here's where the "screwed" comes in: just as with speeding tickets, in order to get a new driver's license you need to pay your tickets first. You can't get a job without the two forms of identification (generally social security card and driver's license) to fill out I-9 employment eligibility verification thing.

Ah, but aren't there other forms of ID than than a driver's license? Well, yes and no. To get the non-driver "government ID" it's from the same office and you still have to pay off your tickets first. To get a student ID you must submit photo ID, such as a driver's license. To get a passport you have to present photo ID (previous passport, immigration documents, driver's license). A passport lets you have secondary forms of ID but unless somebody with photo ID has known you for 2 years or you have a mailing address, you're kind of screwed.

So in order to get photo ID, you must have photo ID. If it's stolen while you're in a strange city, the system seems actively designed to screw you over unless you have a certain amount of money to throw at the problem.

It's not easy to pay off the tickets even _with_ a job, and you can't get a job without paying off the tickets. In Heather's case a couple thousand dollars in fines (more than my ability to just reach in and fix it with random altruism), and though she offered to sit in jail Austin's jails are full, so they won't let you work off a class C misdemeanor that way. That leaves community service: 240 hours thereof in this case. That's 6 weeks of full-time work, which would be hard for ME to do, let alone someone with no way to reliably feed themselves, no transportation, no private place to sleep or bathe, nowhere to store anything you can't carry with you...

(Speaking of bathing, I asked how she managed to avoid smelling like some homeless people do. It is possible to give oneself a sponge bath with wet wipes, apparently.)

My city should not be doing this to people. Something is wrong.

I remember how my mother (a psychotherapist for emotionally disturbed adolescents) told me there used to be in-patient treatment programs for crazy people but Regan ended funding for them and turned all the crazies out on the street about the same time he fired the air traffic controllers back in the 80's. I also remember how I never saw anybody with a sign on a street corner under Clinton, and then in 2001 duh-byuh took over and suddenly they were everywhere...

April 14, 2011

Today has rocked. Got out early and biked around town, exercise I desperately needed and has helped clear my head a lot. (I have not been getting enough exercise, and it's affecting my concentration.)

So is not having noise cancelling headphones. I can't work at home (the cats pester me for attention every time I work up any sort of concentration, their creativity and willingness to inflict pain escalate the longer I ignore them), so I go out to coffee shops and fast food places a lot with my laptop. Which is noisy, and often has Zombie Sinatra or similar playing. My old headphones died just before my Moscow trip, and my backup pair just didn't cut it. (This morning, even playing Nirgaga at full volume didn't drown out Einstein's "Music to set your teeth on edge, a medly of the worst the 1970's had to offer", or the chatter at neighboring table where somebody had found either Jesus, Waldo, or Camen SanDiego, I was trying hard not to listen by they were insistent.)

Today I biked to guitar center and got my headphones replaced with a _better_ brand of headphones (even more passive noise cancelling, more durable earphone pads, and the cord is a separately replaceable thing that plugs in at both ends now). And the guitar center people printed out a receipt for the old ones so I could get a refund check for the headphones I beat to death in 9 months. (Bought the 2 year extended warantee. Just about pays for the new ones.)

Stopped by T-mobile (not yet a part of AT&T) and got a new USB cable (and that _was_ what's broken, yay having a phone again), and got my sim card upgraded, which essentially just means that "edge" is faster and less intermittent, but I'll take it. (I can go back to see if 3G can get reinstantiated, but not today.)

Yesterday I got my taxes done and it turns out I don't owe them money. (I did over $10k of untaxed consulting in the first half of the year, but it turns out specifying "0 deductions" for the rest of the year cancelled that out, what with Fade and I filing jointly having a mortgage and all.)

I am utterly, utterly sick of NFS. I've been grinding away at it more and more slowly ever since I got back from Russia, weekends included (because I feel like I haven't accomplished enough during the week). A fun quote from Al Viro sums up why: 'Know when to say "It can be mended, I shalt do that" and when to say "It is rotten beyond repair".' I consider NFS to be rotten beyond repair, and the thing to do is replace it completely with something like 9p. It really is the Cobol of filesystems, and that'll erode your motivation after a while.

Well guess what? Diod released a 9p userspace server that actually works and is up to date with he 9p2000.L extensions (which have apparently stabilized as of 2.6.38: they claim it's feature complete unless something new comes up). I got it to build, and am testing it. Next up, I need to make that (and FUSE) work in containers.

I've also got a couple other non-NFS plates spinning to divert me and let me recover my love of programming for a bit before going back and facing the horror that is NFS again. Mostly documentation. My OLS paper "Why Containers Are Awesome" got accepted, so I've been researching and writing that. (Trying to determine the scope of the thing, there's so MUCH to say on the topic, it's a question of picking the best subset. Alas, I have to figure out what material to cover in order to write the abstract. Working on it.)

I'm also writing up some Documentation/namespaces/net_ns.txt because there isn't one and there needs to be. Unfortunately, since there isn't I need to read a lot of code and then write up my best guess and wait to be corrected. (People often mistake this technique for expertise. Really I'm writing the documentation I need to read, and by definition am not qualified to do so, I just fake it well.)

And I'm chipping through the new section at the end of rbtree.txt (Augmented rbtrees: now with a drop of Retsin!) which is classic "the person who wrote this already knew it, and in order to understand your explanation you too already have to know it", with the added bonus of non-native speaker of english. In brief, "we added a callback to tree balancing so parents get notified when their kids change". Great: why? There's a LWN article on the topic that completely glosses over what you'd use this for. There's a Wikipedia article on what the new section says it's used for ("Interval trees": is this the _only_ thing it's ever used for), which reads like a cross between stereo instructions and vogon poetry. The example code the Intel guys put in rbtree.txt calls a lot of functions they don't define, and it REALLY looks like it's got the comparisons backwards but it's been code reviewed multiple times so I probably just don't understand it yet. The example code in the wikipedia article is in bad java requiring a lot of mental translation from java to sanity. ("a.compare(b.getmember()) > 0" means "a > b", right?). Someone, somewhere, is being paid based on the size of their programs rather than what they do. (Again.)

And as I read through the code I come up with random cleanup patches.

Once I've recovered some of my love of programming enough to no longer consider turning my laptop _on_ to be a chore, I'll have to face and defeat NFS in single combat. But NOT TODAY.

April 8, 2011

Catching up on the Rachel Maddow podcasts is painful at the moment, because of all her BAD SCIENCE on nuclear power. Her "oh noes, the sky is falling" approach to covering this is like the Huffington Post's approach to vaccination. (I'm about half a month behind, in the podcast I'm listening to now she's currently going on about how the nuclear plants in california are going to kill us all, which has nothing to do with Japan, she's just gleefully exploiting the opportunity to fearmonger about nuclear power in general.)

She's never once mentioned the difference between radioactive material and radioactive emissions. If the glow gives you a sunburn, breathing glowing material is bad, but they're NOT THE SAME THING. But when talking about danger to workers in the plant, "standing in a microwave oven while it's on is bad for you" doesn't mean your microwave oven makes your food radioactive, or that getting a sun tan makes YOU radioactive: it doesn't work that way.

No comparison of natural radiation rates to the elevated background rates. She repeats "no amount of radiation is safe" over a dozen times: if it was true nobody would ever live in Denver or Glasgow, fly anywhere, or visit the dentist. There is massive variation in the natural background levels of radiation between cities, could we have some CONTEXT please? Are we talking half denver, twice denver, as much extra radiation over the course of the year as ten international flights, what? You are providing NO USEFUL INFORMATION here.After a week of coverage, she's never mentioned the half life of anthing. How long is this material dangerous FOR? Just assume it's forever, that's not important to this issue. Nobody lives in Hirosha or Nagasaki today, right? And thus she's treating radioactive iodine as more dangerous than radioactive cadmium, even though the iodine has a half life of 8 days and the cadmium has a half life of 30 years. (I.E. you could dump a dozen truckloads of pure Iodine-131 in downtown Tokyo and a year later it would be safe to live there because ~45 half-lives later only 0.00000000000001% of it would be left, which is only enough to kill people suffering from homeopathy. Iodine-129 has a half life of over 15 million years so it doesn't emit enough radiation to be dangerous even to children, a concentration high enough to be a radiation danger would chemically poison you first just from being that much iodine in your system.)

Speaking of half life: nothing about how an active reactor of this type generates short-lived isotopes and after you shut it down you still have to cool it for about three consecutive days while those isotopes decay, and if you can manage to do that life gets a lot easier. (And if you can't, the reaction starts up again and generates fresh isotopes and resets your three day clock.) Without knowing that, it's kind of hard to understand what the reactor technicians are TRYING to do.

Nothing about the motility of the various compounds, what can wind up in the air (and how, such as evaporating when it gets hot enough) vs what dissolves in water. Nothing about generation rates or dispersal. Never once mentioning that the plant was intentionally situated on the downwind edge of the island so most of any emissions go out to sea and get massively diluted very quickly while its half-life runs down. Never mentioning that the longest-lived isotopes CAME FROM NATURE, we dug them up out of the ground and refined them but we didn't _create_ them. We mined them. Leaking them back out into the world is certainly not ideal but it's not exactly unprecedented either. The planet STARTED radioactive. BANANAS are radioactive enough to set off nuclear material smuggling detectors due to the potassium in them, and yes humans have potassium in them to, about 4400 nuceli per second decay in the human body, which is a lot of geiger counter clicks. We've also got carbon 14, remember radiocarbon dating? Radiation isn't all or nothing, it's how much and where.

Speaking of motility, she keeps treating Chernobyl as a possible outcome of Japan, even though the amazingly stupid design of Chernobyl (your containment unit should not be flammable) is not present in Japan. The Soviet Union wasn't big on safety. That reactor exploded and burned for days, mixing nuclear material with burning graphite spreading radioactive smoke far and wide from Chernobyl. The containment unit in Japan is steel and concrete, even if the rods did completely melt down they're presumably not going anywhere. (Would the steel be enough of a heat sink to keep molten fuel pooled at the bottom? I assume the designers intended something like that, but Rachel isn't working on that level so that kind of question never gets asked, let alone researched. Establishing that every guest she's interviewed has been some kind of anti-nuclear activist is left as an exercise to the reader, but names like "the union of concerned scientists", or her mention of a guest's involvement in the lawsuit against three mile island, are a bit of a hint though. The bias is really thick here.)

Speaking of context: the earthquake and tsunami killed over ten thousand people directly, and left far more homeless (while it's still snowing in Japan so _exposure_ is killing people) and with precarious supplies of food, drinking water, and medical care. The unstable reactors haven't killed anybody yet, but they get all the coverage.

And the SCOPE of the disaster that was required to shake Japan's control, the earthquake _and_ tsunami (which went a mile further inland than the evacuation zones did) _and_ human error putting the backup generators in the wrong place (because the tsunami went a mile further inland than the evactuation zones did). Large parts of Japan currently look like New Orleans after Katrina or south Florida after Hurricane Andrew, getting _to_ the damaged area is hard because the roads are gone and it's an island nation whose resources are tied up dealing with the devastation, and this is treated as an inherent problem with nuclear power. Does this mean we can't use any dangerous substances like arsenic or cadmium (massively carcinogenic) in semiconductor manufacturing because if a comet impact vaporized the facility they could wind up in the drinking water? (Obviously nothing other industries do has any level of risk.)

In terms of long-term health effects, the BP oil spill in the gulf exposed large chunks of the US and Mexico to all sorts of carcinogens like benzene and toluene and whatever was in the dispersants, and paid people minimum wage to go stand on beaches covered in the stuff with and shovel it into buckets. Instead of freaking about that, we're freaking out about how a couple hundred pounds of metal on the OTHER SIDE OF THE PLANET might kill us all.

Rachel: I have a disappoint. I realy expected more science out of a Rhodes scholar with a doctorate. Context is important. You've railed against the disparity between "this is scary" and "this is dangerous" in the war on terror and such for years, and now you are the one fear mongering. It's beneath you. Please stop.

April 5, 2011

What is WRONG with gmail? I don't mind its spam filter going nuts and eating messages, I can go through the spam folder and tell it "not spam" for each one. But it silently DISCARDS messages, including ones I SEND. That really annoys me, and there's no record they ever existed.

Whenever somebody cc's me on a mailing list message, I only ever get one copy. It shows up in my inbox, but does _not_ show up in the mailing list folder, meaning threading gets all screwed up, and I can't search my list archives to find out the last time I mentioned an issue. The only workaround for this is to subscribe to the list TWICE, and tell it not to send me copies for the address I actually send from. That way its duplicate detection can't trigger because it thinks it's sending copies to two different addresses.

But the REALLY anoying part is when it filters my outgoing email. People email me directly about aboriginal linux all the time, and when I get permission from 'em to cc the mailing list I go back and forward the old email to the list... and gmail discards it because I already sent a message similar to that one to somebody else? Obviously you never want to send the same message twice with a different cc: list. That's crazy!

The workaround to _that_ is to resend the email from my work address, meaning A) I have to subscribe that to the list, B) I'm bouncing email off a microsoft exchange server because it's _less_ broken than gmail.

Gmail is almost as helpful as microsoft's "clippy". And there's no email address to complain to, or way to file bug reports, so I'm seriously thinking of setting my my own mail server again.

April 1, 2011

I'm not convinced the kernel's Dynamic Debug stuff is a good idea. Sure, sticking printf() calls (or printk() or similar) in the code is the One True Debugging mechanism (where applicable anyway, but you'd be amazed where you can shoehorn it).

But those printk()s are transient. You don't check them into source control. They're never going to be useful to anybody other than the person who put them in, and it's easy enough to put them in again later. All they do when you're NOT tracking down a specific bug is clutter up the code and make it less readable.

I've seen this happen a lot. It's a common programming mistake, thinking that your debug printfs are A) ever going to useful to anybody else, B) likely to be particularly useful to _you_ in future.

Yeah, I'm aware of the toybox/busybox patch -x option. That's because the darn thing's been sprouting bugs every couple months for years now (the design of that command is more or less BUILT out of corner cases), and because every time I sit down to annotate the flow control again, _I_ get confused. Which is a sign the code is too darn complex and needs to be simplified, but I haven't done it yet. Not immediately obvious to me how.

But I agree: that's me being lazy and the code being bad. You'll notice I didn't even bother documenting the -x option because I don't expect it to be useful to anybody but me, and if the darn code is stable for a while I really should rip it out.

Adding infrastructure to SUPPORT that... it's really not something that should be encouraged.

March 31, 2011

I got fan mail yesterday for Toybox, which hasn't happened in a while. Somebody out there is using it (to chop the boot time of this down to 200ms apparently), which makes me want to dig it up and see if I should do something with it.

I'm ambivalent about continuing its development. On the one hand, I don't want to compete against busybox (which I spent years contributing to, used to maintain, and which has a dozen active developers and huge userbase).

On the other hand, I prefer my design, and can probably make it cleaner, clearer, and more straightforward if I try. I can't stand to work on busybox anymore because it's turned into a painfully bloated mass of #ifdefs that keeps trying to add windows support and such. I prioritized simplicity first, size second, and speed third. Denys has size first, speed second, simplicity third. Over time, this adds up in a big way, and it has.

I might wind up porting code _from_ busybox to toybox, which would involve substantial cleanup in each case. (Cleaning up busybox stuff in-situ is like trying to bail out the ocean, from the middle. Any space you clear fills up again as fast as you can clear it...)

I've pondered redoing it in lua. I'm trying to meet a Lua developer here in Moscow to ask his opinion about it, but we keep missing each other.

I suppose I could sic my apprentice on it as a learning exercise. :)

March 30, 2011

I'm staying at the "Iris Congress Hotel" in Moscow, which is insanely expensive (8100 rubles is around $300/night), but pretty nice. (The complimentary breakfast buffet is lovely.)

The televisions here are about half russian stations and half other languages. (I've identified french, german, and korean.) Of the english stations, they have BBC world service on channel 14, a korean news channel (CCTV or some such) up in the 30's somewhere, "Fashion TV" (supermodels doing endless catwalks in amazingly ugly clothing while some loud man screams bad hip-hip until you have to change the channel just to get them to shut up), a couple of pointless financial services (CNBC and Bloomberg debating how Japan's radiation leak will affect the yen, yes seriously)...

And far and away the most interesting channel was Al Jazeera, channel 20. Covering the war in libya, and the tsunami fallout in japan (no pun intended but hey, I'll take it), and a nice late night debate thing filmed at NYU that had Clay Shirky as one of the panelists. Their english service is clearly inspired by BBC world service (even copying the "heartbeat" motif in the themesong), but they're doing it better.

And tonight, it's gone. Channel 20 is now a second english televangelism station. (I didn't count channel 13, Jesus Christ TeleVision, because it doesn't count. A special about arcaeologists thinking they may have found Jericho? Just like Schileman found Troy, which means he found the actual place where Athena goddess of wisdom fought Ares god of war, just like it says in the Illiad! And San Francisco, the place where a romulan laser drill will one day threaten the earth with red matter! It's a real place too! We have pictures!)

Breakfast each morning has a big projection TV screen over it. On my trip in December, they were showing a russian news station, in Russian. This time they've been showing Al Jazeera english. I wonder what they'll be showing in the morning?

(Edit: They showed Bloomberg. Don't care about some Berkshire Hathaway middle manager quitting to join another company, thanks. You can shut up about it now.)

March 27, 2011

Much airplane.

Russia. It has snow. In March! Who'd a thunk?

Jetlag.

March 24, 2011

I recently listened to a good but frustrating radio program on the invention of money. It reminded me of a couple old articles I wrote about money back in my Motley Fool days (part one, part two), where I described it as "debt with a good makeup artist", and each bill as an IOU. The radio program was frustrating for two reasons.

A minor reason they were frustrating (by far the smaller of the two) is that despite the name they didn't explain the history or evolution of money. (Their segment on yap island touched on it, but didn't go into detail.) Luckily, I already knew that one.

The major reason the program was frustrating to me is because it kept missing one central point that would have explained the bits they claimed to be bewildered by. Since I didn't make it explicit in my articles either, I should go into it here.

The radio program described paper money as a "fiction", but it would be more accurate to call it a "promise". In each of the program's segments, they almost said "here's where they made a new promise", "here's where they broke the promise", and so on... but they didn't. Instead they were baffled by what went on and how people behaved, because they missed the subtle distinction between "fiction" and "promise".

Half of modern society is based on promises. Laws essentially say "if you do this (or don't do this) we promise this response". Contracts are an exchange of promises. The deed to property is a promise to respect your claim to that property, and that if someone tries to take the property away from you some authority will support you in defending or recovering it, and punish whoever stole it. Money is the same kind of promise, to exchange this marker for a cheeseburger or wristwatch or movie ticket or hour of housecleaning. The promise is explicitly written on dollar bills, "this note is legal tender for all debts public and private". The promise is made by the US federal reserve, as explained in the radio program.

Yes, promises only work when people believe them. If you break promises, bad things happen. And somebody has to _make_ the promise, and enforce the terms. That somebody is generally the federal government, which creeps conservatives/libertarians out. "We don't trust you to make that promise, we want the promise to be made without anybody having to make it!" And thus they keep trying to discontinue the federal reserve, with no serious proposal for a replacement.

One of the big parts of the moneytary promise is promising NOT to hand out more IOUs when the promise giver needs more money. The federal government is trillions of dollars in debt (primarily because Republicans since Regan love cutting taxes while raising military spending), printing more money to pay that back (or just to keep up with new deficit spending) would cause so much inflation as to render the dollar essentially worthless fairly quickly.

Back in the 1800's the politicians in Washington wisely handed the power to issue more money to a bunch of academics in New York, so _they_ wouldn't be tempted to misuse it. That was pretty much the point at which the US started to become an economic superpower. Idiots who don't understand how that works want to undo it, and ironicaly call themselves "conservatives" for pushing a plan to end what we've got and replace it with... who knows what really, that's not the important part. Why? Because they don't understand it, and thus it offends them. (They're the same way about global warming and teaching evolution in schools.)

Idiots.

March 23, 2011

A recent blog post about data mining the profile questions on the largest dating website found a number of fascinating results, one of which was that the question "Do you prefer the people in your life to be simple or complex" was the best indicator of political orientation. Twice as many liberals preferred other people to be complex, twice as many conservatives preferred other people to be simple.

It's kind of sad really: the data implies that conservatives are the people who watch the good guy shoot the bad guy in a movie and actually _want_ that guy to just be a faceless uniform who never never had siblings or a dental appointment. The rest of us expect people to be human beings, and that the guy who delivers your mail might visit renaissance festivals on weekends, bake amazing pies, be earning a degree, or a thousand other things.

The conservative black-and-white "you're with us or you're against us" mindset tries to simplify the world into exactly two positions, "mine" and "evil". If your side is christian then it doesn't matter about pedophile priests or the kkk burning crosses, those are mere abberations. But if your side isn't muslim then all muslims must be evil, everywhere, and you can't have a mosque near the world trade center.

On the liberal side you have the standard disclaimer "of course not every individual agrees with this position", and certainly there's a case to be made for that. There's an obvious tendency (at least among liberals) to expect similar variation from the other side... except that the Republicans had 40 "party of no" senators voting in unison for two years and were proud of this. They seem to see conformity and obedience as virtues, unity as strength. They don't just think "you are all X", they think "WE are all X" (even when they're not), and self-identify with simplistic, uniform positions because that's how they want the world to be. Solvable, like the "davinci code".

Black and white thinking interacts badly with the Dunning-Kruger effect, where incompetence breeds self-righteous certainty. People with no particular skills often think that anything they don't understand must be easy, because they can't see any reason why it wouldn't be. Playing the piano is just pressing a bunch of keys, same as writing novels or programming computers. And anyone can press keys...

Regan created our modern massive federal debt. The first Bush spent more money in four years than Regan did in eight. Then Clinton balanced the budget and started paying down the debt, a herculean task that seemed impossible until he did it (involving a bunch of tricks like introducing the Roth IRA so people converted their old retirement plans through a one time taxable transaction, allowing the feds to tap into the revenue surge to balance the books and then successfully argue that once balanced it should stay balanced). Then the second Bush explicitly discarded that (in a speech stating that the government running a surplus meant the people had been overcharged and he was "asking for a refund") and ran the deficit up to new record levels -- starting from a surplus!

Emperically over the past three decades, every Republican president has made the deficit significantly worse, and the democratic presidents haven't, going back to Carter. And now the Republicans are _insisting_ that only they can balance the budget (which is like alcoholics giving advice on sobriety), and they're going to do it by cutting taxes so the government takes in less money... which is insane on the face of it, and which is a policy Regan and both Bushes already tried (repeatedly) while getting us into this mess in the first place.

There's the Dunning-Kruger effect in a nutshell.

When did the phrase "conservative" replace the phrase "old fogey"? People who found a lungful of air they liked sometime in the 1950's and haven't changed it since. Wanting _other_ people to be simple sounds to me like an unwilligness to think. Reality is complex. People are complex. Some of us think this is a good thing.

And yet the baby boomers, who statistically speaking gave us "sex drugs and rock and roll" when they were teenagers and yuppiedom when they were in their 30's, now give us "get off my lawn" foriegn policy and "kids these days" domestic policy. ("Kids these days" have _never_ been significantly different from the previous generation, if you think so then what's changed is YOU. You got old and slow and tired and let your reluctance to think make you stupid.) Plenty of individuals don't, but as a voting block they suck.

My political strategy at this point is to wait for the baby boomers to die off.

March 22, 2011

I'm reading the third edition of Robert Love's Linux Kernel Development book. I'd guess I already know about 80% of the material in it, which makes it a frustrating book to read because I'm tempted to skip stuff, but the bits I don't know are mixed in with the bits I do. I'm reading it to fill in gaps, to connect bits I didn't realize were related, and have the occasional "aha, so that's why" moment. Let's just say there's a lot of mining to get the ore.

But the other reason it's frustrating has nothing to do with what _I_ already know. The book is full of editorial glitches, where "that sentence is exactly backwards" or "that's the wrong word there", or the ever-fun "he never explained that bit, presumably because he already knew it". It makes reading it a bit of a minefield.

And then in chapter 6 he just starts ~~ranting~~ ~~editorializing~~ ranting about crazy stuff. He hates the naming of the kfifo functions: the kernel's First In First Out implementation has kfifo_in() and kfifo_out() functions, which he INSISTS should have been called enqueue and dequeue and is OFFENDED that they weren't. (Um, first in first out, I think I can guess where they got the _in and _out parts from dude. Those aren't bad names when you're implementing FIFO under the name "kfifo".) Then he goes off on a rant about how the kernel's idr stuff (an acronym he never expands because its naming offends him) doesn't match the C++ substandard template library's "map" class. (I'm glad it doesn't, C++ is a horrible pile of bad ideas. Now what does idr stand for?) He doesn't explain how red/black trees actually work (luckily I wrote the kernel's Documentation/rbtree.txt), and only even MENTIONS the kernel containing any hash table implementations as an afterthought in the last paragraph of his "what data structure to use" summary.

I want to hand this book to my apprentice Nick when I'm done with it, but I may have to pass along some extensive warnings too. I hate inheriting books people have used highlighter markers on (it's one step beyond running your finger along the text and moving your lips when you read, except it DAMAGES THE BOOK), but it's possible this book would be improved by judiciously crossing things out.

March 21, 2011

Can we stop treating "The Cloud" like it's some big new thing? This is the third time the computer industry has been through this, you'd think we'd expect it by now.

Mainframe -> Minicomputer -> Microcomputer (PC) -> Smartphone

When mainframes were displaced by minicomputers, the old machines got kicked upstairs into the "server space". It happened again when minicomputers were displaced by PCs ("microcomputers"), and now it's happening again as smart phones displace the PC. The old category of computer doesn't go away (at least until the hardware wears out), it just gets kicked upstairs into a new role as "machines we only interact with indirectly, through another machine closer to us which we more regularly use".

Back when mainframes were introduced, everybody interacted with them as directly as possible. Using or programming computers involved punching a deck of cards, submitting it for processing, and waiting for the resulting printout and/or new card deck. That's what using a computer meant back then.

Then minicomputers were invented, and everybody got a terminal attached to a minicomputer, letting you type directly to the machine and see the result immediately. (You were still sharing the machine, but it was a much more interactive experience.) Minicomputers became the main computer you interacted directly with, and mainframes became a second computer that you interacted with at a distance, often through your minicomputer terminal, and only when you needed the extra power the big machine had that the smaller one didn't yet.

Then microcomputers were invented (which IBM's marketing department renamed "personal" computers), and everybody got one of those on their desk. (You no longer had to share your computer, it was all yours.) That became the computer everyone interacted directly with, which meant minicomputers were kicked up into the "server" space, just as mainframes had been.

Now we have smartphones, a computer you carry with you 24/7, and people are interacting directly with them, so of COURSE the PC is getting kicked up into the server space. Yeah there are some ergonomic issues but the minicomputer started out with those too, the sheer _availability_ of the new devices makes the transition inevitable. Your phone, mp3 player, game boy, and twitter checking device are the same piece of hardware. If you use it all the time, and always have it with you, it's familiar and comfortable. Your laptop can't displace that, but you _can_ connect a keyboard, monitor, mouse, and big speakers to a USB hub, and you can connect an android phone to that same USB hub. Getting them to talk to each other is just software, so why own a laptop when you can plug your phone into a keyboard and HDTV through its charger cable? Bog standard disruptive technology and sort of inevitable at this point: if the smart phone and laptop compete for the same ecological niche, but the smart phone has its own unassailable niche the laptop can't reach, then the smart phone will win.

So the PC is getting kicked upstairs by smart phones, just like the minicomputer and mainframe before it, but this doesn't mean they're going away any time soon. When mainframes were kicked upstairs we invented the "batch job". When minicomputers were kicked upstairs they invented the fileserver and print server. Eventually the older hardware got replaced, as Moore's Law eventually made smaller hardware powerful enough to handle the old use cases. (The modern batch protocol is called "HTTP" and its batch jobs are handled by things called "webservers", meaning batch processing is more popular than ever. The most common file/print server is probably samba.) But the old hardware often stayed in service until it wore out, and the use cases survived. And IBM continues to sell the most expensive mainframes imaginable to the pointiest hair it can find, the same way audiophiles still buy vacuum tubes.

PC hardware has been gradually consolidated into rackmounted 1U clusters for over a decade. This isn't new: we've had PC clusters for 15 years, from Beowulf to web server clusters with load balancers and high availability "heartbeat" software. (Those links are from the 1990's.) All "cloud" does is add a layer of software: installing virtual machines with with live migration and flexible resource constraints. This is cool software, and it means that the virtual OS images (and the apps they're running) aren't tied to a specific piece of hardware, but can move from machine to machine without needing to be rebooted or even taken offline for more than a couple seconds. The virtualization layer also lets you add/remove memory/disk/CPU at will, possibly by migrating your image to a more powerful machine first and then adjusting the resource constraints of the emulated system. Of course IBM's been doing all this on their mainframes for decades, so it's not exactly new...

So "cloud" is nice, but it's just "cluster with an emulation layer". Yeah, the VM guys have made great strides in the past decade, with a half-dozen major projects (QEMU/KVM, VMWare, Xen, Virtualbox, I myself am woring for Parallels these days...) competing with each other, and others taking the containers approach (which is probably ultimately superior: chroot on steroids instead of faking hardware to boot a separate kernel). But that's really just implementation details.

We'll know the transition is complete when the pointy-haired marketing droids STOP calling it "cloud". You don't need special names for the plumbing when it works. The internet isn't "the information superhighway" anymore, because it actually works now.

March 20, 2011

Part 3: How C++ got so bad, a follow-up to parts one and two.

C++ started out as "C with classes", implemented by a wrapper (cfront) that converted C++ into C so it could be compiled. Originally, C++ was just a thin layer on top of C. Its big purpose, the design goal motivating the creation of the language, was object orientation.

Object orientation is a code refactoring technique that associates a set of code with a set of data. As with other design patterns, you can implement it in just about any language. For example, the Linux kernel, written in pure C, is object oriented; each filesystem is a subclass of the VFS, drivers are subclasses of the module infrasctucture, and so on. Having the language support object orientation is just "syntactic sugar", you can write object oriented code without language support (and indeed cfront converted C++ to C, adding classname_ prefixes on member functions with an explicit "struct classname *this" as the first argument that was dereferenced to access member variables).

But the designers of C++ thought that this code refactoring technique was The Next Big Thing, and that it would change the world, so they went off on a major Object Orientation binge, adding things like operator redefinition (including the ability to redefine what commas do), and then redefining the C bit shift operators to perform I/O in the C standard library, just to give an example of what was considered "good taste" in the paradigm.

Unfortunately for them, OO wasn't the next big thing, scripting languages were. Instead of heading from the "fully static without abstraction" peak to the "fully dynamic opaque abstraction" peak, the C++ guys went way off on a tangent, adding features to the language that didn't really buy them much.

The central idea of C++ is classes as an organizing principle. C has global and static scopes, and that's enough if you know what you're doing. When C developers want more, they provide their own naming conventions for functions, often giving them a common prefix. But C++ gives each class its own scope, which is like giving a pack rat more containers and closet space: if you can't organize your stuff, giving boxes to put things in just encourages clutter to breed. The idea of classes as an organizing concept failed so spectacularly that the C++ designers had to introduce "namespaces" to the language, and even make those namespaces nest arbitrarily deep. (It's not that classes make things worse, it's that a visit to "the container store" doesn't magically organize your life. Object Orientation is nice, but good programmers don't really need language support for it and bad programmers aren't helped by it.)

The designers of C++ sold their language based on the strength of C. Their marketing pitch was "It contains the whole of C, so you can do anything you can do in C! All C compilers should become C++ compilers because they can still compile pure C!

But failure to get rid of stuff was the fundamental flaw in C++ which snowballed into the horrible language we have today. C is a simple and elegant language, adding more stuff to it is like walking with a heavy backpack while walking. Even if the stuff in the backpack is useful, it doesn't get you where you're going any faster, nor is the experience more pleasant.

Because C++ includes the whole of C, it could never get rid of pointers. You can't build reliable abstractions on top of something designed to cut through all abstraction and give you access to the bare metal. C++ built its objects on top of static structs instead of dictionaries, with nominative typing resolved at compile time, and the resulting complexity exploded into "friend" functions and virtual base classes.

But C++ doesn't let C shine either, by adding complexity that it won't let C address. For example, class instances don't start with zeroed memory, you must create a constructor function that tediously assigns a zero to each individual member variable. The C way to deal with this is "memset(this, 0, sizeof(*this));", but you can't do that in C++ because the compiler puts data into the object behind your back, such as virtual function pointers and RTTI data. You _can_ use the C method, it's avaiable to you... but you have to understand why it won't work, because of the added complexity to support C++'s abstractions.

When the designers of C++ realized that this new class of languages was emerging that really were the next big thing, they tried to layer these new dynamic concepts on top of their static language. They tried to implement dynamic concepts at compile time, which is a contradiction in terms. The result was things like templates, their answer to dynamic typing. Templates expand C's macro system into a full-blown automatic code generator, providing macro functions that make the compiler spit out a separate copy of the code for every type it's ever used with. Where scripting languages removed the idea of a macro preprocessor and generally don't have one, C++ extended theirs to be literally turing complete at compile time. (They didn't just slow down their compiler tremendously to cope with this new complexity: there's no guarantee that a C++ compile involving templates will _ever_ complete.)

That's why C++ implements variable length types on top of their object system, instead of the other way around. They started with C, added object orientation, and never looked back. The C++ designers never discarded existing features or existing design assumptions no matter how much they conflicted with the new ideas they wanted to copy from the scripting languages, instead they layered more and more new concepts on top of whatever they were already doing, hoping that if they built up high enough their flawed foundation would stop mattering. (The leaning tower of pisa will straighten out of we just build it HIGH enough.)

But without _removing_ any of the old design elements of C -- pointers and static nominative typing with type data associated with each variable name and the offset of each variable determined at compile time -- none of their new abstractions could ever be opaque. The complexity of C++ exploded out of control, but you have to understand the nitty gritty of everything from template expansion to name mangling in order to debug your programs. Each new abstraction in C++ leaked implementation details, because the C language is designed to access the bare metal. C is the language you implement other abstractions _in_. (Note that the runtimes of all these scripting languages are implemented in C, not in C++.)

As scripting languages matured and displayed more attractive new features, the C++ developers got increasing scripting language envy, and tried harder to work around their own existing design to add more and more features. Their conviction that object orientation was the central idea of all programming led to them believe that they could add any new scripting language feature on top of their "C with classes" implementation, without ever discarding any existing features that would break backwards compatability with C and undermine their "C is good, therefore C++ must be good" mud pie marketing advantage.

For example, in scripting langauges, a natural thing to do if you reference a variable that isn't there is to throw an exeception, so most scripting languages grew exception handling. (Not all: Lua just went "any key that isn't there has a value of null" and it seems to work for them.) But scripting languages already have a "runtime" binary intepreting the code and handling garbage collection, it's easy for them to do stack unwinding as well. And one nice thing about exceptions is if the code doesn't catch them, they stop the program and produce a nice stack dump letting the programmer know exactly what happened and where. In the absence of static typing checked by a compiler, runtime testing to produce exceptions becomes a primary debugging technique, and since scripting languages are interpreted, these stack dumps can easily list each source line the code traversed to get to that point.

C++ saw this and decided it needed exception handling, but it didn't have a runtime to attach it to, so it crapped checks into the code at the end of every scope. This could easily triple the size of the resulting binary, broke C's setjmp/longjmp and forced them to do stack unwinding too, and could easily leak memory allocations since they weren't garbage collection. The new feature made no _sense_ in a statically compiled context. You never actually _needed_ it, it just provided an alternative to checking return codes. But C++ went to great lengths to implement it, because scripting languages had it and it didn't.

So that's how C++ wound up in a no-man's-land between C and scripting languages. It tried to be both static and dynamic at the same time. It tried to simultaneously provide access to the bare metal and thick layers of abstraction. It throught object orientation was a bigger deal than garbage collection. C++ made additional piles of _specific_ errors, but most of them trace back to the fundamental design problems with the language.

But by far, the biggest problem with C++ is it convinced a generation of teachers that teaching C++ was equivalent to teaching C. Colleges finally stopped trying to teach people Pascal... and started teaching them C++ instead.

(If you're curious about Java, the other language that wound up in the no-man's-land between C and scripting languages, read this excellent takedown of that language. Basically it added dynamic memory management, but kept C++'s static nominative typing and retained the idea of "compile time". When Java's designers realized how limiting static typing was, they punched holes in their type system with interfaces (I.E. creating even more types you could manually typecast an object to), and then created code generators to spit out reams of boilerplate code to manage these interfaces. Interfaces were to Java what templates were to C++, except this time the massive verbosity is at the source code level instead of the generated binary code. The result is that Java programs are regularly tens of millions, or even hundreds of millions of lines long, bigger than any human being could ever hope to read in a lifetime.

Due to Y2K happening during the peak of Java's popularity, lots of cobol programs got rewritten in Java, and thus Java has become modern cobol. The Fortune 500 loves it, most of the rest of us only care about it when paid to do so.)

March 19, 2011

Following up on Wednesday's Why C is a good language and C++ is a bad one, today's post addresses what C++ specifically did _wrong_ that made it a bad language. If there are two local peaks in language design space (the "native code" peak dominated by C and assembly, and the "scripting language" peak around which are clustered a couple dozen different reasonably interchangeable modern languages), why did C++ lose its way and wind up stuck in a no-man's land between them?

Let's start by talking about the cool new thing that really _did_ change the world: scripting languages. Languages that allow not only rapid prototyping but rapid debugging and deployment, languages in which you can write a web server or a game in 1/10th as many lines of code as compiled languages like C because they automate away so much tedious bookeeping and provide an abstract programming environment you can rely on. How did python, javascript, ruby, php, perl, lua, and so on come about, and what makes them _different_?

Scripting languages do everything they possibly can at runtime. They replace static memory management with dynamic memory management, static typing with dynamic data types, compiled code with interpreters that run the source code directly. They eliminate memory leaks and use-after-free errors, eliminate typecasting, eliminate makefiles... the result is a fundamentally different _type_ of language than C or assembly.

Scripting languages diverged from C by getting _rid_ of concepts from C. They got rid of pointers and replaced them with references. They got rid of structs and replaced them with dictionaries. They got rid of typed variables. They got rid of compile time. C++ never got rid of anything, it just kept adding more stuff.

The most fundamental way scripting languages distanced themselves from C was to get rid of pointers. Instead scripting languages use references, which are sort of like a pointer with the safety catch on. A reference can only ever point to either the start of a valid object or to the special NULL value saying there's no object there.

References let you implement garbage collection sanely, because you know what they point to. Pointers can point to _anything_, which means you garbage collector has to worry about following pointers into unknown territory. A pointer can point to something that's already been freed, or into the middle of an array or structure that isn't a separate allocation, or to a variable on the stack, or to a chunk of memory that came from mmap() and belongs to the page cache or a video card. You can even typecast a pointer into an integer variable, so the fact you can't find a _pointer_ reference to a chunk of memory doesn't prove anything about whether that object is still in use. Pointers and garbage collection just don't mix, so if you want dynamic memory management you remove pointers from your language.

So the first step from C to scripting languages wasn't adding more complexity to the language, it was simplifying the language by getting _rid_ of a fundamental part of C. The next few steps also swapped out static C concepts for new dynamic concepts, removing the old concept and its associated complexity from the new language.

Another concept to go was C's static container types (arrays and structs). Dynamic memory management allows scripting languages to allocate and free memory automatically behind the scenes, which let them implement resizeable containers. This let string types be based on resizeable vectors instead of fixed-length arrays, and vectors also provided stacks that you could pop and push to, or FIFO queues. Instead of C's "struct" containing a group of related variables, scripting languages used dictionaries, little key=value databases that can associate any arbitrary piece of data with any other piece. Some languages (such as Lua) provide a single container type that performs both functions.

All of these containers let you add and remove members on the fly, and it doesn't matter whether it's implemented as an array behind the scenes, or a linked list, or a hash table, or some kind of a tree. You just use it and expect it to work, because the abstractions are opaque and you can rely on them.

Script languages also got rid of C's habit of associating types with variable names. References let you move the type of an object from the variable holding it to the object itself, since you can always safely follow a reference without worrying about segfaults, so you examine the memory it points to see what's there. This means that the reference itself is typeless, every reference is like a void * that can hold anything. If it refers to a container object such as a vector or a dictionary, you can query that container to determine what its contents are at the moment. Of course this takes place as run time, since the answer may be different each time you query it.

The combination of typeless variables and dynamic containers let scripting langauges completely eliminate variable and function declarations, by rethinking how local and global variables worked. In C each function's set of local variables is essentially a structure on the stack. Your globals are one big struct starting from the data segment. Just as your own structs are a pointer plus an offset, the local and global variables in C are also held within static containers, where the offset of each item with that container must be resolved at compile time. But in scripting languages, each of those containers is a dictionary, to which you can add a new entry (or remove an existing entry) at runtime.

In a scripting language, your local variable stack is a vector of dictionaries. Push or pop one on the end as you call or return from a function, let the garbage collector handle freeing the values. You can nest scopes as much as you like, as long as the variable lookup routine knows how many dictionaries to look at on the stack, and what other contexts to look at in what order. You can add more namespaces at will (classes, packages) and let the garbage collector free the ones you've stopped using.

This means you can add a new local variable, a new global variable, or a new class member, just by assigning a value to a name. You never have to declare variables, because the type information is attached to the object itself, not the variable name. When you assign to an existing name, the value currently attached to that key is replaced (and possibly freed by the garbage collector). When you assign to a new name, a new key is added to the dictionary with that value. You can assign an integer to a name that previously held a string, and later replace that string with a callable method. A function that takes an argument could receive an integer, or a string, or a dictionary in that argument, and it can check what type it's got this time and handle each differently, or just do actions like "print" or "add" and assume it'll all work out, or catch the exception if it doesn't. (Even though "add two numbers" and "add two strings" call very different code, it's not the function's problem to care. Whoever called the function is responsible for feeding it arguments it can work with.)

Eliminating the need to declare anything before you use it removes most of the need for #include files. Scripting languages generally haven't got a macro preprocessor, because they don't need one.

Scripting languages moved so much stuff from compile time to run time that they could get rid of compile time completely. Scripting languages are interpreted, you ship your source code with #!/usr/bin/python or similar at the start, and your "runtime" program reads and executes the source code. (Most don't actually interpret it line by line, for performance reasons they run the whole file through a quick bytecode compiler and then execute the bytecode, in a classic a tradeoff of startup latency for execution speed. But since this is another opaque abstraction that just works you don't have to care about these implementation details if you don't want to.)

Getting rid of compilation eliminates a huge pile of complexity in the form of makefiles, and also renders most "open source" vs "closed source" arguments moot. To see the javascript behind a web page, "view source". They haven't got the option NOT to ship their source code, because the source code is what you run. With a scripting language, your source code is the form of the program you deploy and run.

The price scripting languages pay for all of this is speed and memory consumption: a Python implementation of a mathematical algorithm (such as gzip or md5sum) runs an order of magnitude slower than a C implementation. But computers today are literally a million times faster than they were 30 years ago (18 month doubling time of moore's law, 20 doublings, 2^20=1,048,576), most code doesn't need to be as fast as C can make it any more than it needs to be as fast as hand coded assembly can make it. Being able to write and debug the program in 1/10th the time was the reason to move from assembly to C, and is the same reason to move from C to scripting languages for most application development. (Scripting languages are also designed to glue together C functions, the way C glues together inline assembly as necessary. If part of your program is a bottleneck, you can rewrite it in C and keep the rest in a scripting language.)

Tomorrow: what C++ did instead of becoming a scripting language.

March 18, 2011

The ACLU of Texas is, apparently, completely useless.

So they emailed me about the next nutty thing that the Republican bastards the senior-citizen baby boomers keep putting in charge of Texas (especially now they've gerrymandered it), and it's a redo of the "Papers please" law from Arizona that Rachel Maddow covered at some length last year.

This was my reply to them:

I would have twittered about this issue, but the only links you provided are "track everything I do on the net links", which I refuse to click on general principles. I googled for the ACLU of texas twitter feed: it was last updated in February. I googled for the national ACLU twitter feed, it makes no mention of this issue.

Either it can't be that important an issue to you, or you guys are terminally disorganized. But I did a quick check anyway: All the goggle links for "texas sb 9" bring up the one from the last legislative session in 2009. "texas hb 12" pulls up http://www.chron.com/disp/story.mpl/metropolitan/7430788.html which pairs hb 12 with "identically worded" sb 11, not sb 9.

In response, I got:

Delivery to the following recipient failed permanently:

     acluinfo@aclutx.org

Technical details of permanent failure: 
Google tried to deliver your message, but it was rejected by the recipient
domain. We recommend contacting the other email provider for further
information about the cause of this error. The error that the other server
returned was: 550 550 sorry, no mailbox here by that name. (#5.7.17)

I got on their mailing list by adding my name to some petition or other, I forget about what. I didn't ask to be put on a mailing list, but haven't bothered to spam filter it yet on the theory that it might have something interesting to say. That was my only contact with this group since the 90's. I got a fundraising phone call from them on wednesday.

I think they go in my spam filter now. If it's an interesting topic, either The Daily Show, Rachel Maddow, the BBC world service, or my twitter feed will probably mention it.

March 17, 2011

Aw, I have at long last got confirmation that the time Neil Gaiman was nice enough to say "By Grabthar's hammer, you shall be avenged" into a microphone at Penguicon 2, it did not get recorded. Alas. (These days there are many recordings of the man online, but at the time I was impressed by how much he sounds like a more laid back version of Alan Rickman.)

*shrug* I haven't been to Penguicon since 2008 (which oddly was the last time the Penguicon wikipedia page got updated too, although I don't personally find Wikipedia[citation needed] notable enough to edit). Even though Tracy and I co-founded Penguicon together, and Tracy found great guys like Steve Gutterman and John Guest to hand it off to. I never wrote a follow-up to that post because year 3 is where the politics started. I should probably do such a write-up at some point...

March 16, 2011

When I switched web hosts I didn't enable the xbithack thing at the top level directory, so the left nav bar on my main page hasn't been there for a month. (Oops. My apprentice Nick pointed that out to me, and it's back now. This web host doesn't let me tweak the global apache config so I have to have .htaccess files per directory if I want anything non-default, such as XBitHack to give me server side includes. I forgot to copy the .htaccess file into the top level directory. My bad.)

Yes, I'm teaching someone programming, in my copious free time of which I have none. (The merge window is open, I gotta get this NFS crap WORKING THIS WEEK. Which is what I was doing until fielding the bug report about my web page.)

So what's involved in learning programming? If you're going to become a good programmer you need to A) learn low level programming in C and a bit of assembly so you understand what the box is doing, B) learn high level languages like Python so you can bang out code fast.

Possibly most importantly, you must C) Understand why C++ is not C, and why people who mentaly lump them together are, pretty much by definition, bad programmers.

Back when I was still on speaking terms with Eric Raymond, we were working on a paper he wanted to call "Why C++ is not my favorite language". I should write up my own version of that material at some point.

In a nutshell: C++ sucks because C and scripting languages are each great, but when you try to combine the two their good parts cancel out and you wind up with a mess. There are two local peaks in language design space, and C++ is in the no-man's land between them with the disadvantages of both and the advantages of neither.

C is a simple language with very little abstraction, which doesn't automate away anything intersting and thus doesn't get between you and what the machine is doing. The downside is it forces you to deal with a lot of low-level bookkeeping (static memory management, static typing, type conversions by hand, strings are just arrays with some expected contents...) The upside is you have complete control, can write efficient code easily, and can do pretty much anything the hardware can do (and can sprinkle in assembly as necessary).

Scripting languages provide opaque abstractions you can rely on. They abstract away all those tedious details like memory management and type conversions, and you don't have to care how because they just work. They provide things like garbage collection and dictionaries and resizeable strings that you can just take for granted and rely on, and think and design at the higher level and let the language deal with the details.

C++ is halfway between the two: it provides semi-opaque abstractions that leak implementation details, but prevent you from easily seeing what it's doing so you can understand and fix it. You regularly wind up digging down into C++ to understand implementation details behind the standard template classes and virtual functions and function overloading and so on. (Things like "friend functions" and "virtual base classes" are implementation detail leaks built right into the language.)

So on the surface C++ seems like an improvement over C because it offers so much more infrastructure bundled with it. But when you try to use these mountains of code, they're full of sharp edges and you have to be careful where you step. C is an unforgiving language, but when you screw up you did it to yourself, so you can fix it. When something goes wrong in C++, often it means hidden depths of the language's implementation are rising to the surface to bite you.

If C was pure water, C++ would be a mud pie made with that water. Yes, the mud pie contains water, but it's not a replacement for (let alone an improvement on) the original thing. If bookstores then fill up with mud-based cookbooks because it's so hard to make something edible with mud in it, the solution isn't to add more mud but to step back and rethink your assumptions. C++ is neither an improvement on nor replacement for C.

March 13, 2011

Yeah, I'm overdue for an Aboriginal Linux release. I'm fighting with the NFS code. I've been this close to getting something I can _submit_ for a couple weeks now, but I keep having to understand huge tangents in order to understand why it's doing some strange little behavior. (Sometimes just to _DISABLE_ that behavior.)

I've got a bunch of half-finished things queued for Aboriginal Linux that I'll check in once I get 'em to work...

March 6, 2011

Carving out a couple hours to work on Aboriginal Linux.

Problem: the "build all architectures" invocation that makes sense on my laptop is "FORK=1 CPUS=1 more/buildall.sh", but when I do that it gets to the static builds, launches them all at once, has an I/O storm, and they all time out due to inactivity (producing no output for timeout.sh to see) because they're all trying to create their hdb.img images at once.

There's a couple ways around this, but they're all fiddly to implement. I could introduce a couple seconds gap between launching each target, but that's handled by more/for-each-target.sh and how much of a gap makes sense in general? (Fix it for my laptop, sure. And be a pointless delay on a machine like securitybreach.)

I could teach more/buildall.sh to create the HDB images itself, but the logic that currently does so in each system-image's dev-environment.sh (really sources/toys/dev-environmentsh) is just fiddly enough that I don't want to duplicate it. The ugliest bit is adding /sbin and /usr/sbin to the $PATH because on some distros that's where it lives and it's not in the $PATH of normal users.)

I could try to get the existing dev-environment.sh to create the hdb.img for me and not launch an image, except it would still do the distccd finding and launching dance. Making it not do so is unpleasant at a design level. Factoring out the hdb.img creation code into a separate script is unpleasant because I don't want to present the user with MORE THAN THREE entry points. (It's bad enough as it is.)

All of this is easy to do, it's doing this one more thing while keeping the design simple (or at least not making it worse)... Sigh.

March 5, 2011

Huh. The "log me out from the desktop" bug turned out to be a side effect of using the OpenVZ kernel based on Red Hat Enterprise Linux (AKA R-HELL, AKA Pointy Haired Linux, the people who named their own developer conference after Fear, Uncertainty, and Doubt). Probably the initramfs it needed to copy from RHEL launched some background process.

This kernel also caused the bug where forking 15 processes would only use one processor on my quad core laptop. The other three resolutely sat there idle. No containers involved, no fiddling with the nice level... It was just a "FORK=1 CPUS=1 more/buildall.sh" run of aboriginal linux. That means each target should use 1 process, but all targets should build in parallel. And on a vanilla kernel it does.

Also, with the RHEL kernel any attempt to modify the sound was returning "Error: Success!" (I.E. the call failed but errno was left at 0 so no reason why it failed was reported.) Even aumix from the command line was doing this.

I booted back into 2.6.37 vanilla, and it's behaving properly again. I wonder if this will also fix the problems I've been having with the corporate VPN?

March 4, 2011

Still sick. Staring at the laptop all day trying to work up the focus to do something useful. Drank an entire energy drink and it never kicked in noticeably.

Try again tomorrow I guess. (Could be worse. Fade threw up twice today, and is radiating epic woe.)

March 2, 2011

Wrote up a big status thing and then just posted it to my livejournal on the theory it's day-job-related.

In brief: got informed last Wednesday (Feb 23) I would be going to Scale in Los Angeles over the weekend, flew out the next day, was insanely busy until I flew back Monday, and then got sick. Didn't even turn my laptop on yesterday. Still sick today but catching up on email.

February 22, 2011

The reason opendns.com needs to die is that they intercept failed DNS lookups and redirect to their own page for them (which performs some strange non-google web search for random snippets of the URL)

It does this for transient name lookup failures, which apparently happen a lot since the service seems to be kind of overloaded and lookups for sites they haven't seen before are a 4-stage process. (Root nameservers point you to TLD servers, which point you to the domain's nameserver, and then you may have to do a subdomain lookup before getting the actual IP of interest).

Firefox caches this "successful" lookup, and for the next hour or so any attempt to access that page gives you the redirected URL.

Dear network people: let things fail. The "buffer bloat" problem exists because you wouldn't let packets get dropped, and by doing so you BROKE THE INTERNET. By not letting DNS resolution fail, you're also breaking the internet, you moronic bastards.

The OTHER fun thing is that chattr +i doesn't seem to work on ext4, so I can't nail a fixed /etc/resolv.conf in place until further notice in a way that the dhcp client doesn't know how to change it.

Feb 21, 2011

The most annoying thing Xubuntu does is boot up to a desktop, wait anywhere from thirty seconds to ten minutes, and then suddenly kill the desktop (and everything on it) to present you with a login screen.

It gets me every time.

It doesn't do this when I resume from a suspend, but it does it after every reboot. (Which happens because suspend resumed to a solid black screen, because suspend resumed with the mouse and keyboard frozen, because the battery died without warning despite the thing presumably auto-suspending itself before that happens...)

Linux: Smell the usability.

February 19, 2011

I didn't _mean_ to spend most of today updating the kernel documentation page, but I did anyway.

There's actually less there than there used to be, but what _is_ there has been shined up a bit, and largely de-bit-rotted.

(Yes, I still need to finish indexing the OLS 2010 papers. Most likely you haven't read 2009 yet, so go read that first, then poke me.)

Opinions?

February 17, 2011

Remember how I said Meego was completely uninteresting? Apparently Nokia agrees now.

I remember how Microsoft gave Sun Microsystems buckets of money to distance itself from Linux and basically screw the open source guys over back during the SCO trial. Sun went out of business and its corpse was purchased by Oracle last year.

I remember Microsoft buying huge amounts of stupid from Novell, and Novell dilligently delivered the stupid, and their corpose got sold to a private equity firm last year too.Microsoft just bought a whole lotta stupid from Nokia. No matter how seriously Nokia brings the stupid, I really can't see it working out well for them.

February 13, 2011

Every time my laptop forcibly reboots itself (for whatever reason) I lose half a day's work. This time, it resumed from suspend with the mouse and keyboard frozen. (Sigh. I thought I'd left that behind with Ubuntu 8.04.)

February 12, 2011

I wonder if I should post to the aboriginal list instead of blogging about it here? It's easier for people to reply there. I'm posting to livejournal about day job work, but that's kernel development on containers and (regrettably) all NFS at the moment.

I note that I posted how mount works to my livejournal. I may fluff that up into something for the kernel's Documentation directory, if anybody would like to review it...

February 8, 2011

Everything's uploaded to dreamhost now. That's where impactlinux.com used to be hosted (and that domain redirects to landley.net now), so it has as much bandwidth as that website used to. A long overdue upgrade.

There's a new mailing list for Aboriginal Linux. I couldn't salvage the old subscriber list so people will have to resubscribe as they find out about it. I might be able to post my old mbox somewhere (closest thing I have to an archive), but probably not until this weekend.

Regarding uClibc-1.0, I have two bugs preventing me from building NPTL with a gcc 4.2.1 and binutils 2.17 toolchain, one of which is easy to work around and the other I haven't had time to track down yet. (Day job takes up most of my programming time/energy during the week, I'll have to poke at this over the weekend.)

February 7, 2011

Website was down a bit while I changed hosts. I'm on dreamhost now, which is where impactlinux.com used to be parked. I'll see if I can reclaim that domain and get it pointed here, since the aboriginal subdirectory and mercurial repo should be the same in case anybody follows an old link to that.

It may take a while for everything to upload...

February 2, 2011

The power is out at home. (Ok, technically it's gone down and come up again at least 4 times this morning, but it keeps going down.) When it does, the cell phone tower closest to me goes away, and I drop down to one bar of signal on my phone, which can just about intermittently check twitter. Apparently, most of Austin's heating is electric and the sudden "80F to 18F" temperature drop spiked demand to the point where Austin's electric company couldn't keep up, and when they restore power to an area everybody's heater is on at once...

Wheee.

Got tired of this a bit after noon and headed out to see if anybody else had power, and it turns out UT has its own generator. (Streetlights along the way are out, though.) The bagel place on the drag is _packed_. And has no internet access...

(Update: found out the reason for the power outages, in addition to demand, is that the pipes in the power plant froze. Hard to boil water to make steam when your pipes are frozen...)

January 30, 2011

I've been meaning to replace the kernel on my laptop with one I built myself for a month now. I boot kernels under KVM every day. My _day_job_ is now doing kernel development. And the battery died earlier today (low battery warning really shouldn't be a pop-under), so I lost all my open windows, so now would be a convenient time...

In theory, Ubuntu makes this easy by providing kernel config files in the /boot directory. In practice, Ubuntu has managed to make this just horrible enough that I haven't quite done it yet.

I remember lilo. It was sad, but it was simple. Here's a config file, run this program to parse that config file and write a new boot sector that could present a menu, and when you selected the menu entry it would load a list of sectors in order and jump to the start of them. Conceptually what it did was very straightforward, and you could learn how to use it in two minutes and learn everything it could do in about half an hour. But it was really easy to skip a step, and if you did so you needed a boot disk/CD to recover.

Then along came grub, and the big advance for grub was that the little boot program read the file _at_boot_time_. So editing the file _was_ the step that updated the boot menu. Internally, this made it much more complicated, instead of working out a list of sectors when the full OS was up it copied miniature device drivers into itself that could understand filesystems, lots of different types of filesystems. But it was slightly easier to use, so it took over.

Unfortunately, grub was donated to the FSF, which seldom produces anything new that works but loves to take credit for existing projects and do to them what Willy Wonka did to Violet Beauregard. The FSF is a religious organization whose zealots seek to displace proprietary software by bloating any code the FSF has anything to do with until it takes up all the storage space in the entire universe, thus displacing every other implementation with the One Gnu Way. In that they are idiots, but worse from an engineering perspective they are INEFFICIENT idiots.

These days, grub is no longer a text file you edit by hand. At the start of /boot/grub/grub.cfg:

#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by /usr/sbin/grub-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

The file is 157 lines long (not counting sourcing "grubenv"). It calls insmod to load filesystem device drivers (which are stored IN THE FILESYSTEM, no don't ask how that's supposed to work). The individual kernel version load entries now look like this:

menuentry 'Ubuntu, with Linux 2.6.32-27-generic' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,6)'
        search --no-floppy --fs-uuid --set 59f457c9-c387-4963-a18f-d6f19208e753
        linux   /boot/vmlinuz-2.6.32-27-generic root=UUID=59f457c9-c387-4963-a18f-d6f19208e753 ro   quiet splash
        initrd  /boot/initrd.img-2.6.32-27-generic
}

Horrible, isn't it? (Note: initrd was replaced with initramfs somewhere around 2005.) But it gets worse: The initrd line points to a seperate file for each kernel, generated by yet _another_ script. And that funky (redundant?) UUID boot syntax doesn't work without the initrd.

When I run grub-mkconfig, it spits out a config file to stdout. It doesn't update the read-only grub.cfg file, it doesn't regenerate the initrd. So some OTHER infrastructure is updating all this crap, and there's no obvious way to backtrace to it. MORE LAYERS!

To review: the advantage of grub over lilo was that you got to skip a step, you oculd edit the text file directly and didn't have to remember to run the command to update the boot sector. Now that the FSF has taken over, you're not allowed to edit the text file yourself, that's done for you by nested layers of automation that you're not expected to understand and which take control away from you entirely unless you drink down their entire septic tank of complexity.

So, here's how _I_ make it work:

Clone the most recent kernel entry and rename the .32-blah to .37. Delete the weird --class stuff off the first line. Delete the initrd line, delete the search line, run fdisk to figure out which partition to set root= to (in this case "root=/dev/sda6" for my setup) and strip off the "quiet splash" at the end so we can see what we're DOING. Tweak the kernel .config to hardwire in the filesystem driver (ext4) and the sata driver (AHCI). I just disabled about 2/3 of their automation, and now I can boot a darn kernel.

WHY do people keep falling for the FSF bait and switch? They're a mirror image of microsoft, they want to impose their worldview everywhere and all these engineering tasks are a sideline to their real agenda, one which they're not very good at. Complexity is something you BUDGET, for the same reason that using more metal does not make a better car. Double the weight and it will be LESS fast and LESS fuel efficient. Complexity in software is a COST, and two thing it can cost you are developers and users.

I wonder if software suspend works in the new kernel? I highly doubt it, but this could be a fun lever to pull... :)

Wow... That was fast. And... it worked? (Hasn't paniced yet...)

January 21, 2011

So I've been working on containers support for parallels, which is actually a really cool technology.

The problem with doing multiple virtual systems (ala KVM) is that it doesn't scale. It's really easy to do because your OS is already written and you just have to emulate hardware for that existing code to run on, but it's really inefficient, eating memory for redundant disk cacheing and eating CPU to translate data back and forth through device drivers for virtual devices and faking interrupts you then handle... The state of the art today is that if you manage to stack a couple dozen virtual systems on the same server, you're doing pretty good.

But you can have a couple _thousand_ chroot contexts on a sever, without breaking a sweat. Each one takes a tiny amount of memory (because its resources are mostly shared), and literally _no_ CPU time when it isn't doing anything (such as a server blocked waiting for requests).

How do you make virtualization scale better? Well the first thing you can do is throwing hardware at the problem, both more CPU (SMP!) and memory, and by teaching the hardware to perform some of the emulation tasks for you (especially the "VT" extensions offering nested page table support). But you pretty much had to do that to get _up_ to dozens of instances. It won't get you to hundreds, let alone thousands.

Some of it you can handle by making systems more efficient about quiescing themselves. The "tickless" systems (that power all the way down when they have nothing to do) have been in development for 10 years by the embedded crowd doing cell phones. But again, that's part of the state of the art today and they're still only at a couple dozen instances on a fairly fire-breathing server.

Developers are attempting to stretch this further with "paravirtualization", essentially punching holes in their emulation and teaching the virtual systems that they're not really running on their own hardware. This is the realm of balloon drivers and virtio network devices. Unfortunately, this still sucks for a couple reasons, one petty and one fundamental.

The petty reason is that the main advantage of taking the virtualization approach in the first place was that you already had an existing OS you could run. Having to rewrite that existing OS defeats the purpose of doing it that way in the first place. Ok, it's nice that you started from something that worked, but going more than a certain distance from your original design assumptions means the changes you're making are no longer cosmetic, but serious surgery that leaves scars. When what you need to do involves completely discarding many of your original design assumptions, past a certain point it's easier to just start over.

The fundamental problem is they're reinventing the microkernel. Each virtual OS wants to manage its own resources, and without a central context with a global view of the system capable of seeing ALL the available resources and making intelligent resource allocation decisions, your performance is going to suck. You can't share resources across contexts that are designed from the ground up for exclusive access of those resources, and when you try you spend all your time copying data back and forth between the different contexts. We've been here before: the end game of paravirtualization is the sucky performance of microkernels. They're reinventing "the hurd". There's a fundamental design reason Minix lost out to Linux.

Which gets us back to containers, which are how a monolithic kernel design (like Linux) would handle virtualization. It takes the opposite approach, supporting virtual systems by building up from chroot and adding additional isolation so that groups of processes are fully independent of each other, yet the host kernel can still see everything and allocate/share resources intelligently.

Start by giving each container its own PID namespace (and thus its own init task which kills all the other processes in the container if it ever exits), add separate network routing, selective device visibility (and the occasional virtual device like PTYs or tun/tap), and resource accounting so you can limit how much memory or CPU time all the processes in a container can collectively use (and how many files or sockets they can open, how much I/O bandwidth they can use...). And the process local mounts from the shared subtree patches ages ago, and letting /proc and /dev/pts and such have multiple instances so each container can have its own that shows _its_ view of the world... And so on.

There's a lot of work that goes into it. But the thing is, it starts from something _efficient_ (chroot) and adds isolation, instead of starting with something fully isolated and trying to retroactively make it efficient. The container way of doing things turns out to be the easy approach, and playing security whack-a-mole with each new corner case (/proc/sys/vm/drop_caches shouldn't be global! A container calling shutdown should kill just its own PID 1 not power off the whole system!) isn't exactly something _new_ for Linux.

There were several competing out-of-tree approaches to doing this (the most advanced of which was OpenVZ), and now that code has finally started to go upstream into Linus's tree of course the result isn't compatible with any of them. (Sigh.) So there's a new Linux Container control package LXC which provides the userspace knobs to launch and interact with containers. I wrote some temporary HOWTOs for playing with them, part 1 and part 2, and I need to update those HOWTOs when the next LXC release comes out because they document some bug workarounds that I get to remove later.

(It's all very exciting, for the "live in interesting times" definition of exciting that involves a lot of it being unfinished. But it's progressing quite rapidly, and actually does demonstrable things now.)

January 15, 2011

I'm mostly using my new work laptop now (twice as much memory, twice as many CPUs, newer OS install, longer battery life), but my old work context isn't all on it so I can't update this blog from there. (Hence updating the livejournal more frequently.)

I should just bite the bullet and move everything over. This would involve buying a bigger hard drive for it, and pulling out my old USB hard drive adapter thingy to move over the previous context from both machines. (How one moves a Windows 7 partition without freaking it out is an open question. I've had to boot into that a couple times to handle the Windows files HR keeps sending me, which many other things can view but none seem to be able to actually edit. Now that Oracle's crushed Sun, OpenOffice is useless until the Libre guys get up to speed. It's a good thing I never particularly cared about MySQL and haven't cared about Java for a decade now.)

(Yeah, Aboriginal Linux development is similarly stalled, for the same reasons as the blog. Not lack of interest, just lack of convenient work context. My rob at landley dot net email is also on the old machine, I just caught up on a week's worth of back email there. That's why I haven't set up a new mailing list either...)

On the bright side, I'm very very slowly getting traction as a kernel developer. That's sort of terrifying, really...

January 8, 2011

Finally making a bit of progress on the Aboriginal Linux documentation redo. Fixing a couple bugs that should have been fixed in the previous release. Confirmed the armv6l thing is a change to qemu rather than the kernel. The usual. Still no mailing list. (Nobody's emailed me asking for one.)

I'm still mostly blogging over at my old livejournal, about work at parallels. That's what's taking up the majority of my time.

January 3, 2011

Finally got Aboriginal Linux 1.0.1 out. Woo.

Still no mailing list. I should fix that...

January 1, 2011

Happy new year.

Not sleeping well since I came back from Russia, and I seem to have a low level general systemic bug that won't quite become a full-fledged cold but won't go away either.