Rob's Blog Creative Commons License rss feed livejournal twitter

2012 2011 2010 2009 2008 2007 2006 2005 2004 2002


May 11, 2013

Thought I might get a couple hours on toybox or aboriginal while the niecephews were at the pool, but Sam doesn't turn 6 until October and in order for her to swim, I have to swim. (Queue Captain Picard shouting "There are four lifeguards".)

On the bright side, in dayjob-land the filesytem I'm backporting through 4 years of kernel development (Pointy Hair Linux is still using a 2.6.32 kernel) now compiles! And mounts! And segfaults when you list the directory contents!

So that's something.


May 6, 2013

Day job eating life right now. No brain left for anything else.

(Caught up on Dr. Who up through the C.S. Lewis tribute christmas special though, via the netflix app on my phone. Sadly, this required less brain than it should have, but then I still hold it to the standards of the period when Douglas Adams was script supervisor.)

(Also still vaguely annoyed that the picking at early cannon Moffat's doing is completely ignoring Susan. The Doctor did not leave Gallifrey alone, he took someone away and hid her on a planet he's still guarding. The one time she returned to gallifrey (in the 5 doctors) she pretended to be just another companion and left quetly with him; it was her homeworld and she said _nothing_. Anybody else see a red flag in that? Moffatt could trivially say Susan founded the time agency Jack Harkness came from, but no, we get River Mary Sue Song...)

Ahem.


May 3, 2013

I need to extend the option parsing infrastructure to include FLAG_longopt #defines for each long option. Doing this in shell script with sed is horrible, so I'm pondering writing some C to include lib/args.c and leverage the existing C option parsing infrastructure to output its own macros.

Problem: what to do about flags that are configured out? Right now the shell script is #defining those to 0, which makes if (toys.optflags & FLAG_x) become if (0) and drop out. If I replace the shell script code doing that, the C code doesn't naturally do that either. Since USE_BLAH() macros are done by the preprocessor, it's actually really hard to get C code to deal with both at the same time.

Which means I need to build the C executable twice, with two different preprocessor configurations, and feed output from the first one into the second one...


May 2, 2013

Sigh. Caught up on email before going to bed, and there's a mount.c submission, from someone using toybox in a product and who needs mount.

I've had an unfinished 100 lines of mount sitting around for months, with no progress on it. Instead, I've been spending what little coding time I can get doing things like cleaning up ifconfig.c (which I'm maybe halfway through). If you got the impression that cleaning up giant external submissions like that is slower than writing new code from scratch, you're not alone.

The new mount submission is over 1400 lines long. The one I wrote for busybox was 600 lines. This is a command I've already written before. It would be faster for me to ignore it and finish mine.

But I haven't had time to do that. The fair thing to do would be to merge this, discard the work I've done, and clean up this thing. Except I haven't had time to finish cleaning up the last giant code dump yet. Or the half-dozen before that...

Instead, I spent today backporting vanilla kernel.org code from an obsolete 3.0 kernel to an obsolete 2.6.32 kernel, because we need 9p working in centos 6.4 and it only supports 9p2000.u not 9p2000.l like SLES does. Nobody will care in 5 years, but we can't wait for Centos 7.0 to ship, and until then Red Hat will continue to ship a kernel that's already seventeen releases out of date.

I'd love to take a year off and work on toybox full time, but I can't afford to. I got married, we moved to a real house with a real car, and she's getting a doctorate. That lifestyle comes with expenses...

I hate blocking other people. I also can't help but be envious that they get to work on toybox at least somewhat in the context of their jobs, and I don't...


May 1, 2013

Monday: two very attractive women sunbathing outside in bikinis. (Dunno if they've been doing this and I only noticed because I took the day off and the break room with internet access is surrounded by windows.)

Today, it is _snowing_. (Not sticking on the ground, wet grey and overcast all day and I'm not the only one to notice when switched to flakes after sunset.)

Jonathan Coulton's "First of May" song might possibly have presented a very slightly idealized take on the situation, is what I'm trying to say here.


April 29, 2013

Taking the day off from work. The 3.9 kernel shipped and I need to catch up with open source stuff.

Plugged 3.9 into the aboriginal build, tweaked the patches, and built the quick and dirty version of all the targets, ala:

NO_NATIVE_COMPILER=1 more/for-each-target.sh './build.sh $TARGET'

That's a small enough subset my netbook can churn it out in a finite amount of time, and: arm works, mips works, sparc sort of works ("ls -l /dev" hung once, but ran on the reboot? Hmm...), and the x86 targets work.

That leaves powerpc, sh4, and mips64. (And m68k but qemu still doesn't support that well enough to test and I haven't bothered to set aranym back up yet.) Of those sh4 dies before producing any output, quite possibly a qemu issue.

I also need to get inittmpfs ready to resubmit. There's conceptually three parts: base inittmpfs support, rootrdflags=, and the ability to configure out the root= fallback code.

But mostly, spent the day banging on toybox...


April 28, 2013

Managed to screw up my local toybox repository again today. (Doing "hg import -f" will suck up changes to your working store, even in very different parts of a file that only has a one line modification. You'd _think_ it would bypass it and just apply changes to the database, but no. And if you do two of them at once, "hg rollback" won't undo the previous one, it's now a permanent part of that repository, because Matt believes that rollback should only do one level otherwise demons fly out of your nose.

(The way to fix it is to clone the copy on the website into a new directory, do the hg import -f into _that_ copy, and then substitute the .hg directory out of that for the one in the old repository. Or at least that's what I did. So far no more nasal demons than usual.)

Meanwhile, I heard back from the first of the tinycc developers about qcc: Daniel Glockner read my old blog entry on the qcc triage todo list and he does _not_ want any of his code used under BSD license terms, so that's 295, 306, and 307 to remove from tinycc. (He added the ARM support, but the point is to replace all the code generation with qemu's TCG anyway, so not a big loss. The ELF plumbing and predefined #ifdefs need to be redone at some point, but it needs to be genericized with a table for all supported architectures.)

But that's after toybox 1.0, and presumably after I switch Aboriginal to musl, toybox, and llvm. And do my own distcc implementation that doesn't assume falling back to compile on the host is something to do on a regular basis.


April 26, 2013

Behind on everything, and this weekend I visit the niecephews again. In theory the 3.9 kernel drops this weekend, no idea when I'll get to catch up...


April 25, 2013

Why is distcc making zero speed difference building on my netbook? I kick off two parallel builds in two different windows and they advance at exactly the same rate. What, some kind of strange cache effects? Did I break distcc again in a way that it's totally refusing to tell me about because visibility into what distcc is doing has always sucked mightily? Has the emulated network become a bottleneck that's coincidentally slowing it down to EXACTLY the speed of the non-distcc version?

Sigh. I need to write a distcc replacement. Something that actually _forces_ compiles to be distributed and _never_ falls back to a local build, rather than invisibly deciding not even to try to distribute it for inscrutable reasons.

But not today...


April 24, 2013

I'm disappointed that our "Democratic" president is to the right of both Richard Nixon and Ronald Regan, but until the baby boomers stop being 24% of the population I can't see it improving.

I respect people searching for the truth, but have a problem with people who claim to have found it. For one thing, the truth changes: we fought World War II against Germany and Japan, which are now our allies. Statements like "Our problem is X", "We must do X", "It's useful to know X", "The highest moral value is X"... they have a shelf life.

Let's take the current universal moral freakout: Traditional Marriage. Before you get all bothered about the fact the only people who actually seem to want to get married anymore are gay, remember that the institution of marriage predates paternity testing, contraceptives, the industrial revolution, electricity, literacy, and most women _not_ dying in childbirth before the age of thirty. "Till Death Do Us Part" averaged less than 20 years before antibiotics, vaccines, blood transfusions, ambulances, defibrilators, refrigerated and canned food, epipens and benadryl for anaphylactic shock, tylenol to bring down fever, inhalers for asthma, surgery for appendicitis, snakebite antivenom, splints and plaster casts so a broken leg doesn't leave you lame for life, prescription eyewear to prevent "legally blind" from being effective reality, and so on. People in developed countries don't die of heatstroke or freeze to death much anymore, are seldom eaten by wolves, and can generally avoid scurvy, beriberi, or going blind due to vitamin-a deficiency. Rats, roaches, mosquitoes, fleas, and worms are things we actually notice rather than just expect about our persons.

Indoor plumbing all by itself is a huge deal: nobody in my neighborhood died of dehydration or cholera this year. I can't currently smell urine or feces. I can't smell myself either, and despite the snow outside did not risk hypothermia in washing any portion of my body.

The point is: this is a recent development. Men used to die of random banditry when they didn't get sucked en masse into a pointless border war or flatted by Genghis Kahn or Xerxes or Napoleon or Stalin. (Not saying we've solved that last one but nuclear weapons have at least kept the scale down: US casualties in Vietnam were about 1/5th of the number of Armenians killed by the Ottoman Empire during World War I.) Women died in childbirth _a_lot_. And then a plague would come through and kill yet another quarter of the population. "Till death do us part" was in the ceremony because getting remarried after your spouse died (and then having more kids) was normal: fairy tales full of stepmothers, stepfathers, and orphans aren't just common because it's a useful plot element, they're common because it was a common condition.

So saying "How DARE you question the rites of marriage handed down to the ancient greeks by Zeus himself! People divorcing after twenty years rather than dying after fifteen is clearly a sign of moral decline, Odin will strike us all with thunderbolts!" Yeah, not buying it. Used to serve a purpose. But we've moved on. Religion gave us dietary laws before we understood nutrition, parasitology, crop rotation, germ theory...

Now that we know what causes malaria, have public schools providing minimal day-care, can actually plan pregnancies and tell who the father is after the fact if we want to, and aren't generally inheriting a plot of land for generations to subsistence farm on (often as a peasant under a local warlord with a big knife), lots of people find marriage less useful. We've found new reasons to keep at it: tax advantages, joint bank accounts, hospital visits, and so on. But the real reason is cultural inertia, and tacking extensions on to a ceremony dating back to the stone age based around buying a woman from her father with sheep? Legitimately questionable if this is the best way to go about meeting society's current needs.

I also note that denialism is not searching for the truth, it's another way of being certain of an existing answer. Specifically it's saying "I know that this is wrong" and then changing your theory about how or why each time your old theory gets debunked. It's just another way of NOT questioning your existing beliefs to see if the world's moved on without you, as the world tends to do.

This is why science makes predictions. Waking up to find your swimming pool empty and guessing that invisible pixies drained it is a very different statement than saying "tonight, invisible pixies will drain the swimming pool" while it's still full. You can then wait and see what happens, and the statment is capable of being proven wrong. You have a way of noticing when the statement is not true, and that's the heart of science. If you only guess about what you already know, you can't tell when you're wrong.

So this is my problem with the baby boomers. If you search for the truth, there's a danger of thinking you've found it, at which point you stop questioning your beliefs, the world changes out from under you, and you slip into denialism about challenges to your existing belief. (This is without even getting into "smartphones have rendered wristwatches irrelevant for anyone under the age of 30, and you're telling me how great vinyl sounds" levels of inertia, but just sticking to the big important stuff.)


April 23, 2013

If VLC has focus and you type "make", you 1) mute the audio, 2) screw up the aspect ratio, 3) screw up the audio delay, 4) pause the playback by advancing one frame.

Typing "make" in the window again does not fix it.


April 22, 2013

My phone has bluetooth, but apparently the OS upgrade made the bluetooth file browser go away? Or did I only have that in the old Nexus One and not the Galaxy S?

Either way, it goes "Bluetooth! I can use that to have the laptop display sound, because its speakers are even worse than the phone's!" And that's about it.

Bravo, Google. Bravo.

(Ah, I have to download an app to do what I thought would obviously be built-in functionality. Hmmm... I wonder if this is enough to do podcasts now? Got a recording app, got something to get the files off the device, I need to dig up a screencasty thing...)


April 21, 2013

Passing comment from someone that Bill Gates dropped out of college so why can't they?

Apart from the fact his father was a lawyer and his mother was on the board of directors of the Red Cross. William H Gates III ("Trey" to his friends) really didn't have to worry about starving to death if he dropped out of the college his parents were paying for.

Just a thought. There's a _reason_ that Steve Jobs needed venture capital funding to launch Apple (costing him control of his company in 1984), and "Trey" didn't to launch Microsoft (he still owned over 40% of all Microsoft stock well into the 90's).

Yes, this sort of detail is important. As we saw when Jobs came back, when the two went head to head Jobs walked all over Gates. Repeatedly. But when Jobs said "Apple II is dead, Macintosh is the future" in 1984 (after the Lisa flopped), his board of directors rebelled and forbid him to divert resources away from the company's cash cow to invest in a sequel to the Lisa, eventually taking all authority away from him when he didn't listen. Gates's pet engineers didn't even BADLY clone the Macintosh until 1990 (and then only because an engineer working on his own initiative surprised them and forced them to chagne direction), and didn't catch up until 1995. Apple squandered a ten year head start and came crawling back to jobs who handed over the Macintosh sequel he'd finished the previous decade, and renamed it the iMac. Meanwhile Jobs had bought Industrial Light and Magic's digital effects arm when George Lucas sold it cheap after the original Tron flopped in theatres, renamed it "Pixar", and turned himself into a Hollywood movie mogul who eventually wound up the largest shareholder in Disney.

Of such details was the computer industry forged...


April 20, 2013

Whenever I reboot my netbook I lose buckets of state: open command line windows with half-edited files, todo lists, just directories that remind me "oh, this thing you were doing". Balsa won't save its state so I lose windows where I hit "reply" on a message but haven't composed and sent it yet...

Unfortunately, during one of the "apply package updates", Ubuntu swapped out its upstream crypto certificates, deleted the old ones, and needs to reboot to use the new ones. (I think that's what happened.) So I can't do any more package upagrades until I reboot, and that includes installing "debootstrap" and "debuild" so I can once again dink at bootstrapping debian under Aboriginal.

(Well, I could tell it to install packages from untrusted sources, but if the reboot doesn't fix that it's reinstall time.)


April 19, 2013

Huh. If you google for "toybox linux" the first hit is the toybox about page and the second's the news page... but it says "toybox - Rob Landley". Which is odd, because the <title> tag just says toybox. Where is it getting that? (The top level landley.net page's title is "Hello world!". The only mentions of my name except in release notes are on the license page...

Sigh. Google is trying to help out in the kitchen by dropping an egg on your shoes and getting flour everywhere. Google is helpful in ways you didn't ask it to be! How do I pat google on the head and try to distract it so it stops being helpful...


April 18, 2013

And _now_ Texas LinuxFest gets back to me, asking if I can do my proposed "why the GPL is dying" as a lightning talk instead of a full slot.

Alas, when they didn't get back to me by their own deadline (the 15th) I booked tickets to visit my family in Austin the previous weekend, where I get that monday off anyway and can spend all the time at home.

No biggie, Eben Moglen just covered the topic, and I already did three minutes on it in the GPL section of my ELC talk. (Yes, you can link to a starting time in a youtube video. :)


April 17, 2013

I mentioned earlier how I needed to teach people how to clean up code to my standards, and I've been trying to do that. I've been cleaning up pending commands in stages, doing one evening's worth of work and checking it in, then posting a message explaining why I did that.

I started by explaining the start of ifconfig cleanup. Each post links to the mercurial commit so you can see the diff, and describes what the hunks of the diff do. I described what happened in commit 843, then commit 844, then the next one that touched ifconfig was commit 852, and so on.

The first cleanup I checked in in stages was uuencode. I went back and described those stages in three posts (one two three).

My descriptions since then are tagged with [CLEANUP] in the title, so if you want to follow along in the archive, those are the posts to read. (Date view may be the easiest way.)


April 16, 2013

Sleep. Sleep would be good. I should try that sometime.


April 10, 2013

Somebody asked if I'm still working with funtoo, and I answered negatively. But perhaps I should should give more context.

I started trying to bootstrap Gentoo for Aboriginal Linux years ago, but unfortunately the Gentoo project was badly damaged by Debian refugees landing on it around 2006.

The Gentoo community got squashed by a flood of Debian developers fleeing their project's paralysis during the "debian stale" years (3.0 in 2002, 4.0 in 2007) and brought the acrimony with them. When Ubuntu bled off the pragmatists to a new fork, the FSF zealot idealists left behind fought each other to a standstill (pearl-clutching about iceweasel and such), the project nearly constipated itself to death, and a subset of its developers went "hey, gentoo's doing actual engineering without endless flamewars... well it was before _we_ got here, now it's another perpetual flamefest. Heh. Go figure." They bogged off after a few years (Ubuntu started sponsoring permanent Debian developer positions to get its parent project unclogged) but Gentoo development never really recovered because debian snuffed out the sense of community it used to have.

I was hoping Funtoo could provide the condensation nuclei for a new community, but it didn't work out. (For me. Your mileage may vary.)

On a technical level, I've poked at gentoo bootstrapping a number of times and found some deep technical problems with it. The immediate problem was that nobody in the surving community really understood Stage 1 anymore, or was willing to explain it if they did. They'd reverted to a "gentoo builds under gentoo" model where setting up portage and a gentoo base build environment was black magic. What documentation they did have assumed you already knew what you were trying to look up.

But when you dug under that, every gentoo ebuild file explicitly lists every architecture that package supports. Meaning if you want to support a new target (as the qualcomm hexagon guys were doing), you have to touch every single ebuild file in the entire tree, to add your new package to all of them. (Or, as they did, just make the x86 architecture an ancestor of your non-x86 architecture. Which is cheating, but the build system was too broken for any clean solution other than a complete rewrite.)

This completely unnecessary design assumption runs directly counter to what I'm trying to get Aboriginal bootstrapping to do. I'm building packages for whatever the current host architecture happens to be. I want the build to be architecture-agnostic, but just resolving the portage manifest means following a symlink to an architecture definition file that #includes a bunch of sub-architecture definition files that eventually get around to including individual ebuild availability lists. They make a big deal about having a top level portage configuration file in /etc that you specify your architecture tuple and such in, but this turns out to be a thin layer of tuning on top of baked in assumptions througout the portage tree.

Their entire build is designed around NOT letting me use it in any sort of flexible manner. Even _more_ so than rpm or dpkg based repositories where building from source wasn't a priority so they didn't put so much effort into imposing One True Way to do it.

By the way, Google's Chrome OS (the not-android thing they have fighting a civil war in house with the android guys) is Portage based. As outsourced to Cannonical and then brought back in-house. I couldn't wrap my head around it when I tried, it was kind of horrid. But the reason it was horrid is forking gentoo and doing your own portage-based build with different packages was hugely non-obvious. (So they did a preprocessing layer that... let's just say the One True Way got defeated. By brute force leaving piles of debris everywhere.)

So yeah, I had high hopes for Funtoo throwing all that out and starting over, and Daniel Robbins was working on it and I was trying to help until the irc incident. Except the "throwing it out and starting over" never quite happened because their community wasn't big enough, so they decided to leverage the existing portage tree from gentoo. And then never got away from it, that I saw. Oh well. (Perhaps they've made great progress since I stopped paying attention, and I just hadn't heard.)


April 6, 2013

A longish recounting of Friday's adventure. Feel free to ignore if you don't want to hear about my aches and pains, but I had a stressful thing and need to vent.

Living in the habitrails of St Paul, everything's right next to each other. My commute to work is slightly longer vertically than horizontally, and my dentist is at the end of the hall going the other way.

Around eleven I had dental work, an hour and a half of prep work (cleaning and flouride varnish and stuff) for all the cavity filling they have to do over the next few weeks. This involved an anesthetic mouth rinse that didn't anesthet as much as it should have, but then I have a history of weird reactions to anesthetics. (Nitrus Oxide makes me feel like my entire body's being electrocuted, for example.)

This ended shortly before my 1pm meeting, and I still had a hard time talking in said meeting, then headed to the convenience store for some bananas and an energy drink (I'd missed lunch).

I stopped by my apartment's break room on the way back from the convenience store to check my email, so I was looking at black text on a white screen when my vision greyed out for a moment (like I'd stood up and wasn't getting enough blood to my brain), and when it came back a blob-shaped quarter of the screen was just white with no text on it. It moved when I changed where I was looking, and it was there symmetrically in both eyes, so I knew it was a brain thing and not an eye thing. (And that it was basically "I can't see this bit but my brain is treating it like the blind spot everybody has in each eye and editing out the bit I can't see, stretching the edges and doing a flood fill or something.) Deeply freaky; I stood up to look out the window and see what something well lit but non-computer would look like, and just walking around a bit made it clear up.

Even though the problem cleared itself up in maybe 30 seconds, I went next door and asked the dentist's office where the nearest hospital was. The showed me a building three blocks away out the window, and when I said I wanted to walk rather than take a cab one of them escorted me to the parking lot. (The habitrails do not go to the hospital, it involves going outside.)

I walked about a block, and the area of my vision that had been glitched before started to re-glitch, only this time it was a sort of crosshatch texture overlaid on what I was seeing, which got more opaque and expanded off to the left. Even though walking fixed it last time, this time it seemed like walking was making it worse. (I was thinking "is this a blood vessel is blocked kind of stroke, or a blood vessel is leaking kind of stroke".) At this point, I stopped under an intersection street sign and called 911. A thousand dollars to go two blocks vs chance of permanent blindness: I'm paying the money.

While waiting for the ambulance to show up (fire truck showed up first for some reason), the visual distortion got worse, growing into a giant crescent shape taking up the left 2/3 of my vision in both eyes. It was bad enough I couldn't read anything (not even the really big letters in the advertisement on the window across the street) except out of the right side of my eyes. And here I am thinking "if I can't read, I can't work", and that if it was a blocked blood vessel they could use the clot busting drugs any time in the next hour, and if it was the rupture sort of stroke lowering my blood pressure might slow the advance while I still had some vision left so I was sitting down and trying to stay as calm as possible.

So the ambulance shows up and starts asking me which hospital I want to go to. What on earth kind of question is this to someone who THINKS HE'S HAVING A STROKE? I told them I'm from out of town, don't know my options, that I'm aware of the research about hospitals in the same city costing ten times as much for the same thing but I am not an informed consumer here and just take me to the nearest one that can deal with the symptoms I'm describing please. (I might have been babbling at this point but _dude_.)

So we get to a hospital, and they wheel my gurney into a hallway. But this point, my vision has cleared up a bit, meaning I'm afraid to move because walking made it worse and lying still made it clear up and I had no idea what was going on and if this might just be a precursor to a sudden BIG stroke or what. Yet another person asks me what's going on and I describe my symptoms, trying not to use words like "fovea centralis" because that's their domain and not mine and I'm HAPPY to leave this to the professionals.

At this point, they do a bit of traige. The ambulance guys had checked my blood sugar and blood pressure (both fine), but a nurse came and checked my blood pressure again. (Yes, if I'm having a leaking blood vessel in my brain by all means constrict my arm to force more blood to my brain, thanks. I guess the ambulance guys didn't pass on any data?) Somebody did the "can you track my finger" test, which I probably could have done even if I'd still had just the 1/3 vision while awaiting the ambulance so I'm not sure what it proved. (Not completely blind yet, nope.) A receptionist came and took my driver's license and insurance card. And that was it: I sat there.

I spent the next half hour in that hallway while they tried to clear a room for me to go into. In the first room they told me to change into a hospital gown, then gave me three very small blankets because it was really cold in the gown. (Still Minnesota.0 Then they moved me to a second room, because there was some unspecified thing wrong with the first one. (During this move my glasses got lost for about half an hour.)

After some more waiting in the new room, a nurse and then doctor finally showed up to actually look at me. (I'd been there about an hour at this point.) Since I was having a neurological problem that presented symmetrically in both eyes, they looked at my eyes. Checked for retinal detachment (none, good to know), later wheeled in a machine to check for glaucoma (none). Wanted very much to do the eye chart test with me but couldn't until my glasses turned up.

An early test the doctor did involved shining a bright light in my eyes, and the afterimages didn't want to go away, and after a couple minutes they started flickering and wobbling in a fairly disconcerting way and I went "oh no don't start again" and the doctor wanted to know why, and I said "I hope it's not doing it again". So she left. A minute or so later I hit the nurse call button because it was doing it again, but this time spreading out to the _right_ side of my vision. Luckily the left side had recovered to be nice and sharp and clear and high resolution again, so I wasn't as panicky as I could be because there was a reasonable chance this would recover the same way eventually. But I was trying to explain to the doctor that I could hold my fist up and not see it, and she kept checking the edges of my vision where I could start seeing her wiggling fingers, and I'm trying to explain "there's a large hole in my vision but it's not AT the edge, I can see around the side of it, the test you're doing is not helping" but she kept performing the useless test anyway. (That pretty much sums up the entire visit, actually.)

Sometime after that the nurse showed up with the glaucoma test machine, and around then I found my glasses (they got mixed up in the blankets) so they could do the eye test. (Reading the eye chart with "E" at the top which I try not to memorize but reading the same chart with the second eye is just SILLY.) During all this, the vision on my right side gradually cleared up the same way the left side had. Still no MRI or cat scan in the offing looking at the bit going funky, they kept investigating my eyes, which even I could tell were clearly not the source of the problem.

At about this point I went from paniced to bored, and started trying to diagnose myself. I knew before I arrived that it was a brain problem not an eye problem (I have two different eyes and this was presenting exactly the same way in both at the same time), so every single test they'd done so far was useless. And it was seeming less like a stroke because strokes generally don't move around like that, the damaged blood vessel corresponds to a specific physical area of the brain. And this was all over the place, and then cleared up again fairly quickly and completely (again, not stroke-like. A blocked blood vessel can unblock, _maybe_ a clot was moving around and re-lodging somewhere else, but... not likely.)

About this time I started to get a headache. It was just like the headaches I've had on the right side of my head for years now (which I've been blaming on sinus troubles), except this one felt like something was stabbing the back of _left_ eyeball, which was new.

This made me remember my friend Adrienne's tweets about having a "scintillating" something visual effect preceeding her migranes. She had the scintillating whatsis once and had to pull over while driving. So I used my phone to look it up and the wikipedia picture actually looked a bit like what I'd been having. Which came as something of a relief at that point, because I'd BEEN THERE FOR THREE HOURS already, and they hadn't even run any tests that would be relevant to anything going on in my brain, and even if they started at this point if I HAD been having a stroke it would probably be too late to avoid permanent damage. Strokes are one of those "every minute counts" things, I started straight for the hospital when I noticed something wrong (walking three blocks is faster than waiting for a taxi or ambulance), I called an abulance to go two blocks rather than exacerbate a problem that walking seemed to be making worse, and then when I got to the hospital I sat for three hours while they twiddled their thumbs and tested things I'd been able to rule out before heading for the hospital in the first place.

So at this point it didn't seem like a stroke anymore, and even if it had been, it would be too late to do much about it. (Stroke, like heart attack, is a "fix it before cells die" thing. Waiting three hours is bad. Although cell death through resource exhaustion is different from autolysis, so no idea what the actual time limits are.) The hospital itself was being completely useless, so I got back in my clothes and walked out to the receptionist to ask how I check myself out.

This made them get a senior doctor to come talk to me. He agreed that scintilating scotomoa sounded like a reasonable interpretation of my symptoms. He said he'd like to run an MRI anyway, but agreed it didn't seem like a stroke. After three hours of stress I just wanted to go home, so he suggested I get a second opinion from a follow-up physician and see if they thought I should get an MRI. And he scheduled me a follow-up not with a neurologist, but with an opthamologist.

Sigh. IT IS NOT AN EYE PROBLEM. IT IS A BRAIN... right.

Adrienne picked me up from the hospital, and I slept on her spare bed for ten hours longer than I'd planned (we agreed I should be observed rather than alone, and I was _TIRED_), and the next day we went to visit my sister like I'd been planning to do before all this. I lost half a day of work, but as far as I'm concerned that was a marvelous outcome because I DIDN'T GO BLIND. Yay not going blind.

I am disappointed in the hospital. I have lots of random trivia passing for knowledge, just enough to be dangerous in all sorts of areas, but I am NOT a domain expert outside of some small niches and have great respect for domain experts. I'm often the guy you call to figure out where a computer problem lives so you can call in a specialist to actually fix it. Having to diagnose myself in a medical context is CREEPY and WRONG. The fact that I could was just lucky. The hospital being so woefully understaffed that if something had been going seriously wrong I'd have been screwed was not reassuring.


April 5, 2013

That was a far more interesting evening than I expected.

I had a thing called a "scintilating scotoma", which is a visual glitch that can precede a migrane, and makes you think you're going blind because half your visual field looks like television static viewed through plastic wrap. But it's not a stroke, just blood vessels dilating wrong, and thus transient. Thank goodness. (I had two of 'em, each lasting about half an hour, with an hour gap in between.)

It took about 3 hours to diagnose this. (Hospitals around here are _really_ understaffed.) Never had a migrane before, I'm guessing it was a reaction to the dental anesthetic but honestly it could be anything.


April 2, 2013

Got an Aboriginal Linux release out.

It was qemu. 1.2.0 built all the targets reliably. Sigh. Had to go back _two_ release versions to make the funky intermittent bug go away.


March 31, 2013

Sigh. Conflicted. On the one hand, I love having contributors to toybox. I am _grateful_ that people are interested enough in the project to contribute to it. On the other hand, my past week's hobby programming time (what little time and energy I have left over after work and life, plus the demands of Aboriginal Linux, kernel documentation, keeping vaguely informed about projects like qemu, musl, linux from scratch...) went to cleaning up the uuencode/uudecode submissions from Erich Plondke.

Erich is a good coder. (I met him working at Qualcomm, he's the lead architect of the Hexagon chip.) He's the kind of developer any project is lucky to attract even passing attention from. But I still took the submitted uuencode from 116 lines (2743 bytes) in 7 functions to 67 lines (1440 bytes) in 1 function, and uudecode from 175 lines (4534 bytes) in 9 functions to 107 lines (2300 bytes) in 1 function. I know I'm _not_ a better coder than him, so the obvious answer is that he simply didn't care about that sort of thing.

Of course the question is "should I care". The submitted uuencode and uudecode worked fine, and I've saved what, a single digit number of kilobytes in the generated binary? (I haven't checked.) I _think_ my version is easier to audit simply because there's less of it, but the previous one wasn't hard to. The obvious question is "was spending a week on that worth it?"

But the reason I'm doing toybox is to do a _better_ job than what's there. I want to produce the best implementation of each of these commands that I can. So how much of my definition of "best" is an illusion?

I just got an ifconfig submission, originally touching something like eight files (adding half of them to lib, two under toys, and touching some top-level headers). I spent the time to glue it together into one big file and threw it in pending, but I don't even know where to _start_ cleaning it up. Just like I haven't even started cleaning up xzcat. (It's not that I can't, it's that it'll take a couple days just to _triage_ this properly.)

But... presumably it works as-is? I could just... leave it? Cleaning up two small, simple, already fairly clean commands just took a week. Not a week of actual programming time, but a week of the time I had to work on it. I haven't gotten back to "mount" (which I've been meaning to find a week to knock out since last june), because every time I sit down there are other things to do. I have two contributed syslog implementations and that's an _easy_ one once I figure out the design issue (nothing specifies the log levels, are they actually stable on Linux or can I harvest 'em from the header during configure or what? I got the signal names dealt with...) The find submission is an easy cleanup. Somewhere I have links to clean gzip/gunzip implementations I should integrate. The I've got to implement bc to clean up after Peter Anvin's ongoing quest to complicate the linux kernel build. I was working on test, _and_ got a submitted test that's probably only 3 times the size of the one I'd write myself...

Sigh. I'd like to delegate all this to somebody smarter than me. I know they're out there. I'd love for somebody else who's a better coder than I am to submit things that I honestly can't figure out how to improve because they're THAT GOOD. I don't want to discourage contributors. I don't want to be a bottleneck in development. I just spent a week taking code that worked fine and making changes to it most people will never even notice.

But I notice. After I left busybox, Denys dismissed this sort of thing as a difference in coding style, and maybe that's all it is. I thought I was talking about the "ifdef considered harmful" paper and the linux-kernel coding style guide about avoiding ifdefs in the C code, and that doing #if ENABLE defeats the purpose of the ENABLE macros I introduced. I guess the main difference between him and me is I thought that was important, or at least interesting. But what if all I'm doing is making the code aesthetically _pretty_ to exactly one person and nobody else?

But I can't _not_ do it and still have any sense of direction for the project. I guess if I'm wasting my time, it's mine to waste. The FSF zealots made sure I wouldn't get paid for any of this, and thus be able to afford to spend more time on it instead of working around a day job doing unrelated stuff. If the result's an idiosyncratic art project: oh well.


March 28, 2013

One of my perl removal patches replaced the kernel's mktimeconst.pl with a C implementation, using the same makefile infrastructure that generates CRC tables. The patch series has been doing that since 2010. Before that it replaced it with a shell script.

So of course once I finally started getting attention for the perl removal patches, H. Peter Anvin replaced his perl script with a "bc" implementation, because his life's goal is to insert extraneous dependencies into the kernel build and solving this problem with C just wouldn't _do_.

Nothing else uses bc. When I say "nothing" I mean the Linux From Scratch guys had to add "bc" to build the new kernel, because the rest of the LFS packages never once used bc for anything. Busybox is 15 years old and nobody's ever SUGGESTED it implement bc, that I'm aware of.

But technically it's posix. A turing complete math language that supports arbitrary precision fractional exponentiation is the _perfect_ thing to do a couple 64 bit divisions with. I mean it's such an obvious choice.

So now I'm trying to figure out if I should keep patching Peter Anvin's insane overcomplexity out, or implement bc in toybox. Hmmm...

Alas, doing a 64 bit math version of bc would solve the problem and probably nobody would ever notice the difference, but the _point_ of bc is to do arbitrary precision math. I know how to do it for the four basic math operations but implementing "raised to the power of .3176252" I need to look up how to do. (Wikipedia[citation needed] probably has an opinion. Might even be right, if I catch it at a lucky moment in the edit wars.)


March 27, 2013

Slowly cleaning up Erich Plondke's uuencode/uudecode submissions, and I hit a weird one. In uuencode, the "historical" algorithm says to basically chop the input into 6 bit chunks and add 0x20 to each one, so you get a character from 32 to 63. Except that ascii 32 is space, so the result is full of significant whitespace, which everything breaks.

In reality, everybody adds 0x40 to the space value to get 64, so the range is 33 to 64 and then you & the result with 0x3F during processing (which actually happens anyway internally as part of the mask and shift, so you get it for free). But... that's not what posix says.

I'm growing increasingly disappointed with posix. Some old farts on the Austin mailing list are going "the C locale can't possibly support unicode, that's blasphemy!" despite the C locale currently supporting unicode, they just hadn't noticed. They keep talking about "certified unix", like that means something in the 21st century (Irix! UnixWare! Ultrix!). And implying that Ken Thompson inventing something and Linus Torvalds calling it the only sane solution to the problem doesn't mean anything, because these old farts know better. Odd.


March 25, 2013

Ok, things that could be causing this weird intermittent aboriginal failure:

It's probably _not_ the binutils upgrade because that was a couple releases back (first in 1.2.1, and 1.2.2 is out since then). Admittedly I haven't been as diligent about redoing the full LFS build since my server died (dinky little netbook takes about a day per target to natively build Linux From Scratch under qemu), and this is an intermittent problem, but I still think I'd have noticed before now. Besides, it's mostly breaking during compiles and not links.

Roll back to qemu 1.4.0 release version and try armv5l (fails most reliably), and that reproduced the failure. Try going back to 1.3.0...

chmod 755 ../../lib/auto/POSIX/POSIX.so
cp POSIX.bs ../../lib/auto/POSIX/POSIX.bs
chmod 644 ../../lib/auto/POSIX/POSIX.bs
make[1]: *** [all] Segmentation fault
make[1]: Leaving directory `/home/perl/ext/POSIX'
Unsuccessful make(ext/POSIX): code=512 at make_ext.pl line 449.
make: *** [lib/auto/POSIX/POSIX.so] Error 25

So I'm guessing it's _not_ a qemu issue then? (Well, 1.3.0 was December, so maybe I wouldn't have noticed, let's try 1.2...)


March 24, 2013

The i586 and i686 LFS builds finished, but the i486 tar build failed with:

  GEN    wchar.h
  GEN    wctype.h
make  all-recursive
make[3]: Entering directory `/home/tar/gnu'
make[4]: Entering directory `/home/tar/gnu'
  CC     areadlink.o
distcc[19717] ERROR: compile areadlink.c on 10.0.2.2:9243/1 failed
distcc[19717] (dcc_build_somewhere) Warning: remote compilation of 'areadlink.c' failed, retrying locally
distcc[19717] Warning: failed to distribute areadlink.c to 10.0.2.2:9243/1, running locally instead
In file included from areadlink.c:32:
./stdlib.h:71: error: redefinition of 'struct random_data'
distcc[19717] ERROR: compile areadlink.c on localhost failed
make[4]: *** [areadlink.o] Error 1

It actually failed _twice_ (once via distcc and again locally), so it looks like the preprocessor consistently produced bad output? The i586 and i686 builds don't have the phrase "remote compilation" in their outputs, so it's not a transient problem. (Thank goodness; those suck to debug.)

Except... it _is_ a transient problem, in that I re-ran the i486 build and it completed, all the way through vim (last package). Same target, same software, different behavior.

Uninitialized variable? Weeeird.


March 23, 2013

In my toybox talk at CELF I mentioned that containers aren't quite ripe yet. The example I usually mention when it comes up to give an example of the tricky corner cases is how writing to drop_caches under /proc in a container shouldn't cause a systemwide latency spike, but that was actually an issue from 2010.

So far this month the container guys added a new feature that opened up a security hole where creating a container let you crack root on the host (due to an bad flag combination that let your container modify mount points in a shared filesystem, so bind mount /etc into the container where you're root and then bind mount /etc/passwd in there and su root on the host.) And they managed to create cross-linked directories in /proc which led to a fascinating post explaining why you can't hardlink directories (because lock ordering is based on ancestry, different path traversal to get to the same dentry means different lock order).

In other news, it looks like capability bits may finally be collapsing under their own weight, although sanity rather than doubling down is probably too much to hope for. (Security is hard, but bureaucracy is not the same as security.)


March 22, 2013

Sigh. Remember the weird intermittent failures I was getting in the native builds, not in any one package but randomly all over the place? And yet the package would build if I rebooted and tried it again, and a chroot build ran to completion three times in a row?

Bit early to be sure, but it really looks like it was a qemu bug.


March 21, 2013

On twitter somebody asked my opinion about an article titled Ramdisk Versus Ramfs - Memory usage issues, and my response was "That's a fairly extensive yet misguided analysis from someone who has no clue how the system works."

There's are four types of filesystems: block, pipe, ram, and synthetic. These are respectively filesystem backed by a block device, filesystems backed by a protocol written over a pipe to some other process, filesystems that store their contents in ram, and filesystems that make up their contents at runtime.

A ramdisk is a block device that stores its contents in a fixed-size chunk of memory. Creating a ramdisk and mounting a ramdisk are two different things. Like all block devices, in order to be mounted a ramdisk has to be formatted with a filesystem and interpreted by a filesystem driver, which reads from it by copying data into the disk cache and writes to it by copying disk cache data back to the block device.

What ramfs does is it mounts the disk cache itself as a filesystem, storing its directory entries in the dentry cache and the file contents in the page cache. The disk cache memory is the _only_ copy of the data, which stays pinned in RAM with no place to go. (There's no "backing store" to flush things to when the system tries to free up memory, so it never expires. The ramfs derivative tmpfs is an extension that can use the swap partition if you've got one, like any other swappable memory.) So when you write files into ramfs it allocates more memory, and when you delete them it frees that memory, and this is mostly just a clever re-use of existing VFS infrastructure that every other filesystem needs anyway so there's very little overhead.

This ramdisk approach is less efficient because the ram block device uses a fixed-size chunk of memory that doesn't grow or shrink based on usage, plus wasted space due to formatting, then it needs two copies of anything you actually use (the copy in the block device and the copy in the disk cache), and to top it all off the filesystem driver you mount it with is a glue layer to parse the format and copy the data into and out of the page cache.

Once upon a time ramdisks were the only option, but now they're essentially obsolete. (If you want to make a filesystem image, use the loopback driver which treats a normal file as a block device.)

So that was my objection to the article: it was written without understanding that not all filesystems are block-device backed. There's no block device behind a ramfs any more than there's a block device behind /proc or /sys. (Mounting an ext2 partition on a hard drive requires two drivers: a block device driver such as SATA or USB to provide the block device, and a filesystem driver such as ext2 or ext4 to parse the format. When you mount sysfs the sysfs driver makes up the contents of the filesystem as it goes along, and the "device" field of the mount syscall is ignored.)


March 20, 2013

My talk from CELF/ELC went up. I'm fairly proud of this one. It's about how the smartphone is replacing the personal computer (the way the PC replaced minicomputers and mainframes before it), what that means, and what I'm trying to do about it. (The outline's here.)


March 19, 2013

Sore throat, runny nose. Yeah, I had multiple children crawling on me for 3 days.

Now I'm having weird problems where the native build is breaking, but not in the same _place_. I run it twice and get two different breaks. I _think_ what's happening is suspending my netbook while a build is happening confuses qemu, or possibly distcc. Unfortunately, my netbook takes more than 12 hours to build LFS under qemu, and it's hard to leave it unperturbed that long when it's my primary machine and I have to suspend it to move it. I suppose I could disable distcc but then it would take even longer to build. (Well, not the perl build. Which is probably about half of it, due to the complete lack of acceleration running native perl code to produce more perl code has.)

I upgraded toybox, busybox, and the kernel. There hasn't been a uClibc release this year and the toolchain's same as last release. So what's going on? Hmmm...


March 18, 2013

Snowed in at my sister's, but three niecephews are in school and the one who has a snowday is asleep on the couch.

Hardwired the darn irq to 59. There's a point where working out what the math is _supposed_ to be and why it used to work out one way but doesn't now is just too many compile and test cycles for this netbook.

Two different people submitted uuencode implementations at the same time. An embarassment of riches. The pending directory is piling up a bit, I need to do all that review and polishing. My current head-scratcher is "who", the last "default n" thing outside the pending directory, which works fine for what it does but supports no options at present and posix says it should do "-abdHlmpqrstTu". Do I sit down and implement all of that before changing default y? Do I move it to "pending"? Do I document a variance from posix that we're not in the minicomputer world anymore so having multiple people logged into the same container is now fairly rare? (That's about the same logic by which I didn't bother with posix "talk", "mesg", and "write".)

And really, looking at the who spec, what's this "who am i" thing? There's a whoami command (already in) but when I do "am i" as arguments to Ubuntu's who I get no output. The "who -r" option prints the runlevel of init, which makes no sense for any init except the system V one. (Note: posix DOES NOT SPECIFY INIT.) "who -t" shows the last change to the system time clock, something Linux doesn't track (and on ubuntu shows nothing). "who -l" says to show lines on which the system is waiting for someone to log in, I.E. the serial terminals (modem pool) attached to this minicomputer. That's a no. "who -d" shows "processes that have expired and not been respawned by the init system process" which again requires a bit of knowledge of things like upstart. (Apparently my system has one, it's "pts/33 2013-03-07 20:56 13206 id=s/33 term=0 exit=0" and I have no CLUE what that means. I have no pid 13206 at present, not even a zombie.)

Ok, so who -lmpt don't produce any output on ubuntu, -bHq seem vaguely useful, -s is the default, -a is a chord of other options, -dr require knowledge of init, and -Tu produce the same info (both about tty7 which is the x11 process, and we have an "uptime" command).

Out of curiosity, I checked busybox who. The only option it has is "-a" which shows all the obsolete ctrl-alt-f1 style /dev/tty# text consoles, and pts33. (I have a dozen terminal windows open in six desktops each with buckets of tabs, there are WAAAAAAAY more ptys in use than that. Ok, what IS it with pts/33? Groveling around in /proc/[0-9]*/fd to see what points to that, it's the shell running a vi instance editing todo.txt. I note that I'm editing THIS file in vi and it's not showing up. What the...?)

Yeah, switching it to "default y" with a comment.

And with school letting out early (cancelled), the horde returns...


March 16, 2013

Arm interrupt routing: it's even more broken than that.

Let's jump back to the 3.6 kernel release, where I was reverting all the irq routing stuff back to what it looked like in 3.3 or so, and both scsi and the ethernet controller worked. In that context, we got the following boot messages:

PCI: enabling device 0000:00:0d.0 (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0d.0 irq 27
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
...
8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
PCI: enabling device 0000:00:0c.0 (0100 -> 0103)
8139cp 0000:00:0c.0 eth0: RTL-8139C+ at 0xc8874400, 52:54:00:12:34:56, IRQ 27

So both the scsi and ethernet were sharing IRQ 27.

After they moved the IRQ controller start from 0 to 32 (for not obvious reason), this means the corresponding IRQ is now 27+32=59. And this is the one that works.

Here are the new boot messages after the fix I checked in yesterday:

PCI: enabling device 0000:00:0d.0 (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0d.0 irq 59
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
...
8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
PCI: enabling device 0000:00:0c.0 (0100 -> 0103)
8139cp 0000:00:0c.0 eth0: RTL-8139C+ at 0xd0874400, 52:54:00:12:34:56, IRQ 62

Spot anything? Like the fact that 62 != 59? And thus, when I run the nativel linux from scratch build and it tries to distribute compilation over distcc, we get:

=== zlib(2 of 48)
Checking for gcc...
irq 59: nobody cared (try booting with the "irqpoll" option)
Backtrace: 
[<c00113e8>] (dump_backtrace+0x0/0x110) from [<c001152c>] (dump_stack+0x18/0x1c)
 r6:00000000 r5:c02eabdc r4:c02eabdc
[<c0011514>] (dump_stack+0x0/0x1c) from [<c004d384>] (__report_bad_irq+0x28/0xb0)
... (mondo useless stack dump that just says the IRQ came from hardware)
Disabling IRQ #59
... (more stack dump)
8139cp 0000:00:0c.0 eth0: Transmit timeout, status  d   2b    5 80ff
8139cp 0000:00:0c.0 eth0: Transmit timeout, status  d   2b    5 80ff
8139cp 0000:00:0c.0 eth0: Transmit timeout, status  d   2b    5 80ff

Oddly, QEMU's scsi device still seems to be usable so at a guess they didn't actually disable IRQ 59? Dunno...

So let's annotate and see:

slot=12 pin=1 irq=62
slot=13 pin=1 irq=59

March 15, 2013

The arm breakage: arch/arm/mach-versatile/pci.c function versatile_map_irq() is failing to add 32 after the irq start moved. That's the same line they broke with the "swizzle" stuff that I've been reverting for several releases now. I checked the current kernel to see what the un-reverted line looks like, and... commit e3e92a7be693 fiddled with it again.

commit e3e92a7be6936dff1de80e66b0b683d54e9e02d8
Author: Linus Walleij <linus.walleij@linaro.org>
Date:   Mon Jan 28 21:58:22 2013 +0100

    ARM: 7635/1: versatile: fix the PCI IRQ regression
    
    The PCI IRQs were regressing due to two things:
    
    - The PCI glue layer was using an hard-coded IRQ 27 offset.
      This caused the immediate regression.
    
    - The SIC IRQ mask was inverted (i.e. a bit was indeed set to
      one for each valid IRQ on the SIC, but accidentally inverted
      in the init call). This has been around forever, but we have
      been saved by some other forgiving code that would reserve
      IRQ descriptors in this range, as the versatile is
      non-sparse.
    
    When the IRQs were bumped up 32 steps so as to avoid using IRQ
    zero and avoid touching the 16 legacy IRQs, things broke.

Yay. Where were you when the darn swizzle breakage happened 6 months ago?

Except, and this is hilarious, IT'S STILL BROKEN. Because now they're adding 64 instead of 32, so it's trying to allocate IRQ 92 which is once again way out of scope.

The SCSI controller's IRQ used to be "hardwired 27". That's what QEMU implemented, that's what worked for years. Then the kernel guys added this "swizzle" nonsense which did the math wrong and tried to allocate irq 28, which wasn't there. Now they've added 32 to the IRQ controller start and 64 to the mapping, so they're trying to allocate irq 92, which isn't even close. (Correct answer: 32+27+((slot+pin)&3) = 59. Except it's probably more like (32*(slot+1))+27+(pin&3) if you want to support both interrupt controllers but I DON'T CARE at this point because qemu doesn't emulate that and obviously nobody's got real hardware to test this on anymore.

Note that the "swizzle" breakage (commit 1bc39ac5dab2) was really stupid. The heart of it was adding a new 'swizzle' function, and changing map_irq() like so:

-       irq = 27 + ((slot + pin - 1) & 3);
+       irq = 27 + ((slot - 24 + pin - 1) & 3);

Stop and ponder that new line for a while. They're subtracting 24, and then anding the result with 3. So the change to the line was actually a NOP, and it was the addition of an unnecessary "swizzle" function (whatever that is) adjusting the result that broke it.

They followed up the swizzle breakage with moving the IRQ controller start but not the irq_map(), so it was still requesting 28 but now that was outside the IRQ controller range. And then to fix it, made the irq_map() add 64 (not 32). That's three consecutive failures of simple arithmetic, which they never checked the result of even though qemu supports it and worked for years before they touched it. Their most recent patch does not even match its own description of the problem.

Welcome to embedded Linux development.


March 14, 2013

Years ago (late 90's? Early 2000's?) I read an article about a researcher pondering the general lack of nutrition in eucalyptus leaves, who worked out what a balanced diet for a creature with a Koala's general physical characteristics would actually _be_, and force fed a koala that diet (they won't voluntarily _not_ eat eucalyptus) for a few weeks to see what would happen.

The result was basically something out of a horror movie: a hyperactive, aggressive, extremely strong, territorial, vicious predator with nasty claws and a tendency to leap at anything that moved and several things that didn't. The researchers' conclusion was that koalas are a race of junkes, and that long ago a species of animals as vicious as anything else in australia got stoned eating eucalyptus leaves and was willing to live perpetually malnourished to stay that way, and after a lot of additionial evolution to cope with being permanently stoned, if you remove the drugs they go psychotic.

I keep being reminded of this by @speedysays on twitter (the author avatar of A Girl and Her Fed is one of her characters, a vicious talking koala), but I can't find that article anymore. It was in the online version of some australian newspaper, I bookmarked it in the browser I had at the time (might have been back when I worked at IBM, which would be 1998?), but that's long gone. Google can't find it, I'm not quite sure what to search for. It's _probably_ in archive.org somewhere, but unless I had the exact link...

Another article I read (more recently, I think, mid to late 2000's? It was after I bought the condo; I remember WHERE I was when I read it) was a scientific article about the possible origin of life on earth being undersea zeolite deposits near volcanic vents. Zeolite is a mineral that naturally has cavities about the size and shape of living cells, and all living cells have a "membrane charge" that powers a lot of their internal processes (like a battery), and one of the functions of ATP and such is to replenish this charge. Zeolite deposits near volcanic vents would naturally attract all sorts of chemicals out of the water (they're natural filters), and with the right chemicals or thermal gradients in the water they would accumulate ion differentials. The point was you could get something pretty darn close to the inside of prokaryotic cells just from what's lying around in such an environment, which could send virus or spore-like bits of itself to adjacent zeolite pockets, and become quite complex before having to evolve a membrane to create its own shell to take with it into the larger world.

I was reminded of this by another random article about how the largest ecosystem on earth is bacteria miles under the ocean floor exploiting chemical reactions between rocks and ocean water (this exhausts the chemicals in the rocks but plate tectonics provides a constant fresh supply as the old stuff gets subsumed and seafloor spreading replaces it).

The obvious conclusion is "oh, the chemicals in the charged zeolite pockets didn't need to evolve a membrane to spread, they just had to learn to carve out new pockets in other types of rock, membranes probably came much later". Given what we know from the fossil record, life arose on the planet 2 billion years before we got anything multicellular, and the nature of fossils is we don't have things like cell membranes recorded because they don't really fossilize. If the same set of chemicals "lived in caves", how would we tell? Especially if it wasn't something we were previously looking for...

I can't find this article either, but this one I can at least dig towards, via wikipedia. It seems I'm remembering some of the original coverage of Mike Russel's work, except he was using the word "Olivine" and I remember the word "Zeolite". (Apparently very similar minerals...) Possibly the article I read was something like this that was up free for a while and has gone behind a paywall now? I can find other interviews with the guy, but not the article I read. (If it was him.)

Still, I can at least find enough to prove I wasn't imagining it, but only after digging enough to find better search terms. I can't find any hint of the koala experiment, only notes that the koala population is crashing, and you can't get permits to do much of anything with them these days, you have to release them into the wild if they're healthy enough to possibly survive there.

It'll be a moot point once they're extinct, and like bananas, I'll miss them once they're gone. (Then again I apparently already missed the really good bananas. Somebody really ought to tell the mythbusters that their "can you slip on a banana peel" episode missed the _point_ and that the bananas laurel and hardy and old bugs bunny cartoons were a different species that became commercially extinct in the 1950's).

I remember reading a great article on the history of bananas. Probably excerpts from either "Banana: the fate of the fruit that changed the world" or "Bananas: How the United Fruit Company Shaped the World" (yes, there are multiple books on the history of the banana), about how railroads were driven through central america to facilitate mining, and how the mining wasn't profitable but the bananas planted along the tracks to feed the workers _were_, and how during the great depression bulk banana imports provided cheap meals in their own wrappers which people littered the sidewalks with, since banana peels are biodegradable in small quantities but became a serious trash problem (hundreds on any given street during their initial popularity, before litter laws).

Of course I can't find that article anymore either. Too bad, it was fascinating.


March 13, 2013

Putting together a toybox 0.4.4 release. There's 8 gazillion half-finished things but I haven't had time to finish them. Aboriginal's overdue for a release of its own using the 3.8 kernel, and needs the cp bugfixes to work. So, sequencing...

Otherwise, head down at work trying to get a project done, eating all my time and energy right now.


March 9, 2013

The uClibc O_NOFOLLOW thing was why cp didn't work on powerpc. The qemu arm board is breakage is fixable by telling the kernel to add 32 to the IRQ.

The device is reporting it's on IRQ 27, which was correct when the IRQ controller's range started at 0. Now that the range starts at 32, it has to be adjusted. I _think_ this should happen on the kernel side, since moving the starting IRQ was a mapping thing the kernel did, and is not a hardware thing? But possibly the IRQ controller should report adjusted numbers when queried? I'd much rather fix the kernel than qemu here, because I ship the kernel and don't ship qemu.

So let's figure out where that 27 came from. In the failure case the error message was "request irq 27 failure", and grepping for "request irq" pulled up a printk in the function sym_attach() in drivers/scsi/sym53c8xx_2/sym_glue.c pulling it out of pdev->irq. Doing an "if (pdev->irq == 27) pdev->irq += 32;" at the start of the function was the quick and dirty fix that got me to a shell prompt, so that's how I know that I guessed right about the problem. (I knew about shifting the IRQ space by 32 because that's what the patch I bisected it to earlier did; this is requesting an irq outside the controller's IRQ range, which immediately fails. So the value being an offset into the range rather than an absolute value is a reasonable guess about what the current right value is.)

Where does this get called from? Let's add a dump_stack() at the start of the function and run it again, and that says sym2_probe() (actually sym_attach() but it got inlined). That's sticking pdev into a structure and then fishing it back out again, but it's 27 at the start of that function. next is local_pci_probe() which lives in drivers/pci/pci-driver.c and is a fairly uninteresting wrapper. That's called from pci_device_probe(), and there we have an interesting digression: a struct device * gets converted to a struct pci_device * by calling to_pci_dev(). Where does that function live? Oh, it's just a #define in include/linux/pci.h doing a container_of() to get an existing enclusing struct that this is a member within.

Ok, so back to the stack trace: that call was from driver_probe_device() which came from __driver_attach which came from bus_for_each_dev() and since the problem we have here is the IRQ controller for the bus got shifted, let's see what bus_for_each_dev has...

And out of time for the moment.


March 7, 2013

Dear uClibc developers: Requiring Linux people to #define GNU_DAMMIT in order to get posix-2008 constants is not cool. I refer to O_NOFOLLOW and friends, which vary by architecture, and which you have bits/fcntl.h variants of for each one ala (from powerpc):

#ifdef __USE_GNU
# define O_DIRECT       0400000 /* Direct disk access.  */
# define O_DIRECTORY     040000 /* Must be a directory.  */
# define O_NOFOLLOW     0100000 /* Do not follow links.  */
# define O_NOATIME      01000000 /* Do not set atime.  */
# define O_CLOEXEC      02000000 /* Set close_on_exec.  */
#endif

This is from the last release version, dated May 2012. That's years after Posix-2008, so the feature test macro is a bad idea _and_ a standards violation. Plus it was never a "gnu extension", it was a Linux kernel addition. The Linux kernel is not and never was part of the GNU project, that's why Linux succeded where Gnu failed.


March 6, 2013

Thanks to Twitter's continuing policy of destroying anything its users like (currently tweetdeck), I have a tumblr now. No idea what to do with it, but there it is.


March 4, 2013

Work could use the old inittmpfs todo item I've had for years (initramfs as tmpfs instead of ramfs). Unfortunately, the last time I brought it up, Peter Anvin cast fear uncertainty and doubt upon the idea without actually explaining what was wrong with it.

I should submit a patch on "don't ask questions, post errors" principles and see if I can make them explain why it shouldn't go in. (If it's a specific issue, it should be fixable.)

Let's see, working back from the cpio extract in init hit some magic initcall section thing, so grep -r linux '"rootfs"' (I.E. look for rootfs _in_quotes_, since that has to show up in /proc/mounts), and wow a lot of stuff cares about that, but the important one is fs/ramfs/inode.c has rootfs_mount() which calls mount_nodev() with ramfs_fill_super. Ok, the same search for '"tmpfs"' pulls up mm/shmem.c which has shmem_fill_super. In theory I can swap the two (with appropriate #ifdefs) and see what happens...


March 3, 2013

Weekend with the niecephews and Adrienne. Exhausting. Think I caught something.


February 28, 2013

Andrew Morton (the #2 guy in Linux's lack-of-hierarchy) asked why I'm bothering with perl removal. So I told him. There's a certain amount of tl;dr in that.

I'm also collecting other people's public comments about this because of the five people who responded my submission, only one is actually an existing embedded developer who's been poking me about this on freenode and twitter and such. Two of the respondents (Andrew and Sam Ravnborg) are existing kernel guys who piped up this time to ask "why are you doing this" (although Sam previously acked one of the patches). The other two supportive but have no experience with the issue.

Of course I get support from lots of nice people: Alan Cox, Jon Masters, and David Anders and it's _great_, but this doesn't translate to acks on the patch. (And yes, that Alan Cox post is over 4 years old. And the patches still aren't in.)

Sigh, all together: sing.

UPDATE: YAY! Andrew added two of the patches to his tree, and Michal Marek the other!

I fall over now. (Dentist appointment early in the morning anyway.)


February 27, 2013

Ok, the perl removal patches are reposted for 3.9-rc1, which means you need to "git pull" in order to apply them. For once I didn't post the 3.8 versions but the current, applies to git at the time I posted, just retested, cc'd everybody and their dog, my darn python script didn't eat the "from" line so my name showed up right, if this doesn't get applied I dunno why version of the patches.

If anybody who cares about this topic would like to test/review/ack them, now would be a good time.


February 26, 2013

A year ago, a certain FSF zealot made a big enough stink to ensure that I couldn't get any sponsorships to work on my open source hobby project full-time. I've been continuing to work on it anyway (as I have for years, and on busybox before that), but not nearly at the rate I could go if I didn't have to work it in (along with my other open source work) around a day job's demands on my time and energy.

So even though I'm not big into schadenfreude, I can't help but follow this this thread with a certain... satisfaction. It's the same zealot, having his head handed to him _on_ precisely the same zealotry, repeatedly, by Linus Torvalds himself.

I should add Linus's microemacs to toybox. On general principles. (And hey, I gotta do a vi anyway because posix says so...)


February 25, 2013

Hang on, busybox maintainer Denys Vlasenko just said that the upstream xz compression code comes from a repository which has the following license statement:

Licensing of XZ Embedded
========================

    All the files in this package have been written by Lasse Collin
    and/or Igor Pavlov. All these files have been put into the
    public domain. You can do whatever you want with these files.

    As usual, this software is provided "as is", without any warranty.

So the people saying that "these files have been put into the public domain" is not a sufficient license statement are happy when Busybox incorporates that code and slaps GPLv2 on it?

How does that make any sense?


February 23, 2013

Travel day to go home. Sitting at the airport sad my netbook battery died overnight and I lost all my open windows. (The acer bios has this "feature" where if the power goes to 5% the suspended system wakes up, presumably so windows can suspend to disk. Ubuntu 12.04 does not suspend to disk in this instance, it just sits there rapidly draining the drags of the battery, even with the lid closed. There is no way to tell the bios NOT to do this.)

Right, one of the people who attended my talk pointed out that "allyesconfig" toybox doesn't build. Oops. I've checked in some stubs like "sed" that I've never gotten around to finishing, and they've bit-rotted a bit it seems.

Let's see, mke2fs, mount, sed, stat, and umount all barf. I haven't checked mount and umount in yet, so I need to fix the other three so they at least compile.


February 22, 2013

Finally finished listening to Paul Krugman's End This Depression Now author talk, which was full of brilliant insights (7 minutes 40 seconds: treating the economic crisis as a morality play is wrong because the people suffering are not the people who sinned).

One of the things he said is that if you look at how the population of Canada is clustered along its southern border, "Canada is closer to the US than to itself". What this made me realize is that Canada is what the US would have looked like without slavery.

The book 1493 details (among other things) how the Jamestown colony introduced Malaria to the new world (which persisted until DDT finally eradicated it in the 1940's). Within a century it had spread up and down the east coast, but the mosquitoes that spread it lived much longer in the south than in the cold northern winters. Malaria made the southern half of the country almost uninhabitable, to the point where the practice of "seasoning" arose, importing the european poor to work as servants in america meant giving them nothing to do for the first year because half of them wouldn't survive the malaria they'd inevitably contract.

A 50% mortality rate within the first year made "getting good help" from europe very difficult (and extremely expensive), so plantation owners looked around for people with natural resistance to malaria. They found them where the disease originated; the native people of West Africa have evolved significant resistance to malaria. (In fact the gene that gives you Sickle Cell Anemia if you have two copies gives you near-immunity to malaria if you have one copy. That's why it's persisted in the population: the 1/4 of the population with two sickle-cell genes dies of sickle cell, the 1/4 with two "good" genes dies of malaria, the half with one copy of each gene survive. The massive death is mostly written down as "infant mortality", so they have lots of babies to make up for it. If you were wondering why religious people are so against teaching evolution: suppressing it is easier than trying to explain how a just and loving god would allow it.)

Of course west Africans had no cultural affinity with europeans, so getting them to voluntarily move to north america in large numbers wasn't feasible, so we sent ships over there to kidnap and enslave them. First in carribean island plantations (even harder-hit by malaria), and then on the mainland.

Since 1776 our politics has been dominated by a south full of ignorant bigots. Southern plantation owners were waited on hand and foot, and the rest of the south's population never needed to invent labor saving devices because slaves did the labor (agricultural and domestic). Even those who didn't own slaves grew up conditioned to treat other human beings as not just inherently inferior but as property.

Stop and think about the common practice of fathering children with female slaves, and then owning (even selling) your own children. This was common practice, considered "improving the breeding stock". (Founding father Thomas Jefferson did this, and refused to free even the subset of slaves he'd fathered, not even in his will after his death.) This requires epic hypocrasy and dissociation, so of course they turned to religion to provide it.

White southerners bent their religion around justifying slavery (bending the cain and abel story in the garden of eden to somehow say that whites _weren't_ descended from cain, or at least not as much as blacks), leading to the subtext of Southern Baptist and Evangelical denominations: "we are superior, the chosen people, justified in dominating all others because they aren't really human". Because their superiority was inherent there was no need for them to learn or improve themselves, and the idea of anyone _else_ doing so threatened their superiority which was against god's will. Ignorance became a virtue, all you needed to know was in the bible and if you start questioning whether slaves are people (or anything else) it needs to be whipped out of you. The industrial revolution passed them by, because the chosen people selling King Cotton didn't want any part of the learning required for it, because learning means questioning and we can't have that.

This _poisoned_ the culture of the south, and to this day they "cling to guns and religion" and insist "the south shall rise again" (because that worked out so well last time). An entire area of the country is culturally scarred.

The civil war didn't start this (bloody kansas) and it didn't end it either (jim crow). Since the first Republican president issued the emancipation proclaimation, the bigot vote went to the democrats for the next century (the original "solid south"). This polarity reversed when a Democratic president (Lyndon B. Johnson, Kennedy's vice president who became president when JFK was assassinated) overcame this legacy and in the wake of Martin Luther King Jr.'s assasination signed the Civil Rights Act. This alienated the racist vote, and Richard Nixon got elected with his "southern strategy" of coded appeals to the south's ubiquitous racism. The "Rockefeller vs Goldwater" divide was about explicitly rebasing the party on southern racism, and although Goldwater lost his election he won the battle to reshape his party's identity. This turned the "solid south" into a GOP stronghold that elected Regan, his VP Bush, and Bush's son. But more to the point, it elected all their congressmen and senators. The GOP became the party of rural ignorance, with subtle but pervasive racism as a core plank of the Republican party platform. A few plutocrats at the top steered a vast horde of ignorant racists, whose racism was the central unifying idea allowing them to be gathered and led. (The party would add a few more coalitions of people whose hot button issues allowed them to be led around by the nose: gun nuts, anti-abortionists, libertarians, and so on. People with a red flag that stopped all thought on any other topic. But by far racists were the largest group.)

Eventually the Democrats called them on it by running a black president, and when he got elected the GOP melted down into gibbering denial. They became "the party of NO" filibustering every bill (including ones they sponsored which he then supported), spinning insane conspiracy theories about how he couldn't possibly have actually gotten elected and must somehow not be real (birtherism), dedicating their entire agenda to nothing but preventing his reelection (triggering and then forcing the country to remain in an economic depression to make him look like a failure), willing to let the country default on its debt payments rather than vote for anything he'd sign...

This was far beyond rational hatred, after 40 years the racism of the party's base had seeped into the party's representatives. The ignorant racist bigots the party appealed to elected ignorant racist bigots to represent them, who went full "protecting our precious bodily fluids" nuts when confronted with a black president.

The plutocrats have been "riding a tiger". The goals of the billionaires who fund party activities are to reduce taxes on the wealthy (from 91% in 1963 to 28% under Regan, 15% on "capital gains" income from owning things instead of working, and many large corporations can avoid taxes entirely), reduce government regulation that prevents for-profit corporations from exploiting "caveat emptor" to the fullest, and to increase the amount of influence money buys in politics (see "Citizens United"). But to achieve these goals, they need to rile up their audience and feed them the occasional win in areas the plutocrats don't care about.

The point of all this is canada was too far north for Malaria to survive so had no need of a west african labor force resistant to the disease, and it had separate political representation where southern slaveowners didn't get to shape national politics.

So all this "conservative" mess with ultra-rich bastards cornering the market and treating the poor as inhuman because their culture says they should have slaves, and their religion says that other people deserve to be in bondange... Canda shows us what the USA itself might have looked like without the stain of slavery. Between federal government and population mobility, even the northern half of the USA is tainted with signifcant amounts of the south's legacy. Canada lives on the same continent, comes from the same "northern european dominator party mix"... they're what we might have been if we'd let the conferderacy become another Mexico and washed our hands of the slaveholders.

(Mexico is a different story, mostly settled by the spanish instead of the english. Spain has its own racist past evicting the muslim Moors and then turning the inquisition on themselves when they ran out of rich foriegners to kill and take their stuff.)

If you hear about republicans switching from Rockefeller to Goldwater, it's about the Goldwater's southern strategy explicitly rebasing the party on southern racism.


February 21, 2013

Ah. If we're up against the zero lower bound in short term interest rates, why aren't long term interest rates at zero too? Because we have about 2% annual inflation. So a zero short ter interest rate should translate to about a 2% long term bond yield, and if long term bond yields are below the rate of inflation (which they are), people are willing to lose money to hold these bonds, they're just losing _less_ than if they were holding cash.

Right, that makes sense. (Took me a while to work that out. Listening to a Paul Krugman interview on youtube on my phone remains educational...)

On the CELF front: talk went well. The room seemed full, although it's hard to tell with a light in your face and there was only one question from the audience. My brain is now completely fried, of course, hence blogging about other things.


February 20, 2013

Giving my talk at CELF tomorrow. (Whatever they're calling it this year.) I never got around to doing slides, but I have my outline up.

The talk description is actually a small part of what I'm covering. The interesting part isn't what I'm doing, it's _why_...


February 19, 2013

California is 2 timezones west of minnesota. My flight on southwest was a bit over 2 hours late. It was the latest one I could get so I didn't have to leave work earlier than necessary. The BART trains stop running at midnight. (Luckily, the airport has shuttlecraft.)

Got to bed at 1am here, which is 3am where I got up, and I got up at 7am. First scheduled event tomorrow is 9am. May be a bit zombie-ish in the morning.


February 14, 2013

Blah: todo list overflow. What am I working on again?


February 13, 2013

Doing a toybox FAQ, which is as much about general open source issues as it is about toybox.

The trigger was Dave Jones tweeting about an issue that made me want to link him to an old screed I wrote on the topic, except that since then Denys has dropped a large busybox-specific digression into the middle of it which makes it much less useful as a general-purpose answer to why open source developers ask you to upgrade when diagnosing a bug.

Let's see, the last version of the FAQ I checked in was... Sigh. Long before someone who shall remain nameless split the website out into a repository that's mentioned nowhere on the website itself (lovely)... Aha, back in the main busybox git:

git blame 95718b309169 -- docs/busybox.net/FAQ.html

And just to be sure, the last version before the stuff I wrote (not counting the occasional change to the html markup) was:

git blame ef614ecca61c -- docs/busybox.net/FAQ.html

Ok, stuff I wrote (and can thus re-use) identified! (Well, unless I start to care about this sort of thing, which I currently don't.)

But I do wince at anyone who can say "advance freedom" with a straight face ("Citizen, you have nothing to hide from the happiness patrol!") and still refer to "the GPL" as if there is such a thing now the Linux smb filesystem driver and Samba server can't share code even though they implement two ends of the same protocol. There is no "the" GPL anymore, there are multiple incompatible GPLs with the FSF and Linux developers on opposite sides. Have fun with your factional infighting, I'll be over here releasing BSD/MIT licensed stuff as close to the public domain as lawyers still allow to exist. (The universal receiver is gone, so I've switched to universal donor in my quest for a simple easily understood legal position.)

Speaking of lawyers allowing things to exist: DARN IT! BRADLEY! Look, the point of that ENTIRE RANT was that I hate what the FSF did to Mepis and the busybox developers SHOULD NEVER SUE SOMEBODY JUST BECUASE THEY USED UNMODIFIED VANILLA SOURCE AND DIDN'T BOTHER TO MIRROR THE TARBALL OF SOURCE THEY NEVER MODIFIED. I want them to let us know it's vanilla, not feel an obligation to mirror stale versions of stuff we've got hosted on osuosl plus archive.org has it plus you can fish it out of the git repository on every developer's machine!

He removed that. He's reserving the right to sue people into giving us our own binary-identical source tarball back, plus a giant payment (in the ballpark of the year of full-time minimum wage mentioned in last night's state of the union) he squeezes out of each one for legal fees for the privilege (and that's if they don't fight back).

I _object_ to this. I can't stop it, but I tried, and I'm very sorry it still happens. If you get sued, the above commit has the language that WAS up on the website until he removed it.

Sigh. Salvage what I can and move on...


February 12, 2013

Blah. The kernel broke QEMU's arm versatile board again. I _tested_ this, and it broke since then? Sigh...

The arm guys keep poking at the versatile board going "no, that can't be right" and making random changes. Except what qemu has emulated for the past few years is what the kernel was doing, not what random hardware nobody has anymore was doing, so they just break it. Over and over.

So the SCSI device isn't working. Bisect goes into a long range of commits that produces no output. Bisecting towards the end of that converges on commit f5565295892e, which moved the IRQ start to 32. The SCSI driver tries to grab interrupt 27, which is denied. (Sigh.) Moving the IRQ start down to 16 lets the driver bind, but then it times out awaiting a response from the device. Did the finally correct that "swizzle" thing I've been reverting for ages? No, comment that patch hunk out and it grabs irq 28 instead, and even with an IRQ start of 16 it still doesn't work.

Ok, bisect to where the dead zone _started_ and it converges on 07c9249f1fa which is "use irq_domain_add_simple()" which changes how device tree IRQ parsing is done. That change prevents the serial console from producing output. the commit before that works, and the boot messages say... Hmmm. Both say "sym0: <895a> rev 0x0 at pci 0000:00:0d.0 irq 27" but one works and one doesn't.

The one that _works_ says "sym0: unknown interrupt(s) ignored. ISTAT=0x5 DSTAT=0x80 SIST0x0" and then starts enumerating (virtual) hard drives. And then later says that the 8139cp device pci 0c.0 but on the same IRQ 27.

The failing one says "SCSI 0:0:0:0: ABORT operation started" and never recovers from that. That's _with_ the starting range adjusted to where irq 27 is an option and the driver binds.

Hmmm... It's getting the right interrupt number (at least with that "swizzle" nonsense reverted), but not the right... routing? Some sort of enable transaction? What's missing here at the hardware level. I may need to instrument qemu...

Hmmm, is it the _offset_ into the interrupt block that's important? So it needs to bind to 32+27? (Which is _not_ what the device tree is telling it, apparently...)


February 11, 2013

Minor logistical fiddliness with Aboriginal Linux: the build system uses release tarballs (which I mirror locally so the build can download them even if they disappear off the original site).

The perl removal patches change slightly ever few releases, usually due to something utterly trivial. (Right now the -next tree has somebody adding or removing a "restrict" keyword in the perl, I forget which. It makes zero difference in the replacement code, but the way patch works I have to state the file being removed exactly, so I have to patch in order to have no change after it's applied. Sometimes they add an extra field to the regexes. That sort of thing.)

So the patches match a tarball version, and the tarballs aren't available until the final release gets made. Meaning... if I check in the new patches without updating download.sh to point to one of the -rc tarballs, I break the build. But I don't want to clutter my mirror with release candidate tarballs. Locally I'm testing against git snapshot du jour, but I'm not going to check that _in_. And things break all the time, right up until the very end. (Right now I'm bisecting arm breakage between 3.7-rc5 and 3.7-rc7, somewhere i there the root device on the versatilepb went away. Might just be a config tweak but I need to track it down to see.)

Which means the perl removal patch updates tend to get posted once I can check them in, which is _after_ the new release gets posted. I.E. during the merge window. (Often a few days into it because cutting a release on this netbook takes days to build everything, and that's assuming it works right the first time.) The kernel guys prefer patches to get posted _before_ the merge window.

I doubt this is the only reason they've ignored them for years, but it makes it easier...


February 9, 2013

Oh no. I'm looking at autoconf output again.

After upgrading the kernel, busybox, and toybox, Linux From Scratch isn't building to completion because the sed build is hanging. Digging into it, some random script Ulrich Drpepper wrote in 1995 is called with one argument, and loops endlessly doing a "shift" and comparing the second argument... which is blank and there are no more arguments. So it spins forever, and the command line it's called with is hardwired not just in the makefile, but the makefile template automake uses to generate that makefile.

I.E. the only reason the sed build _doesn't_ hang is this script is never called normally. But for some reason it's getting called now, so I have to dig in and find out what changed. So I'm diffing it against the build on the host (which works fine, because Drpepper's script is never called), and even looking at just the differences it's still full of crap like:

checking mcheck.h usability... yes
checking mcheck.h presence... yes
checking for mcheck.h... yes

Three checks for the same header, all testing pretty much the same thing. (Should we include this header: yes/no. Incredibly, pointlessly verbose and it does it for a dozen or more header files.)

Then there's this test:

-checking for a thread-safe mkdir -p... /bin/mkdir -p
-checking for a thread-safe mkdir -p... /bin/mkdir -p
+checking for a thread-safe mkdir -p... build-aux/install-sh -c -d
+checking for a thread-safe mkdir -p... build-aux/install-sh -c -d

The minus lines are test on host, the plus lines are same test on target. I dunno why it's run twice, it happens again earlier too (so this test happens at least three times, but this is the same test run twice in a row). What _is_ the test? This:

case `"$as_dir/$ac_prog$ac_exec_ext" --version 2>&1` in #(
  'mkdir (GNU coreutils) '* | \
  'mkdir (coreutils) '* | \
  'mkdir (fileutils) '4.1*)
     ac_cv_path_mkdir=$as_dir/$ac_prog$ac_exec_ext
     break 3;;

Translation: run "mkdir --version" and check if the output identifies itself with one of three specific strings it recognizes. If it's not one of those three versions, don't use mkdir -p but instead use a shell script in the sed source to do mkdir -p. What does it mean by "thread safe"? I have no idea, the test doesn't specify. So no matter how busybox or toybox behaves, unless we implement a "--verbose" and pretend to be some other package, the sed build won't use us.

(The fix is, of course, to build in our own sed and let this package die. In fact this package's maintainer resigned recently because the FSF is crazy.)

But none of this is why the build is hanging. The build is trying to create stamp.vti, which version.texi depends on, which sed.info depends on... Ah. I never implemented -s in toybox cp. It accepts the flag, but "cp -sfR" is the same as "cp -fR" and that doesn't preserve date stamps and sed has decided that "configure" is newer than "sed.info" and thus it needs to rebuild a file that it can't rebuild.

I hate make. Ok, easy enough fix: implement cp -s in toybox.


February 8, 2013

I generally ignore my Android phone's "Upgrade! Now! Full-screen-popups until you yield to this!" blather because I'd done it exactly once: and the _only_ thing that did was break tethering. (Sprint installed an oh-no-you-don't patch, disguised as a standalone upgrade. First one in the queue. So it worked fine _until_ I upgraded.)

Since then my phone gradually bit-rotted (and probably got who knows what viruses since it had no security patches) until it lost the ability to mount as a USB stick to get photos and video off. It still _charges_, but wouldn't present the "turn on USB storage" dialog. So I gave in and upgraded in hopes it would fix this.

Note: upgrading android has never fixed the specific thing that was so broken I gave in and upgraded. And this is no exception: plugging in as USB still doesn't let me copy data off anymore. Maybe the USB connector got damaged, or the cable went bad: I dunno, it won't tell me.

But it did fix the "app store", so I could upgrade the apps. The netflix client stopped working months ago, so I told it to upgrade everything. Including tweetcaster, which was now unusable because the Java Swing "look and feel" plugin got upgraded out from under it and now there were two layers of un-dismissable menu bars always there vertically along the left side of the screen (!?!?) taking up about 2/3 of the window and there was so little room that it couldn't display one whole tweet (one letter at a time vertically down the right edge, with the occasional "and" or "of" collated) .

So I had to update tweetcaster too, and WOW the new tweetcaster sucks. Have the developers ever tried to use it?

Ok, dismissing the fact that every time I go to a new part of the thing I've been using for years it gives me a full-screen pop-up about what's changed which I dismiss without looking at because it's incredibly rude. It only does that once for every single page and mode in the entire program of which there are far more than there should be.

Half of what I use twitter for is opening links. Android has a built-in web browser, but the tweetcaster guys decided that calling an external app worked too well, so they did a built in sucky modal browser. There's no way to turn this off in "settings" (I looked _hard_). The browser does have an "open in browser" option in the menu which calls the external one, but if you do that tweetcaster will crash when you return and lose your place in the twittervstream. (Every time it boots it jumps to the most recent tweet, not the last one read. In fact it loses all that history, and you have to reload and hope twitter feels like letting you scroll that far back. This is because remembering 1024 or so 256-byte-ish data packets (140 chars plus metadata) would BLOW TWEETCASTER'S TINY LITTLE MIND. 256k of data? THAT'S INCONCEIVABLE!)

You can't pull up the browser menu before tweetcaster's built-in modal browser finishes loading, or it will crash. If you try to load a page with funky javascript (such as any links to the new york times) in the built-in modal browser, it will crash. You can't ever press the "back" button from tweetcaster (instead of the home button) or it will exit and lose all context (I.E. crash). To exit the built-in modal browser, you must press the back button (EXACTLY ONCE).

The "tap on a tweet" menu used to have maybe 3 entries. It's now 9 plus tweet-specific entries, so big you usually have to scroll the screen even though they made the entries narrower and harder to hit. None of these entries are "copy link to clipboard" (or "open in the real browser"), you can only copy the entire tweet to the clipboard, go into the browser, and edit it there. You can also let the tweet load in the built-in browser (which is modal so tweetcaster can do NOTHING until you finish with that page and go back, but only hit back _once_ or it'll lose your tweets), and from the built-in modal browser you can copy just the link. (Except not the expanded link, the t.co URL shortener version that gives you no info about what it points to.)

The Android built-in browser is less useful too. There use to be a "+" button to open a new tab, then hold over the URL part until a "paste" option came up. Now you have to hit the "tabs" button (once you figure out that's what that squiggle means, and it's only there when the URL bar is being displayed so you have to scroll up to make it appear) to see your tabs in funky macintosh-style hover windows and from _there_ you can hit a plus. This gives you a new page that auto-loads google. Hold on the URL bar brings you into edit mode. Delete that URL to get an empty URL bar and then press and hold again and THAT gives you a paste option. Now carefully delete the surplus tweet context that tweetcaster put around the URL you want, and note that the right edge you need to click at so you can remove that last character (because there's no delete to the right of the cursor "del" button, only a delete to the left of the cursor "backspace" button, and no cursor keys, and deleting text never frees up space at the right edge of the URL bar unless the URL is too short to fill up the bar) is exactly TWO PIXELS away from the "delete entire contents of the URL bar with no confirmation or undo" button.

This is just one example. Just about everything that used to be one click in the old android is now two or more, on the same tiny cramped screen. The average number of clicks to do anything has doubled, and they call this an advance. (Oh, and instead of the new page appearing, there's an animated swoop adding a quarter second delay between each page so if you mentally macro redundant clicks together and your thumb goes too fast: too bad, button's not there yet. Of _course_ there's no way to turn this gratuious bling off, not that I've found yet. The stupid "collapse screen to a vertical white line as if it was a 1950's television when you hit the sleep button" is the most annoying so far, it gets old after about the third time, by the 100th its developer is your mortal enemy.)

Yet another regular installment of lateral progress. It's not an upgrade if the new software can't do what the old one could, or can't NOT do what the old one didn't make you do. (Proper open source development can sometimes address this, although not always, as Gnome 3 and KDE 4 demonstrated. But at least people could fork the old ones, or provide alternative projects. But android isn't open source: it's regularly updated abandonware. Periodically releasing stale source code into the wild is not the same as visibility and input into its development. Android allows no access to the actual design or implementation process, you first hear about it long after the fact and take it or leave it with no say in it. If you have comments... who cares about THAT old version, they're already several months into the one that will replace it and there are BIG CHANGES in store, just wait and see!)

What I really want is a fresh security-patched version of the old install. Not something the vendor will ever provide, even thought I keep sending Spring a monthly fee that includes paying for this phone on an installment plan (through October).

Just called them to confirm the "through octboer" thing, and ask how much extra they'd charge me per month to remove the tethering restraint. They say they no longer OFFER that option, instead tried to sell me a "wireless mobile hotspot".

Up yours, sprint. I know what the phone can do. Even if I wasn't an embedded Linux developer, I _did_ it for a month before you disabled it, and the menu entries to do it are still there (it's just the cellular signal strength instantly drops from 4 bars to 0 when you enable that, and jumps right back when you switch it off).

I'm going to go watch netflix videos on my phone. I may leave some long ones running while I do other things, for pleasant background noise...


February 5, 2013

I just spent the entire evening making my toybox todo heap even longer. It now includes analysis of what's in klibc, sash, sbase, beastiebox, and nash. Tomorrow I need to look at s6. And embutils, elkscmd, 9base, dracut...

I don't intend to use any of their code, but I want to see what list of commands they consider important.


February 4, 2013

Remember when Standard and Poor's (the S&P 500 people) downgraded US treasury debt?

Remember when Wall Street went all-in behind Romney and told Obama to suck it?

Remember when Obama's second term started with the federal government suing Standard and Poor's for their incompetent ratings being a big factor in the mortgage crisis that trashed the economy?

Actions have consequences. Payback's one of then schedenfreude things.


February 3, 2013

So somebody requested arp, and it's resonably self-contained as network things go, in theory. In practice: the arp man page is horrible.

Writing a new command: not so hard. Figuring out what the old one can be told to do, and how to tell it to do it: hard. Oh well, start with the subset I can work out and wait for people to complain, I suppose.


February 1, 2013

For my birthday, my sister took me and three of the niecephews to Ikea, where I bought them clearance leopards (as you do) and got myself a $20 desk. It is much less well-built than the $20 desk I got at Ikea a few years back (the plastic fasteners used to be metal), but given sweden's exchange rate I'm surprised they can still remain price competitive at all.

I miss those damn toffee thingies. The minnesota store doesn't have the same range of swedish candy, maybe they stopped importing it. (Yeah, I could buy them from amazon but it seems silly somehow when I'm _at_ an Ikea and they're resolutely not carrying them.


January 31, 2013

I'm vaguely amused at the zombie tinycc project, which is trying hard to get a release out after only 3 years of stagnation.

Lemme rephrase that: in early December two different people pestered the maintainer about having a release and he replied (and I quote): "Well, release would be fine for me. Honestly, even better if someone else could do it." (Yes, this is the "maintainer" who opened a "mob branch" anybody could check into, so he didn't even have to review commits.)

So two months later they're still talking about it, and the people who are not him have done some actual work. Not with an eye towards building the kernel or busybox or anything like that. Or building any specific packages at all, that I can tell. Just having a release to have one, it seems. (I haven't been reading too closely, but none of the reports were "I tried to build package X and this broke", because they're just not there yet. Instead they've got threads arguing about how you do a stack trace.)

This burst of effort means that they're scrubbing the tree and finally noticing that they've hardwired gcc into their makefiles, and there's still plenty of code in there they don't understand and are afraid to touch. Most recently, the maintainer suggested doing nothing for a week. (Why? No idea.)

Still, they got the website link to their web archive updated. That only took one year.

This is what they chose instead of my fork, and they're welcome to it. (Actually I'm not bitter, just amused. I've got my hands full with toybox at the moment, and when I do get back around to qcc I've got permission to license Fabrice's original code BSD so I've got triage work to do even on _my_ fork. I honestly couldn't use anything out of that tcc branch anyway, even if it did look interesting. I'm just sad a good codebase got a maintainer who openly disrespects it.)


January 30, 2013

Finally got a chance to fix the toybox cp command so it doesn't break the Aboriginal Linux build. Which means I need to cut a release, and that's currently the only thing in it, so I'm shaking the tree for low-hanging fruit which would be good to stick into a release. I'm partway through a gazillion things (where "partway" sometimes means I did the research to see what's involved in implementing it, but haven't started yet). The time command's trivial, at least if all you want is posix -p mode. Groups is just id -Gn... and that doesn't work right for any users but the current one. Right.

Sigh. It's hard to come up with automated tests for things that require root access and care about how your system is configured. There's no point in creating tests that only pass on my system and which nobody else can reproduce, and I refuse to make them distro-specific either. Maybe I need to put together an aboriginal linux test chroot.

Amusingly I was pondering putting out an out of sequence Aboriginal release because I had a good stopping point before the kernel was ready. (I actually tested -rc3 for once.) But I've been so busy with other things the kernel's catching up...


January 29, 2013

Paul krugman was just on a podcast, which is great if you've been following him, or read his books, but there's some backstory if you haven't. Let's summarize:

Capitalism is about supply and demand: people sell things, other people buy them. Two major categories of things can go wrong in a capitalist economy: supply problems and demand problems. The great depression was a demand problem (1929 stock market crash), 1970's stagflation was a supply problem (OPEC oil embargo). Since 2008 we're having a demand problem again, but the current generation or leaders weren't alive during the great depression so they've been _treating_ it as a supply problem, sometimes described as "pushing on a rope".

Demand problems can be triggered when a commodity price bubble bursts leaving consumers with widespread debt. The 1929 stock market crash wiped out retirement portfolios and left excess margin loans, in 2008 housing prices collapsed and left underwater mortgages. Consumers diverted their income into paying down the debt backlog, didn't buy as many goods and services out in the economy, inventories piled up, companies laid off workers, unemployment shot up, unemployed people didn't buy stuff which reduced demand further, downward spiral into depression.

The main knob the government uses to control the economy is the interest rate at which the federal reserve loans money to banks. Lowering interest rates makes existing debt less painful (lower monthly payments) and encourages people to borrow more money, increasing demand. Higher interest rates do the opposite, reducing demand if supply can't keep up (thus avoiding inflation).

The 2008 crash was similar to the 1990's savings and loan crisis that took out BCCI under the first President Bush (because conservatives don't learn from experience and thus didn't actually fix anything). This time the crash was big enough (massive wave of foreclosures plus collapse of much bigger companies like Lehman Brothers, Arthur Anderson, and AIG) that even lowering interest rates all the way to zero wasn't enough to restore sufficient demand to keep the economy going, hence the rise in unemployment and unsold inventory.

Since you can't lower interest rates _below_ zero (except by raising inflation, which offends rich people whose fortunes erode at the same rate the debt backlog does), and since keeping the interest rates at zero isn't enough to restore the necessary level of demand to keep the economy operating, interest rates get stuck at zero in what is called "the liquidity trap".

This doesn't mean money vanished, it's just not moving. It's collected in the hands of rich people who sit on it. The same companies that have piles of unsold inventory also have large bank accounts, as do the owners of those companies, but they've become risk-averse, afraid to spend what they can't replace. Civilian consumers don't have enough money, so they can't buy things, pay their bills, or employ each other.

The fix is to increase demand, meaning somebody has to spend more money. But the economy is a closed system: all income is money somebody else spent, so if everybody cuts spending at once our incomes have to go down due to sheer math. There are only four sources of spending:

The GOP morons screaming about the federal debt (which is still lower than it was in world war II) want to cut off the last remaining source of spending. In most of Europe conservative politicians have succeeded in imposing massive "austerity", contracting government spending and contracting the countries' economies with it, causing massive unemployment, poverty, riots...

An aside on Europe's problems: the new currency, the Euro, is controlled by Germany, which is driven by exports. Germany sells more to other countries than they buy from them, treating the rest of europe as a captive market. Before the single currency, this persistent imbalance would eventually have priced their goods out of the market by driving up the Deutchmark vs the Lira; instead forcing them onto a common currency with a persistent trade imbalance has impoverished Germany's customers until they can't afford to import more _and_ the imports are still cheaper than what they can produce domestically. German politicians have literally suggested that the other countries in Europe could also redesign their economies around exports, and everybody could sell more than they buy. Presumably the surplus would be sold to Mars. During World Wars I and II Germany tried to take over Europe with tanks, in the 1990's they took over Europe with banks, yet the persistent failure of the rest of the world to be Germany continues to baffle them.

Back in the USA, the main reason 2008 wasn't like 1929 (and didn't trigger a full repeat of the great depression) is that the federal government stepped in to spend a lot of money when nobody else would, and kept the system afloat. A lot of it was automatic "stabilizers" such as unemployment and medicaid. The surge of Census related temp jobs was also well timed, and Obama did beat a modest stimulus package out of congress before the GOP opposition could organize itself into "the party of no". This was enough to avoid waves of evictions forcing homeless people into shantytowns ("Hoovervilles"), but not enough to deal with the backlog of consumer debt in a timely manner. 5 years on, the economy is still depressed.

In a depression like this, the Federal government is the only organization with the freedom to spend, providing income for the indebted masses to pay down that debt until they can start spending money on each other again. (We could also just cancel the debt, but the rich who lent out the money would rather see the country destroyed than their personal wealth threatened, and billionaries own the GOP outright.)

FDR dug us out of the great depression with the new deal The same GOP morons opposed him back then, and kept him from doing enough until the run up to World War II scared them into line. Our economy recovered fully in 1940 and 1941 (december 7th 1941 was after 2 full years of spending to prepare for the war, via lend-lease and such). Unemployment went down when the government gave people jobs, and we also invested in infrastructure we're still using 75 years later (yes it's wearing out).


January 26, 2013

If you follow the GOP, you might get the impression that plutocrats actually want poor people to suffer, as a goal in itself. And you'd get that impression because it's true. It's actually a fairly common thread throughout history, the end of slavery and indentured servitude is why "you can't get good help these days". Without starving illiterate peasants, the giant manors of Gosford Park or Downton Abbey can't afford staff. Education ended country manors just as emancipation ended plantations.

The thing is, billionares don't hire other billionaries to do menial tasks; they can't afford to. You need poor people to be rich relative to. Today's minimum-wage burger-flipper has access to prepaid cell phones, antibiotics and insecticides, fruits from the other side of the planet, meat every day if they want it (chicken's cheap and the McDouble is technically beef), the ability to keep food fresh for months or years, air conditioning, access to police and fire services, learning (libraries/Wikipedia/TED talks/Kahn Academy)... We take for granted wealth that kings didn't have a couple centuries back, but a minimum wage burger flipper is not rich.

Being rich is meaningless unless other people are poor, because rich people hire less rich people to work for them. That's what money is all about: you don't pay farms or mines or factories, you pay people to build and run them. Money always ultimately goes to people, not things.

Rich people's ability to hire lots of poor people is what makes them rich. Cheap labor is the end goal of management everywhere, paying less for what you buy and being paid more for what you sell is the heart of business. The ultimate expression of this is slavery, which is as old as recorded history: buy a laborer and own them outright.

Explicit slavery is out of fashion in the USA (but ask child brides sold for dowries how the rest of the world's doing), as is the Jim Crow replacement system of the "company store" where workers were perpetually in debt to their employers and paid with "scrip" instead of cash (non-portable coupons worthless anywhere else) so they couldn't afford to leave. Economic slavery is the end goal of the upper management of most corporations, if you're lucky you get "golden handcuffs". (This is not the same as job security: slaves were sold all the time.)

The people running these systems haven't changed. Plantation owners are now CEOs, and engineer stuff like H1-B visas (leave this job and you're deported, try for citizenship on your own time). They would LOVE to reintroduce slavery or jim crow, they just want to do so out of sight with sweatshops in India or the phillipines where they can get away with it.

Yes, plutocrats want poor people to suffer, even if they don't admit it to themselves. It's inherent in what they do. If everybody is rich, nobody is rich. The ultra-rich aren't trying to expand the pie so _everybody_ gets a bigger slice, they're playing a zero sum game where they can't win unless they beat somebody else.

For the GOP to win, others must lose. Opposition to health care isn't about the cost any more than fighting wars or union-busting. It's about keeping the uppity poor in line so their labor remains cheap.


January 25, 2013

Hmmm, there's a shell syntax for assigning values to variables if they're not currently set: "${varname:=value}". I'm pondering using it to replace my export_if_blank shell function in aboriginal, but haven't done so because it's not really a cleanup. It looks like evaluating a variable but is actually modifying its contents. It sets locally rather than exporting so I'd need to list the variable name twice (once in the ${} and again in an export statement; exporting is necessary for child processes to inherit the value). And it needs a command context to be evaluated in; you can cheat and use the ":" command (which is basically a synonym for "true"), but you've still got to have that at the start of every line with one of these on it. Which is subtle. I don't like unnecessary subtlety in programs.

So "new tool, very nice, doesn't help".


January 24, 2013

Ah, balance disease. Even in the UK, they can't cover something obvious without finding some crazy loon that disagrees with it and letting them bark into a microphone for a few seconds to show that nothing is ever actually true or false (after all, gravity is a social construct). Spineless twerps: "The united states government just did this thing". Full stop. Was that so hard?

I'm enjoying working at Cray. Learning all sorts of stuff I didn't expect to. We had a brown bag on ftrace the other day, and even though I sat through the CELF keynote on it it never quite "clicked" the way it did in that brown bag. (I need to write a good intro. I should work out how to screencast videos, actually.) Plus in shell scripting, "exec" with just redirects but no new command name can redirect stdin and such persistently the current script, without resorting to elaborate parentheticals. (It's implicit in the susv4 description of exec, but easy to miss.)

Fun Ubuntu 12.04 bug: if you close the laptop lid while switched to a text console, the suspend process gets a "not authorized" which nicely pops up on the desktop (when you switch back to it), but leaves the thing running with the lid closed.


January 23, 2013

Oh right, klibc has executables in it. I should add that to the toybox roadmap (yet another use case to replace). Let's see, do a quick "echo $(for i in $(find . -type f); do file $i | grep -q executable && basename $i; done | grep -v '[.]g$' | sort -u)", filter out the *.so entries and the *.shared duplicates of static commands, and...

cat chroot cpio dd dmesg false fixdep fstype gunzip gzip halt ipconfig kill kinit ln losetup ls minips mkdir mkfifo mknodes mksyntax mount mv nfsmount nuke pivot_root poweroff readlink reboot resume run-init sh sha1hash sleep sync true umount uname zcat

Huh, no switch_root? Weird... Oh right, klibc called it run-init and I couldn't put a dash in a busybox command name so as long as I was renaming it anyway I called it switch_root as an analogy to pivot_root. Right. (My computer history research taught me the importance of tracking sources, so I'm usually very good at remembering where things came from... unless it was from me. Anything I could independently reproduce can't be that hard, QED.)

Ok, what does toybox _not_ currently have? (They call sha1sum "sha1hash", and "nuke" is "rm -rf --"... What the heck is mknodes? It... generates a .c source file? What? Ah, it's part of the build infrastructure for dash: mkinit, mksyntax, mknodes, mksignames all "hostprogs-y". And fixdep is another one.)

Ok, looks like:

cpio dd fstype gunzip gzip halt ipconfig kinit minips mount mv nfsmount pivot_root poweroff reboot resume sh umount zcat

Not a bad list. I'm partway through mount/umount already (which should include nfs and a cifs support, and fstype is related), init needs doing (which includes halt, poweroff, and reboot). I did a gzip implementation in java back in the 90's (back before 1.1 added it to the standard library), so that's gzip, gunzip, and zcat I'm confortable doing. I have plans for cpio, dd, mv, pivot_root, and ps. Which leaves ipconfig, resume, and sh, two of which I know about and "resume" is... a strange tool that reads data from a swap partition and passes it to the kernel? I thought the kernel did that built-in? Weird.

Ok, when I get an hour I need to add all that to the roadmap properly. Ooh, and this (ENOENT is not an error when cleaning out initramfs). And if mounts exist in the old directory and there's a mount point for them in the new directory, --move the mount (the klibc code has special case handling of /dev, /proc, and /sys; sigh).


January 22, 2013

Things I've learned this morning, in the approximate order I learned them:

If you miss your stop on a minneapolis bus, it doesn't turn around at the end. It kicks you off in the middle of nowhere and goes back to the depot.

The google maps client sprint put on my phone does not do bus routes. The web version is unusable on phones due to Clever Javascript.

Google maps can't find a GPS signal in the habitrails. Navigating the habitrails is an acquired skill, and they inexplicably dead end a lot. Stairs down from them don't always clearly label the street level vs sub-basements.

If you ask google maps where two bus routes intersect, it shows you a "transit center" that is a building where bus employees work. No busses stop there.

The sportsball arena has big "transit centers" that people will helpfully direct you to. There is no mention of busses in them. They are deserted. If you go outside to see if busses are mentioned there, the door will lock behind you. This puts you on route 94, a large mutilane highway. If you go along it, there are fences to prevent you from getting back out. Sprint does not get signal here. The sun does not reach here. The roadside contains a surprising amount of abandoned luggage. When a chunk of ice gets in your shoe it can be surprisingly stabby. Getting the shoe back on wearing gloves and balancing on one foot is, eventually, possible. This does not mean one has found and removed the piece of ice when it stuck to the bottom of the sock. Getting the shoe back on a second time is more difficult. It is possible not to be sure if one is laughing or crying, even while doing it. It is possible to be cold enough that getting angry just doesn't happen somehow. This does not stop one from developing a general loathing of the city of minneapolis.

The trick to getting out of route 94 on foot is to go back to the sportsball arena and find the one-foot-wide staircase up the side of the fifty foot concrete wall.

When a helpful person points and says "that's the bus you want", they don't mean that specific bus. That specific bus will sit there, locked, running, with nobody in it, for half an hour. They mean go a block away and wait at the stop for another bus with the same number.

Cans of energy drinks freeze and explode, but do so fairly silently in plastic bags, which then leak slowly down your leg and on your shoe, but if it's cold enough you won't notice, and it freezes there anyway. You notice when you get back in the warm and it starts to melt and drip.

The trip to St. Paul on the 61 bus goes through a surprising number of plowed fields. It has a scrolling LED display that says "keep conversations respectful by using appropriate volume and language".


January 21, 2013

First day of work. Brain filled up around 11am. Many accounts set up. Got a lecture about product architecture. Introduced to many people I couldn't pick out of a police lineup at this point. There were multiple wikis. Got a login to the build machine, got code checked out (from svn) and compiled (not through official build thing, just by typing "make"). and did a big diff of the local tree vs upstream: 1.9 megs before I start filtering. Tomorrow the emulator guy should be in for me to try to boot a thing with a printk() in it so I can tell it's running my code. Printed out more stuff to read overnight.

Found a broom closet so close to work the commute is farther vertically than horizontally. Of the chunk of paycheck peeled off "per diem" it's more than I get in a week but less than two, and I can't argue with the location. Might be able to move in on the 28th. Staying at Kris's friend's place until then, commuting via bus.

Explored the habitrails of downtown St. Paul a bit: found a local branch of the bank Fade uses and a place that sells familiar energy drinks. Eventually made it to 6th and Wabbajack where something or other eventually got turned into the right bus.

Lost one pair of gloves already, bought a replacement. These are electrostatic gloves that work with phone touchscreens! Modern freezing your face off technology. (The high today was -3 farenheit.)


January 19, 2013

In Minnesota, at my sister's. Woke up at 5:30. My sister woke up at 6:30. (I'm totally a night person but can get up in "the middle of the night" and then power through sunrise if I got enough sleep.)

Apparently Minneapolis has a vacancy rate of 2%, so finding a place to stay for 6 months is challenging. My sister lives over an hour from work so apartment hunting from here is a bit awkward, but this is why the internet was invented.

Niecephews are _exhausting_. (Four of them, age range: 5-13.) Having the 5 year old and the 7 year old hang off me at the same time is something my back did not enjoy.


January 18, 2013

And lo, I get on a plane to Minnesota in a couple hours, off on a 6 month contract working at Cray in St. Paul.

I cut a toybox release. It built the i686 target of aboriginal with cp and readlink enabled (and the corresponding busybox versions switched off), so that's probably a good stopping point. There's losetup in there too and several nice third party contributions and general cleanups.

I've been on the fence about doing an interstitial aboriginal release (the ls bug is embarassing) but the deciding factor is now simply that my poor netbook can't grind out a dozen targets in less than a full day, and I haven't got that much time left. The kernel's up to -rc4, I should have non-work internet again by the time that ships. (Since I've updated the perl patches before the merge window for once, I should post them to the list, but balsa whitespace-damages everything and my python script to send stuff is still getting the header info subtly wrong. More todo items...)

My talk proposal for CELF ELC (on command selection in toybox) got accepted, so I should be in San Francisco in late February. First CELF I've attended since the Linux Foundation katamari'd over it, and I'm not attending the other stuff they've bundled with it, but the kernel.org guys are still nuts about key signing (because physical proximity proves you're harmless, that's why large men in dark alleys are so reassuring), and I should really deal with that. Besides, the hallway track is generally fun.


January 16, 2013

With a bit more time to think about it, the mknod case of cp -p isn't as horrible as I thought, because I can just create the sucker with the right permissions in the first place. For files the suid bit goes away if you modify the contents of the file, but with mknod I'm not modifying the contents. (Having the suid bit on a device node is kinda strange in the first place, but it's not my place to judge.) So the initial mask is pretty much right, modulo setting umask(0) for -p.

For directories I already make sure the sucker's writeable or I can't create new entries in it. This is another facet of the same atomicity problem: for a file O_RDWR|O_CREAT gives me a writeable filehandle even if the permission bits of the file it creates don't have the write bit. (This doesn't work on an NFS filesystem, which is NFS's problem not mine. Don't use NFS, it sucks in many subtle ways.) Creating a directory doesn't give me a filehandle, and if I do have a filehandle to a directory it's just a reference, permission checks get redone each transaction. (I think.)

Awful lot of fiddliness for 220 lines of code, isn't it? If you were wondering why I held off doing it for so long, I didn't want to leave it half-finished again. I want to get it _right_.

Still gotta do -sHL flags, re-read the posix spec to see what fun corner cases I might have missed, and plug it into aboriginal to find real-world breakage.


January 15, 2013

So fchmodat(AT_SYMLINK_NOFOLLOW) always fails, which is a glibc bug that uClibc blindly copied. That took a while to track down.

Context: I finally reached the end of a longish rathole making "mknod b blah 0 5 && cp -rp blah blah2" work in toybox cp.

Last time I mentioned I had to fork the -p logic so it could handle with cases where we could and couldn't get a filehandle. The ones where we naturally already have a filehandle to the entity we're trying to set permissions, ownership, and timestamps on are nice and secure: there's no gap between fiddling with the object and fixing up its permissions where somebody could swap in a different object.

But mknod() doesn't give us a filehandle to the node it just created. That's ok, mkdir() doesn't either, in that case we can just open it on the next line and perform a longish dance of verifying that we didn't follow a symlink and that what we have now is a directory. Except that if mknod created a node that refers to a device for which no driver is currently loaded, we can't open it. (Even with 0 in place of O_RDONLY because O_RDONLY already _is_ 0. You can't _not_ request read access.)

Well, that's not quite true: 2.6.39 added O_PATH which gives you a filehandle for use with fchdir() and friends. It's not in Ubuntu 12.04 libc headers yet, but I can look up the value and #define it myself, and then open() accepts it. But if I try to set permissions through the resulting filehandle with fchmodat() it returns "Bad file descriptor". (It doesn't mind opening something that isn't a directory, but then there's _nothing_ you can do with the result.)

So rather than follow the mknod with an open(O_NOFOLLOW) I set the filehandle to AT_FDCWD and down in the -p logic I test for that and use fchownat(), futimensat(), and fchmodat() in that case (which modify a name relative to a directory filehandle), and in the other case use the more secure fchown(), futimens(), and fchmod(). I'm tempted to collapse the two forks together and just always use the lookup-by-name versions, but the original three functions I was using DON'T re-look up the entity by name opening up a race window where somebody can swap in a different one. If I still have the filehandle to the file or directory I was just operating on, from a security perspective I want to use those original ones wherever possible and only fall back to the other three where that isn't an option.

All that I figured out a couple days ago, what ate a lot of debugging time since is the AT_SYMLINK_NOFOLLOW flag, which all three of the new permissions fiddling functions mention in the man pages. That flag tells it not to follow symlinks when modifying this entry, which is the behavior I want: symlinks are already handled further up in the function and -p doesn't apply to them because they don't have their own permission bits. (Hang on, they've got their own ownership and date stamps, so 2/3 of -p does apply. Right, another todo item.)

Anyway, the "race condition" I'm concerned about for cp mknod -p is a malicious user finding some cron job or something where root is doing a cp -a on something that has the suid bit set, setting up an inotify on the destination directory, and right when the resulting file is created deleting it and replacing it with a symlink. If cp then sets the suid bit on that symlink, it can give the user a suid executable with arbitrary contents, and congratulations you've cracked root on the system.

And, as I said at the top of this entry, fchmodat(AT_SYMLINK_NOFOLLOW) always fails. It doesn't even make a syscall, it tests for the flag in libc and throws a temper tantrum if it finds it. (Wrongbot's uClibc code even tests and fails twice: if any other flag is set, fail. If this flag is set, fail. Brilliant.)

They're doing that because there's no syscall, and they can't be bothered to come up with a workaround.

There's also no obvious way to get a filehandle to a symlink, so that _must_ operate by name.

I did an strace on the gnu/dammit version of cp to see what syscalls it uses, and it's doing lchown() with a path from the current directory. That's even _less_ secure, somebody could replace one of the intermediate directories with a symlink.

What I want is for open(O_PATH) to give me a filehandle to a directory entry that lets me modify the metadata but not the contents, and which can open a filehandle to a symlink rather than one the symlink points to. (This way, I don't need two -p codepaths, one that securely operates on filehandles and the other that _insecurely_ operates on names.)

Failing that, I want fchmodat() to implement AT_SYMLINK_NOFOLLOW.

Given that I've got neither of those, I'm going to have to come up with some horrible workaround. Grrr.


January 14, 2013

I'm continually frustrated by the way people keep calling money an illusion. It's a promise of value. This promise is an "illusion" the same way a signed contract is, or laws are, or shares of stock in a corporation. The difference between a promise and an illusion is that promises are honored and enforced by somebody specific.

Society is built on promises. The bill of rights is a list of promises. Citizenship is a promise. The only reason you can walk around safely is everybody else has promised not to kill you, and if they break that promise there are people promised to defend you a phone call away (police), and people promised to defend THEM (swat teams, national guard, army).

Yes, printing money is making new promises. But so is borrowing money: when you borrow money from a bank, the bank treats your promise to pay the money back as an asset. Every time you use a credit card, you're giving the bank a fresh promise to pay the money back, and the bank records that new promise in its books AS A TYPE OF MONEY. Larger transactions involve a web of interconnected promises: taking out a mortgage loan promises to give them your house if you don't pay (and promises that a third party sheriff will come and evict you from that house on the bank's behalf if necessary). Any of these promises can be broken, and the point of courts and judges and juries is to resolve broken promises without trial by combat or lynching.

I've previously called money just debt with a good makeup artist, but actually it's a slightly different type of promise. Taking out a mortgage loan promises to give them your house if you don't pay (and promises that a third party sheriff will come and evict you from that house on the bank's behalf if necessary). Using a credit card is just the raw promise that you'll pay it back.

The government prints money for the same reason the government passes laws: it's the organization we've delegated the authority to make really BIG promises to, and which is in charge of enforcing those promises. They print money the same way they issue laws. Sometimes it's a bad idea, but sometimes NOT doing it is a bad idea. (There are people who think every law is a bad idea. If they really think so they should boycott dollars, which are backed by laws saying they're legal tender.)

The government can collect taxes as an alternative to printing money, and the reason to do that is to prevent inflation. Except that right now the economy has a shortage of money in circulation, and because nobody is willing to take a pay cut this manifests as high unemployment instead of massive deflation. (Instead of prices going down 10%, everybody who has a job gets zero pay raises for five years while ten percent of the people have no job for all that time. This is called "downward nominal wage rigidity".) A couple years of 5% inflation would actually fix what ails the economy. At the expense of making rich people poorer, and we have the best congress money can buy so they've been fighting to keep it broken because fixing _other_ people's promblems doesn't interest rich people who don't need a job.)

Promises are vitally important to a functioning society. Taxes are backed by a promise to confiscate and imprison if you don't pay. Stop signs and traffic lights and lines painted on roads are backed by a promise that police will arrest you if you don't obey them. Writing a check is a promise that there's money in the account. When you deposit money in the bank, the bank is promising to give it back.

People who don't understand this think commodity money is different. "Gold isn't a promise!" Stick that person on a desert island with all the gold they can eat and come back in a year. Commodity money was always a promise of value. People who think having gold is somehow better than having dollar bills miss the point that neither is useful unless people are wiling to exchange them for things you want at a future date. If people stop being willing to exchange them (as happened to all sorts of commodity money: cocoa beans in south america, cowrie shells in the marshall islands) they lose their value.

Using metals as commodity money is no different, look at the wildly fluctuating price of silver: in 1840 it was $1.29/oz, but extensive silver mining drove down the price to 64 cents by 1900 and 34 cents 1940. Inflation brought that back up to $1.63/oz by 1970, and then commodity speculators drove it up to $16.39/oz in 1980... before crashing back to $4.06/oz in 1990. The value of silver has historically been as volatile as beanie babies and tickle-me-elmo dolls.

The gold bugs try deseprately to pretend that silver was always secondary to gold... except that Rome demanded tribute in talents of silver, the coin of Athens (the drachma, nicknamed the "owl") was 4.3 grams of silver from the mines at Laurion, the bible has Judas paid with thirty pieces of silver, "cross my palm with silver", and so on. In greco-roman times, gold was secondary to silver. (Sure gold was rarer and more valuable, but platinum was rarer and more valuable still, and who actually pays for anything with platinum bars? It doesn't get USED for anything outside of D&D campaigns and the recent attempt to exploit a legal loophole.)

The massive amounts of silver and gold brought back from the Americas changed gold's secondary status, because there was now enough of it to encounter regularly. The huge influx of precious metals also caused massive inflation throughout europe, which was GOOD. The increase in the money supply let the european economy expand without immediately going into the same kind of depression we're in now (a liquidity crisis: not enough money to let all the people buy all the things, so you have unemployed people sitting next to unsold things and no way to connect the two). When people got used to an expanding economy they invented paper money to keep the economy expanding after the extra gold supply dried up; a bigger economy needs more money or it slides into depression.

None of this is new, but it's worked so well for so long we forgot the difference between an aqueduct and a river. That we invented mechanisms to scale things beyond what "just happens naturally", and that those mechanisms need to be _operated_. The same idiots screaming that the melting glaciers are just happening without a cause, that evolution is a lie (and thus it's ok to use 80% of all antibiotics in cattle feed because antibiotic resistance is the will of Zeus instead of a trivial example of evolution in action), screaming to keep the government out of their medicare...

They're in control of the house of representatives and fillibustering the senate, and they're trying to cut spending in a depressed economy. They're literally arguing for an end to unemployment benefits when there are three job seekers for every position nationwide so it mathematically CAN'T matter how motivated the job seekers are, there's still more pegs than holes.

It's really, measurably stupid. There are REASONS they are wrong, if you just bother to understand the system, which they don't because they think they already know everything. But driving a car does not make you a mechanic. Getting rich doesn't teach you how the economy works any more than a farmer whose crops get enough rain learns how to predict (let alone control) the weather. My first year in Austin we had a bumper crop of crickets an inch deep, if there was such a thing as cricket farming somebody would have been rich, and would forever after insist they had earned it.

The "end the fed, printing money is a sin" idiots are the exact same guys who want to drown the government in a bathtub (no new taxes, no new laws), and who insist everyone carry a gun because the promise of protection offered by the police can't possibly be trusted. These people do not believe anyone else's promises, because their own are easily broken. They're dishonest bastards whose word is worthless, so of course they don't trust anybody else, which means they can only function in society with the aid of massive piles of cash to cover their perpetual lapses. They either wind up rich (theft is profitable) or in jail. Society doesn't (can't) work the way they think it does, and when they try to change it to fit their prejudices they destroy whatever portion of it they've managed to steal. (They're great at figuring out what they can get away with. They've got the House due to gerrymandering, is that theft? Losing the popular vote but gaining the electoral vote is their stock in trade, gaming the system is what they do. Because they don't trust promises, their own or anyone else's.)

The whole trillion dollar coin thing was just an end-run around these morons, who've worked a contradiction into the law. The government promised to spend a certain amount of money (the budget congress passed), and the government promised not to tax, borrow, or print enough money to spend the amount in that budget. Paradox. The promise breakers are trying to force the government to break its own promises, to show that all those OTHER promises it's issued (from laws to dollar bills) are as worthless as those from the GOP. The platinum coin loophole let the feds print enough money to keep their promises without hoping the crazies had a moment of clarity.

For a while there, people thought Obama had a spine. In reality, the stiffening effect was a little blue pill Obama takes to sustain elections, and his spine goes all floppy again afterwards.


January 13, 2013

Back dinking at toybox cp: the mknod should be using st_rdev (device ID of special file) instead of st_dev (device ID of containing filesystem). Oops.

Unfortunately, this doesn't fix the case where we just did a mknod for a major/minor pair with no associated driver, so any attempt to open it fails, so fchown() has no filehandle to operate on to change ownership. The reason is that O_RDONLY is 0, so you can't _not_ request read permission when opening a file. (Hello PDP-11 unix from 1970.) What I need is some kind of O_NOTHING bit that _just_ gives me a filehandle I can manipulate metadata through, but not contents. Some discussion wandered by on linux-kernel about this, but it's not in the open(2) man page of current ubuntu LTS.

For most things, implementing -p just means keeping the original filehandle open and doing some extra stuff on it at the end. This avoids race conditions where the file you copied data to or the directory you created files in gets replaced by another program between your first and second actions on it. (Which could be a security problem. Assume there's a suborned flash plugin running, but it shouldn't be able to crack root by dropping a symlink somewhere root's about to flip the suid bit. And this is why the multi-user nature of minicomputer Unix is still relevant on phones, it means your web browser can't install keylogging device drivers.)

Alas, mknodat() doesn't return a filehandle, so I have to go back and look it up by name to do further operations on it, with a gap that allows a different filesystem object to be substituted. I can feed all three functions SYMLINK_NOFOLLOW, but I'm still uncomfortable.

I'm also uncomfortable with having two codepaths that do the same thing. If I'm going to have fchownat() instead of fchown() shouldn't I force everything through it? (Except mknod is rare, and the filehandle version is more straightforward for the common case of files and directories. And I still need the equivalent of the filehandle anyway to signal whether or _not_ to perform the modifications because you don't for hardlinks and symlinks. Grrr.)


January 12, 2013

Sigh. The balsa email client is really frustrating.

Right click and copy link doesn't work, so I cut around the link and copy that instead. (Why that works and right click copy link puts nothing in the clipboard, no idea.) Email filtering doesn't work (the rules never triger), so I wrote a python wrapper that runs each time balsa exits and chops up the messages in the inbox mbox, shuffling them to the right folder.

This means the filtering only happens when I exit balsa, which I can't do if I have pending reply windows up, so I tend to read the raw unfiltered inbox a lot. (This means I'm reading a lot more of linux-kernel. It also means I'm days behind on email pretty often.)

So I'm seeing messages containing patches, which I need to collect and deal with later. Cut and paste converts tabs to spaces, but worse the balsa developers didn't bother to implement window scrolling during cut and paste. so I can only cut what's currently showing on the screen, and the patch never fits. So I want to right click and save the message: the balsa guys didn't implement that. So I want to right click and copy the message to different folder (I can create an empty one to collect them, then forward later. Note that fowarding messages _as_attachments_ does whitespace damage). But the balsa developers didn't implement "copy message" either. (Move yes, copy no.) View source: same scroll problem, and it gives me the raw mime with = escapes and wordwrap breakage.

I eventually found out how to get it to do what I want, double click on the message in the message list to pop it up in a window, then "message" (pulldown menu #5), then save current part.

Ubuntu LTS hasn't updated balsa since it shipped, and building it from source still requires installing dozens of packages beyond my patience for dealing with gratuitous dependencies. (I don't want a spell checker. Three spell checker packages and it still wants more: no.)

Sigh. I should restart my search for a viable email client, but trying to find reasonable user interface stuff on linux is an enormous production. (So instead I've written a python script to send email in a way that won't whitespace damage patches, although I've still got to fix the "to" addressing logic.)


January 11, 2013

I am not a fan of SELinux and friends. Maybe I'm being too hard on them, but I doubt it.


January 10, 2013

Heh. According to the history of rome podcast I'm listening to, "senate", "senior", and "senile" come from the same root word: senate was "council of old men". Political offices in Rome were unpaid, thus "office" and "honor" come from the same root word. Running the government was a thing rich people did in their spare time.

"Release early release often" is good advice, and I want to do that with Toybox. When I checked in cp, I started poking at a release.

My process for cutting a toybox release is to stick it into aboriginal linux and build all the targets. That's how I get the static binaries. It's also a good smoketest that it _does_ build with the new toybox commands replacing yet more of busybox.

Alas, cp didn't. Or rather it _almost_ did, but the gcc source tarball keeps re-extracting itself every time it's set up, which screws up parallel builds (because they try to rm/extract/patch the same thign at the same time, this is why "EXTRACT_ALL=1 ./download.sh" can do it all up front before a parallel build, instead of the lazy binding variety build.sh defaults to.)

The problem boils down to "cp -lf", because -f only applies to the IS_REG() file copy, and -l needs the same "it didn't work, delete the destination and try again" -f behavior to avoid cp returning an error code.

So I need to shuffle the code some more. The _easy_ thing I could do is stick in another goto. (Right now the -p logic to adjust the file's metadata after creating it has a goto so the directory creation and file creation can share the same code.) The -f retry stuff either needs to be duplciated or it needs to be able to jump _back_ to retry. With a counter variable so it doesn't potentially get stuck in a loop. Sigh. (My main objection to adding a big loop around everything is I have to reindent it all and it makes massive diff noise due to the whitespace change. Oh well.)Ok, what are the cases here: directory, hardlink, symlink, mknod (char, block, fifo, socket), file. Hardlink and symlink don't need -p, but the mknod variants do and currently aren't getting that right.

The -f unlink/retry logic is needed for all the cases, but only for failures to update the _target_, but not for failures to read the _source_. So copying a directory without -r, readlinkat() failing, openat() failing mean don't unlink/retry. (And, of course, not having -f.)

Sigh. Went through the work to recombobulate everything, and now it's giving me the 'src' is 'dest' complaint for a bunch of files. (And a different test with "sudo cp -rp /dev/null blah" is giving me a "no such device or address" error attempting to open the file for -p after the mknod.) More digging to do...


January 9, 2013

Listening to Terry Gross interview Lemony Snicket (who sounds remarkably like Wyatt Cenack on the daily show). He's saying he couldn't explain to his own kid (age 9) why the Beatles called themselves that. Terry said it might have been an homage to Buddy Holly's band "The Crickets".

I thought it was a pun on "beat", as in the beat goes on. Hence the spelling "The Beatles" instead of beetles. (But nobody ever explained that to me, I just guessed for myself...)

So I continue to hate git. If your repository is modified (such as having patches applied), "git show" appends a diff, which craps lots of text into build/MANIFEST that's just trying to show the version number. This broke my manifest generator in sources/functions.sh, so I go to fix it.

In mercurial if I don't remember how to ask for the current commit hash (and nothing else), I go "hg help" (83 lines), look at the list of commands, see "hg identify", do "hg help identify" (29 lines), and find out I can do hg identify -i or -n depending on whether I want the hash or commit number in the local repository. (The number is just the count of commits in the repository when this one was added, and since it's locally unique, a simple increasing number that's easy to understand, and my copy's the master for my projects, I use that to identify versions.)

Trying to ask git how to display the current commit hash, and NOT crap 12 random other thing out so I have to dig what I want out with sed, is a PROJECT. Starting with "git help" just shows a subset of the commands, you need "git help git" (not the same output, and 901 lines) to get the full list. Spending several minutes reading that, I still have no clue where to go next. I tried "man git show" (416 lines), and that wasn't it. I vaguely recalled "git display" which doesn't exist, and dig through my old blog entries to find git describe. So "git help describe" (153 lines) seems to say it can't do it. (The closest is "git describe --exact-match" which spits out an error message if there's no tag for this commit.)

Eventually I tried beating something out of "git log", and on line 789 of the man page it finally told me enough to figure out "git log -1 --format=%H".

This took more than 15 minutes, and the answer was found more or less by trial and error. Compare that to thirty seconds or so mercurial took to figure out the same thing. This is why I hate git.


January 8, 2013

Heh. Adobe recently released a 10 year old version of photoshop as freeware so they could shut down the old license server, except they say you should only use it if you bought a copy.

This reminds me of the way pirating your own product is a time-honored tradition within the software industry, going all the way back to the creation of proprietary software by the Apple vs Franklin lawsuit in 1983. Pirating your own stuff is a way of both attracting new users and suppressing competition, plus any product that _isn't_ pirated is viewed as a dud because obviously nobody wants it unless forced by their employer. (If nobody _bothered_ to crack it, it must be really bad.)

This is something almost everybody in the industry does, they just won't admit to it (except off the record at tech conferences). I'm not talking about junior engineers sneaking out copies, but the actual owner of the intellectual property uploading their crown jewels to bulletin board systems (or these days, bittorrent) as an intentional part of their business strategy. Years ago an ex-Microsoft engineer told me he watched Steve Ballmer personally upload "cracked" versions of the chinese/korean/indian translation of a new Windows release to various pirate sites because he would rather they run a pirated version of Windows than a non-windows OS. Far easier to convince users of pirated versions to "go legit" next upgrade than to convince Linux (or OS/2, or BeOS, or...) users to pay to switch to a different OS. (Of course Microsoft was founded by a law school dropout whose father is a lawyer, so they sue reflexively. The purpose of Microsoft's "Business Software Alliance" sock puppet was to shake down people who are already using the software. If nobody bothers to pirate it, there's nothing to sue over. Of course this doesn't always work...)

So if you were wondering about the l33t hacking skillz behind all that "zero day warez" stuff... yes there are people who can do that, but their services generally aren't required.

Personally, I prefer the honesty of the open source guys. We're also giving it away for free, the difference is we happily admit it. :)

(And yes, this is the same kind of thing that got Viacom in trouble when they sued youtube... over video Viacom's own employees had uploaded to youtube as part of their jobs. Exact same kind of self-piracy is near-ubiquitous in the software industry. With the exact same kind of secrecy, because they don't want to give up the right to sue.)


January 7, 2013

Finally got cp checked in. Lotsa fun fiddly corner cases, probably still plenty more I haven't tested yet, but it's to the point where it's worth testing an Aboriginal Linux build against it.

Actually I haven't implemented the mknod() or mkfifo() bits. (Right now it handles those as cat > dest, which isn't going to work particularly well for /dev/zero.) Hmmm, when does that kick in? (For cp -a, probably for cp -p too?) According to "man 2 stat" a file can be regular, extra crispy, rotisserie style... Ahem. Regular, directory, char, block, fifo, symlink or socket. (That's unix domain sockets, "man 7 unix". I have no idea how cp is supposed to handle domain sockets. I wonder if the spec says...?


January 4, 2013

I submitted three talk proposals to CELF (well, two talks and a BOF). One was "debugging your way out of a paper bag" which is just a bunch of war stories about debugging my way into the kernel and back again (and what happened after) with an eye towards never being intimiated by any problem you can reliably reproduce. (Queue Sark: "gdb can't help you now, my little program".) Another was about my experience rewriting the Linux command line from scratch twice (once in busybox and again in toybox: what a Linux system actually _needs_, the command selection criteria, what the actual standards are and why they're not good enough, etc). The third is a BOF about helping Android replace the PC, which is 5 minutes on my mainframe->minicomputer->micro/personal computer->smartphone rant, the need to be self-hosting, and how to get there. Then the rest of the time open to the floor (and unlike the darn compiler BOF it _NOT_ turning into me rambling for an hour while people ask me questions, although if Rich Felker showed up I'd totally hand the floor to him).

I doubt any of them will be approved because the Linux Foundation annexed the Consumer Electronic Linux Forum and all its functions now belong to the Master Control Program and together they are complete and so on, and those guys do not respond well to hobbyists. (I'm not sure they know what a hobbyist is, they wanted to know what company I represented in the application form.) But it seemed worth a try.

So I was really paranoid about testing the "rm" command, but testing "cp -r" turned out to be the dangerous one. Forgetting to discard "." and ".." while traversing: bad. Forgetting to add the check that source file and target file aren't the same device+inode: bad. (So it opens src/../blah and dest/../blah which are the same file, the latter with O_TRUNCATE, reads nothing from it, writes nothign to it, closes the now zero-length file.)

Having fairly recent backups: good. Still took a couple days to work through what got damaged and what needed to be restored.

Right, moving on from that: I noticed that cp contained a repeated idiom "if (!(x=strrchr(blah, '/'))) x=blah; else x++;" which is x=basename(blah). I started writing my own basename() because the man page said the libc one could modify its argument and couldn't figure out WHY, but while writing it I started adding the code to truncate trailing slashes and went "oh, that's why". Which is a case that the existing repeated code gets wrong, and the test suite doesn't check for ("mkdir one two; ln -s one/ two" and ln complains that two/ exists.)

So going through the tree and fixing that, and inserting some extra tests in the test suite. While doing that, I wound up cleaning up rmdir -p to deal with trailing and repeated slashes and spent the longest time trying to find a reveresed test. (Isn't that always the way: I knew what I meant not what I wrote, so had to go through the whole song and dance of narrowing it down to THIS line is where it deviated from my expectations and why is... because there's a ! missing. Right.)

Meanwhile the -rc2 kernel shipped a couple days ago and that's about the point where I should start paying attention. Alas, my netbook is not much of a build machine and I goodwilled securitybreach (nowhere to set it up, it's LOUD, turns any closet into a sauna, and turning it on only when you need it's no fun because it takes several minutes to boot up with all the Dell server BIOS crap). An equivalent replacement system is about $600 these days (hyper-threaded 4-way with 32 gigs ram and a terrabyte of disk), I might get one in Minnesota.


January 3, 2013

People are talking about the debt ceiling and how Obama is too spinless to print money (via the platinum coin loophole) to render the nihlist GOP irrelevant. So here's another thing Obama's too spineless to do.

If we hit the debt ceiling, state payments should go to those who voted to honor the government's obligations (I.E. raise the debt ceiling to tax, print, or borrow money they've already passed laws obligating the government to spend). Divide each state's payment by the number of representatives in the blocking body and award one share for each yes vote. So if a state has 7 representatives in the house and 3 voted yes, they get 3/7 of the money they'd otherwise be entitled to. This includes social security checks, medicaid, VA benefits, and all military contracts payable to entities in that state (which are collectively the majority of the budget).

Note this amount will be payable from tax revenues, because the percentage of "yes" votes that goes over the amount payable from tax is enough to raise the debt ceiling. It impartially focuses the default on those who voted to default.


January 2, 2013

WOW I've been more productive when I can actually _see_.

When you cp -R the destination could technically have a symlink where you expect a directory, and thus follow it to copy into areas it shouldn't. This is pretty clearly pilot error, though.

I need to have perror_msg() set TT.exitval = 1 and do a cleanup pass removing all the resulting unnecessary manual exit value setting. Ok, switch gears and deal with that...

Trimming some error messages while I'm here. The point of perror_msg("blah") is that it prints "commandname: blah: syserror output". So instead of realpath going "cannot access '%s'", if it just says "%s" you get "realpath: input/file/name: Discombobulated Inode" or whatever the problem you had was. The _big_ advangtage of this is there's nothing to translate to other languages. The command name's mostly posix, and the error messages come from libc (which should be locale appropriate for us).

Hmmm, there's a repeated idiom (head, cat, wc, cksum, dos2unix...) of:

for(;;) {
  int len = read();
  if (len < 0) error_msg();
  if (len < 1) break;
  do stuff;
}

Not quite sure how to factor that out into a library function. (Ok, I could trivially make a MACRO but that's not the point. That doesn't eliminate duplicate code, it just hides it.)

Sigh. Part of the reason I'm doing a manual pass on all the "find toys -name *.c | xargs grep -A 3 error_msg" output is to find possible cases where error_msg _shouldn't_ set the error return code. I found one: in passwd.c it's doing error_msg("Success") and there is a REASON that command is in the "pending todo items" category in status.html. Need a big cleanup pass on that...

And now cp -p has to care about nanoseconds. (Standards are moving targets when you mothball a project for a few years...)

Memo: when writing a new cp implementation, the first time you test cp -r is _not_ the best time to find out A) you forgot to filter out "." and ".." from the directory traversal, B) you forgot to check the source/target files for dev/inode equivalence before opening O_TRUNC and copying the contents.

Sigh. This is why I have backups...


January 1, 2013

I'm not ready for it to be a different year yet. As usual.

Digging through the clutter in the guest room and moving bits to storage, I found my previous set of glasses in a box, and... wow. It's _so_much_better_ than my current set, even with the damaged lenses. (Apparently the lens coating was soluble in bug spray.) Distance viewing isn't the greatest, but up close I can FOCUS ON THE SCREEN.

I had no idea how much my ability to concentrate was screwed up by having to reread each line three times at enormous font size to see it, and getting massive eyestrain every ten minutes. (Well, I suspected, but _dude_.)

Right, cp. Dig dig... (I feel like Agatha Heterodyne dealing with Gilgamesh Wulfenbach's falling machine: that can go, that can go...) I need to redesign xreadlink() because I mostly need readlinkat() instead. Really it should do the "feed it NULL and it allocates its own damn buffer" trick.

All the xblah() wrappers exit the program if an error that should never happen actually happens. In the case of memory allocation, malloc() should never return NULL because all it does is allocate virtual address space which gets asynchronously populated later as we actually dirty the memory and generate page faults.

The page fault handler can reclaim physical pages from other users and write them to disk, and use the freed up page to satisfy the allocation. (This is called "page stealing". You give it back when the other process needs it, by stealing it back from _this_ process. Or get another one from somebody else. Or if you're lucky other allocations will be freed by then and there will be actually unused memory. And yes, you can steal pages from elsewhere in the same process. Happens all the time.)

If a page was allocated via mmap() on a file, the page can be written back to that file (or simply discarded if it hasn't changed since it was read from the file, this is why there's a "dirty bit" in the page tables). If it's an "anonymous" mapping (no associated file, technically mmap(NULL)) the page fault handler can write the page out to swap space. Of course what the fault handler _does_ is suspend the faulting program, schedule a DMA transfer, set some _other_ program running, and then when the DMA finishes it sends an interrupt to notify the fault handler the write is done and it can go reuse that page now, so it hooks it up to the faulting process's page tables, either zeroes it out or schedules a _second_ read to pull in the appropriate contents from mmap(file) or swap, and then when _that's_ done it unblocks the program and allows the attempt to access the now-restored page to continue. So you're overlapping processing with I/O wherever possible.

Anyway, the point is you're not really out of memory until you exhaust physical memory AND swap space AND all the mmap() pages. And of course disk cache, which is another big user of physical memory (usually about 1/3 of it; if you allocate all your pages for programs disk access will slow to a crawl, every time you open /var/log/wtmp it has to read through the root directory to find "var", go open the file containing the contents, read through _that_ to find "log", go open the file containing that's contents, read through to find "wtmp", look up where _that_ stores its data... We're talking dozens of disk accesses to track down where 1000 bytes actually lives on disk. You don't notice because it caches all that after the first access.

It turns out that if you pin the system against a wall with allocations it starts thrashing. Long before it actually deadlocks itself (gets into a position where there are NO pages left to satisfy any allocations, and every running process has the next page it needs swapped out and everybody's blocked waiting on everybody else), the system runs out of good decisions to make when it needs to steal a page. So instead of stealing a page that won't be used for a while, it steals a page that the process will need soon. So every time the process is unblocked it's almost immediately re-blocked needing to swap in the next page, and the entire system slows to a crawl spending all of its time swapping out pages that immediately get swapped back in.

A badly thrashing system slows to a crawl, to the point where tasks that ordinarily take a fraction of a second take minutes to complete. If work is coming in externally (such as to a web server), every request starts timing out. A server that gets into this state may never clear itself without external intervention because instead of handling 10,000 requests a second it's handling twelve (and those twelve are erroring out and aborting but it takes 5 minutes of grinding for the system to come to that conclusion).

Note that thrashing means a _transient_ load spike will take a server down and keep it down. If your normal 500 requests per second suddenly spikes to 20,000 requests per second, and then goes back to normal after a minute or so... the server stays down. Because if it's only retiring 12 requests per second, the normal 500/second is still well above what it can handle.

This is why a good web server will respond to load spikes by dropping connections early. It'll back down a bit from the edge and cap its load at maybe 8000 simultaneous transactions it can handle safely and everybody else gets a static "sorry, try again" page (or just a dropped connection with no data). If they hit retry, they'll probably be in the 8000 this time around (or the time after that), and get their page in the same 1/2 second it takes everybody else. This means that transient load spikes clear immediately, and even during the spike people pounding "retry" aren't actually making things worse. And nobody's waiting 3 minutes to see _if_ the page could load.

The linux OOM killer is similar: if the system deadlocks, kill processes until you've freed up enough memory for the survivors to continue. Unfortunately, this is a nasty problem. Ignoring for the moment the decision of WHICH processes to kill (which ain't easy), how do you tell when the system's deadlocked? It goes through thrashing first, and thrashing can literally take DAYS to resolve into an actual deadlock. I rememer the 2.4.7 kernel with a memory manager prone to thrashing when I opened too many browser windows. More than once I left a system thrashing when I went to lunch (no more requests, just finish the tasks you're already processing), and when I got back sometimes it had recovered, sometimes it had deadlocked, and more than once it was STILL THRASHING. The difference from a user's perspective between thrashing and deadlock isn't that big: "I move the mouse and cursor doesn't respond for 30 seconds" is equally useless from both causes.

This _infuriates_ the mathematicians, of course. People who want perfect behavior from the system, and believe that intentionally discarding a user request is just offensive. If you're wondering where the phrase "the perfect is the enemy of the good" comes from, it's these guys.

These are the same people who insist that the return value of malloc() should tell you whether or not there was enough memory. If you just leave an "emergency pool" of enough memory idle and only used in emergency situations... except defining an "emergency situation" is a subset of "figuring out when we're thrashing", and every time you try to work out how big the pool has to be to guarantee you can recover from thrashing you wind up doubling the amount of memory in the system and leaving half of it idle and STILL not being sure it's enough...

Why so much? Because memory is shared, because clean memory can be dirtied, because when a gigabyte-sized pig like firefox or openoffice calls fork() the new copy might dirty all its memory, but most of the time it will just exec() a new program and discard its existing mappings... and you can't tell. Predicting a program's future memory access patterns is PREDICTING THE FUTURE.

The relationship between virtual and physical address space is illusory. The easiest way for a 32 bit system to run out of virtual address space is to attempt to mmap() large files, so when you _do_ run out it's not because you ran out of physical memory. At the other end, running a 1000 copies of "bash" involves each process mapping the bash executable into its address space, and the way dynamic linking works (self-modifying-code!) every page of that mapping _could_ be dirtied. The operating system can't tell. The bash executable is about a megabyte, and bash forks a new short-lived copy of itself every time you use parentheses or pipes. In the real world, how does vetoing process creation halfway through a shell script (because the number of _potential_ writeable pages in the bash executable mapping goes over the physical memory installed in the system) differ from the OOM killer? By the time you ask how many pages of buffer two ends of a pipeline might have in flight between them, how much disk cache a process is allowed to dirty writing files to disk before you block the process and let the buffer drain... the ivory tower burns to the ground before you even get to the _hard_ questions.

So let's get back to "checking for a NULL return from malloc and attempting to actually _recover_ is useless". If this ever actually happens, the system is probably hosed to the point of needing a reboot. Your recovery code will never get any non-artificial testing. All the page fault handler can do is send the program a signal to kill it (the program has moved on by the time the system _actually_ runs out of memory), so the point of the wrapper is to make it so nobody _else_ ever has to check. If it triggers, it's like disk errors coming back from rotating media: the system needs medical attention.

(Note: disk cache means actual problems writing the files to disk often happen after the file is closed and the process has exited. The write() just put data into disk cache, if the program already exited since then there's nobody TO notify of an error. And the _reason_ we do this is waiting around for the results is orders of magnitude slower. The chef does not stand next to the diners to see how big the resulting tip is before starting to prepare the next meal, real life doesn't WORK that way.)

(Yes, nommu systems have no virtual address space, and their allocations can fail due ot fragmentation instead of just resource exhaustion. But again, "kill the program if this happens" is a pretty reasonable reaction.)


December 30, 2012

Ok, got losetup done-ish. (Needs more of a test suite, but various things worked from the command line.)

Next up is umount (which uses the losetup code, and is smaller/simpler than mount) but for a change of pace I think I should clean up cp first.

Which gets me back to where I left off in cp: figuring out what counts as an illegal recursive copy. The gnu/dammit version vetoes "cp .. ." but allows "cp ../.. ." and I'm not sure _why_. The case I'm trying to avoid is "mkdir -p temp/temp/temp/temp; cd temp/temp; cp -R ../../temp ." and it...

Duh, all you have to do is stat the source, remember that dev/inode pair, and if you see it again you've gone recursive. Ok, that's actually clever. (No way the FSF came up with it.) Here I was mucking about with creating absolute paths...

I note that I try not to look at gnu source code if I can help it (it's not so much a license contamination thing as a "life's too short to spend my hobby time trying not to vomit"), but I do run tests against it, up to and including strace.


December 27, 2012

The losetup implementation is _so_ much easier now that I've figured out how to properly encapsulate the code. And the losetup command definition is _so_ imprecisely defined: losetup -d can take multiple command line arguments (devices to deallocate). I should also test "losetup -da" and "losetup -dj filename".

Let's see: display current status when device given on command line (error if device isn't associated), display all devices with -a (silently skip non-associated devices), display all devices associated with file -j (is it an error if there aren't any? No, it is not.)

Associate one device. Associate device with offset. Associate with size limit. Change size of file and recheck size with losetup -c. Find first device with -f...

I haven't yet made -d actually _delete_ loop devices because I'm not sure how to tell how many precreated devices there are. (There's a /sys/module/loop/parameters/max_loop but it's 0 on my host and that's got 8 precreated loop devices.)


December 26, 2012

Figured out how I should probably handle losetup (run the losetup main() function from mount and have it leave the found device be at the start of toybuf; avoids more than one file needing to mess with linux/loop.h directly _or_ parsing TT from the wrong context).

Didn't get to play with it today because people sent me bug reports. Lots, and lots of bug reports. Here's the first half of the one I'm currently debugging...


December 25, 2012

So "mount -o loop" needs to do losetup -f (which is best implemented with /dev/loop-control these days), which basically means it needs most of the guts of losetup.c. If I move said guts out into lib/lib.c they depend on the contents of linux/loop.c and I don't want lib.c to depend on linux/*. (Even if I make a compile time probe, there's either #ifdefs in the C code or conditionals in the build script, and so far I've avoided both. All that sort of thing belongs in individual commands that already have config symbols compile time probes can switch off. (These config symbols work on a simple generic rule: name of C file under toys/* matches name of config symbol.) The lib stuff is just lib/*.c which gets pared down by the linker discarding unused symbols.)

Even if I bit the bullet and did factor it out, a theoretical loopback_setup() function cares about the layout of the GLOBALS block. I can sort of abstract out the arguments into char **device (which could start NULL and be set to the allocated "/dev/loopX" string for -f) and char *file (which could be NULL indicating we don't associate but instead display the results of an existing association, although it needs a flag value to indicate whether an existing but unassociated loop device is an error, since it isn't for "losetup -a" but is for "losetup /dev/loopX", but I can just use (char *)1.).

But the problem is the function needs more than that. It needs to know whether to open stuff read only or read/write (flag -r). It needs to know the device and inode of a comparison function for -j searches. So I can get it down to about 5 arguments... Although signalling we're NOT doing -j by setting dev and ino both zero is dubious, since lanana implies that NFS breaks everything, as usual. If I can't check toys.optflags the argument values get uncomfortably magic. (And I can't check losetup's optflags from mount, they're generated values and mount hasn't got losetup's macros defined in its namespace.)

Sequencing-wise, the -j stuff is in the middle between finding a device and binding or reporting the device, you filter out whether or not the device is one you want after you open it. To split that out I have to either provide a callback function (not an improvement) or split the one guts-of-losetup function in half with the first part returning a loop_info64 pointer out of linux/loop.h, meaning the linux header would have to be #included in multiple places because the users of the function need it.

Ick.

It really looks like what I'm going to have to do is have mount xexec() losetup and parse the output through a pipe, which is almost as disgusting as the other alternatives.

(The reason this wasn't a problem from the busybox version is it shared less code and was less clean. I didn't have generic infrastructure doing option parsing, I didn't care about #including linux/*.h from anywhere I felt like it, I threw #ifdefs around with abandon, and there was no POSSIBLE way to make the makefile I inherited any uglier than they already were. Making it work isn't a problem, making it _pretty_ is the problem. It takes an awful lot of work to get a simple result that looks obvious.)


December 24, 2012

Poking at losetup again. The tricky bit is I want to use the code as a standalone command, and reuse the code in mount and umount.

For mount I need essentially losetup -f, but it has to communicate back to the parent program which device it grabbed. This is fiddly because the option flags specifying -f and the global block are command-specific. If I write the functions the obvious way they're not portable, and if I write them to be portable (marshalling buckets of arguments into and out of two function arguments when they're already there in the global block) there are only two users.

One idiosyncrasy of /dev/loop-control is that the naieve (check /dev/loop0 then loop1 and stop at the first one that isn't there) no longer works because the ability to create and delete devices means the set active at any given time can be sparse. So I need to list the contents of /dev and parse the loop[0-9]* entries, so dirtree and a callback. Which is why data needs to be in the global block, because the callback isn't set up to pass things like FLAG_f. (I've got existing data structures for per-node context and global context, and adding a third layer is just awkward.)

I need to do a test suite, which requires root access to work. Unfortunately, while it's easy to come up with tests:

losetup -f -s
losetup -f file
losetup -fs file
losetup -af (fail!)
losetup -j file -s
losetup -j file file (fail!)

It's much harder to make the results reproducible. The output of querying a device includes dev and inode numbers that aren't reproducible, the paths of the associated device are absolute (and thus include the directory you ran the test in), and the order that losetup -a finds devices when it's doing a directory scan is kind of arbitrary (in my tests, it's find them in the reverse order devices were created).

Also, losetup -f is inherently racy. It finds or creates a device, then tries to use it as a second step, and another instance could allocate the device in between those. I'm trying to figure out if this should report an error or if there should be retry logic in there...

Maybe I should break down and have the losetup device scan sort the devices before trying to look at them, but this widens a similar race window with the ability to create/remove devices.

Sigh. If I wanted to do a half-assed job I'd be done by now. It's being stroppy at the design level.


December 23, 2012

How does one test a dog for vampirism? Apparently garlic is bad for dogs anyway, so it's same as the "If you drive a stake through its heart, it dies" test not really being a good way to determine whether something is or isn't vampiric in nature.

Very time consuming dog. Very energy consuming dog. Very seperation anxiety sleep deprivation dog.


December 21, 2012

Why does nobody in washington understand basic economics? This "fiscal cliff" nonsense _can't_ raise interest rates because that's not how interest works.

Interest rates are the return on investment you can get when you loan out money. When the economy sucks, interest rates go down because nobody is making any money, so nobody can afford to make the payments if they borrow more money at high rates. (The people willing to do so don't qualify for the loan. Yes, lenders can always get higher rates by accepting more risk, but beyond a certain point you're just gambling and lose more money to defaults.)

The current economy is stuck in a type of stall we haven't seen since the 1930's: nobody's making any money because nobody's spending any money, and nobody's spending money because nobody's making any money. Living off savings is terrifying if you can't replace them. This problem is easy enough to fix with something like FDR's "new deal" where the entity that can print money (and thus can never run out) buys a bunch of everything to push the economy back up to speed. But we've got a really really BIG economy and it takes a LOT of spending to get it unstuck, and the "stimulus package" of 2009 was maybe a third of what we needed. (Enough to turn "hoovervilles" and the "bonus army" into "occupy wall street".)

Ordinarily when demand goes down below where we need it to be to avoid layoffs, the federal reserve will offer to loan money at even lower interest rates until people are willing to borrow again (sometimes just to refinance their existing debts and lower their monthly minimum payments, thus freeing up new money to spend each month on goods and services). Unfortunately if you get a big enough shock the interest rate you'd need to offer to get monthly spending back up to a rate capable of keeping everybody employed is BELOW 0%, and the federal reserve can't offer that. And because this knob normally works so well at controlling the amount of monthly spending people do, the feds no longer have a backup plan for what to do when the knob hits its end stop. (Asking them to pull out the techniques FDR used in the 1930's is like asking people to pull out candles during a blackout: how quaint, how uncivilized, we don't DO that anymore...)

Unfortunately the old fogies stuck "fighting the last war" are treating this as a supply-side crisis ala the OPEC oil embargo of the 1970's, so they're busy giving water to a drowning man and blocking any attempts to drain the swamp because they cannot CONCEIVE of a problem where customers simply don't buy products producers are selling. The problem MUST be at the producer end, It's not like customers have any CHOICE in the matter, they're just sheep behaving mathematically without volition, right? Only business owners are actually _people_...

Rich people looking for places to park their savings DESPERATELY want rates to go up because right now they're losing to inflation. They don't understand why rates are so low, they're convinced it's a conspiracy on the part of the federal reserve to punish rich people and prevent compound interest from making them richer. They've invented a "bond vigilante" fantasy whereby any day now rates will MAGICALLY go up, and suddenly their vast fortune will be earning, 3%, 4%, 5% above inflation instead of losing money to inflation every year. How will interest rates go up without any increase in people's ability to qualify for new loans or make additional monthly payments? Well, they just HAVE to. Because the alternative would be unthinkable!

Of course that's not how it works, but it turns out that people who made their money by modern white collar piracy ("leveraged buyouts") don't have to understand how the economy works any more than sports players who win via steroids (and retire at 30 with epic health problems) really understand anatomy, biochemistry, neurology... An olympic medal does not qualify you to perform surgery.

Speaking of inflation: it turns out the federal reserve could fake negative interest rates by raising the rate of inflation, because 5% inflation and 1% interest is essentially -4% interest. Go ahead and borrow money: by the time you have to pay it back it won't be worth as much anyway. In fact your existing debts get slowly eroded and less troublesome. But rich people HATE inflation eroding their existing fortune, and will fight to the death to stop this from happening.

P.S. a leveraged buyout is where you borrow a bunch of money to buy a profitable company, often using that company's assets as collateral. (Just like when you get a mortgage on a house, the house you're buying is the collateral.) Once you're in charge not only can you drain the company's bank accounts and pocket the money, but you can transfer your loans into the company's name so the debts you ran up buying the company are no longer your problem. If you haven't maxed out the company's credit rating yet you can have the company borrow MORE money (and pocket it). Next chip off any large assets (buildings, profitable division) and sell them, pocketing that money. Laying off employees can reduce expenses and allow the company to qualify for more loans. Rewrite the employee contracts so any retirement benefits are no longer based on savings and will be paid from future revenues, so you can pocket any existing pension fund. When there's nothing left to loot, sell the dessicated husk of the company and pick a new target.

This is how people like Mitt Romney made their money. Yes, it starts with the ability to borrow millions of dollars, which is easier to do if your daddy used to be the governor of Michigan. If this sounds utterly evil, you obviously don't understand the realities of business where corporations are people and employees are "resources, comma, human".

That said, stealing the Mona Lisa from the Louvre still doesn't make you Leonardo Da Vinci. Obtaining is not making. This is why the correct response to calling rich people "job creators" is to point and laugh.


December 20, 2012

Downloaded the new PC BSD 9.1 release and fired it up under qemu 1.3.0. It hung endlessly on something like PCI bus scanning, with and without ACPI, so I told it to boot in "safe mode" (what is this, windows? That would run under qemu...) and it paniced saying it could't find a time source. So much for this year's interest in BSD.

New dog is very time consuming dog.


December 19, 2012

I uninstalled my irc client after someone on there insisted that the http://landley.net/aboriginal/bin directory (which contains nothing but symlinks into aboriginal/downloads) was confusing them, and that I needed to remove it so their brain could cope. (This was the same person who said I should grab aboriginal.org or ab.org despite both of those already being taken: the whois command is a thing that exists.) Really: I have other things to do with my time.

Part of my short temper is due to my normal tendency to switch to a night schedule and Fade's morning-person tendencies wanting me to be on the same schedule as her, so when I do shift to a night schedule these days she wakes me up every couple hours to see if I want to get up now, and then I'm groggy but not sleepy all night.

Plus still recovering from injuries, which the tetanus shot more or less qualifies as at this point. (My arm is swollen, and the irritation is somehow maintaining the outlines of the band-aid days later.)

Fade and I are getting a dog. The cats are just going to be _thrilled_. We've spent about 4 hours a day all week looking at various dogs, and have gone through a half dozen where we decided "ok, we want this dog" and then either it's adopted out from under us (we were told they couldn't be reserved before pickup, except that the one Fade wanted most was reserved when we came to pick it up), or "we forgot to mention that this dog was part of a bonded pair that can't be separated" or "now, about this dog's medical problems..." But Fade really wants a dog, so we keep at it. Amazingly time consuming, dog hunting.

I need to finish filling out paperwork for the job I start a little over a month from now up in Minnesota. It's a six month contract: as much fun as toybox is, I'm still paying a mortgage on a place three times bigger than I used to live in, and putting my wife through college. (I'd pondered doing a kickstarter or something to see if anybody wanted to sponsor some full-time toybox work, but Fade wasn't enthused about the idea.)

I need to repost the perl removal patches, and even though it's the merge window, and I've posted them to the list a half-dozen times over the past three years, and they don't actually change the generated files, I should probably try to feed them through the linux-next tree because the kernel development clique is ossifying a bit in its old age, developing ever-more layers of procedure and ritual. I downloaded the linux-next tree and read a bit of the wiki, but so far there's nothing about actually submitting patches to it. Possibly it's in Documentation somewhere...

Friend visiting from out of town this weekend. (She used to run Mensa games night before retiring to Maryland).

Recovering from injuries, which the tetanus shot more or less qualifies as at this point. (My arm is swolen, and the irritation is somehow maintaining the outlines of the band-aid days later.)


December 18, 2012

I have a needle-phobia. Today, I got a tetanus shot. That was pretty much my day.

I put it off as long as possible but after the weekend's incident with the wire hoop and the picture of bleeding I posted to twitter with the "do not click on this link" warning... it's a piece of metal lying on a neighbor's lawn, out there long enough to corrode a bit despite being some variant of stainless steel. And the last time I _might_ have had a tetanus shot was 9 years ago.

Then Fade took me to look at dogs, I got home, played skyrim, and fell asleep on the couch until almost midnight. I have a bunch of things I _should_ do, but really wasn't up to any of them.


December 17, 2012

If anybody cares about the patches removing perl from the linux kernel build, I just posted 3.7 versions to the mailing list: 0/3, 1/3, 2/3, and 3/3.

My direct mail sending script mangled them slightly (the archive sort of has a long name for me, but not quite?), and I'm still waiting for the list to send copies back to me so I can see how they came through, but I did the cannonical patch format with diffstat, sent them to the get_maintainers cc: list, and it's at least not whitespace damaged (unlike Balsa). With several days of merge window left.

Just like the last half-dozen times...


December 16, 2012

I got a toybox release out, and an Aboriginal Linux release out.

And I tripped over a wire hoop in the neighbor's yard and re-injured my darn foot. Much bleeding. Really annoyed. Probably need a tetanus shot.


December 14, 2012

My giant build finally completed sometime after midnight (takes more than a full day to build all targets on this netbook, and that doesn't include the native compiles). And the the 3.7 kernel broke arm, mips, and i686.i That's above average collateral damage for a single package upgrade.

I dealt with i686 day before yesterday. Bisecting arm gives me commit 387798b37c8d which added multiplatform support and changed the Arm platform type default from ARCH_VERSATILE to ARCH_MULTIPLATFORM. Ok, add the explicit config symbol to LINUX_CONFIG in the arm target files... and it builds and boots. Right, rebuild the kernel on all those targets...

I have GOT to get a faster laptop. Or at least a server I can connect to and knock out some fast compiles just to show there aren't any OTHER problems...

Let's look at mips:

arch/mips/lib/delay.c:24:5: warning: "__SIZEOF_LONG__" is not defined

Lovely. This is another toolchain version thing, isn't it? Do a quick "gcc -dM -E - < /dev/null | grep SIZEOF" on both toolchains and... yes. Yes it is.

Ok, looks like it's time to update the kernel's README and Documentation/Changes because gcc 3.2 and ld 2.12 ain't gonna work no more. I'm having to patch gcc 4.2.1 and binutils 2.17 to get them to build this sucker, and this is no longer "the sh4 maintainer is an asshole", this is two different architectures breaking even the ones I test (much newer than the documented requirements) it in the same release.

On the bright side, fix that and it seems to be working again.

Gotta test the native builds. Gotta cut a toybox release. Gotta send the perl removal patches upstream (possibly into linux-next). I should check the armv4teb target to see if I can finish it in a reasonable amount of time. I should see if powerpc-440 can actually work with qemu's new -M bamboo board emulation. I should dig up the qemu-m68k branch and make puppy eyes at Laurent...

But first thing after cutting this release: get back to the ccwrap rewrite so I can switch to musl.


December 13, 2012

I need to send the perl removal patches upstream again, and deal with a backlog of documentation patches I've tagged, but Balsa is crap. There is NO WAY to get it to avoid whitespace damaging patches. Even forwarding a message AS AN ATTACHMENT did whitespace damage.

I checked the kernel's Documentation/email-clients.txt and it doesn't mention Balsa (unsurprising) and specifically says that the gmail web front end does not and cannot be made to work. (Longish list of reasons, including it converts tabs to spaces, period.)

Meanwhile, I've got the list of marked patches in balsa that I need to extract _from_ balsa. Right click on the message and... there's no "save as" option. Great. None of the icons when I've got a message selected does it. The file, edit, view, mailbox pulldown menus have nothing. If I "view source" on a message and cut and paste from that window, it turns tabs into "\t". (What were they smoking?) I tried creating a new mbox file to copy the messages to (although undoing the mime encoding is a pain but it's something), but right click has no copy! It has move, but there's no option to leave the original message in place instead of marking it deleted. (What is this, some kind of DRM enforcement? There can be only one copy?)

I eventualy found something usable under the Message pulldown menu, called "Save current part", which can deal with my flagged messages. But in terms of sending _out_ new messages containing non-whitespace-damaged patches, Balsa simply can't do it.

So for a third time, I'm writing python code to fling mail around, using the builtin packages in the python standard library that do this stuff with only a couple lines of code from me, the meat of which is:

recipient = recipient.split(",")

headers = ["From: " + sender, "Subject: " + subject, "To: " + recipient[0]]
if len(recipient)>1: headers.append("Cc: " + ",".join(recipient[1:]).lstrip())
headers.extend(["MIME-Version: 1.0", "Content-Type: text/html", "", ""])
headers.extend(sys.stdin.read().split("\n"))
body = "\r\n".join(headers)

session = smtplib.SMTP("smtp.gmail.com", 587);
print session.ehlo()
print session.starttls()
print session.ehlo()
print session.login(sender, password)
print session.sendmail(sender, recipient, body)

So I hardwire sender to my email address, pass a comma separated list of recipients and a subject string on the command line, and redirect a file to stdin containing the body of the message.

This means I have now written a POP receiver, an SMTP sender, and an mbox filter in python, because in each case it was EASIER THAN FIXING BALSA. If I could decide whether ot pursue the gtk or qt bindings (or some other gui library), I'd just write a front end and be done with it. (I can compose messages in "mousepad".)

But I don't _want_ to write an email client. I just want to _use_ an email client. One that isn't crazy. I have other things to do...


December 12, 2012

So 3.7 added "static __initconst u64 p6_hw_cache_event_ids" to arch/x86/kernel/cpu/perf_event_p6.c and it's breaking my i686 toolchain. What the heck is __initconst? It's defined in include/linux/init.h as "#define __initconst __constsection(.init.rodata)" and right below that is an #ifdef CONFIG_BROKEN_RODATA for toolchains that don't handle this. Which is only set for parisc right now, but apparently applies to anything still using gcc 4.2.

One way to fix this is to default BROKEN_RODATA to y (which works), but I don't want to maintain yet another patch against the kernel that has no change of going upstream. Instead I should probably figure out how to patch gcc. I've been meaning to do a similar upgrade like I did with binutils, where I move to the gcc repository commit right before they went GPLv3 and then fix whatever's broken in that random snapsot, on the theory this might provide ARMv7 support. That would be a good target to support. (The new 64-bit ARMv8 will definitely require a non-gcc toolchain.)

Unfortunately, the gcc repository is crap. As far as I can tell the project is _still_ maintained in subversion and they just mirror it in git, and there are no tags, or even obvious commit comments announcing releases. I have yet to figure out where the release I'm _using_ is. A git log on the COPYING3 file found the commit that introduced that, and fairly extensive grepping of the commit before that (c473c074663de) didn't find any references to GPLv3 (or "[Vv]ersion[ \t]*3" that's actually about the license instead of libstdc++, or several other variants...) However this commit requires MPFR and GMP to build meaning it's off into 4.3 territory, and according to my notes 4.2.2 was GPLv3, so it looks like tags weren't the only history that got lost in the repository conversion. Sigh.

And then when I installed mpfr and gmp on the host just to see what would happen, but build broke in a hilarious way:

In file included from /usr/include/stdio.h:28,
from ../../.././libgcc/../gcc/tsystem.h:90,
from ../../.././libgcc/../gcc/libgcc2.c:33:
/usr/include/features.h:324:26: error: bits/predefs.h: No such file or directory
/usr/include/features.h:357:25: error: sys/cdefs.h: No such file or directory
/usr/include/features.h:389:23: error: gnu/stubs.h: No such file or directory

Translation: they did the -nostdinc thing with a lot of -I and prevented the standard headers from finding themselves, because gcc is _special_. It can't cope with building like a normal program, no, it has to micromanage the host compiler. (Even though all it DOES is parse input files and produce output files which is NOT HARD. Except for the FSF.)

Broke down and just did the kernel workaround for the moment, which got i686 building. Set a buildall.sh going to see what other targets break...


December 11, 2012

The 3.7 kernel dropped last night, so today is patch update day. My hacks to the arm "versatile" board for better qemu support don't apply cleanly anymore. (The reversion of the IRQ changes that break qemu still applies cleanly, but all the menuconfig symbol stuff to stick different processor types in the same board had context change around them.) I don't need to apply the ext4 stability fix since that's upstream.

Tryn's old "make BOOT_RAW a selectable menuconfig option" had another context change but I yanked it rather than rediff it because I'm not actually using it. (Possibly I should give that one more submission. The real value there is the help text...)

And then there's the perl removal patches. That's not just version skew, upstream had several changes: different #ifdef guards in the generated headers, and the UAPI changes finally went upstream so the kernel headers that get exported to userspace are now split out and kept in a different directory. Instead of chopping out "ifdef KERNEL" blocks while exporting them, the kernel's private headers #include the uapi versions. (They still need a pass to remove __user annotates and put underlines around __asm__ and so on. The "don't use u8 as a type in anything userspace sees" appears to be a coding convention rather than something the scripts clean out, another reason to separate the files rather than have different conding conventions inside and outside special #ifdefs.)

With stuff this fiddly, the best way to see what's changed and make sure you don't miss anything is to "git log" the old perl files and "git show" each commit that touched them so you see the patch and explanation, then make the corresponding changes to the shell script version. When I wrote the shell script I sat down and worked through everything it was doing and diffed the resulting generated files and it took _days_. But now that I've got equivalent behavior, I just want to see what new things showed up.

Which brings us to the new requirements, such as removing the _UAPI prefixes from the #ifdef guards preventing multiple inclusion in these files. Why do they need to do that? Git commit 56c176c9cac9 explains:

Strip the _UAPI prefix from header guards during header installation so that any userspace dependencies aren't affected. glibc, for example, checks for linux/types.h, linux/kernel.h, linux/compiler.h and linux/list.h by their guards - though the last two aren't actually exported.

I.E. "FSF code is buggy crap full of brittle assumptions about internal implementation details of the headers linux exports to userspace, and if we change those magic details glibc breaks, so work around this bug." The description talks about glibc, but the example breakage they cut and pasted was libtool. (I note that _one_ of the things the header export script has done for years is chop out all references to linux/compiler.h. But glibc and libtool are explicitly looking for it.)

The FSF wants _desperately_ to be the microsoft of the open source world, and they seem to think the way to get there is to produce code as bad as Redmond excretes. Hence the second paragraph of Documentation/CodingStyle in the Linux kernel, which Linus wrote in the very first version of that file back in 1996:

First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture.

And you wonder why I'm following the #musl channel on freenode (after years of trying to use uClibc)?

Anyway, with the perl removal patches updated, now it's time to try a test build, and generally i686 is safest:

CC arch/x86/kernel/cpu/perf_event_p6.o
arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes a section type conflict
make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1

But not in this instance. Right, what's going on here... it's complaining either about building with gcc 4.2 or 32-bit, because building the host version with the same .config (64-bit, gcc 4.6) happily proceeds. And 3.6 happily built this file. Ok, git bisect time...

Wow. A clean bisect (when's the last time that happened?) to commit e09df47885d7 which I'll have to look at in the morning because it's 5am.


December 10, 2012

Various people are surprised that when 1/1000th of the population skims off about half the wealth, everybody else is poorer. Sigh.

Did nobody pay attention to how capitalism _works_? That all profit is inefficiency in the market (it means there wasn't a competitor selling at closer to cost), and the way rich people get rich is generally by some variant of "cornering the market", I.E. fencing out competitors and selling to a captive audience. Whether it's a natural winner-take-all niche due to economies of scale or an artificial one due to patenting algebra, sustained profits require what Warren Buffet called "a moat around the business".

This is magnified by compound interest, the fact that earning 10% interest on a billion dollars gives you 100 million dollars for sitting at home doing nothing (and the historical rate of return of the US stock market over the past century averages out to a bit over 10% annually, including both world wars and the great depression). This is why the rich get richer, and if the system doesn't balance itself out you wind up with the French Aristocracy saying that if the starving peasants outside have no bread "let them eat cake" instead.

For a long time what the US did to counter this was tax the hell out of the rich, both to keep their share of the pie from crowding out everyone else and to pay for things like interstate highways, public schools, and tracking down Typhoid Mary.

Fifty years ago the top tax rate was 91% on individuals and 52% on corporations, and we used that money to put people on the moon and invent the transistor. This is why we didn't have to worry about "Citizens United" because the rich didn't have more money than the rest of us combined. The rest of us got together and voted to tax them into submission. Not just to raise revenue but to keep society balanced.

In 1964 President Johnson lowered the top tax rate from 91% to 70%, but it was really Ronald Regan who screwed everything up: In 1981 President Regan lowered the top tax rate from 70% to 50%, and then in 1986 lowered it again from 50% to 28% (and also raised the _bottom_ tax rate from 11% to 15%: yes he took from the poor to give to the rich). A quarter century of compound interest later, concentrating more and more wealth into fewer and fewer hands, and "the 1%" own the GOP outright.

Of coure the math of Regan's tax plan didn't _work_, and our enormous national debt is the result, as is lingering economic weakness. The whole "oh noes, Japan is eating our lunch, now India is eating our lunch, now China is eating our lunch" mess is because Ronald Regan and two Bushes screwed up a good thing. These days most of the money the US economy churns out is skimmed off by well-connected parasites. It no longer goes into fixing crumbling bridges, upgrading our ancient and decrepit national electrical grid, or putting a fiber optic cable to every home (like South Korea did a decade ago), or any of the other important things we "can't afford" to do.

(The other way the US dealt with monopoly profits is by breaking up monopolies using the Sherman Antitrust act. They broke up "Standard Oil" and as a result 4 of the 5 largest companies today are oil companies. They _didn't_ break up Ma Bell (their 1957 action resulted in a consent decree allowing them to continue but not expand outside the phone business) and the resulting company stagnated so badly over the next quarter century it changed its mind and allowed itself to be broken up in 1984, the breakup giving rise to cell phones and turning modems from a shameful semi-illegal abuse of their phone network (hooking up unofficial equipment to the phone lines, for shame!) into ubiquitous home internet access. Unfortunately, the Party of Regan also gutted sherman antitrust enforcement, so for example the 1995 and 1998 actions against Microsoft came to nothing.)

Add it all up and the weakness of the US economy, where parasites have sucked out half the blood and wonder why the beast's health is failing, is starting to hurt the rich. They keep treating near-zero interest rates as a plot to prevent them from continuing to compound their wealth. But loaning out money through credit cards doesn't work when the cards are maxed out because the cardholder is unemployed. Loaning out money in home mortgages doesn't work when a wave of reposessions has trashed everybody's equity and they can't afford the down payment on a new one.

Low interest rates won't make poor people living from paycheck to paycheck borrow more if they have mountains of existing debt, all they'll do is refinance. Even if they want to borrow more, the first thing they'll do with the money is pay off their existing high interest loans, lowering their overall interest rate without increasing their overall level of debt. To avoid that, the rich can't let them "qualify" for new loans; even tightly controlled store credit cards that can ONLY be used for new spending mean they might put the groceries on that and use the grocery money to pay down the other high rate credit cards.

The problem rich people have trying to park their money in a depressed economy is that compound interest isn't a mathematical abstraction, it's loaning or investing money in people who create new value by doing work. If the people aren't working, value doesn't get created: you can't take what doesn't exist. If the people's work earns less and less money, they can't buy stuff, and your big business has a shortage of buyers to sell stuff to.

Sometimes I ponder the distance between Occupy wall Street and the French Revolution and try to work out how much pain this country would actually need before declaring billionaires a game species. But mostly I console myself with the knowledge that the people who screwed this stuff up, and the people who profited from it, are in their 80's now. They and the baby boomers behind them will all die soon, and then a new generation gets a crack at it. Preferably with at least a 50% inheritance tax.


December 9, 2012

Got in a long walk today for the first time since I hurt my foot: walked to Dragon's Lair and back. Got to see Randy and several other people (several of whom I was not prepared to meet but said hi anyway).

Stopped at The Donald's along the way, and got some programming in!

Ok, todo items, otherwise known as "procrastinating about the linux 3.7 update". Test current toybox in aboriginal, fix aboriginal's native-build.sh so it actually finds netcat after the busybox->toybox switch, check baseconfig-busybox for more commands toybox provides that I can switch off in busybox (looks like cut, rm, touch, hostname, and switch_root).

Try the lfs-bootstrap build: it breaks in m4 hanging on an rm prompt about deleting a ro file without -f, which is odd because the previous rm didn't and all the rm instances in configure look like they have -f. Try a chroot splice version so I can more easily track down where that's called from, and chroot-splice is saying that the read only bind mount is writeable. And an strace on mount shows it's passing through both the bind and ro flags and the result is still writeable. Is that a regression in Ubuntu 12.04's kernel... no, apparently the read only attribute can only be applied on a remount, not on the initial bind mount, which is CRAZY. (Easily fixed, but still crazy of the kernel to require.)

Ok, the delete is actually from build-one-package.sh in the control image bootstrap infrastructure, which is deleting config.guess and substituting a tiny "gcc -dumpmachine" instead (why config.guess doesn't do that itself...)

This call to rm doesn't have a -f on it, which is fine because it's a symlink (to a read-only bind mount, but the symlink itself is in a writeable directory). The problem seems to be that faccessat() is not honoring AT_SYMLINK_NOFOLLOW. In fact, strace says:

faccessat(AT_FDCWD, "build-aux/config.guess", W_OK) = -1 EROFS (Read-only file system)

And that's not even showing the fourth argument to the syscall. Is there a kernel version limitation? Let's see, cd to linux source, 'find . -name "*.c" | xargs grep sys_faccessat' and it's in fs/open.c and... I don't even have to do a git annotate, it only has 3 arguments. So either the man page is wrong or libc implements wrapper glue that uClibc is getting wrong.

Sigh. I can check the link status in the stat info dirtree is already giving me (symlinks are always chmod 777 in linux), the problem is what if the directory it's in is read only? faccessat() should tell me if the path to here doesn't let me fiddle with it. Then again, if that is the case I can't delete it anyway so prompting is kinda moot...


December 8, 2012

And ubuntu crashed, for the first time in a while. (It was X11: chrome tabs kept freezing, then _any_ chrome tab would freeze after a few seconds, then the whole of X froze so badly the mouse pointer wouldn't move and ctrl-alt-F1 wouldn't give me the text console. Remember, the system is SO MUCH MORE STABLE if your only way of interacting with it is through a userspace process.)

This means my 8 desktops full of open windows all went bye-bye. Probably a sign I was swap-thrashing, but it's back to my todo lists to try to figure out what I was working on...

Stopped by Dragon's Lair's "webcomics rampage" thing shortly after lunchtime, but apparently they don't start until 6pm Saturday this year. Ok...


December 7, 2012

Saying "Since the server breakin we've deployed SELINUX" roughly translates to "Since that boat sank we've sprayed WD-40 on everything". Not helping. Really, not helping.

(See also: thinking WD-40 will turn a boat into a submarine, arguing about the merits of WD-40 vs scotchguard for stopping a dripping faucet, allowing people who sell undercoating and extended warantees to design IT security "solutions"...)


December 6, 2012

Sometimes, posix is so nuts _nobody_ implements it properly.

The posix rm spec, section 2, says how to handle interactive prompts for recursive deletion of directories. Section 2(b) says to prompt before descending into a directory, and 2(d) says to prompt before deleting the now empty directory.

This is not what the gnu/dammit implementation of rm does:

$ mkdir -p temp/sub
$ rm -ri temp
rm: descend into directory `temp'? y
rm: remove directory `temp/sub'? y
rm: remove directory `temp'? y

It only prompts before descending into a non-empty directory. But the spec doesn't say anything about the directory being empty, it says you prompt for 2(b) and you prompt again for 2(d).

Also, section 4 is just awkward. The bits dealing with directories should be 2(e) (because you can't get there unless you made it through 2(a)), and the bits dealing with files should be 3(b).

Oh, and posix requires infinite directory traversal depth (even though filesystems have a finite number of inodes), and explicitly says you can't keep one filehandle open per directory level. This means that A) you have to traverse ".." to get out of the directory you're in, pretty much guaranteeing that two parallel rm -rf instances on the same tree cannot both complete, B) you have to consume an unbounded amount of memory cacheing the directory contents because you can't keep the filehandle open and restarting directory traversal with -i would re-query about files you already said "n" to at the prompt.

Somebody really didn't think this one through, but even _trying_ to make it compliant means I have to start over and write a lot of bespoke code that only applies to rm. Not sure it's worth it. (I'll probably break down and do it, but I'm going to sulk a bit first and call the standards guys names.)


December 5, 2012

Updated the dirtree infrastructure to feed parent to dirtree_add_node() so it can print the full path to errors. The _other_ thing I need to work out how to do is notify a parent node that one of the child nodes had an error.

I'm going to have to have multiple negative values in parent->data for a COMEAGAIN callback, aren't I?

Oh well, could be worse. The two uses for that field are dirfd for directories and length for symlinks. (Should be zero for normal files.) Symlink length should never be negative and the only negative fd is AT_FDCWD (which is -100, and that's hardwired into the linux ABI at this point).

No, that doesn't work because I can't just reach out and set parent->data at failure time because it's using that filehandle to iterate though a directory and there may be more valid entries after the failing one. So I'd have to defer setting it, which means I need another place to store it which means it should just _stay_ there. I'm going to have to allocate another variable in struct dirtree, aren't I? (I keep being tempted to overload fields in struct stat, but they're just not well defined enough.)

Actually getting the functionality right took a couple hours. Getting error handling/reporting right is coming up on day 3.


December 4, 2012

I wonder why "rm" cares about the files it's deleting being read-only? The "unlink" command doesn't. Oh well.

Yeah, finally working on The Most Dangerous Command, which I've held off on not because it's hard but because it's most likely to screw up my system if I get it wrong.

The yesno() stuff is wrong, it's checking stdin, stdout, and stderr (in order) to find a tty and using the first one it finds, meaning it's trying to write a prompt to stdin. I can special case my way past this, but in general working out when "yes y | thingy" should feed answers to yesno and when "zcat file | tar" should bypass stdin and bother the tty... I think the caller has to specify the behavior it wants. Gonna have to rewrite that function, but I probably need more than 2 users to work out the right semantics.

The ubuntu man page for rm has a --no-preserve-root option, which is a technical violation of posix but probably a good idea anyway. Except that longopts with no corresponding short opts are kinda tricky. (I _sort_ of have support for them, you can put parentheticals before all the other options and it'll parse them and set a flag. But there's no FLAG_ macro for those, and teaching the shell script to do that sounds painful. (It would wind up being something like FLAG_no_preserve_root anyway.)

I could trivially just ignore "/" (and have the error message be "rm /. if you mean it"), but posix doesn't require it and it conflicts with simplicity of implementation. If you log in as root and "cd /; rm -rf *" or "rm -rf /*" you're equally screwed without hitting the special case. Doing a realpath check for "/" might catch a couple more ".." than you expected, but how is taking out "/home" by accident much better?

Another fun little piece of special case behavior:

mkdir -p sub/other
touch sub/other/file
chmod 400 sub/other
rm -rf sub

This complains about being unable to delete "sub/other/file". It does _not_ complain about being unable to delete sub/other, or sub, so error reporting is suppressed as it works its way back up the three. (But only for parent directories, not siblings and aunts and such.)

My first pass at the code complained about being unable to open ".", "..", and "file", with no paths. Even though . and .. are discarded by the callback what it was complaining about is inability to stat and thus generate a dirtree node, because the directory has r but not x, meaning you can list the contents but not stat the files in it. (How is that useful, you say? No clue.) The dirtree infrastructure notes the stat failure and doesn't construct a node, thus can't call the callback. I can trivially filter out . and .. from error reporting, but giving a path to file means calling dirtree_path() which means having a node to call it on.

I think what I need to do is move the error handling to the callback, which means have it make a node with zeroed stat info. I can probably detect that by looking at st_nlink which should never be zero, but occasionally is. Sigh, inband signalling vs wasting space in every node. And either way I need to go fix the existing users in ls and chgrp and such... Not liking this. What's the alternative? Making generic infrastructure too clever us unpleasant, but having it display the full path to a file it couldn't access is probably the right thing...

Also, really hard to make a test suite that captures the error output in a way the test passes on multiple implementations producing slightly different error messages. Maybe I can just count the number of lines or wash stderr through grep or something...


December 3, 2012

The losetup command is more complicated now than when I wrote one for busybox, mostly because the kernel keeps growing new features. You can set the _length_ of an association now, not just the starting offset. It's got a "capacity check" thing that updates the loopback device to match a changed file size (which is _not_ the same plumbing as setting the length of the association because the people writing this aren't big into code reuse). You can iterate through all loop devices associated with a given file...

The fun one is /dev/loop-control meaning loop devices come and go now, so the -a, -f, -j, -d, and possibly even the basic association options are more complicated, in a way I'm not sure how to avoid race conditions for, but only after July of last year. (xubuntu 12.04 has this, but 10.04 didn't. And looking at the log of drivers/block/loop.c in the kernel there's a LOT of activity going on in what you'd think would be an ancient stable system. Partition support (tweaked August 2011) with LO_FLAGS_PARTSCAN (no support in the userspace losetup xubuntu's using). And of course the kernel parameter to set the number of pregenerated loop devices (potentially to zero, so you _must_ use loop-control to request new ones).

It's a bit like moving from the old static PTY devices to /dev/pts, except not quite cleanly separated.


December 2, 2012

One of the musl guys expressed interest in a big endian arm target for aboriginal, so I'm taking a stab at that. It's... tetchy.

Ok, the old arm4eb target built something but qemu couldn't boot it. The main problem here is it's oabi, which is obsolete. Nothing uses OABI anymore, and EABI requires thumb extensions so we need to bump up a notch in processor version to support this.

New target config, armv4teb, based on an unholy union of armv4tl and armv4eb. Diff the two and see what changes armv4tl needs. The gcc/binutils tuple is derived from the filename, so that should be ok. In the uClibc config there's an "ARCH_WANTS_BIG_ENDIAN", set that.

Next problem: the kernel doesn't want the versatile board to be big endian. There's some big endian plumbing support but the key is declaring an ARCH_SUPPORTS_BIG_ENDIAN symbol that currently only the ancient mach-ixp4xx declares, more or less what the armv4eb oabi stuff from before was aimed at. (Apparently you can declare the _existence_ of symbols inside conditional blocks testing on symbols which may be modified at runtime. Wheee.) This big endian plumbing has further derived symbols for armv6 and armv7 processors, but nothing sets it. Leaked infrastructure to support out of tree boards, looks like. The plumbing's there but no board definition uses it.

So, patch the kernel kconfig so the versatile board is even MORE versatile, and then set CONFIG_CPU_BIG_ENDIAN. Now try to build it...

And the sanity test at the end of simple-cross-compiler fails because libc.so is big endian but the compiler is trying to build a little endian hello world. And it's doing that because uClibc feeds a CFLAG to force big endian but the default in the compiler is still little endian. Why is the default wrong? Dig, dig... gcc's libbfd is testing the host compiler to set the target endianness. That's just SAD. Ok, in the target config "export ac_cv_c_bigendian=yes" and try again... and that test is coming out right but it made no difference to the smoke test.

Right, the armv4eb test bit-rotted in current releases but it worked in 1.1.1 so check that out and build a working big endian arm oabi toolchain to compare against... Right, got something that can run hello world under qemu-armeb. Now to look at its build logs.

This is gonna take a while.


Back to 2012