Rob's Blog Creative Commons License rss feed livejournal twitter

2011 2010 2009 2008 2007 2006 2005 2004 2002


May 15, 2012

The gnu/dammit implementation of the chmod command is _weird_. It rejects "x=g" but thinks that "g+" and "g=t" are perfectly valid file modes.

And the spec actually seems to require some of this: ugo+ and ugo- are NOPs, and "=" by itself clears _all_ the bits (even the sticky bit)...

How crazy is this? Let's see "touch blah; chmod = blah; chmod u+s blah; chmod g=u blah" and it does _not_ copy the suid bit to the sgid bit. Ok, that simplifies the implementation _slightly_.

Ok, lemme try to come up with a coherent usage message. Something like:

usage: chmod [-R] PERMISSIONS FILE...

Set read, write, execute, suid, sgid, and sticky bits for user, group, and other.

PERMISSIONS are an octal bit pattern or one or more (comma-separated) [ugoa][+-=][rwxstXugo] stanzas.,/p>

For each category (u = user, g = group, o = other, a = all), set (+), clear (-), or copy (=) the permissions (r = read, w = write, x = execute).

Special permissions:

Octal top+bottom bit patterns (7777 is all bits set):

ug uuugggooo
sstrwxrwxrwx

Hmmm... Not _entirely_ coherent, but it fits on one screen...


May 14, 2012

Last week I emailed Fabrice Bellard:

Hello Mr. Bellard, I'd like to run a kickstarter to hire you to:

1) Adapt qemu's Tiny Code Generator to work as the back-end for your old Tiny C Compiler, to create a new qcc (QEMU C Compiler) that can produce output for the various targets qemu supports.

2) Resurrect tccboot with the result, and get it to boot a current (3.x) kernel to a shell prompt. (Another "modified subset" is fine, as long as it boots to a shell prompt.)

3) Release the result under a BSD license.

Does this sound doable? If so, how much would you charge (so I know how much to ask the kickstarter for), how long do you think it might take (ballpark), and when might you be available to start (if we can get you the money by then)?

(I.E. "it would take me a dozen fortnights, cost my weight in canadian 'toonie' coins, and the next open slot in my schedule is 37 years from now.")

--- Optional details:

My notes on this project, from when I tried to do it myself, are at:

http://landley.net/code/tinycc/qcc/todo.txt

I can maintain this after it works, I just don't know enough to make it work in the first place, and have been trying to find time to learn for years now but keep growing _other_ projects instead (toybox, aboriginal linux, I accidentally became linux-kernel Documentation maintainer...)

I have no particular interest in the current "no releases in 3 years" tcc mob branch, and am just as happy for you to start with your old code if you prefer. If you want anything out of my old tcc fork, I hereby grant it to you under the same BSD license as tcc/tcg.

It doesn't need multilib, being able to build "arm-tcc" and similar would be fine, and probably the common case given the need for libc, libtcc, crtbegin, and so on. (Being able to specify code generation with the same granularity as qemu's -cpu option would be nice, but not a huge deal in the absence of any real optimization.)

Eventually I'd like to "busyboxify" tcc/qcc, I.E. make it so the front-end recognizes whether it's called as cc/cpp/ld/as/strip and reacts accordingly. But I can handle that part later, and make its command line parsing understand more gcc-isms if necessary. I wrote some notes about that years ago here:

http://landley.net/code/tinycc/qcc/commands.txt

I don't care about C++. The missing C99 bits from your old tccboot notes would be really nice, though.

Simple dead code elimination would be really nice. (Busybox depends on it to avoid linker calls to undefined functions.) Just detecting if (0) constructs after constant propogation and suppressing output (or diverting output to a ram buffer that gets discarded) would be plenty. But if that sounds out of scope, I could probably tackle that after the fact too...

Thanks for your time,

Rob

Today, I heard back:

Hi,

I had the same idea when I was working on TCC and QEMU. The code generator of QEMU is not generic enough to do it, but at that time I began to modify it to handle the missing bits. Unfortunately it is a large project and I lost interest in it. Maybe someday I'll be interested again in compilers (perhaps to do a mix between C & Javascript), but now I have other projects which have a higher priority, so I cannot help you now.

Best regards,

Fabrice.

Hmmm. I wonder if Paul Brook or Anthony Liguori or one of the other codeweavers guys would be up for this?


May 13, 2012

An article wandered by on twitter about friday's rant, about how clean energy is really popular in all the polls, it's just one political party is adamently opposed to it and the other has no spine, so what the vast majority of people actually _want_ is going unserved. Oh, and republicans are trying to stop the US military from using biofuels.

(Sigh. "And to everyone else out there, the secret is to bang the rocks together guys.")

Elsewhere, Paul Krugman points out that inequality is one of the main causes of the current bad economy, I.E. if .01% of the populace has half the wealth, everybody else is significantly poorer than if the resources were distributed more equally.

The current "demand-limited liquidity crisis" is due to people who have money not spending it, I.E. a small number of billionares who got rich by "cornering the market" on everything including our political system have set themselves up as choke points, and this means a few dozen scared old men hoarding cash can throttle the entire country's economic activity.

Meanwhile, here's the best summary I've seen of the situation in Greece, which roughly jibes with Krugman's guesstimate that Greece's exit from the euro could happen within the next month or so.


May 12, 2012

Meant to do programming today. Instead slept in, then mucked out the old condo some more. (The upstairs is clear, the downstairs, including the kitchen, is not.)

I need to get the server set back up so I can debug the weird Fedora build break, which still isn't happening anywhere else. I need to get an Aboriginal Linux release out before the _next_ kernel release ships. (I refuse to miss two in a row.)

And dalias (the musl maintainer) poked me about xargs not following the insane quoting semantics that predated the invention of "find -print0 | xargs -0", and seems to think it's important. Sigh... I suppose a config option since it _is_ in the spec...


May 11, 2012

And now the Koch Brothers are mounting an Astroturf campaign against wind power.

Dear Barack Obama: ARE YOU BLIND???

I've come to the conclusion that if the democrats WEREN'T spineless and clueless, they'd tax the hell out of the fossil fuel industry and use the money to subsidize the alternatives. It is the OBVIOUS political move.

There's no real political downside because the fossil fuel companies already exclusively back the Republicans, AND SET THE REPUBLICAN AGENDA. Follow the money: this is why they keep electing "oil men" president. It's why the US invaded Iraq the first time (because Kuwait had oil), and the second time (because Iraq did, especially after a decade of sanctions put it way behind the rest of the middle east's peak oil depletion curve). It's why they support the Keystone oil pipeline and love fracking. It's why they _apologized_ to BP during the gulf oil spill, congress was biting the hand that fed half of it.

All the anti-global-warming denialism is funded by oil (and coal) companies desperate to avoid paying for the consequences of their actions, using the Tobacco Industry's model to distract, delay, and deny. If they have to work to undermine the country's belief in science itself in order to deny what the science is clearly and unambiguously saying, that's what they'll do.

Peak Oil happened in 2005 and has been really good to the fossil fuel industry, because over the past decade prices have quadrupled. Opec doesn't have to restrict supply any more, production can't keep up with demand even with massive offshore drilling looking at the only areas we haven't sucked dry yet: the deep ocean.

That's is why this year's Fortune 500 is topped by Exxon, Wal-Mart, Chevron, and Conoco: three of the four largest companies in the US are oil companies, and like the Tobacco industry did they side with party that can be bought, the one that credits the rich for their success and blames the poor for their poverty, even when both are inherited for generations.

Obviously, continuing to depend depending on oil _after_ world production has peaked, with China and India rapidly growing their imports (competing for the remaining supply), is financial suicide for the country. But hugely lucrative for the companies that supply an ever-more-precious commodity to a captive market, for as long as they can keep that market captive.

The Democrats try to help people, and the poor need the most help. Social Security, Medicare, affordable housing, affordable education. When they see a problem, they're likely to try to do something about it, and they believe government _can_ do good things. The Republicans believe in self-interest, and getting the government out of the way of people's self-interest. I.E. the Democrats naturally want to stop pollution and dependence on fossil fuels, and the Republicans can naturally be bought.

So massive amounts of oil money are funneling into the Republican party, working to undermine alternatives to oil, roll back environmental regulation, open the arctic wildlife refuge to drilling, and control oil-producing countries with our military. (Notice how we effect "regime change" in countries like Iraq and Libya that have a lot of oil but don't give us preferential pricing. Sanctions work, but they mean we can't just buy that oil cheaply right now. Why restrict the oil supply when we have laser-guided bombs?) And, of course, stop the Democrats from doing anything about it.

The Koch brothers are oil billionaires (investing oil profits in spin-offs the way Phillip Morris owned the Kraft food company, but Oil's the heart of it). That's also how the Bushes made their money. Dick Cheney was CEO of Haliburton -- an oil company --, and since leaving office he's already partnered with Rupert Murdoch in another oil company. And of course Rupert Murdoch's justification for invading Iraq back before it happened was (and I quote): "The greatest thing to come out of this for the world economy would be $20 a barrel for oil. That's bigger than any tax-cut in any country." These are not coincidences.

So of COURSE the Republicans attacked and mocked Soylindra. Obama subsidized solar electricity generation, they were terrified it might succeed. And you can tell Obama's an amateur because when they howled in pain he BACKED OFF.

Every time this guy hits a nerve he stops and apologizes instead of following up. Obama is NOT strongly pushing a huge carbon tax and pouring the money into massive R&D and subsidies for solar and wind and batteries and hydrogen and treating the whole thing as a giant jobs program and preventing the country's money from draining away to the middle east. You want to fix our trade deficit, it's obvious where to start. He's not highlighting the pattern of "Big Oil says jump republicans say how high" and calling them out. Instead he's treading lightly so as not to offend the oil companies, because he's completely spineless. He's trying preemptive unilateral compromise with the successor to the Tobacco institute funding the Party of No.

So yeah, it's nice that Biden cornered Obama into declaring gay marriage to be a state's rights issue like abortion was _before_ Roe vs Wade and like interracial marriage was back when he was born (his parents' marriage was illegal in 16 states). Somehow, this is counted as progress from mister half-measures. He (and I quote) "hesitated on gay marriage in part because I thought civil unions would be sufficient". Yes, the first black president endorsing separate but equal. If marriage is really a religious ceremony why can christians marry hindus, or an atheist marry anybody? It's a tax status with insurance benefits, and nobody's who's been divorced more than once can talk about the "sanctity of marriage" (basically all the republicans).

But I wouldn't exactly consider myself "energized" by this. If we had primaries I'd happily give Hillary a chance. Oh, what do you know, early voting opens monday...


May 10, 2012

Wow, Linux Documentation maintainership means being cc'd on MASSIVE quantities of irrelevant crap. Most patch series touch something in Documentation (or _should_), and so do things like device tree format changes, and scripts/get_maintainer.pl will now add my email address to the cc: list of any patch series that touches anything in there. Meaning I get cc'd on the entire patch series, and all ensuing discussion.

Oh well, I'd been meaning to get back into reading linux-kernel anyway, but _dude_.

Speaking of which, I pulled up last week's linux-kernel web archive threaded view (on the theory that it's stopped updating so I won't have to go _back_ to check for additions I missed), and skimmed the first half of it. (Interesting topics, interesting users, basically what caught my eye.) This took 2 hours. It looks like just _skimming_ linux-kernel is now about 4 hours a week. Not as bad as I feared, but a kernel-traffic or kernel podcast style summary would still save me a lot of time. Alas, the people who did both stopped because it was too time consuming.

(In the past few years I've tried to hire two different people to do it for me, since it's a great learning experience for a computer-interested high school or college student and a heck of a portfolio piece when you're ready to get a real job. But both fell through for different reasons and it turns out I don't really KNOW a whole lot of high school or college students at the moment. They all graduated.)


May 9, 2012

Finally got a couple hours to bang on open source again. Went through one more of Georgi's patch backlog, removing strndupa from mdev, which was kinda moot since dirtree had changed out from under it and the command was never finished in the first place (no hotplug support). Did the basic cleanup so it compiles again, anyway.

I really need to get an aboriginal linux release out but the server isn't set back up so I can't track down the Fedora build break. (I tried the build on the new Ubuntu LTS and everything worked there, so it's not some newer package version, it's Red Hat being screwy.)

The amusing part is that last night I spent half an hour looking for a Generic Power Cable. The kind that's so bog standard that my commodore 64's disk drive used it in 1982? The TV uses it. The printer uses it. The server uses it. The replacement power adapter I got for my netbook uses it. I have a spare bundle of _5_ of the suckers back at the condo. And I apparently haven't got ONE of them in the new place.

Ironically, the XBox360 uses a nonstandard version. It looks like the normal one but only has 2 of the 3 wires: no ground pin. Yes, it's the most generic piece of hardware in the past 30 years and The Law Offices of Small and Limp Esquire did a nonstandard version I couldn't repurpose.


May 5, 2012

Moving to the new house. Everything's in boxes.


April 29, 2012

So one of the problems Fedora has is it installs ccache by default, and ccache support turns out to be broken in record-commands. I added infrastructure to support this almost three years ago, but for some reason the fallack directories are being created but not included in the $PATH that more/record-commands.sh is using to call $CC.


April 28, 2012

I felt mostly recovered from the cold except for interminable coughing (which gives me a headache and a raspy throat, but those are effects of the coughing). Then I tried packing some boxes, which involved a couple hours of breathing dust...

That didn't end well. Lungs not efficiently clearing themselves at present.

Programming! I'm trying to debug the m68k target in Aboriginal Linux. The m68k support in qemu is unfinished, but current progress is going on out of tree:

git clone git://gitorious.org/qemu-m68k/qemu-m68k && git checkout remotes/origin/q800 -b q800 && cd qemu-m68k && ./configure --target-list=m68k-softmmu && make -j 3

(You have to re-checkout fairly regularly because this branch rebases against upstream every time anything new shows up in it. No point in pulling.)

So using that, I get as far as:

ABCFGHIJK
Linux version 3.3.0 (landley@brillig) (gcc version 4.2.1) #1 Sat Apr 28 20:26:59 CDT 2012
bootconsole [early0] enabled
Detected Macintosh model: 35
VIA1 at 50f00000 is a 6522 or clone
VIA2 at 50f02000 is a 6522 or clone
Apple Macintosh Quadra 800
qemu: fatal: Illegal instruction: 7f45 @ 00000000

And then it dumps registers. So the kernel is booting, and then dereferencing a null pointer long before it's time to run userspace.

My rinse/repeat command line here is:

./linux-kernel.sh m68k && ./system-image.sh m68k && KERNEL_EXTRA=ignore_loglevel PATH=/home/landley/qemu/qemu-m68k/m68k-softmmu:$PATH more/run-emulator-from-build.sh m68k

The first two stanzas rebuild the kernel and repackage the system iamge, then the "ignore_loglevel" thing is a kernel command line argument that tells the kernel to show _every_ printk on the command line.

Beyond that, it's a question of descending into build/packages/linux and groveling around for strings from that output (they're mostly in arch/m68k/mac/config.c so far) to find recognizeable points _before_ it crashed, and then follow it forward to isolate the crash.

So mac_identify() is where the last identifiable chunk of output comes from, and that's called from config_mac() (which makes it all the way to the end as determined by a printk() there producing output).

That gets called from setup_arch() which turns out to be in arch/m68k/kernel/setup_mm.c, and... it goes into paging_init() and never comes out again. Ok, this is an mmu emulation bug.

Oddly, if I enable DEBUG in that file (which just switches on a bunch of printk() calls) it turns into a hang. And the hang refuses to narrow down to a specific line with additional printks, it seems the kernel does something and some amount of time _later_ QEMU goes into a tight loop that kill won't take down with -9.

Weird...


April 26, 2012

Back at work for a half day. Still coughing, but at least not sneezing twice a minute, probably not contagious. Mostly stayed in my cube anyway, and did the "self-assessment" part of my annual review (which is due friday, and that was the extended deadline).

Hanging out at Fry's' coffee shop after work, which means no internet, which means I can't watch The Daily Show (no RSS feed to download it), so I'm catching up on the Rachel Maddow episodes I downloaded but haven't watched yet. April 2: The Abortion Show. April 4: The Abortion Show. But both had enough of a "war on women" slant to at least make sense in a larger context.

April 6: she found an excuse to do a Bad Science About Nuclear Power segment through a really strained segue about an abandoned ship floating around the pacific for a year after the tsunami (totally tipped her hand since the segment was titled "half life of the ghost ship" so I was going "how is this going to turn into Bad Science About Nuclear Power" for several minutes before it actually did. The ship actually had nothing to do with fukushima, it was just an excuse to look back at the tsunami and go "I still don't understand nuclear power but am terrified of it!" I mean, at least she's not going on about homeopathy or anti-vaccination stuff, but _dude_... And the interview is about that too. (Control-right arrow skips 60 seconds forward in VLC. Shift-right arrow is a five second skip.)

Sigh. She has some great segments towards the _end_ of each show. She just has to vent her fixations at length first. The need for 2/3 votes for "immediate effect" in Michigan sounds really important, but you have to sit through the rest of the show to get there...

(I say this as someone who tweeted earlier today the observation that if you look at the numbers, Planned Parenthood is the most effective anti-abortion organization in the USA because all the contraception it distributes reduces unwanted pregnancy, which is a prerequisite for abortion. Obviously I care about the issue, but the sheer broken record hammering on it gets old, then gets annoying, then gets to "please stop talking to me about your religion" levels...)


April 24, 2012

Still sick. I am so ready for this cold to be over.

I need to get back to poking at thread/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S to PIC-ify the __fork_generation symbol. Basically declare a global int, do an |= assignment to it in main(), and then compile that with -fPIC and disassemble it to see what it does.

Alas, I haven't got the concentration right now...


April 23, 2012

Home sick from work. Thought I'd get some work done, but mostly wound up napping or sitting on the couch catching up with twitter or playing skyrim. No brain at all today.


April 22, 2012

Bleah. Sick today. Scratchy throat, voice sounds weird, headache. Thought some exercise might help fight it off, but a half hour into biking I got nauseous.

Fade says she had a mild version of this and fought it off with those massive 1000 miligram vitamin C packets. Took a couple of flintstones' chewable "with extra C", for the placebo effect if nothing else.

Trying to concentrate either on fixing toybox ls (directory handling is all wrong) or aboriginal x86_64 (an assembly hunk is missing an #ifdef __PIC__ stanza and dies with an impossible relocation error: apparently the uClibc guys don't really test x86-64 much). Unfortunately, my brain is going "a 4 hour nap might be a good idea", and I'm an hour bike ride from home at present.

An energy drink counts as hydrating, right?


April 21, 2012

Yay, it's the weekend: I can do real work again!

Poking at aboriginal to get armv4l building again. It looks like they didn't bother to implement nptl for arm-oabi. Can't really say I blame 'em. I also need to fix x86-64, and figure out why fedora host breaks the build. I'm reinstalling the upstairs server with Fedora, since gentoo ate itself. (If you only tell your gentoo server to --sync and upgrade itself every 3 months or so, it gets really unhappy. These guys do not have releases, and their idea of legacy support seems to be "a whole month".)

Poking at toybox to make ls -l with no arguments work: all those fiddly decisions about when to recurse into directories, and command line arguments being special. Although I think "ls symlink" vs "ls symlink/" is handled for me by the OS because the second one is implicitly "ls symlink/.".

The Documentation/ discussion continues apace. Converging on a reasonable solution, sounds like. The thing to remember is "this wastes a megabyte of permanent storage for every upload" is a touch eroded by Moore's Law these days. And if I rsync to my server and have _it_ do the git checkin (with its' 1.5 terabyte drive), if it gets big enough to bother them upstream it's still not my problem: they're the ones who felt git push was preferable to rsync for generated files.

But mostly, packing boxes to move stuff into the storage space we rented several days ago and have yet to actually put anything in. The "dust makes you cough" thing turns out _not_ to be a myth.


April 20, 2012

Poked at the users at kernel.org list to see about getting my kernel.org account reinstantiated so I can update kernel.org/doc again. I need to create a gpg key and get it signed by people I don't know, because getting it signed by people I do know (who are or have been in the kernel MAINTAINERS file) doesn't count, because they don't have active kernel.org accounts and their current process is remarkably insular.

Getting an account means I get access to a "kup" tool which is designed around the assumption that you're doing git. Each file you upload must be individually gpg signed, although it has built-in support for telling a git tree on the server to create a patch file or a tarball, and store it. I have a python build script that takes half an hour to generate a directory full of files, which I then rsync to a server. Using kup to do this would be hilariously awkward, and involve writing a script to drive it and leaving it to run overnight. Every time.

I asked the kernel.org guys if they could maybe redirect docs.kernel.org to a server I maintain, and bypass all this. Instead, they're trying to figure out a git-based rsync so I can update kernel.org/doc again. Apparently, if all you have is git, everything looks like a repository. (Imagine if every .o file during a kernel build had to be checked into git to produce a vmlinux. Every time. Oh well, it's their disk space.)

I'm grateful they're trying to adapt to meet my use case, but amazed at the blinders their mindset has. Especially since kernel.org predates git.


April 18, 2012

Catching up on Aboriginal Linux. I've got the kernel updated, busybox updated, uClibc all the way up to 0.9.33.1, and everything using NPTL. Now grinding my way through the Linux From Scratch (still 6.8) build...

Also poking the kernel.org users list, to see about getting kernel.org/doc updating again. (I think the kup program is what you'd get if the TSA wrote software. The security vs usability tradeoff is skewed pretty far to "maybe if we make the system useless, nobody will try to crack it". I really don't want to try to reinvent rsync on top of somebody's perl script.)


April 16, 2012

A police state is the legislative equivalent of training wheels. Needing secret police to govern means you suck at governing.

Back when El Shrubbo perpetrated the Department of Homeland Security, TSA, and Guantanamo Bay, I visualized him with a life preserver and flippers crouched in a row of olympic swimmers at the edge of a pool. The idiot kept calling a timeout and demanding water wings, a shower cap, and noseplugs on top of that because he was completely unprepared to do his job. Every demand for additional power as one more sign he was profoundly ineffective with what he already had.

That's part of the reason I was so disappointed when Obama signed off on FISA and kept Guantanamo open. Until then I thought he was good at his job.


April 14, 2012

Checked in the dirtree and ls changes, and three or four other pending little fiddlibits in my toybox tree. For the first time in ages, "hg diff" shows no changes. Feels good.

I fall over now. Tomorrow I need to do linux/Documentation and Aboriginal stuff.


April 13, 2012

Friday! Went home exhausted around noonish. Dunno if I had a stomach bug or if I was just so utterly drained by sitting in a cubicle that I was sweating and my vision was greying out. (My subconscious is _not_ happy with my recent life choices in regard to what I do 9-5. Wearing a suit and tie would be _less_ demaning, at least that's blues brothers/tenth doctor cosplay. Cubicles are just dumb.)

Slept for several hours before feeling human again. I've only managed to do my "get up at 5 am and be productive before work" thing once this week. It's sad, I miss it, but I'm just so _tired_ all the time. Of course now that the sun's gone down I'm perking up. (It's also been 8 hours since I've had to sit in a cubicle, and I don't have to do it again for 2 full days. That really helps too.)

So, trying to finish up ls this weekend, and get Aboriginal Linux and the kernel Documentation directory under control. I've fallen _way_ behind on all the work I actually care about.

I'm most of the way through ls, but hitting strange behavioral corner cases. Currently, when to print the directory name header. You _don't _ do this when you're listing just files on the command line, including directories with -d. You also don't do this when you're listing just one directory (either "ls dirname" or the implicit "." you get with no arguments), except that if you say -R then the first directory gets a header. If you say "ls dir1 dir2" then both get the header.

I read the ls spec, which said:

If more than one directory, or a combination of non-directory files and directories are written, either as a result of specifying multiple operands, or the -R option, each list of files within a directory shall be preceded by:

"\n%s:\n", <directory name>

If this string is the first thing to be written, the first shall not be written. This output shall precede the number of units in the directory.

Except that if I go:

mkdir sub
cd sub
ls -R

I get ".:" instead of no output, even though the bit in the spec about -R was a conditional about _why_ you might list more than one directory, and this is just one directory without even any files in it. (I.E. the behavior of the gnu/dammit version of ls isn't quite what the spec requires, but what else is new?)

And of course "ls -lR file1 file2 dir" still lists those two files without a "dirname:" prefix, then gives a dirname prefix on the dir. In fact those files don't have a "total:" either.

In a way the pathological case for doing this the way the spec implies is "mkdir -p sub/sub; ls -R sub" because you can't tell you're listing more than one directory until after you descend into it, so you either do an arbitrary amount of readahead (the directory could have 5000 files ane no subdirectories, in which case it should retroactively have no prefix) or you defer the decision until you descend into it (so the test is in the recursive instance of the display function).

Some more fun ls behavior: "ls -l todo.txt todo.txt todo.txt" treats them as three separate files (understandable), and "ls -l doesnotexist sub" gives a "sub:" label on the directory even though the other argument doesn't exist. So what it's _testing_ is that the number of command line arguments <= 1.

Sigh. I think I need to implement -R forcing the dir label, if only because it's what everybody expects, regardless of what the standard says...


April 11, 2012

Added All the Flags Ever to ls, haven't implemented most yet.

I think the right way to handle -d is actually display-time filtering, which implies the command line parsing shouldn't create separate files and dirs lists but should just make one big one, and yank/free the non-directory entries as it displays them. (Actually I can filter them out as it assembles the table for sorting, although figuring out what to free when gets a bit tricky, but not too bad.)


April 10, 2012

The Daily Show's commercials have gotten weirdly mysoginistic. There's a Junk-in-a-box commercial where somebody marries a hamburger, and the wedding ceremony ends with "you may eat the bride". Then there's a commercial where some guy is followed by a cloud of nanites coming out of his hair which disintegrates three women (leaving behind piles of clothes), and then the women are reconstituted in his bathroom at home by The Product.

They spent money filming these things. I'm not sure why. Possibly I'm just "not watching television for the past decade", so these seem unusually weird to me. Then again I've yet to figure out what the woman in the pink and white striped dress has to do with t-mobile, so...


April 9, 2012

Swamped. I've been dinking away at an ls rewrite to test the new dirtree stuff, and it's about 80% done, but this is one of those "breaks stuff that used to work" patches that it's hard to check in just part of. (I must admit, loading my old Red Hat 9 image and comparing the ls -l output with what Ubuntu 10.04 is producing, it's changed a _lot_ over the years. And of course SUSv4 is remarkably vague.)

Fade's family is visiting. Went to see Miyazaki at the Drafthouse. Fade's off with them at their hotel tonight, I helped Tryn pack boxes then came home to bang on software.

Signed paperwork to buy a new house on Thursday, although we won't get to move into it for a while. Currently two different places want to be declared our legal residence for mortgage purposes, and the "I'll claim one and Fade claim the other" idea turns out not to work if you're married (for some strange reason) so we'll probably just wind up selling the old one sooner than we intended. So that's more paperwork to look forward to.

Work remains crushing: I sit in a cubicle, under flourescent lighting. It's a _cubicle_, combining claustrophobia with a complete lack of privacy in a way that doesn't seem possible, and yet. The old building didn't have cubicles.

Might poke my boss to see if I can get time to work on the kernel's Documentation/ directory as part of my day job duties. I'm not doing toybox or aboriginal work on company time because those are my projects and the company may decide "we own him, therefore we own them" (which isn't true but is an easy enough mistake to make I'm keeping some clear lines here to avoid upleasantness). But curating a library of kernel documentation is a small part of a larger project that predates me, and something they're unlikely to get grabby about...


April 5, 2012

User @vixy on twitter linked to "An excellent theory as to why men keep trying to make laws about vaginas", an article which titles itself Birth Control -- and why we'll still be fighting about it 100 eyars from now.

Its thesis is that when a priveleged class has arranged a concentration of wealth and power, not only will they die before they give it up but their descendants have to die too. Previous revolutions often killed off the old guard, but the printing press didn't, and thus took centuries to not just establish new norms but make them stick. (The fact we're still arguing over teaching evolution in schools implies we still haven't gotten through this one.)

Another tweet from @mattyglesias is The entire future of the American economy in three paragraphs, with important Deep Space 9 allusions. This one points out that our economy is eliminating the need for manual labor, which would result in everyone at leisure if our society was set up that way, but is instead winding up with everyone unemployed because nobody needs them to do anything. The world _is_ changing fundamentally, but we aren't changing to take advantage of it.

The trend I'm noticing here is that concentration of wealth and power requires there to be poor people to extract tribute from. What good is money if you can't pay people to work for you? The existence of rich people, who can command hordes of peons at will, requires hordes of peons, each willing to work full-time for a tiny fraction of the money the rich person has at their disposal. You can't have an english manor without a servant class to staff it.

Part of it is that being a slumlord still pays, and with pawn shops, prepaid credit cards, and "payday loans" it pays more than ever. But also if we're all rich then we've merely raised the standard of living. To be rich is to stand out from the crowd and be BETTER than everyone else. You can't be better than everyone else without someone to be better _than_. Lots of someones.

The most fundamental servant class in history was "women", and the conservatives attempting to re-establish their gloriously hallucinated past are trying to put women back in the servant class by taking away birth control and equal pay and domestic violence protections and so on.

The words "conservative" and "liberal" are actually kind of interesting, from an etymology perspective. When something is liberally applied it means you use a lot of it, free-flowing. Liberals are the experimenters who try everything to see what works, the Johnny Appleseed types who spread their influence far and wide and let successes snowball and failures recede on their own merits. To be a liberal means trying new things. Conservatives are conservators and conservationists: they retain the past. Their job is to prevent change, and when they do strive for something different than today it's by going back to the past: researching a perceived golden age and attempting to recreate it, albeit often via mythology so they're often trying to "rebuild" a camelot that never really existed and doesn't work in practice.

All this gets us back to the baby boomers, still the dominant force in my lifetime. The rise of conservatism is becomes the baby boomers are shriveled up old fogies unable to cope with change. As teenagers they were willing to try anything once, and thus were liberals: Free love, hippies, woodstock, protesting vietnam, the space age, the works. All before I was born. Now they've stopped trying new things, and are thus ultra-conservative. The _baby_boomers_ are the ones who made "liberal" a dirty word. What they know is all there is. They don't want anything new, because they couldn't adapt to it. They just want to relive their glory days and return to an imagined past, not the _actual_ past when they were hippies or yuppies, but when they were the fine upstanding straight-laced citizens they never actually were. And in pursuing this fantasy they're providing an army of votes and funding for the concentrators of wealth and power to elevate the 1% of the 1% so far above everyone else they're as untouchable as Marie Antoinette.

Then there's the libertarians, insisting that Alexandard the Great couldn't be great if he wasn't free to conquer the world, and thus allowing Genghis Kahn to rampage across asia makes us all more free somehow. The greatest defenders of the 1% are those who insist we can't restrain them: you can't have a government because that would stop us from having kings.

Kind of annoying, really.


April 4, 2012

So Randy Dunlap is retiring as maintainer of the linux-kernel Documentation/ directory and he asked me if I wanted to do it, so this happened. I need to reinstantiante my kernel login, but probably not before signing giant stacks of house-related paperwork on thursday, and then Fade's relatives visit of the weekend, and I'm trying to get the toybox and aboriginal stuff caught up...

And I think I may need a caffeine detox. The energy drinks have stopped working again...

Busy month.


March 29, 2012

Why do I think busybox has lost the plot?

I needed a tftp daemon for work, and turned to busybox as the trivial solution to the problem. According to the --help "./busybox tftpd /path", right?

Wrong. It kept spitting the help back out at me, without ever saying what was actually wrong. Eventualy I dug into the source code and found this (in tftp.c, not tftpd.c, and the source remains an unreadable forest of #ifdefs as usual):

our_lsa = get_sock_lsa(STDIN_FILENO);
if (!our_lsa) {
     /* This is confusing:
      *bb_error_msg_and_die("stdin is not a socket");
      * Better: */
     bb_show_usage();
     /* Help text says that tftpd must be used as inetd service,
      * which is by far the most usual cause of get_sock_lsa
      * failure */
}

No, the help text says it "should" be used as an inetd service, not that daemon mode has been completely removed. The old message was better.

Not having daemon mode is annoying: Ubuntu is upstart based and I haven't looked into adding services to that without rebooting, nor do I want to try to install inetd alongside it and hope they don't fight.

But fundamentally: the point of inetd was to have a single running binary that used a small amount of memory, which spawned short-lived instances of the other daemons to service each request, then they exited while inetd waited for new connections. (This was back in the 1970's, before the widespread use of swap space and dynamic linking with lazy binding, or even faulting in pages as-needed for static binaires.) Having a daemon running pinned memory, so inetd made sense. These days daemons like Samba and Apache are much larger than their historical counterparts, but we've got way more memory and we have swap space to flush them into when they haven't done anything recently.)

Now let's look at inetd in the context of busybox: if inetd is in busybox, and tftpd is in busybox, you have busybox launching another instance of itself, just like daemons do. Having it be two applets accomplishes what, exactly? It's just an excuse not to factor out the accept() code into libbb and make daemon mode cheap.

Even worse: the busybox help text gives two examples, one for inetd and one for udpsvd, both of which are in busybox. I.E. it's got redundant implementations of the same darn functionality, which has no business being a separate app. (Note: netcat server mode can do this too. Netcat server mode at least has a _reason_ to be able to do this, being a general-purpose tool and all.)

As for my "simple, quick and dirty solution", I just downloaded tftpd-hpa which has daemon mode and doesn't reply on any external packages in the ubuntu repository, and made a note to write a tftpd in toybox someday.


March 28, 2012

Huh. I think Ulrich "death to static linking" Drepper is no longer Glibc maintainer. Not only did he move to Goldman Sachs, but according to Roland McGrath the glibc steering committee just dissolved.


March 27, 2012

The Fedora bug appears to involve gcc building with "--target=i686-unknown-linux", "--build=i686-walrus-linux", and "--host=i686-unknown-linux". Since the machine it runs on and the machine it produces binaries for are the same, it's linking stuff against the target library and trying to run it on the host, even though I told it the _build_ machine is different.

Note: this is a bug. We're cross compiling a program that runs on machine X (the conventional "target", you supply an existing cross compiler that produces these binaries, and the _only_ way we should have to specify this is by supplying the cross compiler). The program happens to be a compiler, which produces output for target Y. You should never have to tell the build what kind of machine the build host is: you can check compile-time macros if you really care, but mostly you shouldn't.

The fact that gcc doesn't think this way is because the designers of its build system were insane, and created a giant pile of overcomplicated crap that serves no purpose.

So when building gcc, --host actually means target, --build means host, and --target means output type for binaries generated by the new compiler. I told it --build is not the same as --host, I.E. host != target, so the fact it's trying to link host binaries against target libraries is a bug.

Now to dig into why it's doing that...


March 26, 2012

Catching up on The Rachel Maddow Show: the Friday March 16 show was "All Abortion All The Time", at least until I gave up and skipped to monday, more than halfway through the episode.

Monday the 19th was a special report on "Bad Science About Nuclear Power". At the 6 and a half minute mark: she showed a picture of Osama Bin Laden and Some Other Guy with A Big Beard, while explaining how Eisenhower's "atoms for peace" thing from the 1950's was somehow responsible for them wanting nuclear weapons. (I thought the US dropping bombs on hiroshima and nagasaki sort of clued the rest of the planet in that nukes were a thing.)

I think Rachel's point is that if Eisenhower had kept nuclear power a secret, nobody else would ever have discovered it, or something? Because obviously after we used nukes to end World War II, nobody else ever would have followed up on that. "Gee, all this radiation, just like Marie Curie was researching last century, I wonder if that's involved somehow."

It's sad that whenever nuclear power comes up Rachel's brain seems to shut down and go into the same kind of "the evil must be destroyed without question, don't look at it or you'll turn into a pillar of salt" mode the right wing pundits spend all their time in. I really used to like this show...

Ok, let's skip ahead to the most recent show... and her first piece is on the Gabby Giffords shooting. Well, at least it's a new topic. Not sure it's news at this point, but ok... And it looks like it's working up to another "and there's nothing you can do about it" ending.

I miss Keith Olbermann. He was the lightning rod for all the "mad as hell" impotent rage, and Rachel could spend her time being informative. Rachel trying to rile up her audience is no fun to watch. (Too bad the other new spinoff shows haven't got podcast.msnbc.com feeds.)

Nope, didn't make it through the gun segment. After she'd gone on for 400 years about the inevitability of our lovecraftian demise with "stand your ground" legislation spreading through our precious bodily fluids, I closed the window and deleted the files I'd downloaded. She's just gotten too depressing to watch.

At least there's still John Stewart and Colbert, but I can't download those and watch 'em offline. (Well, not as easily.)


March 25, 2012

Somebody left a newspaper on a table at McDonald's, which mentioned that Cheney had a heart transplant. This implies Bush got a brain and Obama may have gotten some courage. We can only hope.

Biked to Chick-fil-a yesterday, today I biked to the coffee shop in Fry's. I need waaaay more exercise. (According to the time and temperature signs I passed, it was 55 farenheit. In direct texas sun, it was more like eighty, but that's still better than summer. Stopped twice to apply sunblock.)

I got Aboriginal to the point where I can build an x86-64 image on my laptop, and then build all the targets under that on quadrolith, and they pretty much work. (Or at least smoketest-all.sh gives the expected failures, which are generally qemu things. Half of 'em are that qemu's serial initialization on powerpc and sparc and such eat an unlimited amount of info, so feeding in a script as a here document, even with significant whitespace padding at the start, doesn't work. Once upon a time I was writing an expect implementation in shell, I should get back to that at some point...

(The fiddliness with expect is that shells don't have the idea of a circular pipe: you can't have the input and output of process X go to the output and input of process Y. The closest I've come is to mkfifo and connect the start and end of the pipeline together via the FIFO, but that leaves trash in the filesystem.)

Meanwhile, I set quadrolith building Linux From Scratch under all those targets (the server is called quadrolith because the first one was monolith and its successor was duolith... the fact it's 4xSMP is actually coincidental). And since I haven't got net access at Fry's I cant bang on the Fedora bug from here either (which _might_ have been fixed by fixing the brown-paper-bag bug slashbeast found, or at least I kicked off a rebuild of the native-compiler stuff to see if that's the case), I'm back banging on toybox for the moment.

So toybox: I've got like three major logjams trying to digest Georgi's patch pile, and the most tractable of them is dirtree. Having seen the fts functions and the scandir() stuff, I now have a reasonable idea of what dirtree needs to look like. I've got to wean it off of toybuf and PATH_MAX in general, and switch to something that uses openat() and fdopendir(). I worked out some quick sample code to confirm I'm using the suckers right:


#include <stdio.h>
#include <dirent.h>

int main(int argc, char *argv[])
{
  int fd=open(argv[1], 0);
  DIR *dir;
  struct dirent *d;

  dir = fdopendir(fd);
  printf("dir=%p\n", dir);
  while (d=readdir(dir)) printf("name=%s\n", d->d_name);
  closedir(dir);

  return 0;
}

I added a close(fd) at the end there and it returned -1 so the man page was right when it implied that fdopendir() took custody of the filehandle. Specifically, closedir() closes it so I'm not leaking them if I don't. (Recursing into big directory trees, I need to care about this sort of thing.)

The other thing I'm adding is dirtree_path() which takes a struct dirtree *node argument and traverses up to the root of the tree to assemble a full path. The fact we only assemble this path on demand (and then if we need it to be absolute we feed it to realpath()) is one of the big advantages this has over the fts stuff: if you tell that to assemble a tree for a big directory it's gonna eat buckets of memory with all the paths.)

Hmmm... there's no lstatat(). There's an fstat(), but I need to open thething first which means symlink resolution's already done. Either I can call open twice (once with NOFOLLOW), or I can use readlink first... Ah, there's a readlinkat(). And readlink() returns EINVAL if it's not a symbolic link, so I can use that to test whether or not it is. Ok, let's do that then.

It's slightly racy since the readlink() and open() could have something happen between, but... Hmmm. Ah! Do openat() with the NOFOLLOW flag, and then only readlinkat() if we couldn't open. This means the stat buffer isn't filled out when we read link info, but the only thing we're missing there is the date stamp on the symlink, which we can't _get_ in the absence of lstatat() anyway...

Hang on, posix 2008 has included openat() and readlinkat() so there's no way they'd have left this hole. (If _I_ can spot it... there's a difference between Microsoft and IBM paying off the committee to leave holes large enough to drive NT and OS/360 through, and missing this sort of thing. So let's look at posix's lstat() page...

There is an fstatat(), with an AT_SYMLINK_NOFOLLOW flag. Beautiful.

Grrr. "ls -a" shows "." and ".." entries, and I'm filtering them out because recursing becomes _insane_ otherwise, and 99.9% of the time you don't care. (Modern filesystems don't actually store them anymore, they're handled at the VFS level. I need to think about how to deal with that...)


March 24, 2012

Bunch of weird stability issues in Aboriginal Linux, which I'm cleaning up. The sanitize_environment stuff (to unset every environment variable that isn't recognized, because they can break the build) works by assembling a big long comma-separated list of allowed variable names in the environment variable "TEMP", and then iterates through the environment variables and unsets every one it doesn't recognize.

Somebody on freenode ("slashbeast", I'd credit him in the changelog if I knew his actual name) had the build break for him because TEMP wasn't in the whitelist. Oops. Brown paper bag bug, that. (It worked if TEMP wasn't an existing variable, and was thus at the end of the list. If TEMP _was_ an existing variable that got replaced, when sanitize_environment saw it, it would blank it... and every variable after it since the whitelist was now empty. If TEMP was before PATH in the environment variable list, bad things would happen in download.sh, which is what happened to slashbeast.)

Another really weird one is that when building the kernel, "make allnoconfig KCONFIG_ALLCONFIG=filename" works, but "make allnoconfig KCONFIG_ALLCONFIG=<(shellfunction)" only works SOMETIMES. It might have to do with /dev/pts being mounted, or a bash 2.x vs 4.x thing... Sigh. I think I'll just write the tempfile and not worry about trying to make the guts of bash reliable. (I can revisit this when I write my own shell in toybox and make the build use it: fixing bugs in an obsolete version of bash isn't interesting.)

Oh, and then there's EXTRACT_PACKAGE. It runs tar in a subshell, so it can pipe the output to dotprogress (a shell function that reads in filenames and outputs one period for every 5 filenames. This turns a "scrolls by so fast you can't see it, for pages and pages" tar output into a resonable if not ideal progress indicator. Making an actual progress bar out of that would involve knowing ahead of time how many entires to expect so you know where 100% is, which is a maintenance nightmare in the presence of package upgrades).

The problem with running tar in a subshell is if it _fails_ (due to a truncated archive or some such; yes we're checking sha1sum but that doesn't handle ALT packages or a packages directory populated by something other than download.sh, and what if the disk fills up during extract, or the user kills tar with killall or ctrl-C?), the subshell needs to propogate the failure up to the parent. Except the subshell is in a pipeline. A subshell in a pipeline that needs to pass out exit status is kinda funky, and -o PIPEFAIL didn't quite cover it for some reason.

My solution involved cacheing $$ and then calling kill on it from the dienow function. Guess what I forgot to include in baseconfig-busybox because it's not used in the normal flow of operations and thus didn't show up in record-commands.sh output? That's right, the kill command! So it blithely continued on and wrote the sha1sum file, which was then accepted as gospel later on even though the archive had only extracted halfway, and it didn't correct itself on future runs until I did an "rm -rf build/packages" to zap the corrupted package cache.

Obviously I want to put kill back in, but I'd also like to make the failure path more naturally detected. So I only want to create the sha1sum file for the extracted package within the subshell. If tar exits successfully, touch the file in the subshell, and then once we exit the subshell the parent can test that it exists. (Yes, it's IPC through the filesystem, but it works. And before you ask: using NFS ever for anything is pilot error.)

Problem: extract supports parallel operation (FORK=1 EXTRACT_ALL=1 ./download.sh), and is version-independent (so it can't depend on any _specific_ filename tarballs extract to). So each tarball extract happens in its own temporary directory, and then the directory created under that gets renamed to the expected desination name in the package cache. We don't know what the name of that new directory created by extracting the tarball actually is (it contains version information in formats that vary all over the place, every 2.4 linux kernel was just "linux", squashfs doesn't have a dash between name and version number, gcc-core extracts into "gcc", etc). So we wildcard it away: as long as there's just ONE subdirectory in the temporary directory, we don't care what it's called. If extracting the tarball creates more than one thing in the temp directory (and thus the wildcard expands to multiple things), then the fact the destination we try to rename the wildcard to isn't a directory (because it doesn't exist yet) breaks the build in an early and obvious way, and we know "hey, you forgot to say --prefix=linux/ when you did your git archive versionname | bzip2" or some other way you got a bad archive.

Anyway, so I want to touch a new file "$TEMPDIR/*/sha1sumsfile" from the subshell, which would actually have to be "$TEMPDIR"/*/sha1sumsfile because wildcards aren't expanded in quotes but we need to the quotes in case the absolute path in $TEMPDIR contains spaces.

The reason this doesn't work is that even though $TEMPDIR/* expands to a unique directory, $TEMPDIR/*/doesnotexistyet does not exist. (Because we're trying to create it.) So the wildcard isn't expanded, and touch complains it's trying to create a file in a nonexistent directory named "*".

Wheee.

The fix seems easy enough:

FILE="$TEMPDIR"/*
touch "$FILE"/doesnotexistyet

Except wildcards aren't expanded in variable assignments either, we need FILE="$(readlink -f "$TEMPDIR"/*)" which is well into the realm of black magic at this point. (I'd happily use echo instead of readlink -f, but I don't trust it. What does "echo *" do if one of the files in the current directory is named "-n"? Did you notice that echo doesn't support -- to end argument parsing? Oh, and $() trims trailing whitespaces even when you put quotes around it, presumably because $(echo hello) actually has a newline on the end. But I'm just going to assume no package tarball is crazy enough to create a directory name that ends with whitespace characters.)

WHY all this has to that way is sadly non-obvious, other than "it's what I need to do to make it work". Half this blog entry was written so I could figure out how to phrase the darn comment, because oh boy does that need one.

If you were wondering about the difference between "mature" software and "it worked for me"... it's all this sort of stuff. Never seems to run out. The sanitize_environment stuff was added to guard against variables that broke the build for people who weren't me. The immune system code broke the build in other ways, even though it also worked for me. Trimming down busybox instead of building defconfig protected against build breaks in the ever-growing katamari of busybox, but we're still adding back stuff seldom-used error paths need. And the KCONFIG_ALLCONFIG works as a file but not a pipe is somethign I hit when rebuilding the code under itself (because my gentoo server has emerged its way into a corner and the compiler internal-compiler-errors all over the place now, and rather than spend the weekend reinstalling it I decided to just grab a root-filesystem-x86_64 and build under that: which turned out not to work. Been a few months since I tried it. Linux From Scratch still builds, but the kernel didn't...)

Imagine the regressions if I was doing active development on this thing. Speaking of which, I wonder if m68k works in qemu yet...


March 22, 2012

I am informed that Ulrich Drepper no longer works at Red hat. I'm not sure his new employer is a step up.

And for future reference, running qemu with display on a remote server means doing this on the server:

qemu-system-386 -m 2048 -hda fc16.img -vnc 127.0.0.1:1"

And then locally going:

ssh user@server -CX vncviewer 127.0.0.1:1

The client side can be killed and restarted as many times as you like, it's just a view into the server's framebuffer (with attached keyboard and mouse).


March 21, 2012

Installing Fedora 16 under qemu to reproduce Denys's bug turns out to be a bit of a pain. It didn't like 768 megs of ram, so I tried on the server where I can give it 2 gigs. Then it managed to have three different bugs until somebody on the #fedora IRC channel pointed me to an A) updated B) xfce ISO image. That installed much more easily (Gnome 3 sucks mightily).

Then I had to install glibc-devel, glibc-static, gcc, and patch to get a reasonable development environment. (Remember, Ulrich Drepper works for Red Hat, therefore static linking got chopped out into an optional package because he really hates the concept.)

In theory, once host-tools.sh finishes the build gets a little easier. In practice, it died with a "No space left on device" error because I only gave Fedora EIGHT GIGABYTES of disk space and let it use its defaults for the install. (What a pig.)

And attempting to figure out how the disk space is USED is hard, because fdisk says there's one partition, which is a device mapper entry.

Ok, let's play this game: df /dev/dm-0 says it's a 1 gig devtmpfs, same with /dev/dm-1. (Why are there two devtmpfs instances? How do you have a PARTITION be devtmpfs? That's a synthetic filesystem derived from tmpfs which is derived from ramfs! That's like a partition coming back procfs or sysfs.) Let's try "ls -l /dev/disk/by-label", which returns one entry: "ext4" which is a symlink to "../../sda2" (which fdisk didn't list), and "df /dev/sda2" says it's /boot and half a gig (bit wasteful: I let it auto-partition). This implies there might be a /dev/sda1 (even through fdisk says there wasn't), and df says it's another (the same?) 1 gig devtmpfs. Same with /dev/sda3.

Results of investigation: Fedora is insane.

Looks like I have to try again, giving qemu a 16 gig image. (The host has a terabyte drive, presumaly Fedora can only waste so much space before becoming usable or nobody would be _able_ to use it.)


March 20, 2012

Spent last night at the coffee shop in the middle of Fry's reviewing the mode parser Daniel Walter contributed to toybox (getting it about halfway cleaned up). I am also hugely behind on Georgi's patch stack, but working on it. (As I posted to the list: the work Georgi's doing keeps raising design issues, and I need to resolve those. For example, we have three different directory traversal function sets: cp uses my readdir() based lib/dirtree.c, ls uses scandir(), and Georgi's patch stack has commands using the fts_open() family.)

Meanwhile the 3.3 kernel just dropped, so it's time to push out Aboriginal 1.1.2, meaning I need to test that on all targets, and install Fedora Core 16 to debug Denys's weird build issue (more host/target confusion leaking through because the host distro changed), and debug the mdadm build issue (meaning I need to download the mdadm build script that's having the issue and reproduce it from source). Plus I have pending todo items about wrapping cpp and distcc not being reliable, which I've sadly neglected far too long.

Oh, and the more/record-commands.sh stuff (which I'm using again for testing toybox) turns out to always have been subtly screwed up trying to record the host-tools.sh build (which itself edits the $PATH, sort of the point of that stage), and when a toybox change broke host-tools I went down the rathole of cleaning _that_ up. (The problem there is really that $OLDPATH is used to mean two different things, which used to coincide but it was a coincidence...)

I haven't even _looked_ at the new uClibc release yet, but that should go in too.

Wound up being up until midnight last night, meaning I slept through my alarm (well, went back to sleep) and didn't get any morning programming time in. Spent an hour dealing with financial paperwork, now back to daylight jobbery...


March 17, 2012

Fried enough that I took saturday to recover and didn't do any programming at all.


March 16, 2012

"GNU/Linux" is an oxymoron: Linux is GPLv2 only, GNU is GPLv3 or later, the two projects cannot share code. Either it's "mere aggregation", or it's a license violation. Pick one.


March 15, 2012

I've got an extremely active contributor to toybox (Georgi) who I basically can't keep up with. They've tested toybox in various build environments I can't currently reproduce (macosx, android bionic, musl) and sent various patches.

This discussion resulted in me yanking the "#define _GNU_DAMMIT" from lib/portability.h (because the FSF has no claim on anything I"m doing and if I can _break_ compatability with the Hurd while remaining standards compliant I will do so, the same way I sprinkle c++ keywords into my C99 code as variable names and such. I refuse to #define _ALL_HAIL_RECHARD_STALLMAN in code that not just ISN'T GNU, but is actively ANTI-GNU.)

I only had the macro in there as a temporary hack to get dprintf() (which I can live without) back beore I mothballed the code, and I replaced it with the feature test macros that SUSv4 actually recommends. (Note that SUSv4 doesn't _require_ feature test macros, this incredibly stupid idea only contaminated the standard for "strictly conforming" applications, whatever that means. It's like the "-pedantic" compiler option, I think.) Of course, this breaks stuff.

Let's back up: feature test macros. The glibc headers nominally require you to #define various things in order to get all the function prototypes and constants the headers actually define. Except they don't _actually_ require it: if you NEVER MENTION ANY FEATURE TEST MACROS in your program, then the headers detect this (with a couple of big #ifndef blocks in features.h) and it gives you _BSD_SOURCE, _SVID_SOURCE, _POSIX_SOURCE, and sets _POSIX_C_SOURCE to 200809L, all automatically.

I.E. most people will NEVER HAVE TO CARE ABOUT THIS. But if you _do_ define any feature test macros yourself, it switches all that stuff _off_ unless you do it manually. I.E. touch the knobs and you disable the autopilot and crash horribly. So in general, people don't do this.

There's one special feature test macro, _GNU_SOURCE, which is the Great Big Button That Switches On Everything. In features.h switching that on automatically #defines ten other feature test macros. Everybody who ever bothers with feature test macros just presses the big Go Away button. In fact things like the unshare(2) man page say "#define _GNU_SOURCE" before including sched.h, even though on an Ubuntu 10.04 LTS development system the actual header guard (in bits/sched.h) is __USE_MISC which gets #defined by _either_ _USE_BSD or _USE_SVID (both of which you get for free if you never mention any feature test macros in your source). I.E. the man page tells you to #define something you don't need to define, which would be the wrong symbol to #define _anyway_, just because it's the big Go Away button for feature test macros.

The unshare() function is part of Linux container support, implementing things like network namespaces. The kernel developers invented unshare() in response to a rejecting the OpenVZ patches: Linus and company designed a different interface for the kernel to do container support, and the namespaces bit is via a brand new system call.

The FSF has nothing to do with this, and is not even aware of it. The Hurd does support anything remotely like this.

Needing to say "_GNU_SOURCE" to get unshare() is _really_ _stupid_. The man page there was simply wrong, if you used unshare() without #defining anything, it worked fine in Ubuntu 10.04.

In current ubuntu, they "fixed" that, so you DO need to define _GNU_SOURCE in order to get unshare(). They took the bug in the man page, and changed glibc to match. Right, I can just #include directly and include the unshare() system call prototype directly in my code to work around the glibc regression.

So now we come back to MacOS X (which hasn't got _GNU_SOURCE), and toybox, which is explicitly opposed to the FSF (the project's license is BSD because GPLv3 was that bad)

And then there's musl, which ONLY IMPLEMENTED _GNU_SOURCE (not any of the other feature test macros like _POSIX_SOURCE which are actually _mentioned_ by SUSv4) because nobody ever tries to _use_ granular feature test macros: they just hit the big Go Away Button.

Both musl and uClibc require you to #define _GNU_SOURCE to get strpcpy. That link is to strpcpy in posix 2008: it ain't GNU and requiring _any_ #define to get it is a bug.

tl;dr: Feature test macros are a bad idea.


March 14, 2012

Well of course Ubuntu is displacing Red Hat. Red Hat abandoned the developer workstation in favor of focusing on the enterprise market, and developers are going to deploy on what they developed on.

Red Hat abandoned its original workstation distro after Red Hat 9 because it figured out how Sun made money. Government and Fortune 500 procurement contracts often cap a bidder's profit at a fraction of the cost of materials, so these bidders spec the most expensive materials they can. They'd rather make a $500 profit off a $5000/seat Solaris license than a $3 profit off a $29.95 retail boxed copy of Red Hat. The engineers would rather use Linux, but the sales force and management insisted on the more expensive component to pad the bottom line. (And of course convinced the clueless _customers_ that obviously Solaris must be superior, since it was so much more expensive. For more on large institutions "protecting" themselves in the foot with both barrels, see this excellent article by Joel Spolsky.)

When Red Hat figured this out (You mean if we raise the price they'll buy _more_ copies?) they came out with the ridiculously expensive "Red Hat Enterprise", everybody switched to the better technology and suddenly their "enterprise" division made ten times as much money as the rest of the company combined. This "tail wagging the dog" situation sucked all Red Hat's attention away from the developer workstation market, which they decided to unload on the open source community in the form of Fedora.

When Red Hat did this, they had a little over 50% of the Linux installed base: desktops, servers, everything. It's what Linus Torvalds himself ran. The default Linux was Red Hat. And then they abandoned the developer workstation.

Unfortunately for Red Hat, Fedora didn't work. They were sort of aiming at something like Debian (without the endless flamewars), and clearly hoped that the community would pitch in and do lots of work for free as unpaid Red Hat employees, without asking for any significant voice in controlling the project. But this is not how open source development works: developers push projects in the direction those developers want the project to go. People are most likely to get out and push when they're trying to steer.

The first year of Fedora development was well-summarized in a parody IRC log: the tension between Red Hat's employee engineers and the would-be volunteers eroded much of the initial momentum. In year five Red Hat vetoed the demand for an independent Fedora Foundation (and in doing so crushed most remaining third party interest). In explaining why no Fedora Foundation would be forthcoming, the main point was "Red Hat has veto power over decisions" and would not be giving it up:

Red Hat *must* maintain a certain amount of control over Fedora decisions, because Red Hat's business model *depends* upon Fedora. Red Hat contributes millions of dollars in staff and resources to the success of Fedora, and Red Hat also accepts all of the legal risk for Fedora. Therefore, Red Hat will sometimes need to make tough decisions about Fedora.

This is a bit like Mozilla back when AOL/Netscape couldn't let go of it, which Jamie Zawinski post-mortemed in his resignation letter. The whole thing's an excellent read, but let me quote a little bit:

The truth is that, by virtue of the fact that the contributors to the Mozilla project included about a hundred full-time Netscape developers, and about thirty part-time outsiders, the project still belonged wholly to Netscape -- because only those who write the code truly control the project.

A similar problem occurred with every open source project Sun ever tried (Java, OpenSolaris, OpenOffice... often exacerbated by license issues and copyright assignment). The company wanted to retain control, but attract volunteer contributors. They wanted obedient galley slaves, free to row in unison alongside their employees but not allowed to nudge the project off course. Open source doesn't work that way.

Red Hat's utter failure to allow outsiders any control over Fedora reduced Fedora to "Red Hat Enterprise Rawhide", I.E. a beta release of the next Red Hat Enterprise, nothing more. In fact, Fedora became _so_ uninteresting that the Centos project was launched to do an open tracking fork of more interesting Red Hat Enterprise.

(How uninteresting is Fedora? Wikipedia considers 70 Ubuntu-based distributions "notable", compared to 19 Fedora based and 11 Red Hat Enterprise based. Those 19 Fedora distros include Fuduntu (combining Ubuntu and Fedora), Yellow Dog Linux (a decade old PowerPC distribution from the days before Red Hat enterprise), sposored projects (K12LTSP is based on Fedora because Red Hat gave them money), corporate-based distributions such as Intel's defunct Moblin project (merged with Maemo to form Meego, and then superceded by either Yocto or Tizen depending on which order you rank the Linux Foundation's quest for relevance), and of course Red Hat Enterprise itself.)

Another failure mode illuminating Fedora's loss of market share was the Solaris x86 debacle. Sun Microsystems' management alienated their development community by repeated killing the x86 version of Solaris, because why would anyone deploy that on a server? But Sun workstations cost tens of thousands of dollars, so developers don't write software on them. Instead they bought cheap x86 workstations, installed Solaris x86, and then ported the results to Sparc hardware when it was done.

The loss of the workstation market shrank the Sun developer pool. The repeated attacks on Solaris x86 strangled it. But Red Hat voluntarily _abandoned_ the Linux developer workstation market.

Red Hat's enterprise distro is not appealing to developers, it's a deployment environment for large conservative instututions. When Ubuntu came along and sucked the developer workstation market away from Fedora (not even bothering to field a server version for its first few releases), Red Hat gradually became a less appealing deployment environment, because it's not what Linux developers write software for.

Red Hat is under fairly standard disruptive attack. Its brand equity among developers has eroded, it's "pointy hair linux" now. It's the distro management uses to run Cobol programs. Small projects deploy something else, and when they grow to be big projects they stay with that something else. Red Hat is still milking brand equity from a dozen years ago.


March 9, 2012

Downloaded yesterday's Rachel Maddow Show to see if it had stopped being "All Abortion All The Time" (with interludes of Bad Science About Nuclear Power).Alas, no. The first 20 minutes or so of the show: solid abortion coverage, then a guest she could interview about abortion. Sigh.

I'm pro choice, but somewhere between the 3-hour special on Dr. Tiller and the entire episode covered to the history of the Kansas anti-abortion movement, I got kind of tired of listening to Rachel go on about it, and stopped watching. Keith Olbermann's job was to be mad as hell and not take it anymore, and he was a lightning rod that let Rachel be calm and intellectual. Now she's trying to rile up her base, and it's exhausting to watch.

Luckily, vlc has a "playback->faster" option that makes the show chipmunk its way through topics, and her last interview of the show was on an interesting topic, which they danced around but didn't quite directly address:

Rich people get rich by cornering the market: not just selling stuff but preventing _other_ people from selling the same thing, thus driving down the price and taking away their volume. Even the relatively good guys like Warren Buffet talk about how great it is to have a "moat around a business".

The current republican "party of the rich" is all about cornering the market. All the New Jim Crow laws are about cornering the market on voting. Taking away rights from poor/gay/female/young people is about cornering the market on rights, so elderly white men are disproportionately represented. The republicans are trying to corner the market on power.

If you don't have the metaphor of "cornering the market", you can't understand the current Republican strategy. They're think they're playing a zero-sum game, where "I can't succeed unless I fence you out". I.E. "for me to win, you must lose."

Their attacks on contraception or freakout about gay marriage make no sense unless you understand this fundamental tenet of their worldview: anybody else doing well in any way is an existential threat to the mindset of this group of elderly white men. They must be surrounded by suffering to feel good about themselves. Only then are they better than everyone else.

This leads to horribly stupid policies. If a meteor was hurtling towards the planet republicans wouldn't try to _stop_ it, they'd try to build domes so the chosen few could survive. The Tories are going out of their way to dismantle an existing successful health care system because tearing other people down is part of their definition of progress. If the world is a zero sum game, all you have to do is attack your enemies and rewards for you must follow; simple math. The last man on earth must be a trillionaire.


March 8, 2012

Somebody asked me if replicant is relevant to my interests. It is not. I see it as about on par with Utoto or maybe gNewSense.

Let's start with pragmatism: ReactOS is doing an open clone of windows (essentially porting Wine to the bare metal). Ever heard of it? Know anybody who's used it?

I'm an ex-OS/2 developer: preinstalls matter. Google licensees are shipping something like 100 million android devices per quarter, and providing those guys with a native development environment is a huge opportunity. Replicant _might_ manage 10,000 installs over the project's entire lifetime, all aftermarket.

Cyanogenmod is upgrading Android to be more useful, and even they've got a tiny fraction of stock Android's market share. Replicant is offering _less_ than stock Android, removing working code they disagree with the license terms of. Nobody who just wants to USE their phone has any reason to do this.

Again, BusyBox predates Android. Toybox is not competing with BusyBox on Android: if Google has had over five years to start shipping android and hasn't done so. The shortest path to getting android users more command line functionality than Android's toolbox provides is to write new code compatible with Android's existing license policies, as issued by Google and adhered to by Android device manufacturers. This is not based on "how an ideal world should work". This is starting with existing reality and plotting a course through it.

(Yes, the toolbox/toybox name thing is confusing, but I'd like to point out I named my project before the first Android phone shipped. :)


March 7, 2012

The Linux Foundation is a really strange bureaucracy. They hold some random invitation-only conference, and then they send out spam emails inviting everyone to "request an invitation". No, really!

One of us is unclear on the concept of "invitation-only". Actually, I think one of us is unclear on the concept of "open source". (And I see having bronze silver and gold sponsors weren't enough, they needed platinum too, because that's the _important_ part.)

Sigh.


March 6, 2012

I got interviewed by h-online about Toybox.


March 5, 2012

My old /usr/bin vs /bin rant has been cleaned up slightly and published by Hacker Montly.

They integrated my corrections about the actual drive sizes: / was a fast but tiny 0.5 meg disk, and RK05 disk packs (on /usr and eventually /home) were very slow 2.5 meg external beasties (not 1.5 megs). So the "3 whole megabytes" line was right, but I had the mix wrong in my 2010 post. (Dennis Ritche's website has primary soruce material on this if you know where to look, I pointed 'em at a couple citations.)

I need to get back to working on my computer history book. I need to tackle my aboriginal linux todo list. I need to get toybox finished. I need to start qcc. I need to get hexagon linux booted on my old nexus one. I need to fix chroot in the kernel. I need to do the "hello world" kernel refactoring. I need to start another penguicon/linucon-style convention. I need to start a podcast to get rid of unused ideas. I need more hours in the day...


March 3, 2012

toybox 0.2.1 is out.

I screwed up the tarball slightly (didn't pregenerate generated/help.h) so it requires python on the host to build. I'll probably do a 0.2.2 next weekend to catch up with the patch backlog and fix that.


March 1, 2012

Still haven't _quite_ got a toybox release out, too much other stuff eating my time. Thinking this weekend...

Got the roadmap updated a bit though. That should replace most of the random todo list stuff, _and_ be the place listing currently supported commands each release. That's also the place to stick this sort of thing...

I feel bad about the contributors whos patches have gone umerged this week, though. Sorry, I'm catching up as fast as I can! (The roadmap has a list of "probably done" commands, and of course I'm re-auditing them all to make sure they _are_. Well a lot of 'em I haven't looked at in 3-4 years...)


February 27, 2012

Heh. Forbes asks "Is the cloud catching up with mightly Oratroll?" and never once mentions in-memory databases or the NoSQL movement.

My reaction to the article is "Couldn't happen to a nicer patent troll", but a quick check of my blog didn't find an explanation of what in-memory databases _are_, and wikipedia is useless here. Since that's what's really killing Oracle, here's a quick primer:

An in-memory database is what you get when all your tables fit in RAM at once. IBM's Search and Query Language (SQL) came from its mainframe R-series databases of the 1970's, which were designed around the assumption you only have enough memory to load one record from each table at a time. So it would load one record from disk out of each table, compare them together, and immediately write the results back to disk. All that "stream" and "join" stuff is based on the idea that you can't possibly have more than one record from each table in memory at once, and seeks are bad for performance!

This stopped being relevant about 20 years ago. Moore's Law has doubled memory sizes, and by about 2000 it was starting to become feasible for small databases to keep the entire thing in memory at once and index everything with hash tables, providing literally a 1000 times speedup over disk access.

In-memory databases are not just three orders of magnitude _faster_, they're also correspondingly _simpler_. Suddenly, your "tables" were just python dictionaries, and the entire database program became a thousand lines of python (800 of which implemented the SQL parser; I saw such a program on sourceforge in 2001).

The obvious implementation goes like this: start with a snapshot of your database (a gzipped Python "pickle" file of your hash tables, for example). Every "write" transaction (a query which updates any records anywhere), gets appended to a log file. (You can fsync() the log, it's linear streaming writes so should be pretty fast.) If the power fails, reload the snapshot and replay the log.

When the log gets uncomfortably long: fork the database. The parent process closes the old log file, opens a new one, and continues about its' business. The child starts writing a new snapshot, freeing its memory as it does so.

If the power fails before the child finishes writing the new snapshot, just read the old snapshot and replay both logs in sequence. When the child finishes writing, it can archive/delete the old snapshot and old log because the next shutdown replay doesn't need them.

The interesting bit of the above is that due to the way "fork" works, this doesn't take twice as much physical memory. The parent and child share copy-on-write mappings of all the underlying physical memory, and as long as the child is freeing its copies as fast as the parent is dirtying pages, the system shouldn't run out of memory. Making this take advantage of SMP is bog standard threading/locking.

This is why the department of defense spent millions of dollars buying a 2.4 terabyte ramdisk back in 2004. It was also a big driver for early adoption of 64-bit systems a year later: people wanted more than 4 gigs of memory to fit their databases in; they could afford the chips but needed more than 32 bits of address space to use it all. Now there's plenty.

The entire "nosql" movement boils down to "Hey, if 80% of our remaining database code is implementing SQL, why _bother_ with that? Why not just use the darn hash tables directly, via shared library or something?" SQL itself is an artifact of the R-Series stream-and-join design assumption, it doesn't fit well with randomly seekable hash tables that return results almost instantly.

If you're familiar with disruptive technologies, this is a textbook one. It started with databases too small to interest the existing players; if your database fit entirely in memory in 2000 it simply wasn't worth Oracle's time. Then medium-sized businesses could do it. Now all Oracle's got left are the largest fish like credit card processing companies and stock exchanges (plus the out-of-touch old fogies who Never Got Fired For Buying Oracle), and when those switch over (or in the latter case die off), what's left?

Oracle's problem is that the decades of "stream and join" design optimization is built on top of an obsolete design assumption, that memory is a transitional state your data passes through because memory is tiny and precious; records must be evicted back to disk as soon as possible. This is no longer the case, and the obsolete read/process/write loop at the hard of stream-and-join means Oracle has the best buggy whips in the world, decades of optimization into hiting the horses _just_right_... and nobody cares anymore.

(The stream-and-join metaphor also doesn't translate well to clusters, so very high end data warehousing really isn't their thing either. Like DEC, they're sandwiched between a relentlessly rising foe and a hard ceiling they can't chip through very fast.)

I've developed a theory that dying business models explode into a cloud of IP litigation, like drowning victims climbing on top of anyone who can swim, pushing them under to buy a few more seconds. SCO did it, now Oracle's doing it, the RIAA/MPAA are infamous for it... This is not a sign of health.


February 26, 2012

Finally got killall.c and kill.c cleaned up and merged.

Fun little corner case: kill lets you do "-s signal" or just "-signal" and lots of signal names begin with s! Thus "kill -stop 12345" gets interpreted as "kill -s top 12345" and it complains "top" is an unknown signal.

To fix that, I added a new option to lib/args.c where you can stick a space after a command letter and it requires its argument to be a separate command line argument, not the remains of the current one.

So now I've stuck a toybox snapshot into aboriginal and am building "TOYBOX=toybox CROSS_COMPILER_ARCH=i686 ./build.sh i686". I had to disable ls because it wants "-di" and "-tc" and who knows what else. And it's doing the xargs segfault thing again. Sigh.

Probably not going to get a release out tonight...


February 24, 2012

People keep asking me about Qualcomm's Hexagon in email, and I keep writing up long explanations and then never them again. So here's the most recent email I wrote on the topic, for posterity. It's been a year and a half, the basics of what we worked on shipped already, the story can come out now.

Keep in mind my contract to work on this stuff expired in October 2010, so these are my vague recollections from over a year ago:

On 02/24/2012 02:28 AM, [REDACTED] wrote:

> Thanks for info. I have already seen this page, but i have tried all
> branches on the 
> https://www.codeaurora.org/contribute/projects/hexagon-project/ and have
> not found any simulator.

Ah.

They had an in-house simulator that was some horrible proprietary thing they contracted out to a third party to produce, and if I recall right qualcomm's lawyers went out of their way to make sure they _didn't_ get source code because there was a nest of "propreitary is obviously better, duh". I expect that bit them in the ass, because the sucker was useless to the Linux port.

Mostly we just used real hardware. We had these things called "comet" boards that had a snapdragon SOC and ~256M of memory which you could boot and run code on. (No local storage to speak of, I wound up implementing nbd-client in busybox to get some.)

There was an aborted attempt to add hexagon support to qemu (Scott somebody did it, google for "quic qemu" and I think his post to the qemu mailing list is the first hit). Unfortunately, the guy who tried it couldn't wrap his head arount TCG (I think he was mostly a manager), and it never went anywhere. :(

Hexagon is a six stage pipeline running at 600mhz, and the clever thing they did was create six different register profiles and round-robin them down the pipeline, so each pipeline stage is totally independent of the others and they don't need any pipeline interlocks. So it _looks_ like a 6-way SMP chip, and they send NOPs down the pipeline when there's nothing to do (which power the circuitry down completely for that clock cycle). It doesn't do any branch prediction or speculative execution or anything becuase it's designed _not_to_waste_power_, instead each register profile is a separate thread (a bit like hyper-threading, only much simpler). It has ridiculous amounts of parallelism, in addition to having up to six thread profiles in flight at once, instructions are bundled into VLIW "packets", of up to 4 instructions dispatched to 4 execution units. Each execution unit has slightly different capabilities (the last two have the big vector and floating point ops doing the SIMD thing, the first one handles branching, I forget what the second does. They've all got a largeish general-purpose set of instructions they all do too. I had a giant booklet on this explaining the architecture and such.)

As far as Linux is concerned, it's a 6-way SMP chip running at 100mhz, but each cycle it can dispatch up to 4 instructions so you get back closer to 300mhz performance, and then there's the SIMD stuff which makes multimedia things fly. Also, most of the prefetch delays and such happen during the 6 "idle" stages between each batch of 4 instructions getting executed, so it's _really_ good at fairly hard realtime. I'm told that the next generation is going to have a 4-stage pipeline (and thus they're clocking it down to 500 mhz, so it looks like a 4-way SMP running 125 mhz). The lower the clock speed the better the power consumption to performance ratio is, and what they _already_ had was beating the then-current ARM stuff in battery life at a given performance level.

> When i use gdb from windows binaries i see this:
> 
>     /(hexagon-gdb) file DSP2.mbn/
> 
>     /Reading symbols from
>     D:\SHARED\qdsp6\Hexagon_Tools__4.0_windows\gnu\bin/DSP2.mbn...(no
>     debugging symbols found)...done./
> 
>     /(hexagon-gdb) run/
> 
>     /Starting program:
>     D:\SHARED\qdsp6\Hexagon_Tools__4.0_windows\gnu\bin/DSP2.mbn/
> 
>     /hexagonsim_exec_simulation: *Unable to execute hexagon-sim*!/
> 
>     /(hexagon-gdb)/

Sounds like they didn't hook it up all the way.

The snapdragon system-on-chip (which is what you find in the Nexus One and such) actually contains four processors:

1) An ARMv7 "Scorpion" processor qualcomm licensed from ARM and then optimized (at the Raleigh campus, they're protective of their turf, internal politics at qualcomm).

2) A QDSP6 "Scorpion" processor qualcomm developed internally (in Austin). In android this is used as a "multimedia coprocessor", but is actually a powerful 4-issue VLIW general purpose processor with a lot of vector instructions.

3) A QDSP4 (old DSP, does signal processing for the radio side of things). This was an ancestor of QDSP6 the way the 8080 was an ancestor of the Core 2 Duo: it ain't gonna run Linux.

4) An ancient ARMv5 that's the "boot processor". It runs code from flash at power-on, does the DRAM controller init, and then hands off control to either the Scorpion or the Hexagon. having a different chip take control is a question of running different boot code on this ARMv5. (Afterwards in Android it goes off and does signal processing for the radio side of things just like the QDSP4.)

Yes, "snapdragon" and "scorpion" are confusingly similar. (To me, anyway.) Snapdragon = SOC with 4 processors one of which is a Hexagon, Scorpion = yet another ARM implementation.

For the Linux port we powered the other 3 processors down after the Hexagon came up. I really really really wanted a bootloader I could run as an Android app on my Nexus One to boot Linux on the hexagon in that (copy the uboot and kernel+initramfs blobs into memory, kick the ARMv5 boot processor to run that uboot, halt the ARM), but I could never get Richard Kuo or anybody to write one. (And it wouldn't have been useful unless I got at least a usb-to-serial adapter working to get me a serial console, which turned out to be nontrivial. All the Snapdragon peripheral drivers were in the Android tree, but under arch/arm. One of the big things the Linutronix guys were looking at was fishing them out and moving them up to the generic architecture stuff so Hexagon could use 'em. No idea if they ever actually did this.)

Last I checked nobody had written any code to let the Scorpion and Hexagon share main memory: who is accessing what without stomping each other was a thing, so for the Linux port they just powered the Scorpion down completely. (The Android hexagon binary blob can share memory to act as a "multimedia coprocessor", but it does its best to _look_ like a hardware coprocessor, and essentially has dedicated memory buffers handed off the way you hand off texture memory to a 3D chip or sound data to a sound card. All static mappings, I expect. As I said: I didn't really look at that. You could try to objdump -d it if you like. :)

There was occasional talk among the engineers of doing a Snapdragon revision without a Scoprion in it, which would save power and licensing money and die size and so on. But we were careful not to say this where anybody outside the team could hear it because we didn't want the Raleigh guys to start quoting the "From Hell's Heart I Stab At Thee" speech from Star Trek II. (Note: my management mostly shielded me from the politics, so I got this stuff second hand. My exposure to the politics was basically a long list of things I wasn't supposed to talk about outside the department, which meant close your office door because you never know who might be listening...)

We had enough trouble with the lawyers, which were TERRIFIED that this open source stuff was going to undermine their precious patent revenue (the most lucrative of which expire this year anyway, I think). From an engineering perspective "not doing android" simply wasn't an option, and we convinced senior management of this, but the Lawyers insisted that the Qualcomm guys who worked on open source had to be moved to a separate corporate shell ("Qualcomm Innovation Center", I.E. quicinc.com: new email addresses but everything else was the same, they didn't even annotate the names on our office doors with the distinction), and then a SECOND corporate shell was set up (the "code aurora foundation") which was a partnership between Qualcomm and Qualcomm with some random lip service from Intel or somebody, so we had a second layer of indirection to wash everything through protecting Qualcomm's patents from Linux. It was _ridiculous_. Oh, and the lawyers Doomed to Exile in Quic were the junior guys who drew the short straw and couldn't _avoid_ getting the GPL all over them. (They tried not to panic in front of us. Again, this was over a year ago and I have no idea if this recollection is accurate or just me projecting onto them based on incomplete understanding of things that were happily Not My Problem.)

> I also have tried to compile these sources with Cygwin. This was very
> hard, because makefile for gdb has errors with looking of some
> libraries.

I did everything under Linux. We were trying to port Gentoo to it, but hit some unnecessary complexity adding a new architecture to their profiles (undocumented stage 1 black magic, and the need to annotate every single package in the tree with every single architecture it supports, which is _crazy_), and wound up just doing Linux From Scratch plus a bunch of Beyond Linux From Scratch packages.

I'm unaware of anybody ever trying this stuff under Cygwin. I was using Ubuntu, the server was running Debian, and I think a couple developers were using Fedora.

> I have manually downloaded and compiled these libraries, set
> direct path for libraries in make file, successfully make all
> *gnutools*, but have another problem again:
> 
>     Administrator@PC6 ~/Hexagon/bin/gdb
>     $ ls DSP2*
>     DSP2.MBN
...
>     (hexagon-gdb) target hexagon-sim
>     (hexagon-gdb) run
>     Starting program: /home/Administrator/Hexagon/bin/gdb/DSP2.MBN
>     bailing from child.
>     execvp: No such file or directory
>     Switching to remote protocol
>     :15097: Connection timed out.

I know nothing about cygwin. I don't do windows. :)

When the basic Hexagon suport got into Linux 3.1, did they set up a linux-hexagon mailing list? (I told them they really _should_, but I honestly haven't been keeping track since I left. Busy with other things...)

> Also i have seen https://www.codeaurora.org/patches/quic/hlk/ page, but
> there not exist even a gdb files.

Sigh.

Qualcomm outsourced the Linux upstreaming part to Thomas Gleixner's linutronix. (I may have strongly recommended them: pay one of Linus's lieutenants to do the first code review pass in private so you can fix up the code to kernel standards without embarassment or bikeshedding). Thomas is tied up in various NDAs, but now that most of this has gone upstream he might be able to help you more than I can. I'm a year out of date on all this, and left my qualcomm proprietary stuff at work when I left. (I had Aboriginal Linux building a hexagon native development environment, and then building Linux From Scratch and a chunk of Beyond Linux From Scratch natively on a comet board under that. It booted X11 (remotely, the X server was on another machine because comet had a network card but no graphics hardware) and ran a half-dozen X apps: a terminal, xchess, xeyes, and so on. Left it all behind when my contract expired, I no longer have that code. Oh well.)

Hexagon a _very_ interesting chip and I'd _hugely_ love to see it succeed, but as long as Qualcomm's lawyers are steering the company the engineering team has a giant ball and chain dragging it down. They do great stuff and you never hear about it, because "secret is good". Nobody talks about Qualcomm's secret shame, that they actually have a bunch of really smart guys working for them who do cool stuff. (I'm often frustrated at TI being closed and tangled, but it's a lot less trouble finding out their stuff _exists_...)

> I have found /*80-NB419-1_A_hexagon_v2_programmers_ref.pdf*/ document
> with hexagon command representation and looks like my ELF file
> (DSP2.mbn) has Hexagon-based architecture.

I do remember that "qdsp6-objdump -d" works in the toolchain they shipped. (Even the really old ones.)

Note: the qdsp6 name is because Qualcomm's 6th generation digital signal processor grew legs and became a general purpose CPU, _while_ still being good at all the DSP multimedia stuff. The hexagon rebranding is fairly recent (the name's due to the 6-stage pipeline and thus 6-way SMP, even though the newer ones are gonna be 4-way. :)

The big upgrades they did to the toolchain to support Linux were:

1) Add dynamic linking support,

2) Port uClibc and glibc to it,

3) Forward port from gcc 3.4 to something reasonably current.

If all you want is a binary blob to run on the chip, the old toolchain should be fine.

A note on the MMU: the chip hasn't actually got one. Instead it has a set of Translation Lookaside Buffer slots loaded by software. They made a binary blob that acts as an MMU, and their snapdragon port of u-boot (running in the armv5 boot processor, I think) loads this blob and hooks it up to the page fault interrupt so it acts as a software mmu (which their Linux port then depends on).

The lawyers have ADAMANTLY refused to release the code for that because they've got multiple patents in it that will _never_ stand up to scrutiny because software MMUs like this were commonplace back in the 1980's, but as long as you can't examine it you can't collapse the quantum state of the patents and thus they MIGHT be enforceable. (The 3D accelerator guys do something similar: everybody has patents that cover everybody else's chips, but as long as their drivers are closed enough you can't examine them and _show_ that they haven't got some clever way to program the chips that _doesn't_ violate your patents. The open source guys reverse engineering your chips and writing open drivers for them that violate a dozen patents is no problem because that doesn't show that the _closed_ drivers didn't have a workaround for that patent, and the patent-violating open soruce drivers are obviously programming the chip wrong. Plausible deniability! Also an insane waste of time so trolls can leech off the work of others to perpetuate concepts left over from Gutenberg's original hand-cranked printing press, as greedy rich people come up with yet another way to corner a market.)

Also note that the GPL doesn't bite unless you distribute binaries, meaning this code being GPLed translates to an awful lot of stuff that would be trivially easy to reverse engineer if they just shipped a binary never gets let out of the company because it requires an Enormous Legal Review Process to do so, and the political capital to push it through just ain't there. So "we got this to work, but you'll never see it, because it's GPL". (Yes my employment contract can say I can't release this binary outside the company. GPL is about copyright law, not contract law. The _license_ can't require it, but I can personally agree to an individual restriction before they give me the code. You want to argue that with Qualcomm's lawyers, be my guest. I'll be over here.)

The main problem with the Hexagon variant in the existing Snapdragon chips (um, QDSP6v2 I think) is that they don't have enough TLB slots. If you run the full 6-way SMP doing gcc compiles and such, it thrashes the hell out of the cache and slows itself way down. The performance "sweet spot" for that turned out to be around -j3 or -j4 in my testing. I don't think we found this out soon enough to fix QDSP6v3 (although since that's the 4-stage 500mhz variant, it puts less presure on the TLB anyway so is more or less in the sweet spot of what they _do_ have). But QDSP6v4 (in development when I left) adds lots more TLB slots which should greatly improve performance under Linux. Assuming anybody ever uses it under Linux.

The existing multimedia codec binary blob that Android runs uses a lot of static buffers with hugepage mappings, so it doesn't hit the TLB slot imitation. It's all hand-crafted assembly, really. (I assume so, I never looked at it.)

> I have succesfully decompiled
> a little peace of code first with a paper and pencil :), then have
> decompiled elf sections with objdump, but it's very hard to reverse
> entire algo without simulator.

"objdump -d" - disassemble.

> Please, help me, if you can. Maybe, you have any another tools,
> or useful advice about these tools.

I'd love to see Hexagon succeed. I spent half a year working to get Linux on it. The engineers in Austin also greatly want to see it succeed, but they're at the mercy of A) Qualcomm's legal department, B) politics with the Raleigh guys behind the Scorpion yet-another-ARM-chip, (those guys are very proud of their yet-another-ARM-chip. I forget why...), C) quarterly budget renewals. (My contract ended because they couldn't get the funding to renew it without a 2-month gap, so I went off and did other things.)

The reason the lawyers are in charge at Qualcomm is due to an accounting trick: all the patent licensing revenue is credited to the legal department, but all the R&D costs of coming up with that technology in the first place are billed to engineering. So even though engineering makes way more gross revenue than licensing, it looks on paper like licensing brings in 3x the net revenue that engineering does. And thus senior management listens to the lawyers three times as much as it listens to the engineers. Sad really.

Rob


February 23, 2012

Over the years I've wasted a lot of time talking to people whose minds are already made up, and are merely trying to figure out how to let you have their way. People who don't seem to believe there _is_ a legitimate alternate viewpoint, and are just trying to find a way to explain to us how we're wrong that we'll accept. People who are not there to _be_ convinced of anything.

I'm always trying to figure out "how am I wrong this time". Sometimes it's because I haven't explained myself properly, but often it's because I'm taking the wrong approach, have the wrong goals... I am wrong a lot.

This is why I expect I was wasting my time once more commenting on the recent lwn article where a lawyer reports on his meeting with a conference organizer, and thus the busybox gpl stuff is all resolved now. (I guess if we could get a pizza delivery guy to talk to a random janitor somewhere in rural Oklahoma they could bring about arab-israeli peace? People who ship BusyBox still get sued because of it, Android's still excluding GPL code from their userspace... What's changed, exactly?)

I admit commenting on "the story is over" with "why was this ever news" is sort of counterproductive anyway, but this whole thread was irritating because the original blog post that set this whole mess off was based on more than one false premise:

  • Tim isn't behind busybox, I am. Implying Tim (or Sony) is some sort of puppetmaster is both untrue and insulting.

  • Busybox is way older than Android, so if Android's 4th release still isn't shipping BusyBox, it's unlikely to start. If some combination of GPLv3, the busybox lawsuits, and the FSF's insanity has rendered GPL in userspace unpalatable to android (which is Google's _official_policy_ here), and waiting around for 5 years didn't make them change their minds, and they're shipping a BILLION devices: there just _might_ be a viable niche which a new project can legitimately address.

  • I'm writing new code to obsolete my old code, and people who didn't contribute any code to _either_ project are freaking out about it. So what? Why is this news? It's pure FUD trying to bury the new project and hold out a little longer just in case 5 years isn't enough to confirm the Android guys are serious about not shipping busybox. (Even though there are already 3-4 other projects trying to fill this market vacuum.)

  • How would it be bad if Sony (or whomever) _was_ involved? "If you don't like it, write your own code" is exactly what toybox is doing, and it's exactly what's pissing these people off. That's not "standing for freedom", that's insisting on control over the actions of others. These guys are hypocrites.

  • Patent trolls and copyright trolls are roughly equivalent. Everybody just assumes the lawsuits were a good thing, nobody's questioning whether or not they might have been a net negative.

  • How is "let's switch away from the obvious lawsuit factory" any different whether it's BusyBox or SCO or Oracle you're switching away from? Deciding to _stop_ doing business with somebody is not synonymous with violating their IP, even if "we have patents on other stuff, we'll still sue!" Great. If you want to sue people over the kernel, then do so. If the kernel guys feel that _not_ making "Linux" synonymous with "Lawsuit" it could be because they're _very_smart_.

That last bit I find really annoying. I was the guy who initiated the BusyBox lawsuits. Me, personally. I set the process in motion, I recruited the other people into the action, without me they would not have happened. (Erik was busybox maintainer for years, notice how the lawsuits started _after_ I took over?)

I no longer believe those lawsuits were a good idea, for a number of reasons. (One of which was that the SFLC assured me they had distanced themselves from the FSF... then got back in bed with them to sue Cisco in 2008.) There are excellent conversations to be had around that, but it's apparently not a conversation anybody (other than me) wanted to have.

Instead you get a bunch of hysterical armchair admirals with no copyrights worth enforcing (because they haven't written any interesting code anybody feels the need to use), screaming about how will the poor defenseless kernel survive without busybox to protect it. The kernel literally has a THOUSAND times more contributors than busybox (which hasn't even got one full-time employee working on it: nope, not even Denys). Not only are they far more _capable_ of suing people if they chose to, but Linus still hasn't put his foot down on binary only modules.

The big success in busybox releasing code was the the 2003 negotiations with Linksys that spawned OpenWRT, but that wasn't a lawsuit. GPLv2 gave them leverage, but it was all "walk softly and carry a big stick". Hitting lots of people with the stick broke the stick, and made people distance themselves from your stick. It was a _bad_thing_. (GPLv3 is a board with nails in: I'm not carrying that anywhere, nor going near it.)


February 22, 2012

Sigh. Ok, I know it's a public wiki, but dude.

So the entire first page of what was the toybox roadmap is now an advertisement for some BSD project, and you have to scroll down to see Toybox is even mentioned. They did no new requirements analysis other than strongly implying they're perfect for the job, despite BSD's ffs and berkeley packet filter being completely irrelevant to Android. They're "proven capable to replace busybox, in general", with 4 fewer commands than Toybox's last release. And now they're the only thing you see when you load that page unless you scroll down a lot, apparently attempting to leverage the work I did while simultaneously obscuring it.

I wonder if they know that gluing together a bunch of existing command implementations into a single binary is not where you _stop_? That the whole _point_ of the exercise is the extensive cleanup and refactoring to simplify and maximize code sharing?)

Sigh. It's an open wiki, obviously they have the right to promote their own project there, and it's not my place to remove stuff other people wrote. But it's also obviously unsuitable to use that page as the toybox roadmap. Let's see...

Needs more cleanup.


February 21, 2012

Working on release notes for _another_ toybox release, because the amount of progress since the last release is kind of impressive. (Very little of it done by me...)


February 20, 2012

I am behind on everything...


February 12, 2012

The "inbox" on gmail accumulates a lot of bounce notifications for spam sent to Japan in my name for some reason, and every couple weeks I remember to go in and clear them out. (Email that's actually to my address gets moved to another folder, this is the bcc: stuff that's not to a recognized list, I.E. almost entirely spam.)Thunderbird's stupidity continues to amaze me. This is an imap folder (not a local folder filters copy stuff to), so when I go in and delete messages it A) freezes for seconds at a time, B) manages to do about two deletes at once even when I hit five (about as many as I can see on the screen at once with their amazingly horrible space-wasting layout and the font size I use).

But those aren't the weird bit. The weird bit is: C) download 2767 message headers, over and over and over. The number isn't even going down, it just sits there, downloading the same message headers, over and over. It takes about 10 seconds each time, and then it does it again. I assume taht since I queued up about 300 message deletions, it's going to do this pointless activity 300 times. At 10 seconds each, this is most of an hour of UTTERLY useless network thrashing.

That's a pretty good summary of thunderbird, really. Some weekend, I need to install a real mail program. (It's moving my email filters over I don't look forward to...)

Oh, it also loses track of folder locking. Even after it finishes with the network (such as me switching the network card _off_, perhaps by having suspended the laptop before its' hour of pointless thrashing was through), it insists something is still using that folder and therefore messages I read don't switch from "unread" to "read" and I can't delete any of them.

The fix for that is to kill thunderibrd and restart it. Really, the "and restart it" is negotiable, I need to install a real mail program. This one has SO many things wrong with it, you'd think nobody'd ever actually _used_ it before.

(Perhaps it's meant to only be used with fetchmail or something, and the imap functionality is vestigial? It certainly isn't _tested_...)

(Alas, the kill thunderbird and restart it also kills any half-finished replies you may have been composing, which I tend to ahve several of up at any given time. I keep forgetting this. Kmail would save them and pop them back open when it restarts. Thunderbird doesn't consider email worth reliable delivery or archiving; it treats it the way Twitter treats tweets. "A mere sideline to what we actually do, whatever that is.")


February 11, 2012

It's always hard to resist correcting people being wrong on the internet, especially when it's personal. (I point out that pressure put on Cisco in 2003 _without_ a lawsuit was far more effective than the lawsuit in 2008, and the reply is "You're wrong, they did sue, in 2008". No really. Apparently the 2008 lawsuit reached back in time several years to retroactively provide the basis for OpenWRT and such. Good to know.)

But I'm not feeding the trolls over there anymore. I've decided to shut up and show them the code. Have a toybox release.


February 10, 2012

If you wonder why I think the FSF zealots do more harm than good, the lwn.net threads on toybox are exhibit A. The loonier members there are literally taking the same position as SCO, that a program which replaces another program is automatically "infringing".

Whether it's just FUD or they actually believe it, the public statements are that you _cannot_ legally compete with any existing piece of software using a fresh implementation. "We like the old thing enough that the new thing must somehow be illegal." Under this theory, FireFox on Windows infringes Internet Explorer, and OpenOffice infringes Word, Cyanogenmod must be a "circumvention device" for phone company lockdowns, and so on.

These guys are "on the side" of Linux developers the way BMI and ASCAP are "on the side" of musicians. In the more mature content distribution industries like music and books and video, we have a wealth of evidence (link link link link link link (and link link) link link) that attempting to "protect" content on the internet tends to be hugely counterproductive.

Luckily: SCO lost. Not to the FSF: to _IBM_.

The point of copyleft was to turn copyright against itself. Getting comfortable with copyright enforcement suits to the point where you miss them when they're gone means YOU HAVE TURNED INTO WHAT YOU FIGHT. You're defining yourself by what you hate. That's _sad_.

Proprietary software didn't _exist_ prior to 1983, when the Apple vs Franklin decision extended copyright to apply to the binary ROM images Franklin's Apple II clone had copied verbatim. (Before that, copyright didn't apply to binaries, here's a 1980 audio interview with Bill Gates (transcript) about his efforts to lobby congress to change the law.) Once the law _did_ change, for-profit companies everywhere jumped on the new status quo, from IBM announcing from now on it would distribute Object Code Only (which was not well recieved), through AT&T closing up Unix, to the Xerox printer driver that drove Stallman to start the FSF.

The FSF was a conservative reactionary attempt to defend the status quo against changes in the industry, and wasn't even the only _unix_ based attempt to do so (BSD, Minix, Linux...) The GPL was an attempt to take the bull by the horns and steer. But that was 30 years ago: the disadvantages of proprietary software (abandonware, unfixable bugs, version skew, winner-take-all markets) became apparent to most players in the industry even _before_ the rise of the internet. In a lot of ways, proprietary software was an experiment that ran its course. Javascript isn't open because of the FSF, it's freely viewable because THAT'S WHAT WORKS. Do we really still need to be riding a bull?

When the DMCA passed, everybody thought it was a bad thing, now FSF advocates are trying to not just wield it but advocate aggressive interpretations of it. I'm sure if SOPA passed, a decade later these guys would do the same. That's going in the wrong direction.

The internet has fundamentally eroded the concept of copyright, as I wrote about a dozen years ago. Copyright arose with the printing press, which rendered irrelevant the church scribes' hand-written illuminated manuscripts. This broke their monopoly on literacy, and during the transition many people were put to _death_ for translating the bible into local dialects so people could read it themselves.

Now we've got the internet, which is the printing press all over again. If the printing press was as big a deal as electricity, the internet is room temperature superconductors powered by cold fusion. It changes everything _again_, and copyright _no_longer_makes_sense_ in the new context. The entrenched corporate interests are staging a new inquisition to stop the world from spinning, but all they can do is delay the inevitable.

Most people under the age of 50 take for granted that the RIAA and MPAA are ludicrously misguided dinosaurs struggling to slow their descent into the tarpits. But the people who want stronger GPL and more lawsuits, and think they NEED this in order to propser? They are on THE WRONG SIDE of that exact argument.

In moving from GPL to BSD, I'm asking: do we really _need_ to live in gated communites? Is the outside world really so terrifying we must fence ourselves off from it? GPLv3 is a concrete bunker full of canned goods because the previous gated community wasn't _secure_ enough, and I don't want to live there. Same way I put a creative commons tag at the top of this page (copy what you like, it's polite to attribute it).

Because dude: link link link link link link, link link... Why are we still arguing about this?


February 9, 2012

Debugging is frustrating. I'm trying to track down the xargs segfault that happens when I build aboriginal with toybox in host-tools (all the defconfig commands overriding the busybox commands), ala:

cd ~/aboriginal/aboriginal
hg clone http://landley.net/hg/toybox
rm -rf build
mkdir -p build/packages
ln -sf ../../toybox build/packages/toybox
TOYBOX=toybox ./host-tools.sh
more/record-commands.sh
CPUS=1 CROSS_COMPILER_HOST=i686 TOYBOX=toybox \
  ./build.sh i586 2>&1 | tee out.txt

When I do this, I get:

[8426193.532368] xargs[4542]: segfault at 0 ip 00007f4e3f21487c sp 00007fff4cb7a238 error 4 in libc-2.12.2.so[7f4e3f107000+15d000]

So in another terminal I ran:

while true
do
  sleep .2
  [ ! -z "$(dmesg | tail | egrep '(segfault|protection)')" ] &&
    killall make configure
done

And then looked at build/logs. This helped me track down the sort bug, but the xargs one _seems_ to be the xargs pipeline at the start of sources/sections/uClibc.build, and I when I extract that and ran it I _think_ I got a segfault once... but not on subsequent runs.

Heisenbugs! Always a pain...


February 8, 2012

Finally found where archived messages live in the Thunderbird UI. It's not in any of the menus (pull-down or pop-up) that's a red herring.

There are left pointing triangle and right pointing triangle buttons between "All Folders" and "Quick Filter" in the UI. (Under the "Tabs? Why would it have tabs?" level, which are under the "green arrow pointing into a drawer", "inverted carat", "paper with a pencil", "some kind of book maybe", and "dogtag with another carat" icons, which are under the pulldown text menus. (Cluttered UI much?)

If you hit the _left_ triangle you get a different set of folders, one of which is "archive". Which contains, in a flat view, every message ever downloaded by the system, including the spam. Plus _extra_ copies of messages that have been accidentally thrown in there. You can distinguish the extra copies because they actually have bodies when you select them, as opposed to an "oh no, there wouldn't be hot water _today_" message on all the others.

Thunderbird: they put an amazing amount of EFFORT into sucking this badly.

I miss kmail. Too bad it was glued to a desktop that became unusable, and got sucked up into a katamari with calendaring and rss feed reading software I didn't want to use....


February 7, 2012

I really should thank the guy who blogged about how upset toybox made him. I have contributors now! I'm actually having a hard time keeping up with all the code review I need to do...

If I had a book, I'd encourage him to publicly burn a copy. Closest I can think of is that I'm responsible for maybe half the material in "The Art of Unix Programming", if that's of interest...


February 6, 2012

I really need to find a better mail program than thunderbird, because its developers are crazy.

The random 4 minute hangs while it decides to go contact the network and refuse to even repaint are annoying enough (even though I told it to NEVER periodically fetch mail, it does so at random intervals anyway. Usually giving it focus reminds it to do so, and thus it

And of course if it's interrupted (such as switching the network _off_ so whatever it's doing HAS TO FAIL so I eventually get control back), it drops empty messages in all my folders with no title and no body.

Contributing to this delay is the fact that A) google's gmail imap server is slow, B) it's got some insane O(N^2) algorithm on folder size, probably doing a quicksort on already sorted content or some other equally "we don't understand how this actually works and are using it in a naieve manner" nonsense. Yes, I have 144,500 unread messages in linux-kernel. I hardly ever GO into that folder because I haven't got the time for this mail program to grind away trying to open it. But YOU SHOULD NOT RANDOMLY RESORT IT WHEN I'M NOT IN IT AND YOU'RE NOT FETCHING MAIL! (I can see the CPU spike and stay at 100% for minutes at a time. I've got a little bar graph that shows it to me. Stop it.)

There's a workaround for this: "file->offline->work offline". Then it only OCCASIONALLY randomly decides to talk to the network. Without that, thunderbird would be completely unusable.

But the thing that's _dangerous_ is the "archive message" option in the right click menu. If I right click on something, it will pop up a menu, usually after about a 3 second delay for thunderbird to be slow and bloated. UNLESS it's too far to the right on the screen, so that the cursor is over the menu when it finally gets around to processing the "release" part of the right click. Then it'll randomly select whatever menu item the cursor was over and execute it.

When it starts randomly popping up a reply window or something when I wanted "make this as unread", that's easy enough to undo. But "archive" means "hide this message from me so I can never find it again". Where do archived messages _go_? I've googled, but nobody seems to know the answer. The only way I've ever found them again was to remember some snippet of text from the message and do a find | xargs | grep on the thunderbird data directory, where I can find the raw text of the message and copy it out by hand.

I rant about this because it just ate another message, and I've given up trying to find it again..


February 5, 2012

Alas, did not get a toybox release out this weekend. Prepping for a release found too many fiddly little bugs I want to fix first.


February 3, 2012

Corporations are not people, corporations are machines.

A corporation is a machine the same way an aircraft carrier is a machine: many people must show up to work to operate the machine every day, and the machine can't do anything without them. An airplane and an airline are just different kinds of machines.

Treating a corporation as a person is no different than treating a car or a building as a person. "Don't blame me, it was the car that ran over that pedestrian. Don't blame me, it's the building's doors that were locked when it burned down with all those workers inside."

People drive the machine, and part of the machine is a uniform that people can put on to act on behalf of the machine. Some people who wear the uniform are "just following orders", and will do anything as part of their job.

But the people driving are the ones who rose to the top, often because they want to win at all costs and are willing say or do anything to get what they want. "Of course I'm not HIV positive, baby, come to bed." Some people will say anything to close the deal, to get someone's money, foreclose on their homes, strip-mine national parks, all while eating whale sushi with a side of bald eagle... because they can.

We wind up with crap like "Citizens United" not just because the people steering these machines find them more useful if they're treated as people, shielding the driver from responsibility for who the car hits. It's that the people arguing for this honestly see no difference between people and machines.

You don't have to be a sociopath to become rich, but it does eliminate the temptation to give your money away to others less comfortable than you are, so there's a bit of selection pressure involved at the billionaire level. And being a sociopath means you can hire lawyers, who are just following orders, to argue that machines and people are exactly the same.


February 2, 2012

Crazy busy with work and the second half of 40th birthday celebrations tonight at the Drafthouse and friends having emergencies du jour... Probably won't get a toybox release out before the weekend.

(And of course uClibc 0.9.33 ships the day after I release Aboriginal 1.1.1. Yay, but now I have to go test it...)


February 1, 2012

Happy birthday to me, just turned 40. I am old.

Somebody blogged about how my Toybox project is obviously a plot by Sony, even though I've been doing it on and off since 2006, been publicly disgusted with GPLv3 just as long, and repeatedly blogged about how either Android or iPhone is going to replace the PC and I'd much rather it be Android. (For the record Sony hasn't paid me a dime for Toybox, although it would be nice if they did.)

I attempted to explain to him that he was simply _wrong_ in lots of comments on his blog, and on the lwn.net story on it (well, the first one, anyway), but by the time the h-online story came out about it, I decided to just shut up and show them the code. Working on cleanup for a release now.

I am sorry I lost my temper (again) when forcibly reminded that The Failure of Open Source still exists. I get annoyed when people who don't write open source software try to tell those of us who do how to go about it, and when it's all somebody does for a decade and change? Gets old.

Like me, apparently. Birthdays...


January 29, 2012

SUSv4 continues to be full of subtle assumptions and missing pieces. For example, with xargs the -L option works on "non-empty" lines but doesn't specify whether a line containing whitespace is "empty", or only zero length lines are. (In this case it mentions trailing whitespace indicates continuation, so I guess a line with only whitespace has to be empty due to the continuation rule. I presume trailing whitespace on the last line is not an error.)


January 28, 2012

Finally got xargs checked in to toybox. It's only got the basic -ns0 options (yes, -0 is a basic options, had to be designed into the tokenizing), but at this point filling out the test suite is the hard part.

What bit me in the test suite? I wanted to use "ls -w" as part of one of the tests, which turns out to be a gnu/dammit extension, and thus doesn't hold weight. Like so much the Free Software Foundation does, ls -w turns out to be utterly useless because they didn't think it through.

When ls's output is a tty it detects the screen width and wraps lines, as required by SUSv4. The -w option is described in the man page as "assume screen width instead of current value", so I should be able to test ls -w against xargs -s and check that the wrapping decisions match, right?

Here's the stupid part: -w only works if you have a tty. If you don't have a tty, the -1 option (list one file per line) is implied. So if you go "ls -w 80 > filename" it does one entry per line just as it would if you hadn't specified -w in the first place. I.E. the most obvious use for -w is exactly where it DOESN'T WORK.

The FSF still sucks at engineering, because it's not what they do. The FSF is somewhere between a religious organization without an invisible friend to venerate, and a lobbying group that never buys appointments with politicians. The common element of those is fundraising, and the Linux Foundation's got them beat there. (Both organizations sponsor a bit of engineering development, and presumably budget it as a marketing expense.)


January 27, 2012

I've been collapsing /bin and /usr/bin together since forever, and now it's a thing, so I linked to one of my old off-topic busybox posts about it in the LWN discussion comments, from where it got scooped up by Lennart Pottering, and from there became the top "story" on y-combinator, where I do not have an account so can't comment on the the fascinating discussion of my offhanded historical blathering.

I usually have to go to great lengths to track down and mirror computer history stuff, always nice when it comes to me. Although if somebody's going to say I "got many other details wrong" it'd be nice to list them. Good to know that Sun specifically was to blame for /opt, I need to research Sun's "project Lulu" which is going to be hard given Robert Young's Lulu occluding the search term. I remember the bit in "Under the Radar" about it, and the Larry... (Wall? Page? the bitkeeper guy) treatise it links to. I should re-read them.

The backstory is that I started symlinking /bin /sbin and /lib to their /usr counterparts in the yellowbox days back at WebOffice (2001 or thereabouts) because it made read-only vs read-write tracking easier. (All the read-only root filesystem stuff was under /usr, all the writeable stuff was under /var, and everything else at the top level was a symlink or mount point for a non-block backed filesystem.) Keep in mind I'm the guy who wrote up the first initramfs documentation: the first clear /bin vs /usr/bin explanation I got was in the tutorial workbook from Atlanta Linux Showcase 1999 (which I've still got), but clearly initrd obsoleted that split even before initramfs existed. (Linux 0.0.1 had kernel/blk_drv/ramdisk.c already.)

But I also did it because computer history is a hobby of mine and I learned backstory of _why_ /usr/bin and sbin and lib happened: Their first hard drive was only half a megabyte, they added a bigger but slower 2.5 megabyte RK05 disk pack on /usr for the home directories, but the root disk was so small it leaked into /usr, and when they got a third disk (another RK05 disk pack I think, I need to track down the references) they mounted it on /home and gave /usr over to the system. Keeping commonly used binaries in /bin and /sbin was because the first disk was _faster_.

All of this was an implementation detail of their original PDP-11 system circa 1972. It made perfect sense for Ken and Dennis to do, it _never_ applied to Linux running on PC hardware in any way. When I got taught it at ALS in 1999 I went "huh", because I am weird. (This is also why I'm so slow learning new tools: I question assumptions down to the bedrock on a regular basis, and am not COMFORTABLE unless I understand WHY we're doing stuff. Sometimes it's good, sometimes it's incredibly inconvenient. Oh well.)

Mirell's now informed me that I need to actually _write_ the computer history book I've been meaning to do forever, rather than just blogging about it. I have boxes of unscanned magazines, books to read, so many links to collate... I need to dig up the old proposed topic index I wrote up over 10 years ago. (Great thing about computer history: your old todo lists don't go stale. It's still history, now even more so.)


January 25, 2012

Why Linux on the Desktop will never happen, part eight thousand, four hundred, and seventy two:.

Finally rebooted my netbook today (instead of just suspend/resume) because too many things had stopped working. The network fails in low memory situations with a panic in dmesg saying it can't allocate memory. (Why the network card is trying to _allocate_ memory for each packet instead of having some static buffers is one of those unanswerable questions.) This fix is "sudo insmod -r iwlagn && sleep 1 && sudo insmod iwlagn" repeated several times because for the first couple dmesg says a watchdog timer dies after 4 seconds of trying to bring the card up.

The sound died too, but that's something like 15 separate modules with a dependency hierarchy I've never managed to work out, so I can't just rmmod and insmod that. So I did without sound for a while... until the network rmmod/insmod trick stopped working last night. Now it was giving me the timeout message _twice_ on each cycle, something was horked, so... reboot.

On reboot, it prompts me to log in, and won't let me. I note that I selected "shutdown" from the menu rather than just holding the button down; I was _nice_ to the system, which was probably my mistake.

So I ctrl-alt-F1 over to a text console, log in there and dig up .xsession-errors, and it says:

/etc/gdm/Xsession: Beginning session setup...
Setting IM through im-switch for locale=en_US.
Start IM through /etc/X11/xinit/xinput.d/all_ALL linked to /etc/X11/xinit/xinput.d/default.
/usr/bin/startxfce4: X server already running on display :0
<stdin>:1:3: error: invalid preprocessing directive #Those
<stdin>:2:3: error: invalid preprocessing directive #or
<stdin>:3:3: error: invalid preprocessing directive #Xft
<stdin>:4:3: error: invalid preprocessing directive #Xft
xrdb:  "Xft.hinting" on line 13 overrides entry on line 6
xrdb:  "Xft.hintstyle" on line 14 overrides entry on line 7
xfce4-session: Unable to access file /home/landley/.ICEauthority: Permission denied
XIO:  fatal IO error 104 (Connection reset by peer) on X server ":0.0"
      after 38 requests (37 known processed) with 0 events remaining.

Obviously the solution is "sudo mv .ICEauthority .ICEauthority.bak", we can totally expect any random end-user to know how to do that. Just as they'd know about ctrl-alt-F1 (which the kernel developers keep threatening to remove).

I note that even I have no idea what .ICEauthority is _for_. It's bound to be more of that unnecessary selinux/dbus/hal style crap Linux has been growing. Lennart Pottering recently published part 12 of his "systemd for administrators" series on LWN. Anything that requires a 12 part series to explain should not be on my system.

If you wonder why I'm not particularly worried about Android obsoleting vanilla Linux: from a usability perspective they honestly can't make it much worse.


January 24, 2012

The Obama administration wasn't content to take away Habeas Corups but recently removed the fifth amendment as well, which combined with the "oh yeah torture was fine" stuff means this is no longer funny.

This is why leaving war crimes unprosecuted was a bad idea: he didn't cauterize the wound, and so the infection continues to spread...


January 20, 2012

Banging on the long-delayed Aboriginal Linux release again, which is in part blocked by the fact that mips' network connection went away, so the emulated mips system thinks it can use distcc but fails when it tries (and then overloads because it hasn't got enough memory for a fully local -j 3 build without the OOM killer going off).The problem actually isn't the kernel upgrade, the problem turns out to be the QEMU upgrade. It worked in qemu 0.15.1, and didn't work in 1.0, and git bisect tracked that down to commit 5632ae46: "mips_malta: move i8259 initialization after piix4 initialization".

Not quite sure how to deal with that: "use qemu 1.0 for x86 because the emulator command name changed, but use 0.15.1 for mips because there's a blocking bug". Hmmm...


January 18, 2012

Darn it, musl is lgpl, and thus pointless. I was looking at it as a potential bionic replacement to complement toybox as a toolbox replacement, but it won't. Android allows no GPL in userspace.

I'm excited by the opportunity for toybox to replace toolbox on a billion machines and become the de-facto standard when the smartphone repeats the mini->micro transition and becomes the new default computing platform.

But I mothballed Toybox was pointless back when it was just fighting for a fraction of BusyBox's market share, which was fighting for a fraction of glibc's market share, which was fighting for a fraction of Windows' market share.

Musl is fighting for a fraction of uClibc's market share, which is fighting for a fraction of glibc's market share, which is fighting for a fraction of Windows' market share. Good luck with that, I have other things to do with my time.

Toolbox and Bionic are both weak-ass stubs Google did just enough of to run Java, and no more. They are crying OUT for replacement, and since I was one of the big reasons for BusyBox's success (not the only one, but I turned an embedded-only toy into a general purpose program), I'm in an excellent position to do a Toolbox replacement.

But that replacement won't be GPL. The Free Software Foundation has poisoned the GPL (I commented about that on lwn.net recently), which is why GPL use is declining faster than ever. Android's "no GPL in userspace" policy explicitly includes LGPL.

And that isn't just a Google thing, it's everybody building systems around Android too. (This includes my day job: previous versions of their product included stuff like BusyBox, but that stopped last year and none of the new stuff we're working on is allowed to. They're pretty typical.)

Back when Eric Raymond and I wrote the 64 bit transition paper, we focused on the wrong transition, a mistake I acknowledged last year while explaining some of the things I'd gotten wrong in that paper. Yes, the switch from 32 to 64 bit PCs was our chance to break into the PC desktop, and we blew it. (If incremental change between transitions was possible either OS/2 or the first 30 years of Linux would have at _least_ clawed their way above 2% market share.) But the PC desktop is going away in favor of the smartphone desktop, which is scaling up via tablets and USB docking stations to displace the PC the way the PC displaced minicomputer terminals in the 70's and 80's. That transition is a race between iPhone and Android, with vanilla Linux and the Gnu/dammit stuff so far back you can't even see it.

Comparing the PC to phone transition to the earlier minicomputer to PC transition, the smartphone has just emerged from the Commodore 64 vs Amiga vs Atari 800 scramble. The macintosh and PC of this generation have now emerged. Google is the IBM of this era, somewhat locked down and surrounded by a cloud of followers that innovate a bit but not so far they lose compatability. If I can convince this generations equivalents of Compaq and HP and Gateway to all do the same thing, then Google might take it up the same way Linus Torvalds finally merged squashfs when it became ubiquitous. (Basically, because his reluctance to do so had ceased to matter to anybody but him: it was universally used anyway.)

Sigh. It wasn't the Musl developers' fault I got excited without checking the details. It's a bit painful to see an enormous missed opportunity right next to a corresponding waste of time effort and talent, like watching someone dying of thirst crawl right past an oasis. But as they say, "you can lead a horse to water"...


January 17, 2012

Didn't get an Aboriginal release out this weekend, instead I wrote up documentation on toybox's argument parsing logic and finally did a first pass at collating the zillion toybox todo snippets into the main todo file.

Tried to play with musl (which seems to be trying to replace uClibc the way toybox is trying to replace busybox), but the git repository has been down all evening. (The web page is up, but the git repo isn't?) Oh well, I tried.

Honestly, Github exists, mirroring git is _trivial_ and he didn't _bother_. The website isn't down, just the git repo; I take that to mean the project's author doesn't think the repository is worth anybody else's time to pay attention to (or he would have made it possible for us to do so). And thus the project goes back down my todo list to the "cat flossing" levels. Oh well.


January 13, 2012

Resisted updating my notes.html symlink to point to the notes-2012.html file for a week and change because I'm writing an updated version of the python rss generator that'll also split the big file into individual files with prev/next links and an index. (To make linking to individual entries a lot easier.)

Alas, appreaching 2 weeks into the new year and not having finished it... time to set the symlink anyway.

I also need to get an aboriginal linux release out. And update the toybox todo. Maybe this weekend.

Day jobs. They are time consuming.


January 11, 2012

The FSF deleted the last GPLv2 release of binutils (2.17) off their website, an replaced it with a binutils 2.17a. I downloaded that and diffed it against the real 2.17, and the first difference was:

--- binutils-2.17/cgen/cpu/fr30.cpu     1969-12-31 18:00:00.000000000 -0600
+++ binutils-2.17.new/cgen/cpu/fr30.cpu 2011-08-24 06:40:39.000000000 -0500
@@ -0,0 +1,1863 @@
+
+; -*- Scheme -*-
+; Copyright 2011 Free Software Foundation, Inc.
+;
+; Contributed by Red Hat Inc;
+;
+; This file is part of the GNU Binutils.
+;
+; This program is free software; you can redistribute it and/or modify
+; it under the terms of the GNU General Public License as published by
+; the Free Software Foundation; either version 3 of the License, or
+; (at your option) any later version.

The new file is GPLv3. Those bastards at the FSF deleted the last GPLv2 release of binutils off their website, replaced it with a GPLv3 version, and REDIRECTED THE OLD FILENAME TO POINT TO THE NEW FILE WHICH IS UNDER A DIFFERENT LICENSE.

Wow that's evil. (They keep using the word "freedom", but they seem to think it means we should do only what they want us to, be happy with what they give us without question, and act in obedience to their whims. I do not think it means what they think it means. We are apparently NOT free to consider GPLv3 a bad idea and want no part of it; they'll take away our old GPLv2 stuff and inflict the new license on us by stealth if necessary. GPLv3 hadn't been released when binutils 2.17 shipped, but apparently that's the license it's under now. At least if you get it from their website.)

Luckily Aboriginal Linux's sha1sum check caught it, and automatically fell back to my mirror location, which still has the real 2.17. I noticed all this while trying to figure out why that had happened.


January 10, 2012

Sigh. Me and my big mouth.

The tl;dr version: somebody was an asshole on IRC and I responded in kind for about 15 seconds before /ignoring them, randomly rolled a critical hit on the parting shot, and thus got kicked out of a position in Funtoo development I'd never asked for in the first place. Oh well.

So hanging out on the #funtoo channel, one of the devs suddenly started going off on a random unprovoked rant about how "If welfare worked, I'd lose weight when thin people exercise", and so on.

This pissed me off, since my sister and her four kids have been on food stamps and living in heavily subsidized housing ever since her husband left her for the wife of some military guy deployed to Iraq. I've sent her tens of thousands of dollars over the years, in the past year alone I bought each niecephew a netbook, and flew all five of them to florida to see their great grandparents with us over christmas. My father and grandfather have also each sent her five figure amounts, but I haven't got the money to actually _support_ her, and if she moves she'd lose custody of her kids.

Another friend was recently homeless (she moved to Austin for a job right as the recession hit and the job wasn't here anymore when she got here because the _company_ went away, then she was tied to an apartment lease in a strange city but couldn't find a new job during her three months of saved living expenses, had her car reposessed shortly before she was evicted from her apartment, had her purse stolen her first week on the street with all her photo ID in it, and spent about nine months camped in a park before I found her). She's currently unemployed again (I helped her put her life back together enough to get a minimum wage clerical job but it only lasted a couple months), and now she's trying to survive a bad infection of both kidneys without health insurance (this is apparently why she hasn't been sleeping well since before christmas; as with all poor people she only went to the doctor when the problem wouldn't go away on its' own for a long time, so it's pretty advanced by the time it's diagnosed). She's on 1500 miligrams of antibiotics a day, which I gave her most of the money to pay for (she borrowed the rest from a friend, another dancer she met dancing at a strip club; which was the only job she could get without photo ID and turns out to pay horribly when four different managers want to be "tipped out" to the tune of $50 each every night; she wound up _losing_ money half the time, especially since her skin's a bit sun-damaged from holding a sign on street corners, which might make maybe half minimum wage on a good day and literally nothing on a bad day).

The clinic she went to (the emergency room just tried to give her Vicodin and send her away) says she really _needs_ to be on intravenous antibiotics, but she can't afford them, and the next appointment to see if she can qualify for medicaid isn't until friday. Last I heard one of her kidneys had already shut down. I'm seriously worried she's going to die, of something entirely treatable, because she hasn't got health insurance.

This is not "some anecdote". Her name is Heather, her 26th birthday is a week from Friday. I was hoping to get her dental work for her birthday (living on the streets is really hard on the teeth, first place she went to said a dozen teeth can't be saved and need to come out), but after the other medical bills I can't afford it.

Another friend is dead broke and moved back in with her father because she can't find a job (her degree is in mortuary science, kinda specific), and can't really search for one from his suburban house without a car. (She previously lived with her mother in NYC, and thus never got a driver's license. She's terrified of having to move back in with her mother "because of the roaches".) Her father recently had his _second_ sudden massive abdominal surgery (the details are horrific, but he survived and is doing well enough to have the colostomy bag removed soon), so now she's taking care of him. He's been underemployed since the dot-com crash caused his business to fold, and back on the market in his 50's he hit the age discrimination our industry's full of, so he had to start social security early (hence is getting less of it), then he tried to take up the post-mortgage collapse "refinancing" plans that basically meant the bank told him to stop paying his mortgage until the process completed, and then tried to reposess his house when he complied. (Ongoing legal battle there is in something like its third year, involving every form of malfeasance on the part of the bank you can imagine, including up through the "robo-signers".)

Another friend is currently broke and unemployed after high school, due to a motorcycle accident that hospitalized her (her knee is still screwed up due to severed nerves) that prevented her from starting college on time, and thus cost her a scholarship. She moved from Arkansas to Louisiana recently (more or less couch surfing), and tried to check into a mental hospital (feeling suicidal) which wouldn't take her.

Another friend's been hospitalized a couple times for stress (related to his father winding up in jail basically for life) and a back injury (he's in his 20's but fell down the stairs in his apartment), but luckily he had health insurance. Except now he's unemployed and paying something like $1000/month for Cobra, which has eaten through his savings and is currently going on credit cards (along with his rent and other living expenses) while he job hunts. He's a fairly recent college graduate with only ~3 years experience on his resume, so even though he's probably smarter than me it's a lot harder for him to find a job without moving to a strange city where he has no friends.

This is not an exhaustive list. My ex-roommate Reese was homeless for a while, she got back on her feet after I let her "rent" a room from me for a year and only pay for one month of that. Back when I dabbled in real estate in the late 90's I rented a place to a guy named Tim who had been homeless for a while before I met him (dug himself out of it waiting tables, eventually saved up and bought a trailer in a trailer park). My brother tried his own dabbling in real estate and the mortage crisis happened while he was trying to fix up and sell four properties (one of which he was living in), they all wound up getting reposessed, and he lost his job, so he moved back in with his father for a while.

So when this guy on the Funtoo list started randomly mouthing off about how safety net programs didn't help _him_ and were thus worthless, I told him to "die in a fire" before slash-ignoring him. This was my mistake.

A couple hours Daniel Robbins informed me that the guy's father _did_ die in a fire (about five years ago), and the police suspect him of arson and raided his house last week, so it was the most hurtful possible thing I could have said to him (which I honestly didn't know), and I "couldn't represent the Funtoo project" anymore, which actually comes as something of a relief.

I never actually asked to be on the Funtoo core team, and warned him I probably wouldn't have time to do much with it, I was just working on a technology he found useful. I've wanted to get bootstrap-gentoo working for _years_ and still do, and it's not actually that _hard_ since he and I did about the first third of it in a single evening when he visited Austin last month. But the bits I don't know are in _gentoo_, and learning the guts of gentoo just isn't all that interesting to me, so it's been on my todo list for years and hasn't quite made it to the top yet.

The reason it not being on this core team anymore comes as a relief is I no longer feel _guilty_ about not spending enough time on it. Funtoo wanted me to maintain wiki pages and regression test their "metro" image builder tool (which is another highly integrated thing like catalyst that's almost useless for the bootstrapping I want to do: it's simply not compatible with anything other than itself). These are all interesting todo items, but I have no time! (Heck, I never did set up a standalone funtoo system, and deleted my funtoo chroot while I was in Florida because my fiddling around had glitched it and my netbook was out of space anyway. Now I don't have to feel guilty about not finding time to set it back up.)

That said, yeah I shouldn't have said that to whoever that guy was, wouldn't have if I'd known the context, and would happily apologize to him. Turnabout is fair play and all, but I also had the option to be _better_ than him, and probably should have taken it. To do otherwise makes Wil Wheaton sad.

(I think the real lesson I'm taking away from all this is I shouldn't let it get to be afternoon without having eaten anything all day if there's any chance I'll have to interact with people.)


January 8, 2012

Other bugs in aboriginal linux:

1) Powerpc threading segfaults immediately on application launch. (Non-threaded stuff seems to work fine.)

2) Sparc threading hangs, and the g++ "hello world" build fails with an uknown link type.

3) If the network card config fails the CPUS count is still set to 3, but without distcc the builds don't ge distributed, and boards with only 256 megs of ram can't do -j 3 builds locally without dying.

All of these were there in the 1.1.0 release (and in fact sparc dynamic linking didn't work at all) so they're not _regressions_ and thus don't hold up the release, but I should fix both next time around. (And the proper fix to #3 is really to fix the network cards.

Still debugging the 3.2 kernel. Apparently everybody's network drivers went bye-bye and need a vendor symbol now (not just the intel stuff), and one of my long term todo items has been to switch more board emulations over to gigabit ethernet (which has less overhead and thus works faster than 100baseT or 10baseT emulations).

Unfortunately, enabling the intel E1000 driver on armv5l panics the kernel at some point after it loads. (Sometimes. Other times, it boots up but the interface still doesn't work, I don't think the device probe actually finds one even when I tell qemu to provide one using the same arguments that work on i686. Possibly the qemu code for that is target-specific, even though it's a PCI device.)


January 7, 2012

I got an ssh account on Daniel Robbins' fire-breathing 16-way server with 48 gigs of ram (actually an openvz container), and managed to wget the 3.2 kernel over there yesterday. Today I can't download it to my laptop from kernel.org, so I copied the one from that server.

Yes, the 3.2 kernel is out and even today kernel.org is melting under the strain of serving it. I'm told the local mirrors (eu.kernel.org and such) aren't back up yet. I never got a release out with 3.1 (upgraded the repository to that right after cutting 1.1.0 in October), so I waited for 3.2 to come out and now I'm trying to get 3.2 out promptly to make staying in sync with kernel releases a bit easier.

One of the perl removal patches needed to be rediffed, but that was just to remove fuzz; otherwise still works just fine. The sparc relocation fix for 3.1 is already upstream, so yank that.

In 3.2, the network card config symbols have been deeply screwed up for some reason: now you need to specify which vendor the cards belong to. If you want the E1000 driver, you have to add CONFIG_ETHERNET and CONFIG_NET_VENDOR_INTEL, for no apparent reason. I don't know why they did this. Ask commit dee1ad47f2ee7.


January 4, 2012

I'm mucking about with the early boot code of a Texas Instruments board at work, which means I'm reading through the guts of U-boot.

Bootable kernels tend to make an ELF image, and then run it through "objcopy -O binary elffile binfile" to create a fully linked runnable binary blob of raw machine language, with a fixed load address and start address, that can actually run unmodified on the hardware.

In Linux, this ELF file is called "vmlinux", and in U-boot it's "u-boot", and both are still around after the build if you want to play with them. (QEMU can cheat slightly: it contains an ELF loader that does what objcopy does, and in the process it can determine what load address and start address to use from the ELF metadata. So "qemu -kernel vmlinux" (or u-boot) works on some targets. One of my todo items is to make it work on all of them, the fiddliness is how to feed in the kernel command line.)

The start address generally corresponds to the symbol "_start" in the ELF file (it's actually a field in the ELF header, but the location of _start is the default that field gets set to by the ELF tools), which is some assembly function that does the really low level setup necessary before it can jump to C code. You have to set the processor into the right mode, make sure problematic hardware we haven't configured yet is switched off (disable interrupts and Memory Management Unit), and set the stack pointer to a chunk of memory so C has local variables and so the first C function can call other functions and return from them.

The first thing the C code generally does set up the DRAM controller to start refreshing memory at the correct rate, so data written to DRAM actually stays there. Until the DRAM controller is set up (which is nontrivial, since the same hardware has to work with different brands/speeds/sizes of memory that get refreshed in different ways), all reads from DRAM return random garbage.

This raises a couple of interesting questions. The first is "How does u-boot run before the DRAM init happens?" By running out of something other than DRAM. Your board has to have some non-volatile memory such as ROM or Flash to provide a boot program when you switch it on, which has the disadvantage of being insanely slow compared to DRAM, but once the DRAM is initialized you can copy the rest of your code into DRAM and run from there.

The other question is: where do you get memory for the C stack? In the olden days, C's need for a stack led BIOS vendors to write the DRAM controller init routines in assembly language using no storage other than the CPU registers, which was horrible and is why BIOS development used to be black magic nobody understood. Then the "coreboot" project came up with the a trick of repurposing the processor's data cache via a static TLB entry, so a chunk of address ranges were stored in cache instead of DRAM, providing enough stack space to run a DRAM controller setup function in C. (And there was much rejoicing.) You only need to do it the old way on processors with no L1 or L2 cache (which are obsolete these days because the latency of DRAM fetch from chips a couple inches away does not mix well with modern clock speeds: just sending a signal down that much wire is several clock cycles' round trip, let alone having the DRAM circuitry actually look stuff up).

In general, having more assembly code than necessary raises maintenance problems: it's extremely awkward to write and debug assembly compared to C, and every processor has its own slightly different assembly language which few people know all that well. You generally want to Jump to C code as fast as possible, where the potential number of people who can review/maintain your code increases by an order of magnitude and you have many more tools at your disposal.

Alas, somebody needs to tell Texas Instruments this.

In this case, Texas Instruments went with an overcomplciated 3 layer approach, plus extra hardware. It includes a ROM in its system-on chip and builds U-boot twice. The ROM loads the first instance of U-boot (called "MLO" for some reason) from an SD card into 256k of dedicated on-chip memory (which is not the same as the L2 cache, and is thus mostly wasted after boot time). The ROM has to have a device driver to talk to the SD card through an MMC bus and parse the FAT filesystem to find and read the MLO file. The MLO instance of U-boot has to have its own MMC+SD+FAT driver to load a bigger U-boot into DRAM after it's initialized the DRAM refresh circuitry. And then that U-boot has to have a third instance of the same controller in order to load Linux from the SD card.

Yes, really.

In the u-boot board I'm working on right now, _start is defined in the file "arch/arm/cpu/arm_cortexa8/start.S", and it does _not_ do the minimal work necessary to jump to C. Instead, Texas Instruments wrote extensive setup code in assembly language.

The _start code begins with a branch to the label "reset:", jumping past some memory used to store variables. The reset routine sets the cpu to SVC32 mode, and then has a large blob to "copy vectors to mask ROM indirect addr" inside an #ifdef CONFIG_OMAP34XX that I really hope doesn't apply to the board I'm using. (I doubt Wolfgang Denk would allow such a horror upstream into his code, most likely it's only present in the vendor fork I'm using.)

Next it then optionally calls cpu_init_crit. This disables the processor cache and MMU (in case something left them on), and then calls lowlevel_init which is board-specific init (in this case in arch/arm/cpu/arm_cortexa8/ti81xx/lowlevel_init.S . What does "optionally" mean here? It's in an #ifndef CONFIG_SKIP_LOWLEVEL_INIT, but that doesn't seem to be set in our board's include/config.h or the files sourced from it.)

Lots of plumbing in lowlevel_init.S involves NOR flash, which our board config doesn't seem to have enabled either. The lowlevel_init: label is at line 383 in the source I'm looking at, and minus the NOR bits it does two things: 1) Set the stack pointer to the end of the first block of physical memory (SRAM0_START+SRAM0_SIZE-4). 2) Try to figure out if we're running from DRAM already or not.


January 3, 2012

Bootable kernels tend to make an ELF image, and then run it through "objcopy -O binary elffile binfile" to create a fully linked runnable binary blob.

The rest of this blog entry just explains that statement.

The ELF file format is a container format (like zip/tar/cpio or one the "ar" tool does for *.a static libraries). Executable files, shared libraries, and the *.o files created by compiling source code are all different kinds of ELF files.

Instead of a bunch of separate files, this format is optimized to contain chunks of program code. The archive's metadata describes what kind of code it contains (armv7 little endian 32 bit using the Thumb2 extensions...), and lists a bunch of named "symbols" describing the chunks of data in the archive. This includes "sections" of memory (with attributes like read-only or zero filled), plus the variables and functions that live in those sections. Each variable or function includes a size in bytes, the section name it lives in, the offset within that section it starts at, and so on. A bunch of tools can show this data, but "objdump -x" and "readelf -a" the big ones.

The Linux kernel contains an ELF loader that can parse ELF files and load them into memory, sometimes delegating to a dynamic linker (another program listed in the ELF headers as responsible for dealing with this program).

But running hardware needs raw machine language, so at boot time you need a full linked binary blob, a "load address" to copy it into memory at, and a "start address" to jump to. (Sometimes the start address is the same as the load address. Sometimes a jump to the real start address is inserted at the very start of the file.)

Bootable kernels tend to make an ELF image, and then run it through "objcopy -O binary elffile binfile" to create a fully linked runnable binary blob with a known load address and start address. In the case of Linux, the ELF file is called "vmlinux". In the case of u-boot, the elf file is called "u-boot". They're still around at the end of the build, and you can play with 'em if you like. (If you ever hook a debugger up to a JTAG interface to debug a running kernel, loading the ELF file into the debugger gives it all the symbol information to provide symbol and function names instead of raw memory addresses for everything.)

A "relocateable" program can be loaded and run from any memory location, but it has to either be REALLY carefully programmed (only using relative addresses that are a given distance forward or backward from the current location), or use some sort of wrapper has to run on it to adjust all its addresses with regards to where it's been loaded this time around. The relocation code has to be one big function using entirely local variables: it can't access any globals or call any other functions because those would have to be relocated, which is a chicken and egg problem.

Another common kind of wrapper decompresses the program, so the binary you actually load can be compressed and thus much smaller. Both kind of wrappers run first, and then jumps to the real starting location once they've doen their work on the runnable image. This is a form of relocation, but the wrapper is just copying the output to a known location so it doesn't actually have to patch all its jump and read/write instructions that use an address: those can still be fixed at build time.


January 2, 2012

Sprint is too officially stupid to live. I think breaking the contract is probably worth it. I actually want them to go out of business now, because they've EARNED it.

This morning Sprint's horrible network got astonishingly bad. From my house, with three bars signal, I managed to load _one_ web page (no graphics) after half an hour of trying. Switching to airplane mode and back (force-reinitializing the radio) didn't help, so I rebooted, and when that didn't help I finally broke down and allowed my phone to do the "upgrade" it's been pestering me about since I got it.

This "upgrade" did exactly one thing: it disabled tethering in my phone. That's it. Tethering worked fine out of the box. Now the instant I switch on USB or Wireless tethering, the 3g icon goes away and the bars turn from green to gray. As soon as I switch it off, it comes back. 100% reliable. (I'm not saying the green 3g icon actually passes PACKETS, I'm just saying whether or not it's got a data connection associated with the tower.)

I called sprint's tech support, and they said:

  • Their network has been having 3g outages since the 25th, still ongoing. This week of bad service is apparently not noteworthy.
  • They want to charge me an extra $30 a month to restore my ability to connect my laptop to my phone, which WORKED FINE OUT OF THE BOX until they "upgraded" it away. It's fine for me to repeatedly download DVD images onto my phone and delete them (it's got something like 12 gigabytes of storage, I can queue up a 4 gig image each day entirely in the phone with about 30 seconds effort on my part), so clearly this isn't a _bandwidth_ issue. This is them being greedy.

They already charge me $100/month for the phone. T-mobile is advertising literally half that ($50/month) and their network is reasonably reliable, which sprint's isn't.

This means T-mobile is $70/month cheaper for better service, and breaking the contract with sprint (one month into it) would only cost $350, so I'd come out ahead after 5 months.

I'm sorry, Sprint, but given the above you DESERVE to go out of business.


January 1, 2012

And a new year. Introspection time.

Still employed. I've been doing the "work half the year, do open source half the year" thing for ages, and now that I dunno when/if the Polycom job will end (and hope it won't, but the future's hard to predict), I am either totally falling behind on my open source stuff with a 9-5 job, or falling behind AT THE JOB if I do the "get up at 5am to have some open source programming time before work" thing. It works out ok if I get to bed by 9pm, but I wind up hanging out with Fade instead and going to work on less than 6 hours of sleep, which doesn't end well. My _ideal_ job would be half time, I probably wouldn't even mind sitting in a cubicle for that, but most employers just aren't set up that way.

Still in the tiny condo across the street from a fraternity. My trip to Florida cleared up my sinuses so tremendously that I'm relucatant to buy a bigger place in Austin: Kelly warned years ago that if I didn't have asthma or sinus problems, Austin would provide. And it has. If it's due to being a block downwind of Pease Park's Pollen, moving would help. If the condo is full of black mold or something (which I haven't seen any sign of but who knwos), moving would help. But if it's because Austin is at the intersection of four different ecosystems (hills, desert, plans, ocean) and gets allergens from all of 'em

Four cats is still too many cats. I love the kiggies but I don't want any more kiggies after these: the smell is incredible, they threw up on the couch like five times while we were out, I suspect half the reason my sinuses are so screwed up is incessant cat dander in a confined space, it's hard to travel even when we have time off because of cat care arrangements, and I can't work at home because they constantly pester me for attention (even when we HAVEN'T just been away). Fade and I started accumulating the current batch in 2003 so they're coming up on 8 years old, and going by how long my previous cats have lived this means we've got another decade or so of overwhelming cattitude.

Fade and I still want to have kids, but it hasn't happened and doesn't seem likely to. We looked into adopting, but it's a morass of regulation and hoops to jump through with a multi-year wait that seems designed to guarantee you get the kid just in time to send them off to college, or some such. Oh well. I suppose four cats pretty much ate the household's caring for small ones bandwidth anyway.

I have various unhappy friends and relatives, most of whom need jobs. I try to help out where I can but even with as much money as I'm making it's not enough to address half their needs. (It's nice to be able to help, but buying Heather an extra bottle of ibuprofen is no substitute for the couple thousand dollars worth of dental work she needs. Yes, real example.) I turn 40 later this year. I should probably be focusing on saving for retirement, since the republicrats will have destroyed social security and medicare by the time I get there. (The republicans are evil, the democrats are dishrags hoping that xeno's paradox will protect everything they hold dear as they endlessly meet the unmoving opposition halfway. Not a good combination.)

I need to redo my blog's rss feed generator to create individual pages for each blog entry, with forward/back links, so actually linking to specific blog entries (or chaining together a few of them on a given topic) is a bit more feasible.


Back to 2011