Hobbyist tech writers: need two skillsets. Ability to program, ability to write. (I program for a hobby, and I write for a hobby; in college I got a computer science major and an english minor.) Hobbyist _editors_... Merging documention into the kernel tarball doesn't magically keep it up to date because documentation is not the same as source code. Documentation doesn't compile, it never breaks the build, it's not checked by sparse. Keyword search and replace can trigger on it, but a chinese translation? Two goals: confuse "so people can find it" with "so it's kept up to date". Talk points: 1) Stop thinking of documentation in the kernel tarball as special. - Internally fragmented: Doc, menuconfig, htmldocs (rfc, readme, make help). - They serve different purposes, and have different structural constraints, so none of them merge easily. - Even man-pages is a separate package, living on kernel.org. - The internet has developer web pages, wikis, blog entries, magazine articles, online books, conference papers, audio and video recordings of talks, standards documents, mailing list archives, git commit messages, and more. - I am here to disabuse you of the notion that Documentation/* is special. - If you take nothing else away from this talk but that, I'll be happy. - I'll return to this topic at the end. - So let's look at documentation on internet. 2) Problems with documentation on the internet: - Can't find it. (May or may not exist.) - Poor quality: Out of date/inaccurate. - Need to find best existing documentation and identify the holes. - Needs an index! 3) Sturgeon's law: 90% of everything is crap. - When fighting off sturgeon's law, throwing away 90% of your input is normal. - Publishing has "the slush pile". Unsolicited submissions are of HORRIBLE average quality. This is a universal constant. - Editorial approach: freelance writers submit to a section editor, who submits to the editor in chief, who assembles and puts out the next issue. At each level, articles may be rejected, bounced back with comments, merged, polished, or even accepted as-is. But more is rejected than accepted. - A response telling you to change stuff is PRAISE. Even a personal rejection means the editor saw potential. Takes a while to learn this. - The documentation problem is fundamentally an editorial problem. There's tons of crap out there, need to find it, filter it, stitch it together in some kind of coherent order. - Open source projects use the same editorial approach. In Linux, the normal submission chain is developer, maintainer, lieutenant, linus. - It added layers for scalability reasons. Quote from Alan Cox: "A maintainer's job is to say no." - Distros (Ubuntu, Red Hat, etc) are also editors, at the next level up. Stitch together a release from packages the way a newspaper editor stitches together the next issue from the AP newswire and a little independent reporting. - The documentation problem is also an editorial problem. There's tons of crap out there, need to find it, filter it, stitch it together in some kind of coherent order. - Editors are special people. Project maintainers. They must have taste, but must be willing to hold their nose and wade through slush pile. Must have enough knowledge of the subject to know a good article when they see it, and be able to tighten it up. It's a different skill from writing (code/prose), and it takes time _away_ from writing. - Veto power is the main power an editor has, especially in open source. - Even if they can afford to commission specific articles/patches along the lines they want, there's no guarantee the result will need less editing than the freelance submissions to get it into shape to merge. - They can't just say "I don't like these submissions, I'll write it myself" very often, because that doesn't scale. - Steven King's editor is not a better writer than Steven King. It's an adjacent but different skill set. The editor must be a decent writer/programmer, but isn't necessarily the best. - Alternatives to editorial approach. - Online democratic ranking. Doesn't work. - The "wisdom of crowds" tends to select for LOLcat pictures. Voting says "yes". Saying "no" is not the same as failing to say "yes". Slashdot-style comments can't give guidance for how to change a patch to get it in, because sturgeon's law applies to that guidance! - Digg can help filter, but it's not a replacement for an editor. 2) Realize that documentation sources are dynamic. - Mailing lists informative but ephemeral. If you didn't read it at the time you're unlikely to stumble across just the right thread from 2003 in the archives. - Magazines have new issues regularly. - Some commit messages (or 0/23 patchbomb posts) should be memorialized. 3) The big missing piece is indexing. - Google doesn't tell you what questions to ask. 4) Writing new documentation can be a form of indexing. You could always learn what you needed by reading source code, reading list archives, asking questions, trying experiments. You _have_ to learn it the hard way before you can write about it. The problem with that approach is that answering a simple question can take weeks. The information is out there, writing documentation saves others the time and effort of finding it themselves. But if there's other documentation out there, are you adding to the noise or reducing it? 5) Indexing it: Giant bookmarks list of answers to questions you didn't know to ask. Sorting it into order so the answers make sense. 6) Keeping it up to date: Memorializing old answers so you can still find 'em later. Is this answer still correct? This is a secondary issue. Wrong information will inspire feedback. "Don't ask questions, post errors" http://linux-man-pages.blogspot.com Softnet in 2.3 multithreaded the network stack and let us do interrupt mitigation (fallback to polling). Collating indexes! Matthew Wilcox's keynote, collected together things he's worked on. Basics (red/black trees, blocking issues) vs current implementation details vs API.