Notes |
(0011484)
Vladimir
09-15-08 22:35
|
Looks like system resources hunger.
Try
while true
do
ls -l
free
done
Is free memory decreasing? |
|
(0011494)
vda
09-16-08 14:34
|
Regarding your "killall init" strace. It definitely working correctly - it signaled init with TERM:
# /strace -fF killall init
execve("/usr/bin/killall", ["killall", "init"], [/* 12 vars */]) = 0
...
close(3) = 0
kill(1, SIGTERM) = 0
exit_group(0) = ?
so the reason why init does not reboot, is somewhere in init code (or in the script run from init when it gets TERM). |
|
(0011504)
vda
09-16-08 14:38
|
While you run this:
while true; do ls -l; done
can you determine which telnetd process is responsible for this session, and run "strace -tt -o telnetd.log -p <telnetd's_pid>" until this hang happens? Attach resulting log to this bug. (You may want to daemonize strace process so that it does not require a separate telnet connection to be open). |
|
(0011514)
xmuchgw
09-16-08 18:51
edited on: 09-16-08 19:02
|
1)When I run the script as follow
while true
do
ls -l
free
done
Found the free memory is not decreasing.
2)I think halt.c has sent the SIGTERM signal to the init process.
But init process can not receive the signal. Because my debug messages in init.c are not printed.
3)HyperTerminal/minicom is connected with development board's ttyS0. I run the script in HyperTerminal/minicom, not in telnet terminal. When the HyperTerminal/minicom hangs, I open a telnet terminal to run "strace -tt -o telnetd.log -p <telnetd's_pid> &". And then run reboot. Look at the attached telnetd.log please.
|
|
(0011614)
vda
09-18-08 15:31
|
> Because my debug messages in init.c are not printed.
Can you show what messages did you add there? (Simply attach a patch to this bug). |
|
(0011624)
vda
09-18-08 15:36
|
> 3)HyperTerminal/minicom is connected with development board's ttyS0. I run the script in HyperTerminal/minicom, not in telnet terminal.
Aha, so the the session over serial link hangs, not telnet session. What program is uses to initiate this session? getty or something else?
> When the HyperTerminal/minicom hangs, I open a telnet terminal to run "strace -tt -o telnetd.log -p <telnetd's_pid> &". And then run reboot.
strace serial session then, not telnetd. "strace -tt -o serial.log -f -p <ash's pid> &". And start strace BEFORE you run "while" loop, because you need to capture the moment it hangs.
> Look at the attached telnetd.log please.
Because you straced your own telnetd session, I just see "reboot" being run there. Not useful. |
|
(0011654)
xmuchgw
09-18-08 23:26
|
>Aha, so the the session over serial link hangs, not telnet session. What program is uses to initiate this session? getty or something else?
The driver of my serial is 8250.c base on linux kernel 2.6.23. My /etc/inittab is as follow:
# Note: BusyBox init works just fine without an inittab. If no inittab is
# found, it has the following default behavior:
::sysinit:/etc/init.d/rcS
::ctrlaltdel:/sbin/reboot
::shutdown:/sbin/swapoff -a
::shutdown:/bin/umount -a -r
::restart:/sbin/init
# Start an "askfirst" shell on the console (whatever that may be)
::askfirst:-/bin/sh
>strace serial session then, not telnetd. "strace -tt -o serial.log -f -p <ash's pid> &". And start strace BEFORE you run "while" loop, because you need to capture the moment it hangs.
Because the output message is too much. I only list the bottom message in serial.log file. Look at the attached serial.log please
>Can you show what messages did you add there? (Simply attach a patch to this bug).
Please look at the init.c.patch. |
|
(0011694)
vda
09-19-08 16:22
|
> The driver of my serial is 8250.c base on linux kernel 2.6.23. My /etc/inittab is as follow:
> ::askfirst:-/bin/sh
So it's a bare shell. Hmm. Try "exec getty <baudrate> -" and log in from resulting login prompt. Now run the test, does it still hang? (getty does some tty setup, maybe it would help).
>> strace serial session
> Because the output message is too much. I only list the bottom message in serial.log file. Look at the attached serial.log please
It shows that last write() syscall has blocked, as if kernel thinks the buffer is full or something like that.
> Please look at the init.c.patch.
static void halt_reboot_pwoff(int sig)
{
const char *m;
int rb;
+ printf(__FILE__" [line: %d]\n", __LINE__);
+ message(L_CONSOLE | L_LOG, "[chgw] in function halt_reboot_pwoff!");
Not reliable if serial line is already bothched. may block.
+#ifdef R_DBG
+ printf(__FILE__" [line: %d] enable R_DBG\n", __LINE__);
+ if((dbg_fd=open("debug.log", O_WRONLY |O_TRUNC | O_CREAT | O_SYNC)) < 0)
Good idea to write a file to disk.
You forgot the mode (3rd param): open(name, O_xxx, 0666)
BTW, in strace I see that "debug.log" indeed exists. So signal handler in init.c _is reached_.
+ if(5 != write(dbg_fd, "11111", 5))
+ {
+ printf("open write 11111 error\n");
+ }
Don't bother with error checks. just use write(dbg_fd, "11111", 5), it's just debugging after all.
Add more such write's to find out where it stops in init.c |
|
(0012334)
vda
09-28-08 16:19
|
xmuchgw, any news? |
|
(0012514)
xmuchgw
09-29-08 19:20
|
I don't have any process.
I think it's my serial driver's problem. But I don't have any evidence to prove that. I am investigating it right now.
I will let you know if I have any process.
Thanks a lot for you support. |
|
(0013684)
xmuchgw
10-16-08 02:44
|
I can not fix this bug.
I am sorry I give up to fix it. |
|
(0013694)
vda
10-16-08 03:13
|
You were doing just fine in comment "09-18-08 23:26", I thought you will continue the same procedure - inserting write()s to a debug file in init code. It is a relatively simple way to find where is it stuck. And if you will find that place, it's possible you will just need to remove a code which tries to write to stdout/stderr and blocks on serial output there.
That might fix the bug.
If not, at least you will be able to show where exactly it hangs (blocks). |
|