Anonymous | Login | Signup for a new account | 11-10-2008 11:05 PST |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Reproducibility | Date Submitted | Last Update | |||||||
0001013 | [uClibc] Posix Threads | major | always | 08-31-06 09:04 | 10-11-08 03:07 | |||||||
Reporter | jalaber | View Status | public | |||||||||
Assigned To | khem | |||||||||||
Priority | normal | Resolution | open | |||||||||
Status | assigned | Product Version | ||||||||||
Summary | 0001013: pthread_cancel/pthread_join sequence hangs when using select in an other thread | |||||||||||
Description |
Hello, I have found a very strange bug in uClibc using pthread_cancel/pthread_join. My test program launches 1 thread which basically makes a select call with a struct timeval set to 600ms. Then the main thread calls pthread_cancel and pthread_join, followed by a printf. The program hangs. However if you remove the printf call, then the program terminates normally. I have tried to replace the select call with a sem_wait call, and everything works fine with or without printf. So the problem seems to happen only with select. I use buildroot with kernel 2.4.28 and uclibc 0.9.28. I have attached the program to reproduce. If you try to comment the printf("join OK\n"), it works for me. Thank you for your time and help, Philippe. |
|||||||||||
Additional Information | ||||||||||||
Attached Files |
![]() ![]() |
|||||||||||
|
![]() |
|
(0001715) dwagner 10-23-06 11:43 |
I think this issue is responsible that the LIRC driver of directfb RC1 does not terminate. The driver uses select() and pthread_cancel()/pthread_join(). Please fix that. |
(0001744) vapier 11-16-06 23:07 |
here's a tip ... saying things like "Please fix that." makes people think "Fix it your goddamn self." the hang may be because of the IO mutex being held by the canceled thread ... if you turn on PDEBUG in libpthread/linuxthreads/debug.h, that may give you helpful output |
(0002526) chombourger 06-27-07 12:58 |
I tried to reproduce this issue on uclibc 0.9.29 running on a PC with a 2.6 linux kernel and your program is still running as I am typing these lines! Is it what you are getting? Running the same program on the host (compiled and linked against glibc worked). I will now try to enable the debug traces to see if that helps. |
(0002527) chombourger 06-27-07 13:15 |
traces with debug enabled in linuxthreads.old: 26294 : __pthread_initialize_manager: manager stack: size=8160, bos=0x804a150, tos=0x804c130 26294 : __pthread_initialize_manager: send REQ_DEBUG to manager thread 26294 : pthread_create: write REQ_CREATE to manager thread 26294 : pthread_create: before suspend(self) 26295 : __pthread_manager: before poll 26295 : __pthread_manager: after poll 26295 : __pthread_manager: before __libc_read 26295 : __pthread_manager: after __libc_read, n=148 26295 : __pthread_manager: got REQ_CREATE 26295 : pthread_handle_create: cloning new_thread = 0xbf1ffe20 26295 : pthread_handle_create: new thread pid = 26296 26295 : __pthread_manager: restarting -1208466944 26294 : pthread_create: after suspend(self) 26295 : __pthread_manager: before poll 26296 : pthread_start_thread: step 0 step 1 step 2 26295 : __pthread_manager: after poll 26295 : __pthread_manager: before poll step 3 cancel th... 26294 : pthread_cancel: sending cancel signal to 26296 26294 : pthread_cancel: kill returned 0 |
(0002528) chombourger 06-27-07 13:26 edited on: 06-30-07 14:22 |
It seems that the created thread has no jmpbuf when pthread_handle_sigcancel() is called in the created thread and the signal handler returns and the thread was not rerouted. select() does not behave like a cancellation point (while it should). Could it be because select() is simply a syscall5 and we therefore never reach the sigwait() function of the pthread library? If I modify the select() call as follow, the thread is indeed canceled: r = select(0, NULL, NULL, &tv); if ((r == -1) && (errno == EINTR)) pthread_testcancel(); I eventually found where linuxthreads.old defines cancellable system calls: wrapsyscall.c and added an entry for select(2). Note: select() was previously listed as a cancellation point but it got removed by Ulrich Depper and I don't know why: CVSROOT: /cvs/glibc Module name: libc Changes by: drepper@sourceware.org 2002-12-15 13:43:25 Modified files: linuxthreads : wrapsyscall.c Log message: Remove creat, poll, pselect, readv, select, sigpause, sigsuspend, sigwaitinfo, waitid, and writev wrappers. I have attached to this report, a patch re-introducing select(2). |
(0002712) hmoffatt 09-04-07 21:25 |
I have an application which is hanging with uClibc 0.9.29. The main process is regularly calling fork() and exec(). There is also a thread which is doing a select() with a 1ms timeout in an endless loop. The application sometimes hangs just after the fork/exec; the exec has happened (there is a zombie process left around). I tried the patch in this bug report; now the program segfaults instead of hanging. So I don't think the patch is the correct solution. |
(0002713) chombourger 09-05-07 00:30 |
Three questions: (a) have you tried running this program on a glibc system and does it work? (b) is your app. making any use of pthread_cancel() and pthread_join()? (c) can you provide us a stripped down version of your application so that we can reproduce the bug/segfault with it? |
(0002714) hmoffatt 09-05-07 02:58 |
My application is in Python. At the time my hang occurs I am not using pthread_join or cancel; a finite set of threads have been created some time earlier and should continue to exist for the life of the program. Hence the original problem in this report doesn't describe my situation. Nonetheless the patch had some impact which suggests it may not be right. My original process (not one of the threads) is regularly calling fork() and exec() (via python wrappers). It appears that the process hangs somewhere after exec(), before returning to my interpreted program. strace shows that the program did get SIGCHLD as the last thing that happened, meaning that the child has exited. Then it seems to be sleeping waiting for something to happen. There is another thread which is calling select() with a 1ms sleep indefinitely. I think it hangs also though I will have to retest to be sure. The thread manager thread seems to be still running ok, calling poll() with a 1 second timeout. strace shows it is still running. When I put in the patch from this report, the select() thread dies with SIGSEGV. I am trying to build glibc for the embedded system now. I will also try to run it on my desktop with glibc and reproduce it. |
(0002721) hmoffatt 09-05-07 21:58 |
I have now tested this with glibc 2.3.6 with linuxthreads (not NPTL). It segfaults just the same as with your patch for uClibc. I guess that could be considered a positive sign.. probably meaning that my bug is somewhere else. |
(0002722) chombourger 09-06-07 01:14 |
Ok interesting point. Do you have a backtrace when it crashes? Wondering if the segfaults occurs within the libc. |
(0002723) hmoffatt 09-06-07 07:45 |
My cross-gdb insists on trying to load the host libraries rather than the target ones so I can't get a meaningful back trace even though I have the build and a core file. :( Looks like it is reading the absolute paths from the core file; I really need to be able to prepend a path to those when loading into the cross-gdb. Is that possible? I have had even less success doing a live cross-gdb against gdbserver. |
(0002724) chombourger 09-06-07 07:51 |
You could use the gdb setting 'set solib-absolute-prefix PATH' to tell gdb the base prefix of your target file-system That used to work for me. Hope this helps! |
(0002725) hmoffatt 09-06-07 16:27 |
Great, solib-absolute-prefix was exactly what I needed. I got the following when tracing against glibc; I don't have a build with the latest uclibc ready to test at the moment. Core was generated by `/usr/bin/python /opt/calyptech/lib/webserver/server.py'. Program terminated with signal 11, Segmentation fault. 0 0x4021b8ec in sem_wait () from /home/hamish/work/robots/glibc-romfs/lib/libpthread.so.0 (gdb) (gdb) bt 0 0x4021b8ec in sem_wait () from /home/hamish/work/robots/glibc-romfs/lib/libpthread.so.0 0000001 0xbe5ff54c in ?? () gdb is insisting that my core file does not match the python binary though so it may be confused. I'm sure that it does (I copied the binary back out of the embedded system). I'm not sure if this is related to threads or not. |
(0009924) thomask 07-23-08 06:32 |
Is there any update on this bug? Looks like it's still there. |
Copyright © 2000 - 2006 Mantis Group |