Wednesday, June 12, 2013
I have a daemon running on CentOS4.3. By the way, an operator told me that the server it resides should be moved to the other rack. So I decided to move and upgrade OS version to CentOS6.3 with gcc-4.4.6 during migration. I moved and compiled. But it stopped after several minutes. The daemon spawns many threads and each thread has socket connected via boost::asio. Each thread passes connected socket to libssh2_session_handshake() for ssh connection. Libssh2 library is used here. It uses libgcrypt library to generate 16 bytes of random numbers. However, gdb didn't show call stack when it goes into library routine(showed function name like ???() ). To know where it stops, I downloaded each libraries(libssh2, libgcrypt, libgpg-error) and compiled them again. Gdb finally showed that libgcrypt tries to attain mutex lock to access random pool for generating random number, it stops because it designed to fail to get mutex lock and assert. It works well on single thread environment. This isn't typical way to use libssh2 library. No helpful answer found at Google... TT.
After doing more googling, bingging I found interesting information. Acturally both libssh2 and libgcrypt are designed for multithread environment. There was nothing to do for libssh2 to run it on MT-ed code. But something should have been done for libgcrypto. Because it doesn't know which thread mechanism is used in the host, it implemented two types of structures and functions for mutex as a macro. One for pthread and the other for pth. So the application code that I'm managing should select which macro to be used by adding following lines before it forks thread(pthread selected).
And it also should call gcry_control() so that the library assign callbacks to each functions related to mutex it uses like below.
Several day has passed, and the server works well up to now. Hooray! :)