-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
While investigating HELP-25377, I noticed that _mongoc_get_rr_search uses strerror to print an error message from h_errno. The latter is set when an error occurs in the res_nsearch or res_search calls earlier. However, h_errno is not designed to be run through strerror, and the actual error is different from what we see in the error message. In HELP-25377 in particular, the error message seen was "Interrupted system call". We can see its mapping:
#define EINTR 4 /* Interrupted system call */
Looking at the error section for h_errno in the manual, this is not at all what's happening:
Errors
The variable h_errno can have the following values:
HOST_NOT_FOUND
The specified host is unknown.
NO_ADDRESS or NO_DATA
The requested name is valid but does not have an IP address.
NO_RECOVERY
A nonrecoverable name server error occurred.
TRY_AGAIN
A temporary error occurred on an authoritative name server. Try again later.
h_errno.h also defines hstrerror to retrieve the error string for a given error code, but this has been marked obsolete. With that in mind, I'd suggest adding _mongoc_hstrerror to get an error string for an error, taken from the list above.
I'll note that whether on purpose or by oversight, the function also ignores the TRY_AGAIN error. One could argue that "Try again later" does not suggest retrying the lookup right away, and it also wouldn't have fixed the problem in HELP-25377 as h_errno is set to NO_DATA. However, it might be beneficial to try again to protect against transient failures.
- related to
-
CDRIVER-4249 Undeclared DNS constants and symbols when building with POSIX 2008
- Closed