Notes |
(0009494)
vda
07-16-08 16:25
|
Nope.
unsigned long FAST_FUNC bb_strtoul(const char *arg, char **endp, int base)
{
unsigned long v;
char *endptr;
if (!isalnum(arg[0])) return ret_ERANGE(); <==========!!!
errno = 0;
v = strtoul(arg, &endptr, base);
return handle_errors(v, endp, endptr);
}
We definitely know that arg[0] is a digit. Therefore strtoul will not set
endptr to &arg[0], at the very worst, it will be &arg[1]. And thus endptr[-1]
is valid (it references arg[0]). |
|
(0009504)
vda
07-16-08 16:28
|
> Is that still needed? On my libc, calling strtol("-", &endptr, 10) sets endptr as expected to the beginning of the buffer storing "-"...
Yes. This is not a problem. The problem is, strtol returns 0 and does not set errno. I refuse to believe that string "-" is a valid representation of number 'zero', and I don't want busybox tools to accept that as a valid numeric input. |
|
(0009514)
cristic
07-16-08 17:17
|
> if (!isalnum(arg[0])) return ret_ERANGE(); <==========!!!
> ...
> We definitely know that arg[0] is a digit. Therefore strtoul will not set
> endptr to &arg[0], at the very worst, it will be &arg[1]. And thus endptr[-1]
> is valid (it references arg[0]).
Sure, but isalnum() tests if the character is alphanumeric (digit _or_ letter), not if it's a digit. But I agree that if you replace isalnum with isdigit you solve the problem; quite a short fix. |
|
(0009564)
vda
07-17-08 01:45
edited on: 07-17-08 01:45
|
Oh. indeed, its isalnum! :(
replacing isalnum with isdigit is not good - it will break conversions to base 16.
I think the working fix would be swapping "if (endptr[0])" and "if (endptr[-1] == '-')" checks in handle_errors(). If strtoul indeed stopped on the very first char, if (endptr[0])... will trigger and in the if body it checks for isalnum() and returns. "if (endptr[-1]..." won't be executed.
|
|
(0009674)
cristic
07-17-08 13:01
|
I think the handle_errors() functions should not be more complicated than this:
static unsigned long long handle_errors(unsigned long long v, char **endp, char *endptr)
{
if (endp) *endp = endptr;
/* errno is already set to ERANGE by strtoXXX if value overflowed */
if (!errno && endptr[0]) {
/* good number, just suspicious terminator */
errno = EINVAL;
}
return v;
}
I tested it in conjunction with bb_strtol() and it seems to work
fine. I think the confusion is about the contract for
strto[u][l]. The contract for these functions is to convert the
prefix of the given string to a number, not the entire number.
And it looks the way EINVAL is set is not portable across
platforms. So, strtol("-", &endptr, 10) does indeed return "0"
and sets errno to 0, but so does strtol("w", &endptr, 10). But
endptr[0] is zero if and only if the entire string is valid, so I
think testing it is enough. |
|
(0009684)
vda
07-17-08 14:46
edited on: 07-17-08 14:47
|
Try:
busybox printf '%d\n' -
For me it prints "-0", and this is WRONG. "-" is not 0 or -0, it's not a number.
> strtol("-", &endptr, 10) does indeed return "0" and sets errno to 0, but so does strtol("w", &endptr, 10)
But in case of "w" examining endptr[0] and seeing 'w' there clearly says us that conversion failed, whereas with "-" endptr[0] is NUL. Many programs conclude that if endptr[0] is NUL and errno is 0 it means that entire string was a valid number. With "-" it is not true! I want to detect that case.
And btw, printf fails to do it not because bb_strtol is buggy, but because printf fails to test for errno!=0. Here is the instrumented output:
sh-3.2# ./busybox printf '%d\n' -
printf: bb_strtol('-'):0 errno:22
-printf: my_xstrtol('-'):0 errno:22
printf: bb_strtol('-'):0 errno:22
-0
And for comparison, this is what shell and coreutils do:
sh-3.2# printf '%d\n' -
sh: printf: -: invalid number
0
sh-3.2# /usr/bin/printf '%d\n' -
/usr/bin/printf: -: expected a numeric value
0
|
|
(0009694)
cristic
07-17-08 15:06
edited on: 07-17-08 15:06
|
>But in case of "w" examining endptr[0] and seeing 'w' there clearly says us that
>conversion failed, whereas with "-" endptr[0] is NUL. Many programs conclude that
>if endptr[0] is NUL and errno is 0 it means that entire string was a valid
>number. With "-" it is not true! I want to detect that case.
Are you sure? What libc are you using? On my libc (GNU libc 2.6), if I run strtol("-", &endptr, 10) and then examine endptr[0], I see "-", not NUL. So with the version of handle_errors() that I proposed, bb_strtol() sets errno to EINVAL, as desired. Maybe this is a bug in uclibc?
|
|
(0009744)
vda
07-18-08 11:10
|
You are right. Please test 5.patch.
NB: I dont want to set EINVAL on "-", I want to set ERANGE. ERANGE is considered to be "serious" error here - "this is not a number at all!" |
|
(0009774)
cristic
07-18-08 17:25
edited on: 07-18-08 17:26
|
OK, I tested it and it looks like the overflow is gone now.
But I have two other comments:
1. I'm not really sure I understand when you want to return EINVAL and when
ERANGE. My understanding was that you want to return ERANGE when the number is
valid, but out of range, and EINVAL when you encounter an invalid character.
But this is definitely not the case. So for example:
bb_strtol("123456123456", &endptr, 10) -> sets ERANGE (correct number, but too large)
bb_strtol("123x", &endptr, 10) -> sets ERANGE too (invalid number)
2. In bb_strtol, you might want to replace
if (!isalnum(arg[0])) return ret_ERANGE();
by something like
if (!isalnum(first)) { *endp = (char*) arg; return ret_ERANGE();}
so that callers don't read uninitialized memory if they inspect **endp.
|
|
(0009784)
vda
07-19-08 01:20
edited on: 07-19-08 01:23
|
bb_strtonum.c says it all:
/* On exit: errno = 0 only if there was non-empty, '\0' terminated value
* errno = EINVAL if value was not '\0' terminated, but othervise ok
* Return value is still valid, caller should just check whether end[0]
* is a valid terminating char for particular case. OTOH, if caller
* requires '\0' terminated input, [s]he can just check errno == 0.
* errno = ERANGE if value had alphanumeric terminating char ("1234abcg").
* errno = ERANGE if value is out of range, missing, etc.
* errno = ERANGE if value had minus sign for strtouXX (even "-0" is not ok )
* return value is all-ones in this case.
*/
I found out that with this convention bb_strtoXXX() are easier to use than strtoXXX. In many cases I don't need to analyse endptr, and thus I don't need to _have_ endptr, I just pass NULL.
> - if (!isalnum(arg[0])) return ret_ERANGE();
> + if (!isalnum(first)) { *endp = (char*) arg; return ret_ERANGE();}
Adds more code. ERANGE according to the above is "hard error". No point in examining anything, the number is *bogus* anyway, and you cannot "make it right" by examining endptr.
|
|
(0009814)
cristic
07-19-08 18:11
edited on: 07-19-08 18:13
|
Thanks for the explanation. BTW, it might make more sense to use isxdigit (0-9, a-f, A-F) instead of isalnum, but this is a minor issue.
|
|
(0009824)
vda
07-20-08 05:45
|
think bb_strtou(str, NULL, 36) |
|
(0009834)
cristic
07-20-08 12:16
|
> think bb_strtou(str, NULL, 36)
Makes sense, my assumption was that base <= 16.
We can finally close it then, thanks again for the patch. |
|
(0009844)
vda
07-20-08 15:12
|
Thanks for your bug spotting. Please file more bugs! :) |
|
(0009854)
vda
07-20-08 15:13
|
Fixed in svn. |
|