| Commit message (Expand) | Author | Age | Files | Lines |
| * | cleanup use of visibility attributes in pthread_cancel.c•••applying the attribute to a weak_alias macro was a hack. instead use a
separate declaration to apply the visibility, and consolidate
declarations together to avoid having visibility mess all over the
file.
| Rich Felker | 2015-04-14 | 1 | -8/+9 |
| * | fix inconsistent visibility for internal syscall symbols | Rich Felker | 2015-04-14 | 1 | -0/+5 |
| * | consistently use hidden visibility for cancellable syscall internals•••in a few places, non-hidden symbols were referenced from asm in ways
that assumed ld-time binding. while these is no semantic reason these
symbols need to be hidden, fixing the references without making them
hidden was going to be ugly, and hidden reduces some bloat anyway.
in the asm files, .global/.hidden directives have been moved to the
top to unclutter the actual code.
| Rich Felker | 2015-04-14 | 11 | -30/+96 |
| * | fix inconsistent visibility for internal __tls_get_new function•••at the point of call it was declared hidden, but the definition was
not hidden. for some toolchains this inconsistency produced textrels
without ld-time binding.
| Rich Felker | 2015-04-14 | 1 | -3/+2 |
| * | remove remnants of support for running in no-thread-pointer mode•••since 1.1.0, musl has nominally required a thread pointer to be setup.
most of the remaining code that was checking for its availability was
doing so for the sake of being usable by the dynamic linker. as of
commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer
necessary; the thread pointer is now valid before any libc code
(outside of dynamic linker bootstrap functions) runs.
this commit essentially concludes "phase 3" of the "transition path
for removing lazy init of thread pointer" project that began during
the 1.1.0 release cycle.
| Rich Felker | 2015-04-13 | 4 | -11/+5 |
| * | allow i386 __set_thread_area to be called more than once•••previously a new GDT slot was requested, even if one had already been
obtained by a previous call. instead extract the old slot number from
GS and reuse it if it was already set. the formula (GS-3)/8 for the
slot number automatically yields -1 (request for new slot) if GS is
zero (unset).
| Rich Felker | 2015-04-13 | 1 | -1/+5 |
| * | remove mismatched arguments from vmlock function definitions•••commit f08ab9e61a147630497198fe3239149275c0a3f4 introduced these
accidentally as remnants of some work I tried that did not work out.
| Rich Felker | 2015-04-11 | 1 | -2/+2 |
| * | apply vmlock wait to __unmapself in pthread_exit | Rich Felker | 2015-04-10 | 1 | -0/+4 |
| * | redesign and simplify vmlock system•••this global lock allows certain unlock-type primitives to exclude
mmap/munmap operations which could change the identity of virtual
addresses while references to them still exist.
the original design mistakenly assumed mmap/munmap would conversely
need to exclude the same operations which exclude mmap/munmap, so the
vmlock was implemented as a sort of 'symmetric recursive rwlock'. this
turned out to be unnecessary.
commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the
interval during which mmap/munmap held their side of the lock, but
left the inappropriate lock design and some inefficiency.
the new design uses a separate function, __vm_wait, which does not
hold any lock itself and only waits for lock users which were already
present when it was called to release the lock. this is sufficient
because of the way operations that need to be excluded are sequenced:
the "unlock-type" operations using the vmlock need only block
mmap/munmap operations that are precipitated by (and thus sequenced
after) the atomic-unlock they perform while holding the vmlock.
this allows for a spectacular lack of synchronization in the __vm_wait
function itself.
| Rich Felker | 2015-04-10 | 5 | -30/+18 |
| * | optimize out setting up robust list with kernel when not needed•••as a result of commit 12e1e324683a1d381b7f15dd36c99b37dd44d940, kernel
processing of the robust list is only needed for process-shared
mutexes. previously the first attempt to lock any owner-tracked mutex
resulted in robust list initialization and a set_robust_list syscall.
this is no longer necessary, and since the kernel's record of the
robust list must now be cleared at thread exit time for detached
threads, optimizing it out is more worthwhile than before too.
| Rich Felker | 2015-04-10 | 2 | -6/+5 |
| * | process robust list in pthread_exit to fix detached thread use-after-unmap•••the robust list head lies in the thread structure, which is unmapped
before exit for detached threads. this leaves the kernel unable to
process the exiting thread's robust list, and with a dangling pointer
which may happen to point to new unrelated data at the time the kernel
processes it.
userspace processing of the robust list was already needed for
non-pshared robust mutexes in order to perform private futex wakes
rather than the shared ones the kernel would do, but it was
conditional on linking pthread_mutexattr_setrobust and did not bother
processing the pshared mutexes in the list, which requires additional
logic for the robust list pending slot in case pthread_exit is
interrupted by asynchronous process termination.
the new robust list processing code is linked unconditionally (inlined
in pthread_exit), handles both private and shared mutexes, and also
removes the kernel's reference to the robust list before unmapping and
exit if the exiting thread is detached.
| Rich Felker | 2015-04-10 | 2 | -26/+27 |
| * | block all signals (even internal ones) in cancellation signal handler•••previously the implementation-internal signal used for multithreaded
set*id operations was left unblocked during handling of the
cancellation signal. however, on some archs, signal contexts are huge
(up to 5k) and the possibility of nested signal handlers drastically
increases the minimum stack requirement. since the cancellation signal
handler will do its job and return in bounded time before possibly
passing execution to application code, there is no need to allow other
signals to interrupt it.
| Rich Felker | 2015-03-16 | 1 | -1/+2 |
| * | add aarch64 port•••This adds complete aarch64 target support including bigendian subarch.
Some of the long double math functions are known to be broken otherwise
interfaces should be fully functional, but at this point consider this
port experimental.
Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
| Szabolcs Nagy | 2015-03-11 | 4 | -0/+69 |
| * | fix regression in pthread_cond_wait with cancellation disabled•••due to a logic error in the use of masked cancellation mode,
pthread_cond_wait did not honor PTHREAD_CANCEL_DISABLE but instead
failed with ECANCELED when cancellation was pending.
| Rich Felker | 2015-03-07 | 1 | -0/+1 |
| * | fix signed left-shift overflow in pthread_condattr_setpshared | Rich Felker | 2015-03-04 | 1 | -1/+1 |
| * | make all objects used with atomic operations volatile•••the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
| Rich Felker | 2015-03-03 | 9 | -16/+18 |
| * | suppress masked cancellation in pthread_join•••like close, pthread_join is a resource-deallocation function which is
also a cancellation point. the intent of masked cancellation mode is
to exempt such functions from failure with ECANCELED.
| Rich Felker | 2015-03-02 | 1 | -1/+5 |
| * | fix namespace issue in pthread_join affecting thrd_join•••pthread_testcancel is not in the ISO C reserved namespace and thus
cannot be used here. use the namespace-protected version of the
function instead.
| Rich Felker | 2015-03-02 | 1 | -1/+2 |
| * | factor cancellation cleanup push/pop out of futex __timedwait function•••previously, the __timedwait function was optionally a cancellation
point depending on whether it was passed a pointer to a cleaup
function and context to register. as of now, only one caller actually
used such a cleanup function (and it may face removal soon); most
callers either passed a null pointer to disable cancellation or a
dummy cleanup function.
now, __timedwait is never a cancellation point, and __timedwait_cp is
the cancellable version. this makes the intent of the calling code
more obvious and avoids ugly dummy functions and long argument lists.
| Rich Felker | 2015-03-02 | 7 | -24/+21 |
| * | fix failure of internal futex __timedwait to report ECANCELED•••as part of abstracting the futex wait, this function suppresses all
futex error values which callers should not see using a whitelist
approach. when the masked cancellation mode was added, the new
ECANCELED error was not whitelisted. this omission caused the new
pthread_cond_wait code using masked cancellation to exhibit a spurious
wake (rather than acting on cancellation) when the request arrived
after blocking on the cond var.
| Rich Felker | 2015-02-27 | 1 | -1/+1 |
| * | fix breakage in pthread_cond_wait due to typo•••due to accidental use of = instead of ==, the error code was always
set to zero in the signaled wake case for non-shared cv waits.
suppressing ETIMEDOUT (the only possible wait error) is harmless and
actually permitted in this case, but suppressing mutex errors could
give the caller false information about the state of the mutex.
commit 8741ffe625363a553e8f509dc3ca7b071bdbab47 introduced this
regression and commit d9da1fb8c592469431c764732d09f7756340190e
preserved it when reorganizing the code.
| Rich Felker | 2015-02-23 | 1 | -1/+1 |
| * | simplify cond var code now that cleanup handler is not needed | Rich Felker | 2015-02-22 | 1 | -86/+63 |
| * | fix pthread_cond_wait cancellation race•••it's possible that signaling a waiter races with cancellation of that
same waiter. previously, cancellation was acted upon, causing the
signal to be consumed with no waiter returning. by using the new
masked cancellation state, it's possible to refuse to act on the
cancellation request and instead leave it pending.
to ease review and understanding of the changes made, this commit
leaves the unwait function, which was previously the cancellation
cleanup handler, in place. additional simplifications could be made by
removing it.
| Rich Felker | 2015-02-22 | 1 | -5/+38 |
| * | add new masked cancellation mode•••this is a new extension which is presently intended only for
experimental and internal libc use. interface and behavior details may
change subject to feedback and experience from using it internally.
the basic concept for the new PTHREAD_CANCEL_MASKED state is that the
first cancellation point to observe the cancellation request fails
with an errno value of ECANCELED rather than acting on cancellation,
allowing the caller to process the status and choose whether/how to
act upon it.
| Rich Felker | 2015-02-21 | 2 | -10/+16 |
| * | prepare cancellation syscall asm for possibility of __cancel returning | Rich Felker | 2015-02-20 | 5 | -11/+32 |
| * | make pthread_exit responsible for disabling cancellation•••this requirement is tucked away in XSH 2.9.5 Thread Cancellation under
the heading Thread Cancellation Cleanup Handlers.
| Rich Felker | 2015-02-16 | 2 | -3/+2 |
| * | use the internal macro name FUTEX_PRIVATE in __wait•••the name was recently added for the setxid/synccall rework,
so use the name now that we have it.
| Szabolcs Nagy | 2015-02-09 | 1 | -1/+1 |
| * | fix missing memory barrier in cancellation signal handler•••in practice this was probably a non-issue, because the necessary
barrier almost certainly exists in kernel space -- implementing signal
delivery without such a barrier seems impossible -- but for the sake
of correctness, it should be done here too.
in principle, without a barrier, it is possible that the thread to be
cancelled does not see the store of its cancellation flag performed by
another thread. this affects both the case where the signal arrives
before entering the critical program counter range from __cp_begin to
__cp_end (in which case both the signal handler and the inline check
fail to see the value which was already stored) and the case where the
signal arrives during the critical range (in which case the signal
handler should be responsible for cancellation, but when it does not
see the cancellation flag, it assumes the signal is spurious and
refuses to act on it).
in the fix, the barrier is placed only in the signal handler, not in
the inline check at the beginning of the critical program counter
range. if the signal handler runs before the critical range is
entered, it will of course take no action, but its barrier will ensure
that the inline check subsequently sees the store. if on the other
hand the inline check runs first, it may miss seeing the store, but
the subsequent signal handler in the critical range will act upon the
cancellation request. this strategy avoids adding a memory barrier in
the common, non-cancellation code path.
| Rich Felker | 2015-02-03 | 1 | -0/+1 |
| * | overhaul __synccall and fix AS-safety and other issues in set*id•••multi-threaded set*id and setrlimit use the internal __synccall
function to work around the kernel's wrongful treatment of these
process properties as thread-local. the old implementation of
__synccall failed to be AS-safe, despite POSIX requiring setuid and
setgid to be AS-safe, and was not rigorous in assuring that all
threads were caught. in a worst case, threads late in the process of
exiting could retain permissions after setuid reported success, in
which case attacks to regain dropped permissions may have been
possible under the right conditions.
the new implementation of __synccall depends on the presence of
/proc/self/task and will fail if it can't be opened, but is able to
determine that it has caught all threads, and does not use any locks
except its own. it thereby achieves AS-safety simply by blocking
signals to preclude re-entry in the same thread.
with this commit, all known conformance and safety issues in set*id
functions should be fixed.
| Rich Felker | 2015-01-15 | 2 | -44/+137 |
| * | suppress EINTR in sem_wait and sem_timedwait•••per POSIX, the EINTR condition is an optional error for these
functions, not a mandatory one. since old kernels (pre-2.6.22) failed
to honor SA_RESTART for the futex syscall, it's dangerous to trust
EINTR from the kernel. thankfully POSIX offers an easy way out.
| Rich Felker | 2015-01-15 | 1 | -1/+1 |
| * | fix __aeabi_read_tp oversight in arm atomics/tls overhaul•••calls to __aeabi_read_tp may be generated by the compiler to access
TLS on pre-v6 targets. previously, this function was hard-coded to
call the kuser helper, which would crash on kernels with kuser helper
removed.
to fix the problem most efficiently, the definition of __aeabi_read_tp
is moved so that it's an alias for the new __a_gettp. however, on v7+
targets, code to initialize the runtime choice of thread-pointer
loading code is not even compiled, meaning that defining
__aeabi_read_tp would have caused an immediate crash due to using the
default implementation of __a_gettp with a HCF instruction.
fortunately there is an elegant solution which reduces overall code
size: putting the native thread-pointer loading instruction in the
default code path for __a_gettp, so that separate default/native code
paths are not needed. this function should never be called before
__set_thread_area anyway, and if it is called early on pre-v6
hardware, the old behavior (crashing) is maintained.
ideally __aeabi_read_tp would not be called at all on v7+ targets
anyway -- in fact, prior to the overhaul, the same problem existed,
but it was never caught by users building for v7+ with kuser disabled.
however, it's possible for calls to __aeabi_read_tp to end up in a v7+
binary if some of the object files were built for pre-v7 targets, e.g.
in the case of static libraries that were built separately, so this
case needs to be handled.
| Rich Felker | 2014-11-22 | 1 | -4/+0 |
| * | overhaul ARM atomics/tls for performance and compatibility•••previously, builds for pre-armv6 targets hard-coded use of the "kuser
helper" system for atomics and thread-pointer access, resulting in
binaries that fail to run (crash) on systems where this functionality
has been disabled (as a security/hardening measure) in the kernel.
additionally, builds for armv6 hard-coded an outdated/deprecated
memory barrier instruction which may require emulation (extremely
slow) on future models.
this overhaul replaces the behavior for all pre-armv7 builds (both of
the above cases) to perform runtime detection of the appropriate
mechanisms for barrier, atomic compare-and-swap, and thread pointer
access. detection is based on information provided by the kernel in
auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture
version encoded in AT_PLATFORM. direct use of the instructions is
preferred when possible, since probing for the existence of the kuser
helper page would be difficult and would incur runtime cost.
for builds targeting armv7 or later, the runtime detection code is not
compiled at all, and much more efficient versions of the non-cas
atomic operations are provided by using ldrex/strex directly rather
than wrapping cas.
| Rich Felker | 2014-11-19 | 1 | -12/+1 |
| * | manually "shrink wrap" fast path in pthread_once•••this change is a workaround for the inability of current compilers to
perform "shrink wrapping" optimizations. in casual testing, it roughly
doubled the performance of pthread_once when called on an
already-finished once control object.
| Rich Felker | 2014-10-20 | 1 | -8/+12 |
| * | eliminate global waiters count in pthread_once | Rich Felker | 2014-10-13 | 1 | -9/+13 |
| * | fix missing barrier in pthread_once/call_once shortcut path•••these functions need to be fast when the init routine has already run,
since they may be called very often from code which depends on global
initialization having taken place. as such, a fast path bypassing
atomic cas on the once control object was used to avoid heavy memory
contention. however, on archs with weakly ordered memory, the fast
path failed to ensure that the caller actually observes the side
effects of the init routine.
preliminary performance testing showed that simply removing the fast
path was not practical; a performance drop of roughly 85x was observed
with 20 threads hammering the same once control on a 24-core machine.
so the new explicit barrier operation from atomic.h is used to retain
the fast path while ensuring memory visibility.
performance may be reduced on some archs where the barrier actually
makes a difference, but the previous behavior was unsafe and incorrect
on these archs. future improvements to the implementation of a_barrier
should reduce the impact.
| Rich Felker | 2014-10-10 | 1 | -2/+6 |
| * | add C11 thread creation and related thread functions•••based on patch by Jens Gustedt.
the main difficulty here is handling the difference between start
function signatures and thread return types for C11 threads versus
POSIX threads. pointers to void are assumed to be able to represent
faithfully all values of int. the function pointer for the thread
start function is cast to an incorrect type for passing through
pthread_create, but is cast back to its correct type before calling so
that the behavior of the call is well-defined.
changes to the existing threads implementation were kept minimal to
reduce the risk of regressions, and duplication of code that carries
implementation-specific assumptions was avoided for ease and safety of
future maintenance.
| Rich Felker | 2014-09-07 | 9 | -7/+82 |
| * | add C11 condition variable functions•••Because of the clear separation for private pthread_cond_t these
interfaces are quite simple and direct.
| Jens Gustedt | 2014-09-06 | 6 | -0/+57 |
| * | add C11 mutex functions | Jens Gustedt | 2014-09-06 | 6 | -0/+69 |
| * | add C11 thread functions operating on tss_t and once_flag•••These all have POSIX equivalents, but aside from tss_get, they all
have minor changes to the signature or return value and thus need to
exist as separate functions.
| Jens Gustedt | 2014-09-06 | 5 | -0/+42 |
| * | use weak symbols for the POSIX functions that will be used by C threads•••The intent of this is to avoid name space pollution of the C threads
implementation.
This has two sides to it. First we have to provide symbols that wouldn't
pollute the name space for the C threads implementation. Second we have
to clean up some internal uses of POSIX functions such that they don't
implicitly drag in such symbols.
| Jens Gustedt | 2014-09-06 | 14 | -28/+73 |
| * | make non-waiting paths of sem_[timed]wait and pthread_join cancelable•••per POSIX these functions are both cancellation points, so they must
act on any cancellation request which is pending prior to the call.
previously, only the code path where actual waiting took place could
act on cancellation.
| Rich Felker | 2014-09-05 | 2 | -0/+3 |
| * | refrain from spinning on locks when there is already a waiter•••if there is already a waiter for a lock, spinning on the lock is
essentially an attempt to steal it from whichever waiter would obtain
it via any priority rules in place, and is therefore undesirable. in
the current implementation, there is always an inherent race window at
unlock during which a newly-arriving thread may steal the lock from
the existing waiters, but we should aim to keep this window minimal
rather than enlarging it.
| Rich Felker | 2014-08-25 | 5 | -5/+5 |
| * | spin before waiting on futex in mutex and rwlock lock operations | Rich Felker | 2014-08-25 | 3 | -0/+20 |
| * | spin in sem_[timed]wait before performing futex wait•••empirically, this increases the maximum rate of wait/post operations
between two threads by 20-150 times on machines I tested, including
x86 and arm. conceptually, it makes sense to do some spinning because
semaphores are intended to be usable as a notification mechanism
between threads, not just as locks, and low-latency notification is a
valuable property to have.
| Rich Felker | 2014-08-25 | 1 | -0/+5 |
| * | sanitize number of spins in userspace before futex wait•••the previous spin limit of 10000 was utterly unreasonable.
empirically, it could consume up to 200000 cycles, whereas a failed
futex wait (EAGAIN) typically takes 1000 cycles or less, and even a
true wait/wake round seems much less expensive.
the new counts (100 for general wait, 200 in barrier) were simply
chosen to be in the range of what's reasonable without having adverse
effects on casual micro-benchmark tests I have been running. they may
still be too high, from a standpoint of not wasting cpu cycles, but at
least they're a lot better than before. rigorous testing across
different archs and cpu models should be performed at some point to
determine whether further adjustments should be made.
| Rich Felker | 2014-08-25 | 2 | -2/+2 |
| * | fix false ownership of stdio FILEs due to tid reuse•••this is analogous commit fffc5cda10e0c5c910b40f7be0d4fa4e15bb3f48
which fixed the corresponding issue for mutexes.
the robust list can't be used here because the locks do not share a
common layout with mutexes. at some point it may make sense to simply
incorporate a mutex object into the FILE structure and use it, but
that would be a much more invasive change, and it doesn't mesh well
with the current design that uses a simpler code path for internal
locking and pulls in the recursive-mutex-like code when the flockfile
API is used explicitly.
| Rich Felker | 2014-08-23 | 1 | -0/+2 |
| * | fix fallback checks for kernels without private futex support•••for unknown syscall commands, the kernel produces ENOSYS, not EINVAL.
| Rich Felker | 2014-08-22 | 4 | -4/+4 |
| * | fix use of uninitialized memory with application-provided thread stacks•••the subsequent code in pthread_create and the code which copies TLS
initialization images to the new thread's TLS space assume that the
memory provided to them is zero-initialized, which is true when it's
obtained by pthread_create using mmap. however, when the caller
provides a stack using pthread_attr_setstack, pthread_create cannot
make any assumptions about the contents. simply zero-filling the
relevant memory in this case is the simplest and safest fix.
| Rich Felker | 2014-08-22 | 1 | -0/+2 |
| * | further simplify and optimize new cond var•••the main idea of the changes made is to have waiters wait directly on
the "barrier" lock that was used to prevent them from making forward
progress too early rather than first waiting on the atomic state value
and then attempting to lock the barrier.
in addition, adjustments to the mutex waiter count are optimized.
previously, each waking waiter decremented the count (unless it was
the first) then immediately incremented it again for the next waiter
(unless it was the last). this was a roundabout was of achieving the
equivalent of incrementing it once for the first waiter and
decrementing it once for the last.
| Rich Felker | 2014-08-18 | 1 | -29/+21 |
| * | simplify and improve new cond var implementation•••previously, wake order could be unpredictable: if a waiter happened to
leave its futex wait on the state early, e.g. due to EAGAIN while
restarting after a signal handler, it could acquire the mutex out of
turn. handling this required ugly O(n) list walking in the unwait
function and accounting to remove waiters that already woke from the
list.
with the new changes, the "barrier" locks in each waiter node are only
unlocked in turn. in addition to simplifying the code, this seems to
improve performance slightly, probably by reducing the number of
accesses threads make to each other's stacks.
as an additional benefit, unrecoverable mutex re-locking errors
(mainly ENOTRECOVERABLE for robust mutexes) no longer need to be
handled with deadlock; they can be reported to the caller, since the
unlocking sequence makes it unnecessary to rely on the mutex to
synchronize access to the waiter list.
| Rich Felker | 2014-08-18 | 1 | -40/+22 |