grovel - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Expand)	Author	Age	Files	Lines
*	cleanup use of visibility attributes in pthread_cancel.c•••applying the attribute to a weak_alias macro was a hack. instead use a separate declaration to apply the visibility, and consolidate declarations together to avoid having visibility mess all over the file.	Rich Felker	2015-04-14	1	-8/+9
*	fix inconsistent visibility for internal syscall symbols	Rich Felker	2015-04-14	1	-0/+5
*	consistently use hidden visibility for cancellable syscall internals•••in a few places, non-hidden symbols were referenced from asm in ways that assumed ld-time binding. while these is no semantic reason these symbols need to be hidden, fixing the references without making them hidden was going to be ugly, and hidden reduces some bloat anyway. in the asm files, .global/.hidden directives have been moved to the top to unclutter the actual code.	Rich Felker	2015-04-14	11	-30/+96
*	fix inconsistent visibility for internal __tls_get_new function•••at the point of call it was declared hidden, but the definition was not hidden. for some toolchains this inconsistency produced textrels without ld-time binding.	Rich Felker	2015-04-14	1	-3/+2
*	remove remnants of support for running in no-thread-pointer mode•••since 1.1.0, musl has nominally required a thread pointer to be setup. most of the remaining code that was checking for its availability was doing so for the sake of being usable by the dynamic linker. as of commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer necessary; the thread pointer is now valid before any libc code (outside of dynamic linker bootstrap functions) runs. this commit essentially concludes "phase 3" of the "transition path for removing lazy init of thread pointer" project that began during the 1.1.0 release cycle.	Rich Felker	2015-04-13	4	-11/+5
*	allow i386 __set_thread_area to be called more than once•••previously a new GDT slot was requested, even if one had already been obtained by a previous call. instead extract the old slot number from GS and reuse it if it was already set. the formula (GS-3)/8 for the slot number automatically yields -1 (request for new slot) if GS is zero (unset).	Rich Felker	2015-04-13	1	-1/+5
*	remove mismatched arguments from vmlock function definitions•••commit f08ab9e61a147630497198fe3239149275c0a3f4 introduced these accidentally as remnants of some work I tried that did not work out.	Rich Felker	2015-04-11	1	-2/+2
*	apply vmlock wait to __unmapself in pthread_exit	Rich Felker	2015-04-10	1	-0/+4
*	redesign and simplify vmlock system•••this global lock allows certain unlock-type primitives to exclude mmap/munmap operations which could change the identity of virtual addresses while references to them still exist. the original design mistakenly assumed mmap/munmap would conversely need to exclude the same operations which exclude mmap/munmap, so the vmlock was implemented as a sort of 'symmetric recursive rwlock'. this turned out to be unnecessary. commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the interval during which mmap/munmap held their side of the lock, but left the inappropriate lock design and some inefficiency. the new design uses a separate function, __vm_wait, which does not hold any lock itself and only waits for lock users which were already present when it was called to release the lock. this is sufficient because of the way operations that need to be excluded are sequenced: the "unlock-type" operations using the vmlock need only block mmap/munmap operations that are precipitated by (and thus sequenced after) the atomic-unlock they perform while holding the vmlock. this allows for a spectacular lack of synchronization in the __vm_wait function itself.	Rich Felker	2015-04-10	5	-30/+18
*	optimize out setting up robust list with kernel when not needed•••as a result of commit 12e1e324683a1d381b7f15dd36c99b37dd44d940, kernel processing of the robust list is only needed for process-shared mutexes. previously the first attempt to lock any owner-tracked mutex resulted in robust list initialization and a set_robust_list syscall. this is no longer necessary, and since the kernel's record of the robust list must now be cleared at thread exit time for detached threads, optimizing it out is more worthwhile than before too.	Rich Felker	2015-04-10	2	-6/+5
*	process robust list in pthread_exit to fix detached thread use-after-unmap•••the robust list head lies in the thread structure, which is unmapped before exit for detached threads. this leaves the kernel unable to process the exiting thread's robust list, and with a dangling pointer which may happen to point to new unrelated data at the time the kernel processes it. userspace processing of the robust list was already needed for non-pshared robust mutexes in order to perform private futex wakes rather than the shared ones the kernel would do, but it was conditional on linking pthread_mutexattr_setrobust and did not bother processing the pshared mutexes in the list, which requires additional logic for the robust list pending slot in case pthread_exit is interrupted by asynchronous process termination. the new robust list processing code is linked unconditionally (inlined in pthread_exit), handles both private and shared mutexes, and also removes the kernel's reference to the robust list before unmapping and exit if the exiting thread is detached.	Rich Felker	2015-04-10	2	-26/+27
*	block all signals (even internal ones) in cancellation signal handler•••previously the implementation-internal signal used for multithreaded set*id operations was left unblocked during handling of the cancellation signal. however, on some archs, signal contexts are huge (up to 5k) and the possibility of nested signal handlers drastically increases the minimum stack requirement. since the cancellation signal handler will do its job and return in bounded time before possibly passing execution to application code, there is no need to allow other signals to interrupt it.	Rich Felker	2015-03-16	1	-1/+2
*	add aarch64 port•••This adds complete aarch64 target support including bigendian subarch. Some of the long double math functions are known to be broken otherwise interfaces should be fully functional, but at this point consider this port experimental. Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.	Szabolcs Nagy	2015-03-11	4	-0/+69
*	fix regression in pthread_cond_wait with cancellation disabled•••due to a logic error in the use of masked cancellation mode, pthread_cond_wait did not honor PTHREAD_CANCEL_DISABLE but instead failed with ECANCELED when cancellation was pending.	Rich Felker	2015-03-07	1	-0/+1
*	fix signed left-shift overflow in pthread_condattr_setpshared	Rich Felker	2015-03-04	1	-1/+1
*	make all objects used with atomic operations volatile•••the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,p,f(p)); where the latter is intended to show multiple loads of p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.	Rich Felker	2015-03-03	9	-16/+18
*	suppress masked cancellation in pthread_join•••like close, pthread_join is a resource-deallocation function which is also a cancellation point. the intent of masked cancellation mode is to exempt such functions from failure with ECANCELED.	Rich Felker	2015-03-02	1	-1/+5
*	fix namespace issue in pthread_join affecting thrd_join•••pthread_testcancel is not in the ISO C reserved namespace and thus cannot be used here. use the namespace-protected version of the function instead.	Rich Felker	2015-03-02	1	-1/+2
*	factor cancellation cleanup push/pop out of futex __timedwait function•••previously, the __timedwait function was optionally a cancellation point depending on whether it was passed a pointer to a cleaup function and context to register. as of now, only one caller actually used such a cleanup function (and it may face removal soon); most callers either passed a null pointer to disable cancellation or a dummy cleanup function. now, __timedwait is never a cancellation point, and __timedwait_cp is the cancellable version. this makes the intent of the calling code more obvious and avoids ugly dummy functions and long argument lists.	Rich Felker	2015-03-02	7	-24/+21
*	fix failure of internal futex __timedwait to report ECANCELED•••as part of abstracting the futex wait, this function suppresses all futex error values which callers should not see using a whitelist approach. when the masked cancellation mode was added, the new ECANCELED error was not whitelisted. this omission caused the new pthread_cond_wait code using masked cancellation to exhibit a spurious wake (rather than acting on cancellation) when the request arrived after blocking on the cond var.	Rich Felker	2015-02-27	1	-1/+1
*	fix breakage in pthread_cond_wait due to typo•••due to accidental use of = instead of ==, the error code was always set to zero in the signaled wake case for non-shared cv waits. suppressing ETIMEDOUT (the only possible wait error) is harmless and actually permitted in this case, but suppressing mutex errors could give the caller false information about the state of the mutex. commit 8741ffe625363a553e8f509dc3ca7b071bdbab47 introduced this regression and commit d9da1fb8c592469431c764732d09f7756340190e preserved it when reorganizing the code.	Rich Felker	2015-02-23	1	-1/+1
*	simplify cond var code now that cleanup handler is not needed	Rich Felker	2015-02-22	1	-86/+63
*	fix pthread_cond_wait cancellation race•••it's possible that signaling a waiter races with cancellation of that same waiter. previously, cancellation was acted upon, causing the signal to be consumed with no waiter returning. by using the new masked cancellation state, it's possible to refuse to act on the cancellation request and instead leave it pending. to ease review and understanding of the changes made, this commit leaves the unwait function, which was previously the cancellation cleanup handler, in place. additional simplifications could be made by removing it.	Rich Felker	2015-02-22	1	-5/+38
*	add new masked cancellation mode•••this is a new extension which is presently intended only for experimental and internal libc use. interface and behavior details may change subject to feedback and experience from using it internally. the basic concept for the new PTHREAD_CANCEL_MASKED state is that the first cancellation point to observe the cancellation request fails with an errno value of ECANCELED rather than acting on cancellation, allowing the caller to process the status and choose whether/how to act upon it.	Rich Felker	2015-02-21	2	-10/+16
*	prepare cancellation syscall asm for possibility of __cancel returning	Rich Felker	2015-02-20	5	-11/+32
*	make pthread_exit responsible for disabling cancellation•••this requirement is tucked away in XSH 2.9.5 Thread Cancellation under the heading Thread Cancellation Cleanup Handlers.	Rich Felker	2015-02-16	2	-3/+2
*	use the internal macro name FUTEX_PRIVATE in __wait•••the name was recently added for the setxid/synccall rework, so use the name now that we have it.	Szabolcs Nagy	2015-02-09	1	-1/+1
*	fix missing memory barrier in cancellation signal handler•••in practice this was probably a non-issue, because the necessary barrier almost certainly exists in kernel space -- implementing signal delivery without such a barrier seems impossible -- but for the sake of correctness, it should be done here too. in principle, without a barrier, it is possible that the thread to be cancelled does not see the store of its cancellation flag performed by another thread. this affects both the case where the signal arrives before entering the critical program counter range from __cp_begin to __cp_end (in which case both the signal handler and the inline check fail to see the value which was already stored) and the case where the signal arrives during the critical range (in which case the signal handler should be responsible for cancellation, but when it does not see the cancellation flag, it assumes the signal is spurious and refuses to act on it). in the fix, the barrier is placed only in the signal handler, not in the inline check at the beginning of the critical program counter range. if the signal handler runs before the critical range is entered, it will of course take no action, but its barrier will ensure that the inline check subsequently sees the store. if on the other hand the inline check runs first, it may miss seeing the store, but the subsequent signal handler in the critical range will act upon the cancellation request. this strategy avoids adding a memory barrier in the common, non-cancellation code path.	Rich Felker	2015-02-03	1	-0/+1
*	overhaul __synccall and fix AS-safety and other issues in setid•••multi-threaded setid and setrlimit use the internal __synccall function to work around the kernel's wrongful treatment of these process properties as thread-local. the old implementation of __synccall failed to be AS-safe, despite POSIX requiring setuid and setgid to be AS-safe, and was not rigorous in assuring that all threads were caught. in a worst case, threads late in the process of exiting could retain permissions after setuid reported success, in which case attacks to regain dropped permissions may have been possible under the right conditions. the new implementation of __synccall depends on the presence of /proc/self/task and will fail if it can't be opened, but is able to determine that it has caught all threads, and does not use any locks except its own. it thereby achieves AS-safety simply by blocking signals to preclude re-entry in the same thread. with this commit, all known conformance and safety issues in set*id functions should be fixed.	Rich Felker	2015-01-15	2	-44/+137
*	suppress EINTR in sem_wait and sem_timedwait•••per POSIX, the EINTR condition is an optional error for these functions, not a mandatory one. since old kernels (pre-2.6.22) failed to honor SA_RESTART for the futex syscall, it's dangerous to trust EINTR from the kernel. thankfully POSIX offers an easy way out.	Rich Felker	2015-01-15	1	-1/+1
*	fix __aeabi_read_tp oversight in arm atomics/tls overhaul•••calls to __aeabi_read_tp may be generated by the compiler to access TLS on pre-v6 targets. previously, this function was hard-coded to call the kuser helper, which would crash on kernels with kuser helper removed. to fix the problem most efficiently, the definition of __aeabi_read_tp is moved so that it's an alias for the new __a_gettp. however, on v7+ targets, code to initialize the runtime choice of thread-pointer loading code is not even compiled, meaning that defining __aeabi_read_tp would have caused an immediate crash due to using the default implementation of __a_gettp with a HCF instruction. fortunately there is an elegant solution which reduces overall code size: putting the native thread-pointer loading instruction in the default code path for __a_gettp, so that separate default/native code paths are not needed. this function should never be called before __set_thread_area anyway, and if it is called early on pre-v6 hardware, the old behavior (crashing) is maintained. ideally __aeabi_read_tp would not be called at all on v7+ targets anyway -- in fact, prior to the overhaul, the same problem existed, but it was never caught by users building for v7+ with kuser disabled. however, it's possible for calls to __aeabi_read_tp to end up in a v7+ binary if some of the object files were built for pre-v7 targets, e.g. in the case of static libraries that were built separately, so this case needs to be handled.	Rich Felker	2014-11-22	1	-4/+0
*	overhaul ARM atomics/tls for performance and compatibility•••previously, builds for pre-armv6 targets hard-coded use of the "kuser helper" system for atomics and thread-pointer access, resulting in binaries that fail to run (crash) on systems where this functionality has been disabled (as a security/hardening measure) in the kernel. additionally, builds for armv6 hard-coded an outdated/deprecated memory barrier instruction which may require emulation (extremely slow) on future models. this overhaul replaces the behavior for all pre-armv7 builds (both of the above cases) to perform runtime detection of the appropriate mechanisms for barrier, atomic compare-and-swap, and thread pointer access. detection is based on information provided by the kernel in auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture version encoded in AT_PLATFORM. direct use of the instructions is preferred when possible, since probing for the existence of the kuser helper page would be difficult and would incur runtime cost. for builds targeting armv7 or later, the runtime detection code is not compiled at all, and much more efficient versions of the non-cas atomic operations are provided by using ldrex/strex directly rather than wrapping cas.	Rich Felker	2014-11-19	1	-12/+1
*	manually "shrink wrap" fast path in pthread_once•••this change is a workaround for the inability of current compilers to perform "shrink wrapping" optimizations. in casual testing, it roughly doubled the performance of pthread_once when called on an already-finished once control object.	Rich Felker	2014-10-20	1	-8/+12
*	eliminate global waiters count in pthread_once	Rich Felker	2014-10-13	1	-9/+13
*	fix missing barrier in pthread_once/call_once shortcut path•••these functions need to be fast when the init routine has already run, since they may be called very often from code which depends on global initialization having taken place. as such, a fast path bypassing atomic cas on the once control object was used to avoid heavy memory contention. however, on archs with weakly ordered memory, the fast path failed to ensure that the caller actually observes the side effects of the init routine. preliminary performance testing showed that simply removing the fast path was not practical; a performance drop of roughly 85x was observed with 20 threads hammering the same once control on a 24-core machine. so the new explicit barrier operation from atomic.h is used to retain the fast path while ensuring memory visibility. performance may be reduced on some archs where the barrier actually makes a difference, but the previous behavior was unsafe and incorrect on these archs. future improvements to the implementation of a_barrier should reduce the impact.	Rich Felker	2014-10-10	1	-2/+6
*	add C11 thread creation and related thread functions•••based on patch by Jens Gustedt. the main difficulty here is handling the difference between start function signatures and thread return types for C11 threads versus POSIX threads. pointers to void are assumed to be able to represent faithfully all values of int. the function pointer for the thread start function is cast to an incorrect type for passing through pthread_create, but is cast back to its correct type before calling so that the behavior of the call is well-defined. changes to the existing threads implementation were kept minimal to reduce the risk of regressions, and duplication of code that carries implementation-specific assumptions was avoided for ease and safety of future maintenance.	Rich Felker	2014-09-07	9	-7/+82
*	add C11 condition variable functions•••Because of the clear separation for private pthread_cond_t these interfaces are quite simple and direct.	Jens Gustedt	2014-09-06	6	-0/+57
*	add C11 mutex functions	Jens Gustedt	2014-09-06	6	-0/+69
*	add C11 thread functions operating on tss_t and once_flag•••These all have POSIX equivalents, but aside from tss_get, they all have minor changes to the signature or return value and thus need to exist as separate functions.	Jens Gustedt	2014-09-06	5	-0/+42
*	use weak symbols for the POSIX functions that will be used by C threads•••The intent of this is to avoid name space pollution of the C threads implementation. This has two sides to it. First we have to provide symbols that wouldn't pollute the name space for the C threads implementation. Second we have to clean up some internal uses of POSIX functions such that they don't implicitly drag in such symbols.	Jens Gustedt	2014-09-06	14	-28/+73
*	make non-waiting paths of sem_[timed]wait and pthread_join cancelable•••per POSIX these functions are both cancellation points, so they must act on any cancellation request which is pending prior to the call. previously, only the code path where actual waiting took place could act on cancellation.	Rich Felker	2014-09-05	2	-0/+3
*	refrain from spinning on locks when there is already a waiter•••if there is already a waiter for a lock, spinning on the lock is essentially an attempt to steal it from whichever waiter would obtain it via any priority rules in place, and is therefore undesirable. in the current implementation, there is always an inherent race window at unlock during which a newly-arriving thread may steal the lock from the existing waiters, but we should aim to keep this window minimal rather than enlarging it.	Rich Felker	2014-08-25	5	-5/+5
*	spin before waiting on futex in mutex and rwlock lock operations	Rich Felker	2014-08-25	3	-0/+20
*	spin in sem_[timed]wait before performing futex wait•••empirically, this increases the maximum rate of wait/post operations between two threads by 20-150 times on machines I tested, including x86 and arm. conceptually, it makes sense to do some spinning because semaphores are intended to be usable as a notification mechanism between threads, not just as locks, and low-latency notification is a valuable property to have.	Rich Felker	2014-08-25	1	-0/+5
*	sanitize number of spins in userspace before futex wait•••the previous spin limit of 10000 was utterly unreasonable. empirically, it could consume up to 200000 cycles, whereas a failed futex wait (EAGAIN) typically takes 1000 cycles or less, and even a true wait/wake round seems much less expensive. the new counts (100 for general wait, 200 in barrier) were simply chosen to be in the range of what's reasonable without having adverse effects on casual micro-benchmark tests I have been running. they may still be too high, from a standpoint of not wasting cpu cycles, but at least they're a lot better than before. rigorous testing across different archs and cpu models should be performed at some point to determine whether further adjustments should be made.	Rich Felker	2014-08-25	2	-2/+2
*	fix false ownership of stdio FILEs due to tid reuse•••this is analogous commit fffc5cda10e0c5c910b40f7be0d4fa4e15bb3f48 which fixed the corresponding issue for mutexes. the robust list can't be used here because the locks do not share a common layout with mutexes. at some point it may make sense to simply incorporate a mutex object into the FILE structure and use it, but that would be a much more invasive change, and it doesn't mesh well with the current design that uses a simpler code path for internal locking and pulls in the recursive-mutex-like code when the flockfile API is used explicitly.	Rich Felker	2014-08-23	1	-0/+2
*	fix fallback checks for kernels without private futex support•••for unknown syscall commands, the kernel produces ENOSYS, not EINVAL.	Rich Felker	2014-08-22	4	-4/+4
*	fix use of uninitialized memory with application-provided thread stacks•••the subsequent code in pthread_create and the code which copies TLS initialization images to the new thread's TLS space assume that the memory provided to them is zero-initialized, which is true when it's obtained by pthread_create using mmap. however, when the caller provides a stack using pthread_attr_setstack, pthread_create cannot make any assumptions about the contents. simply zero-filling the relevant memory in this case is the simplest and safest fix.	Rich Felker	2014-08-22	1	-0/+2
*	further simplify and optimize new cond var•••the main idea of the changes made is to have waiters wait directly on the "barrier" lock that was used to prevent them from making forward progress too early rather than first waiting on the atomic state value and then attempting to lock the barrier. in addition, adjustments to the mutex waiter count are optimized. previously, each waking waiter decremented the count (unless it was the first) then immediately incremented it again for the next waiter (unless it was the last). this was a roundabout was of achieving the equivalent of incrementing it once for the first waiter and decrementing it once for the last.	Rich Felker	2014-08-18	1	-29/+21
*	simplify and improve new cond var implementation•••previously, wake order could be unpredictable: if a waiter happened to leave its futex wait on the state early, e.g. due to EAGAIN while restarting after a signal handler, it could acquire the mutex out of turn. handling this required ugly O(n) list walking in the unwait function and accounting to remove waiters that already woke from the list. with the new changes, the "barrier" locks in each waiter node are only unlocked in turn. in addition to simplifying the code, this seems to improve performance slightly, probably by reducing the number of accesses threads make to each other's stacks. as an additional benefit, unrecoverable mutex re-locking errors (mainly ENOTRECOVERABLE for robust mutexes) no longer need to be handled with deadlock; they can be reported to the caller, since the unlocking sequence makes it unnecessary to rely on the mutex to synchronize access to the waiter list.	Rich Felker	2014-08-18	1	-40/+22