summaryrefslogtreecommitdiffhomepage
path: root/src/chunklets/README-fastspin
diff options
context:
space:
mode:
authorMichael Smith <mikesmiffy128@gmail.com>2023-07-29 14:32:06 +0100
committerMichael Smith <mikesmiffy128@gmail.com>2023-08-02 21:02:31 +0100
commit9a0d8730fa977f666b5c12e4c5901e7d0391e245 (patch)
tree87eebcdcef04ae1e7348ef80e972c08aa4783649 /src/chunklets/README-fastspin
parentd337b09936ecd90bad07b28b48b7103395d97ce5 (diff)
Make various preparations for upcoming features
A lot of this is random WIP from a while back, at least a month ago, and is being committed now to get it out of the way so that other patches can be brought in and integrated against it without causing headaches. Also rolled into this commit is a way to distinguish plugin_unload from exiting the game. This is required for another soon-to-be-integrated feature to avoid crashing on exit, and could in theory also be used to speed up unloading on exit in future. While we're at it, this also avoids the need to linearly scan through the plugin list to do the old branch unloading fix, because we can. Rough summary of the other smaller stuff I can remember doing: - Rework bitbuf a bit - Add some cryptographic nonsense in ac.c (not final at all) - Introduce the first couple of "chunklets" libraries as a sort-of subproject of this one - Tidy up random small bits and bobs - Add source for a small keypair generation tool - Rework democustom to be very marginally more useful
Diffstat (limited to 'src/chunklets/README-fastspin')
-rw-r--r--src/chunklets/README-fastspin109
1 files changed, 109 insertions, 0 deletions
diff --git a/src/chunklets/README-fastspin b/src/chunklets/README-fastspin
new file mode 100644
index 0000000..8052415
--- /dev/null
+++ b/src/chunklets/README-fastspin
@@ -0,0 +1,109 @@
+fastspin.{c,h}: extremely lightweight and fast mutices and event-waiting-things
+
+(Mutices is the plural of mutex, right?)
+
+== Compiling ==
+
+ gcc -c -O2 [-flto] fastspin.c
+ clang -c -O2 [-flto] fastspin.c
+ tcc -c fastspin.c
+ cl.exe /c /O2 /std:c17 /experimental:c11atomics fastspin.c
+
+In most cases you can just drop the .c file straight into your codebase/build
+system. LTO is advised to avoid dead code and enable more efficient calls
+including potential inlining.
+
+NOTE: On Windows, it is necessary to link with ntdll.lib.
+
+== Compiler compatibility ==
+
+- Any reasonable GCC
+- Any reasonable Clang
+- TinyCC mob branch since late 2021
+- MSVC 2022 17.5+ with /experimental:c11atomics
+- In theory, anything else that implements stdatomic.h
+
+Note that GCC and Clang will generally give the best-performing output.
+
+Once the .c file is built, the public header can be consumed by virtually any C
+or C++ compiler, as well as probably most half-decent FFIs.
+
+Note that the .c source file is not C++-compatible, only the header is. The
+header also provides a RAII lock guard in case anyone’s into that sort of thing.
+
+== API usage ==
+
+See documentation comments in fastspin.h for a basic idea. Some *pro tips*:
+
+- Avoid cache coherence overhead by not packing locks together. Ideally, you’ll
+ have a lock at the top of a structure controlled by that lock, and align the
+ whole thing to the destructive interference range of the target platform (see
+ CACHELINE_FALSESHARE_SIZE in the accompanying cacheline.h).
+
+- Avoid putting more than one lock in a cache line. Ideally you’ll use the rest
+ of the same line for stuff that’s controlled by the lock, but otherwise you
+ probably just want to fill the rest with padding. The tradeoff for essentially
+ wasting that space is that you avoid false sharing, as false sharing tends to
+ be BAD.
+
+- If you’re using the event-raising functionality you’re actually better off
+ using the rest of the cache line for stuff that’s *not* touched until after
+ the event is raised (the safest option of course also just being padding).
+
+- You should actually measure this stuff, I dunno man.
+
+Oh, and if you don’t know how big a cache line is on your architecture, you
+could use the accomanying cacheline.h to get some reasonable guesses. Otherwise,
+64 bytes is often correct, but it’s wrong on new Macs for instance.
+
+== OS compatibility ==
+
+First-class:
+- Linux 2.6+ (glibc or musl)
+- FreeBSD 11+
+- OpenBSD 6.2+
+- NetBSD ~9.1+
+- DragonFly 1.1+
+- Windows 8+ (only tested on 10+)
+- macOS/Darwin since ~2016(?) (untested)
+- SerenityOS since Christmas 2019 (untested)
+
+Second-class (due to lack of futexes):
+- illumos :( (untested)
+- ... others?
+
+* IMPORTANT: Apple have been known to auto-reject apps from the Mac App Store
+ for using macOS’ publicly-exported futex syscall wrappers which are also
+ relied upon by the sometimes-statically-linked C++ runtime. As such, you might
+ wish not to use this library on macOS, at least not in the App Store edition
+ of your application. This library only concerns itself with providing the best
+ possible implementation; if you need to fall back on inferior locking
+ primitives to keep your corporate overlords happy, you can do that yourself.
+
+== Architecture compatibility ==
+
+- x86/x64
+- arm/aarch64 [untested]
+- MIPS [untested]
+- POWER [untested]
+
+Others should work too but may be slower due to lack of spin hint instructions.
+Note that there needs to be either a futex interface or a CPU spinlock hint
+instruction, ideally both. Otherwise performance will be simply no good during
+contention. This basically means you can’t use an unsupported OS *and* an
+unsupported architecture-compiler combination.
+
+== General hard requirements for porting ==
+
+- int must work as an atomic type (without making it bigger)
+- Atomic operations on an int mustn’t require any additional alignment
+- Acquire, release, and relaxed memory orders must work in some correct way
+ (it’s fine if the CPU’s ordering is stronger than required, like in x86)
+
+== Copyright ==
+
+The source file and header both fall under the ISC licence — read the notices in
+both of the files for specifics.
+
+Thanks, and have fun!
+- Michael Smith <mikesmiffy128@gmail.com>