Mysterious Files PH

Thursday, April 9, 2026

Printed Sleeve Gives Keys Some Grip

April 09, 2026 0
Printed Sleeve Gives Keys Some Grip

[Enginerd]’s chonky key handle is a beautiful use of 3D printing that helps people help themselves. The large wings, indented faces, and beefed-up grip make a typical house key much easier for someone with arthritis or difficulty gripping those brass slivers. Bright filaments in different colors can also help someone with vision limitations. The thing that will not improve is the space in your pocket or purse.

The design only requires a tiny bit of plastic, prints without supports, and what sets it apart from similar models is that you do not need any double-sided tape or bolts, only a keyring, so someone may have to assemble it for the user. The author is clever enough to use an uncut blank in the project photo so that no one will be decoding and copying their house key. We would wager they have read Hackaday if they are so prepared.

Some of the people who purchased early consumer 3D printers already need these kinds of builds, and there is no shortage of intelligent people creating remarkable open-source designs.


TurboQuant: Reducing LLM Memory Usage With Vector Quantization

April 09, 2026 0
TurboQuant: Reducing LLM Memory Usage With Vector Quantization

Large language models (LLMs) aren’t actually giant computer brains. Instead, they are effectively massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of parameters, times N bits per parameter, equals N-billion bits of storage required for a full model. Since increasing the number of parameters makes the models appear smarter, most effort on reducing the storage they require has been on reducing the size of the parameters themselves.

Vector quantization (VQ) is a new method that can compress the vectors calculated during inference to take up less space without significant loss of data. Google’s recently published pre-print paper on TurboQuant covers an LLM-oriented VQ algorithm that’s claimed to provide up to a 6x compression level with no negative impact on inference times.

The tokens aren’t directly encoded in the vector space, but their associated key value is, which along with the single token per inference process creates the need for a key-value (KV) cache, the size of which scales with the size of the model. Thus by compressing the KV cache using VQ, it will reduce its size and correspondingly speed up look-ups due to the smaller size in memory. One catch here is that VQ is due to the nature of quantization some accuracy will be lost. The trick here is thus to apply VQ in such a way that it does not affect this accuracy in a noticeable manner.

Other aspects that had to be taken into account by the TurboQuant algorithm was fast computation to keep up with real-time requirements, along with compatibility with so-called ‘AI accelerator’ hardware.

Key-Value Cache

A basic way to look at the KV cache in LLMs is that it caches the results of previous inference cycles. An in-depth explanation can for example be found in this article by Sebastian Raschka. In the case of generating a phrase of three words starting with the word ‘Time’, we can see the following repeated computations:

Repeated computations in an LLM without KV cache. (Credit: Sebastian Raschka)
Repeated computations in an LLM without KV cache. (Credit: Sebastian Raschka)

Considering that inference is rather expensive computation-wise, you really want to cache these calculated values. This provides a massive boost in performance and much lower CPU load, but because there’s no such thing as a free lunch the catch here is a rapidly increasing memory usage.

Correspondingly, we now have a big in-memory cache to manage, along with memory management routines to make sure that the KV cache doesn’t exceed its allocated memory pool:

KV cache schematic with memory pool management. (Credit: NVIDIA)
KV cache schematic with memory pool management. (Credit: NVIDIA)

As covered in a December 2025 NVIDIA Developer article, KV cache optimization has been a topic for a while, with the article in question covering NVFP4. This is a VQ approach that reduces the precision of the KV cache from 16-bit floating point to 4-bit (FP4). Meanwhile production systems already employ 8-bit quantization, also using a floating point format (FP8).

An additional cost here is that FP4 has to be dequantized back to FP8, which would seem to be an implementation detail in the current version. Compared to FP8 quantization, FP4 reduces latency by up to 3 times and halves the required memory required, while accuracy is negatively impacted by ‘less than’ 1% compared to FP8 due to quantization error.

Accuracy here is important as it factors into the next auto-complete step when the LLM’s probability vector space is once again rummaged through for the next statistically most likely follow-up token. KV cache VQ compression is thus always a trade-off between memory use and accuracy. In short, the same issues apply as with all implementations of quantization-based compression, including the tragic absence of any free lunch.

Turbo Quantization

So what magic did Google’s intrepid engineers pull off to improve on NVIDIA’s NVFP4 approach? The key is in how the quantization is performed, as it isn’t simple a matter of truncating or throwing away data, rounding up to the nearest available value. Instead a series of steps are applied that seek to minimize the quantization error, which in the case of TurboQuant is (confusingly) an algorithm called PolarQuant followed by the QJL (quantized Johnson-Lindenstrauss) algorithm.

Annoyingly for the non-mathematically gifted/educated among us, Google didn’t simply provide a straightforward visualization like that for NVFP4 that’s understandable even for us software developers and other casuals. For NVIDIA’s format we can see that it takes the form of a single sign bit, two exponents and one mantissa (E2M1), as well as a shared FP8 scale per block of 16 values.

One step where TurboQuant appears to be differ is in the PolarQuant algorithm, that applies a polar coordinates transformation to the vectors, following which a typical normalization can apparently be skipped.

Overview of recursive polar transformation procedure. (Credit: Insu Han et al., 2026)
Overview of recursive polar transformation procedure. (Credit: Insu Han et al., 2026)

This polar transformation is preceded by the application of a random projection matrix as a type of preconditioning that will affect later normal distribution, with proof and the full algorithm provided in the PolarQuant arXiv paper for those who desire more detail.

Of note is that PolarQuant employs the Johson-Lindenstrauss lemma, which Google researchers used as the basis for a JL-based transform called QJL. From reading the blog post it’s not immediately clear whether QJL is directly integrated into PolarQuant or an additional step, due to the muddled messaging on Google’s end. From the benchmarking results it does appear that QJL is an additional step.

What we know is that the final format that TurboQuant ends up with is three-bit value, which would logically be 1 bit smaller than NVFP4, or an approximate 25% smaller KV cache for the same amount of data.

Judging On Merits

Comparison and benchmark data in the Google blog post and associated papers do not provide direct comparisons with NVFP4, and the few numbers that are thrown out are rather inconsistent, or unspecified. Take the claim of ‘at least 6x smaller memory size’, for example. The blog text does not clearly specify what this is relative to, while it then tosses out a 4-bit TurboQuant number of 8x performance increase compared to FP32.

Although with some more digging and poking of the available data it might be possible to glean some actual performance information from the provided files, it’s rather vexing how vague Google’s messaging is kept. Not to mention the lack of direct benchmarking against what would be the biggest competitors in the space.

It is definitely true that VQ is a thing for LLM KV cache compression, as we have seen, and NVIDIA ‘accelerator cards’ provide hardware acceleration for this feature, so this is the reality that TurboQuant would have to compete with. Based on the few clear facts that we do have it doesn’t appear that it’s quite the revolution that the hype machine has made it out to be, with it likely being just a bump over NVFP4 that NVIDIA is likely to trump again with its next quantized format.

It will of course be most interesting to see how this will play out once TurboQuant makes its way out of the laboratory into the wider world and we start seeing independent benchmarking performed.


The Brits Made a Rocket. What Happened To It?

April 09, 2026 0

Like many long-established broadcasters, the BBC put out a selection of their archive material for us all to enjoy online. Their most recent may be of interest to Hackaday readers and has more than a bit of personal interest to your scribe, as it visits the Spadeadam rocket test range on the event of its closure in 1973. This marked the final chapter in the story of Blue Streak, the British intercontinental missile project that later became part of the first European space launcher.

It’s possible citizens of every country see their government as uniquely talented in the throwing away of taxpayer’s money, but the sad story here isn’t in Blue Streak itself which was obsolete as a missile by the time it was finished. Instead it lies in the closure of the test range as part of the ill-advised destruction of a nascent and successful space industry, just as it had made the UK the third nation to have successfully placed a satellite in orbit.

We normally write in the third person in our daily posts here at Hackaday, but for now there’s a rare switch into the first person. My dad spent a large part of the 1950s working as a technician for de Haviland Propellers, later part of Hawker Siddeley, and then British Aerospace. He was part of the team working on Blue Streak at Spadeadam and the other test site at RAF Westcott in Buckinghamshire, and we were brought up on hair-raising tales of near-disasters in the race to get British nukes flying. He’s not one of the guys in the video below, as by that time he was running his metalwork business in Oxfordshire, but I certainly recognise the feeling of lost potential they express. Chances are I’ll never visit what remains of the Spadeadam test stands in person as the site is now the UK’s electronic warfare test range, so the BBC film represents a rare chance for a closer look.

In a related story, the trackers for the same program in Australia were saved from the scrapheap.


Wednesday, April 8, 2026

Variable-Pitch Propellers for More Efficient Quadcopter

April 08, 2026 0

Quadcopters tend to have very poor efficency because of their high disk loading. High disk loading– that is, how much weight each square meter of area swept by the propellers must carry–is almost unavoidable with conventinal quadcopters, which are controlled by throttling the four props. Make the propellers too big, and their inertia slows down that control loop, leading to stability problems. [rctestflight] had an idea to solve this, by borrowing a technology from the world of fixed-wing aviation: variable-pitch propellers.

In aircraft use, they are not new, dating back to the end of the first world war. They’re made for everything from the largest turboprops to the  75 kW(100 HP) Rotax 912. By varying the propeller pitch, you can keep the engine turning in its ideal RPM range but still vary thrust by taking a larger or shallower ‘bite’ out of the air with each sweep of the prop. You can probably see how this applies to the quadcopter: a well-designed pitch-change mechanism is going to be much quicker than throttling a big prop with lots of rotational inertia. That’s the theory.

To test it, [rctestflight] builds some large 3D-printed variable pitch props, hooks them up to regular drone motors via a belt drive, before going on–you guessed it–an RC test flight. To make that work, he’s got the pitch servo being driven from what should be the flight controller’s thrust output to each motor. Aside from the vibrations from imperfect balance on the 3D-printed props, it flies quite well– and much better with pitch control than trying to vary the RPMs of those heavy props. He’s even able to reverse the propeller pitch, making this perhaps the first quadcopter capable of autorotation. Well, almost, given that it lost control and came apart when he cut the throttle.

As for efficiency, it is exactly what you’d expect from this disk loading– so, higher than a conventional quad–even with losses from the belt drive and the high-friction surface of a 3D print. Speaking of 3D-prints, the props did hold up to the maximum RPMs he could throw at them, so no ‘kaboom’ in this video. There is a fun rotary subwoofer bonus at the end, though.

Overall, [rctestflight] thinks his variable-pitch quadcopter proves the concept, but that if you’re going to all this effort you may as well build a helicopter and have fewer points of failure. We kind of have to agree. That is how it worked out historically, after all.

This isn’t the first time we’ve seen hackers trying to improve drone efficiency– there was the hybrid ‘giant propeller’ drone a while back, and the ‘slap a wing on it’ technique featured more recently.


Dodging a 60-Year-Old Design Flaw In Your RAM

April 08, 2026 0
A stick of DDR4 in DIMM format held by some alligator clips

Modern computers use dynamic RAM, a technology that allows very compact bits in return for having to refresh for about 400 nanoseconds every 3-4 microseconds. But what if you couldn’t afford even such a tiny holdup? [LaurieWired] goes into excruciating detail about how to avoid this delay.

But first, why do we care? It once again comes down to high-frequency trading; a couple nanoseconds of latency can be the difference between winning or losing a buy order. You likely miss all the caches and need to fetch data from the remote land of main memory. And if you get unlucky, you’ll be waiting on that price for a precious 400+ nanoseconds! [Laurie] explains all the problems faced in trying to avoid this penalty; you try to get a copy of the data on two independent refresh timers. That’s easier said than done; not only does the operating system hide the physical addresses from you, but the memory controllers themselves also scramble the addresses to the underlying RAM!

For the real computer architecture nerds, there’s a lot more to it, and [Laurie] goes over it in meticulous detail in the video after the break.

Thanks to [Keith Olson] for the tip!


Bending Faux-Neon LEDs Make for Animations Glass Tubes Can’t Match

April 08, 2026 0
Bending Faux-Neon LEDs Make for Animations Glass Tubes Can’t Match
an animated gif of the eye in motion.

Odds are, if you like neon lights, you’re not thrilled with the LED faux-“neon” strips that are supposed to replace them. They’ve got their advantages, but the light quality of RGB LEDs lacks something compared to the emission spectrum of nobel gas, at least to purists. On the other hand, you cannot create an animation by bending glass tubes, like [David Hamp-Gonsalves] has demonstrated with his Neon Animated Eye.

Back in the day, you’d have needed dozens of tubes for a flickery animation, but [David] figured that since these LED strips are flexible, why not flex them? He’s using addressable LEDs — WS2812s, specifically — so activating and deactivating the pupil of the eye is easy-peasy. Opening and closing the lid is accomplished with a geared motor driven by a TB6612 driver turning a barrel cam. The ends of the stiff LED strip being brought together and pulled apart result in the blinking effect here, but as [David] points out you’re hardly limited that specific motion. There’s a whole world of Tron-like glowing animatronics that can be created with this technique. Code and STLs are available on GitHub, though, if you want to replicate the eye exactly.

[David] says he’d like to see this in a storefront someday, but given that fatigue life is a thing, it might be something to keep in your back pocket for seasonal displays like Christmas and Halloween rather than something that’s going to run 24/7. On the other hand, if you’re careful about limiting flexion and which faux-neon strip you buy, you might be able to create an animation that can last for years.

This is hardly the first time we’ve seen these faux-neon strips , but it is the first time we’ve seen them animated. We can’t help but think the Hauntimator software we featured before would be a good paring with this hack.


2001: An Air Quality Odyssey

April 08, 2026 0

2001: A Space Odyssey not only pushed the boundaries of filmmaking, but introduced us to one of the most enduring villains in all of media. The HAL 9000 artificial intelligence was human-like but inhuman, a singular uncanny red light on a wall, tasked not only with control of a spaceship and its inner workings but also with being a companion for its occupants. It’s gone on to be the inspiration and basis of many projects around here, where it is generally given much less scope than control of a space ship and instead is tasked with something like monitoring air quality in a home.

Called the PAL 8000 by its creator [Arnov], this uses a Raspberry Pi Pico 2 at its core which monitors a volatile organic compound (VOC) sensor to take air quality measurements. The device features a custom 3D printed enclosure with glowing LEDs and plays contextual audio responses based on air quality levels, completing the HAL 9000 theme. The project also includes a local web dashboard which reports on its data, allowing users to see information in real time rather than relying on HAL’s voice reports alone.

For those looking to build other HAL-inspired projects, [Arnov] has made many of the printing files available on the project’s site. It’s a well-polished build faithful to the source material and could be a great addition to any home automation system for many other tasks beyond air quality monitoring. Perhaps something like a more general-purpose voice assistant, minus the megalomania.