4/21/2026 at 9:51:56 PM
So I've definitely played around with a lot of these ideas, because the thing that I'm trying to emulate/port/reverse engineer is a 16-bit Windows application, which coincidentally means that a lot of the existing infrastructure just keels over and dies in supporting it.I originally started the emulation route, but gave that up when I realized just how annoying it was to emulate the libc startup routine that invoked WinMain, and from poking around at the disassembly, I really didn't need to actually support anything before WinMain. And then poking further, I realized that if I patched the calls to libc, I only had to emulate a call to malloc at the, well, malloc level, as opposed to implementing the DOS interrupt that's equivalent to sbrk. (And a side advantage to working with Win16 code is that, since every inter-segment call requires a relocation, you can very easily find all the calls to all libc methods, even if the disassembler is still struggling with all of the jump tables.)
After realizing that the syscalls to modify the LDT still exist and work on Linux (they don't on Windows), I switched from trying to write my own 386 emulator to actually just natively executing all the code in 16-bit mode in 64-bit mode and relying on the existing relocation methods to insert the thunking between 16-bit and 64-bit code [1]. The side advantage was that this also made it very easy to just call individual methods given their address rather than emulating from the beginning of main, which also lets me inspect what those methods are doing a lot better.
Decompilation I've tried to throw in the mix a couple of times, but... 16-bit code really just screws with everything. Modern compilers don't have the ontology to really support it. Doing my own decompilation to LLVM IR runs into the issue that LLVM isn't good at optimizing all of the implemented-via-bitslicing logic, and I still have to write all of the patterns manually just to simplify the code for "x[i]" where x is a huge pointer. And because it's hard to reinsert recompiled decompiled code into the binary (as tools don't speak Win16 file formats anymore), it's difficult to verify that any decompilation is correct.
[1] The thunking works by running 16-bit code in one thread and 64-bit code in another thread and using hand-rolled spinlocks to wait for responses to calls. It's a lot easier than trying to make sure that all the registers and the stack are in the appropriate state.
by jcranmer
4/21/2026 at 10:58:31 PM
[post author] I went down some similar paths in retrowin32, though 32-bit x86 is likely easier.I was also surprised by how much goop there is between startup and main. In retrowin32 I just implemented it all, though I wonder how much I could get away with not running it in the Theseus replace-some-parts model.
I mostly relied on my own x86 emulator, but I also implemented the thunking between 64-bit and 32-bit mode just to see how it was. It definitely was some asm but once I wrapped my head around it it wasn't so bad, check out the 'trans64' and 'trans32' snippets in https://github.com/evmar/retrowin32/blob/ffd8665795ae6c6bdd7... for I believe all of it. One reframing that helped me (after a few false starts) was to put as much code as possible in my high-level language and just use asm to bridge to it.
by evmar
4/22/2026 at 3:23:31 AM
Yeah, 32-bit x86 is somewhat easier because everything's in the same flat address space, and you at least have a system-wide code32 gdt entry that means you can ignore futzing around with the ldt. 16-bit means you get to deal with segmented memory, and the cherry on top is that gdb just stops being useful since it doesn't know anything about segmented memory (I don't think Linux even makes it possible to query the LDT of another process, even with ptrace, to be fair).As for trying to ignore before main... well, the main benefit for me was being able to avoid emulating DOS interrupts entirely, between skipping the calls to set up various global variables, stubbing out some of the libc implementations, and manually marking in the emulator that code page X was 32-bit (something else that sends tools in a tizzy, a function switching from 16-bit to 32-bit mid-assembly code).
16-bit is weird and kinda fun to work with at times... but there's also a reason that progress on this is incredibly slow for me.
by jcranmer
4/22/2026 at 2:37:45 AM
Interesting work. I’m also doing the same thing with a Win16 Visual Basic app (with some third party libraries).Eventually I just decided to run the app inside an accurate emulator with a full Windows 3.1 environment since nothing else was stable.
by trollbridge
4/21/2026 at 9:55:45 PM
Have you tried just asking an LLM to reverse it to C and then work from there?by charcircuit
4/22/2026 at 2:38:52 AM
I tried that, and ran into roadblocks since the app under test is old Visual Basic (which is half compiled and half interpreted) and then it used a third party library which has quite sophisticated anti-decompilation features.by trollbridge