{"id":22,"date":"2022-11-29T12:14:00","date_gmt":"2022-11-29T12:14:00","guid":{"rendered":"https:\/\/thebrokenpipe.com\/blog\/?p=22"},"modified":"2026-04-28T11:03:14","modified_gmt":"2026-04-28T11:03:14","slug":"reverse-engineering-pc-dos-1-00s-bios-and-boot-sector","status":"publish","type":"post","link":"https:\/\/thebrokenpipe.com\/blog\/reverse-engineering-pc-dos-1-00s-bios-and-boot-sector\/","title":{"rendered":"Reverse Engineering PC-DOS 1.00&#8217;s BIOS and Boot Sector"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I wanted to get familiar with the IBM PC <code>INT 1xH<\/code> BIOS interrupts and explore how they&#8217;re actually used in practice, all in preparation for a challenge project. Reverse engineering the BIOS of PC-DOS seemed like the perfect exercise &#8211; the DOS BIOS handles all input and output for the DOS kernel and applications, so it naturally relies heavily on the PC BIOS <code>INT 1xH<\/code> interrupts. Plus, reverse engineering tends to give a much deeper understanding than just reading documentation online. Since I was already going to be digging into the BIOS, I figured I might as well reverse engineer the boot sector too.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, which version of the PC-DOS BIOS and boot sector should I go with? To keep things simple, it made sense to start with the earliest version &#8211; PC-DOS 1.00. Conveniently, there was already a fully annotated disassembly of its <a href=\"https:\/\/www.pagetable.com\/?p=184\">BIOS<\/a> and <a href=\"https:\/\/www.pagetable.com\/?p=165\">boot sector<\/a> by <a href=\"http:\/\/www.michael-steil.de\">Michael Steil<\/a>. That said, this was primarily a learning exercise for me, so I avoided referring to his work while doing my own. As an added challenge, I wanted my disassemblies to produce binaries identical to the originals when assembled using the original assembler.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"bios\">Reversing the BIOS<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The first step was extracting the DOS BIOS from the diskette image. I opened the PC-DOS 1.00 disk image in a hex editor and noticed there&#8217;s no BIOS Parameter Block (BPB) seen in later FAT filesystems. I could&#8217;ve added a BPB, but I took the simpler route and extracted the BIOS directly using the hex editor. It&#8217;s the first file after the root directory. This also saved me from having to deal with system and hidden file attributes. I saved the extracted file as <code>IBMBIO.COM<\/code> and loaded it into IDA Pro.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1001\" height=\"339\" src=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_ida_load.png\" alt=\"\" class=\"wp-image-268\" style=\"width:508px;height:auto\" srcset=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_ida_load.png 1001w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_ida_load-300x102.png 300w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_ida_load-768x260.png 768w\" sizes=\"auto, (max-width: 1001px) 100vw, 1001px\" \/><\/figure>\n<\/div>\n\n\n<!--more-->\n\n\n\n<p class=\"wp-block-paragraph\">IDA identified the file as an MS-DOS <code>.COM<\/code> binary and set the processor to MetaPC, which handles all x86 opcodes. Since it&#8217;s an 8086 binary, I switched the processor to Intel 8086 to make sure the disassembly was accurate.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"406\" src=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_100h-1024x406.png\" alt=\"\" class=\"wp-image-267\" style=\"width:612px;height:auto\" srcset=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_100h-1024x406.png 1024w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_100h-300x119.png 300w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_100h-768x305.png 768w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_100h-1536x610.png 1536w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_100h.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s where it got interesting &#8211; IDA only recognised about 30% of the file as code, while the rest was marked as data (brown for code outside of functions, blue for functions). That seemed off. Notice the <code>org 100h<\/code> directive? That&#8217;s standard for all <code>.COM<\/code> binaries, but why?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In DOS, a Program Segment Prefix (PSP) is placed at the start of each code segment, so <code>.COM<\/code> files are loaded at address <code>100H<\/code> within their segment. However, the PSP is set up by DOS and the Command Interpreter &#8211; both of which run after the BIOS. That meant the BIOS should have been loaded at segment <code>60H<\/code> offset <code>0<\/code>, not <code>100H<\/code>, as there&#8217;s no need for a PSP nor something to create its PSP. I rebased the program and re-ran the analysis.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"382\" height=\"441\" src=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_rebase.png\" alt=\"\" class=\"wp-image-266\" style=\"width:236px;height:auto\" srcset=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_rebase.png 382w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_rebase-260x300.png 260w\" sizes=\"auto, (max-width: 382px) 100vw, 382px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Anything unusual? Definitely. At the start of the code, there are 10 long intra-segment jumps. Assuming execution begins at <code>0060:0000<\/code>, the first jump would go straight to <code>0060:0165<\/code>, leaving the other 9 jumps unused.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"994\" height=\"558\" src=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_0h.png\" alt=\"\" class=\"wp-image-265\" style=\"width:463px;height:auto\" srcset=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_0h.png 994w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_0h-300x168.png 300w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/bios_disasm_base_0h-768x431.png 768w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">I suspected those other 9 jumps are for DOS to call via the far inter-segment <code>CALL<\/code> instruction. But what exactly do they do? Luckily, there&#8217;s a fascinating document called <a href=\"http:\/\/www.bitsavers.org\/pdf\/seattleComputer\/Customizing_MS-DOS_1.23_and_Later.pdf\">&#8220;Customizing MS-DOS Version 1.23 and Later&#8221;<\/a> available on <a href=\"http:\/\/www.bitsavers.org\">Bitsavers<\/a>. On the very first page, I found this listing:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>0000\tJMP\tINIT\t; System initialization\n0003\tJMP\tSTATUS\t; Console status check\n0006\tJMP\tCONIN\t; Console input\n0009\tJMP\tCONOUT\t; Console output\n000C\tJMP\tPRINT\t; Printer output\n000F\tJMP\tAUXIN\t; Auxiliary input\n0012\tJMP\tAUXOUT\t; Auxiliary output\n0015\tJMP\tREAD\t; Disk read\n0018\tJMP\tWRITE\t; Disk write\n001B\tJMP\tDSKCHG\t; Return disk change status\n001E\tJMP\tSETDATE\t; Set current date\n0021\tJMP\tSETTIME\t; Set current time\n0024\tJMP\tGETDATE\t; Read time and date\n0027\tJMP\tFLUSH\t; Flush keyboard input buffer\n002A\tJMP\tMAPDEV\t; Device mapping<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #174781\">0000<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tINIT\t<\/span><span style=\"color: #357B42; font-style: italic\">; System initialization<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0003<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tSTATUS\t<\/span><span style=\"color: #357B42; font-style: italic\">; Console status check<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0006<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tCONIN\t<\/span><span style=\"color: #357B42; font-style: italic\">; Console input<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0009<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tCONOUT\t<\/span><span style=\"color: #357B42; font-style: italic\">; Console output<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">000C\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tPRINT\t<\/span><span style=\"color: #357B42; font-style: italic\">; Printer output<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">000F\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tAUXIN\t<\/span><span style=\"color: #357B42; font-style: italic\">; Auxiliary input<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0012<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tAUXOUT\t<\/span><span style=\"color: #357B42; font-style: italic\">; Auxiliary output<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0015<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tREAD\t<\/span><span style=\"color: #357B42; font-style: italic\">; Disk read<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0018<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tWRITE\t<\/span><span style=\"color: #357B42; font-style: italic\">; Disk write<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">001B<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tDSKCHG\t<\/span><span style=\"color: #357B42; font-style: italic\">; Return disk change status<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">001E\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tSETDATE\t<\/span><span style=\"color: #357B42; font-style: italic\">; Set current date<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0021<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tSETTIME\t<\/span><span style=\"color: #357B42; font-style: italic\">; Set current time<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0024<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tGETDATE\t<\/span><span style=\"color: #357B42; font-style: italic\">; Read time and date<\/span><\/span>\n<span class=\"line\"><span style=\"color: #174781\">0027<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tFLUSH\t<\/span><span style=\"color: #357B42; font-style: italic\">; Flush keyboard input buffer<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">002A\t<\/span><span style=\"color: #7B30D0\">JMP<\/span><span style=\"color: #002339\">\tMAPDEV\t<\/span><span style=\"color: #357B42; font-style: italic\">; Device mapping<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">This document was meant for customising the BIOS of MS-DOS 1.23. Since that version was only just before IBM PC-DOS 1.10 (equivalent to MS-DOS 1.24), it was no surprise that it included more I\/O functions than the PC-DOS 1.00 BIOS. Looking at the disassembly, the last function is at offset <code>001B<\/code>, meaning all functions after <code>DSKCHG<\/code> were introduced after PC-DOS 1.00. Interestingly, a quick look at the <code>DOSIO.ASM<\/code> source from 86-DOS 1.14 showed that contrary to popular belief, IBM PC-DOS 1.00 isn&#8217;t just 86-DOS 1.14, as the three time and date functions present in 1.14&#8217;s BIOS are missing in PC-DOS 1.00.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With those functions identified using the aforementioned guide, it was time to dive into the actual reverse engineering.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The next step was the tedious but essential part &#8211; reading the disassembly. Doing so gave me valuable insights into the usage of PC BIOS interrupts, exactly what I wanted to learn. I spent two days grinding through the disassembly, reading each instruction and commenting on almost every line.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Binary-Exact Disassembly<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now for the exciting part &#8211; making the disassembly re-assemble into the original binary!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Identifying the Assembler<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">To recreate the original binary exactly, I first needed to figure out which assembler was used to build the original source code. The machine code produced by that assembler was remarkably clean &#8211; there weren&#8217;t any unnecessary <code>NOP<\/code>s.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Typically, when writing x86 assembly, you&#8217;ll use <code>JMP<\/code> instructions. For intra-segment jumps, there are two common types &#8211; the 2-byte short jump (<code>JMP SHORT LABEL<\/code>) and the 3-byte long jump (<code>JMP LABEL<\/code>). The 2-byte short jump has a limited range, and people don&#8217;t usually bother calculating the exact jump distance themselves while writing code. Instead, people just write <code>JMP LABEL<\/code> and let the assembler figure out whether to use a short or a long jump.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">MASM, the most commonly used 8086 assembler on DOS, reserves 3 bytes for all <code>JMP<\/code>s. If the jump distance is within range for a short jump, MASM uses the first 2 bytes for a short jump and fills the third with a <code>NOP<\/code>. If the distance exceeds the short jump range, it uses all 3 bytes for a long jump. Since the PC-DOS BIOS doesn&#8217;t have any <code>NOP<\/code>s, it&#8217;s unlikely that MASM (or IBM&#8217;s version of it) was used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What other 8086 assemblers were there in the early 1980s? Intel&#8217;s ASM86 and Seattle Computer Products&#8217; ASM\/ASM-86. DOS was originally developed by SCP and assembled using SCP&#8217;s ASM, and IBM assembled the IBM PC BIOS with Intel&#8217;s ASM86. So, which assembler did Microsoft use for the PC-DOS BIOS? The answer is SCP&#8217;s ASM, with the key clue being MSB-terminated strings.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">MSB-Terminated Strings<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">You&#8217;ve probably heard of NUL-terminated strings used in C, $-terminated strings used by CP\/M and DOS, or Pascal strings with the length embedded at the start. But there&#8217;s another lesser-known type &#8211; MSB-terminated strings.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Both character-terminated and Pascal strings come with storage overhead. Character-terminated strings need an extra character to mark the end, while Pascal strings require at least one byte to store the length.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">MSB-terminated strings get rid of this overhead by taking advantage of the unused bit in ASCII characters. ASCII is a 7-bit encoding, but each character is stored in an 8-bit byte, leaving the most significant bit (MSB) unused. This spare bit can be used to indicate whether the next byte belongs to the string.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If the MSB is 0, the byte is part of the string.<\/li>\n\n\n\n<li>If the MSB is 1, it marks the end of the string.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This cleverly uses the otherwise wasted bit for efficient string termination without any extra storage. The downside, of course, is that it only works with ASCII.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">SCP ASM Assembler<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The likely version used was SCP&#8217;s ASM 2.24, the version included with the <a href=\"https:\/\/thestarman.pcministry.com\/DOS\/ibm090\/index.html\">internal pre-release of PC-DOS 1.00<\/a>. However, IDA Pro generates assembly files compatible with MASM, while SCP&#8217;s ASM uses a different syntax. After reading the <a href=\"http:\/\/www.bitsavers.org\/pdf\/seattleComputer\/Z80_8086_Cross_Assembler_Preliminary.pdf\">documentation for SCP&#8217;s ASM-86<\/a> (the CP\/M-80 version of ASM), I identified several key syntax differences, outlined below.<\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table class=\"has-fixed-layout\"><thead><tr><th>MASM<\/th><th>SCP ASM<\/th><\/tr><\/thead><tbody><tr><td><br>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CS:[Var1],AX<\/td><td>SEG&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CS<br>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Var1],AX<\/td><\/tr><tr><td>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BYTE PTR Var2,3<\/td><td>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;B,[Var2],3<\/td><\/tr><tr><td>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;WORD PTR Var3,5<\/td><td>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;W,[Var3],5<\/td><\/tr><tr><td>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AX,OFFSET Var4<\/td><td>MOV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AX,Var4<\/td><\/tr><tr><td>Func1&nbsp;&nbsp;&nbsp;PROC&nbsp;&nbsp;&nbsp;&nbsp;FAR<br>RET<br>Func1&nbsp;&nbsp;&nbsp;ENDP<\/td><td><br>RET&nbsp; &nbsp; &nbsp;L<br><\/td><\/tr><tr><td>RETF<\/td><td>RET&nbsp; &nbsp; &nbsp;L<\/td><\/tr><tr><td>JP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LABEL1<\/td><td>JPE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LABEL1<\/td><\/tr><tr><td>JMP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SHORT LABEL2<\/td><td>JP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LABEL2<\/td><\/tr><tr><td><br>REP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;MOVSW<\/td><td>REP<br>MOVW<\/td><\/tr><tr><td>DIV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BX<\/td><td>DIV&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AX,BX<\/td><\/tr><tr><td>SHL&nbsp; &nbsp; &nbsp;AX,1<\/td><td>SHL&nbsp; &nbsp; &nbsp;AX<\/td><\/tr><tr><td>XCHG&nbsp; &nbsp; AX,BX<\/td><td>XCHG&nbsp; &nbsp; BX,AX<\/td><\/tr><tr><td>DB&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#8220;Hell&#8221;,&#8221;o&#8221;+80H<\/td><td>DM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#8220;Hello&#8221;<\/td><\/tr><tr><td>DB&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;10 DUP(?)<\/td><td>DS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;10<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">ASM doesn&#8217;t support procedures or functions, so <code>RET<\/code> always performs a near return (<code>RETN<\/code>), while <code>RET L<\/code> is used for a far return (<code>RETF<\/code>). All prefixes and segment overrides must be on the line above the instruction. Memory access uses square brackets &#8211; simply referencing a label operates on the pointer itself, not the byte or word that it points to. For shifts and rotates, only the register to shift or rotate is used. For multiplication and division, the destination register must be included, which in this case is always <code>AX<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One important difference is the <code>JP<\/code> (jump-if-parity) instruction. In Intel and MASM syntax, both <code>JP<\/code> and <code>JPE<\/code> refer to this instruction. But in SCP&#8217;s ASM syntax, <code>JP<\/code> means a short intra-segment jump (<code>JMP SHORT<\/code>), and <code>JPE<\/code> is the only name for jump-if-parity. Naming short jumps <code>JP<\/code> probably came from Z80. SCP ASM also includes a pseudo-op called <code>DM<\/code>, which defines a message (string). Strings defined with <code>DM<\/code> are MSB-terminated.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Mismatching Binaries \ud83d\ude41<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">After getting comfortable with the SCP ASM syntax, I exported the IDA database to a MASM <code>.ASM<\/code> file and converted it to SCP ASM syntax. I tweaked the code until the error count hit zero, then used <code>HEX2BIN<\/code> to convert the Intel HEX object into raw binary. Finally, I compared the original binary with mine to see if they matched. Unfortunately, they didn&#8217;t :(.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I opened both binaries in a hex editor and compared them side by side. One of the mismatched instructions was <code>ADD BH,2<\/code>. In the original binary, the machine code is <code>80 C7 02<\/code> (in hex), but in my re-assembly it came out as <code>82 C7 02<\/code>. Both opcodes perform the same operation, which made me question whether SCP&#8217;s ASM was really the assembler Microsoft had used. Assemblers typically produce deterministic output &#8211; they should generate the same machine code for the same instruction every time. So, did Microsoft actually use SCP&#8217;s ASM? Or maybe they had their own internal assembler? I scoured the web for answers, and just when I was about to give up, I stumbled upon <a href=\"http:\/\/cini.classiccmp.org\">Rich Cini<\/a>&#8216;s <a href=\"http:\/\/cini.classiccmp.org\/hal.htm\">incredible research on the 20HAL program<\/a>!<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">The one remaining problem relates to operand size in a single line \u2014 it compiles but produces the incorrect byte sequence. For example, code just after the loc44 label is:<br><br>&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;&nbsp;&nbsp;&nbsp;[bx+1],ch<br><br>In the original program, this codes as 88\/AF\/01\/00 yet when recompiled, it comes out as 88\/6F\/01. Even if I use the &#8220;W&#8221; (word) modifier, it doesn&#8217;t change the output. Arrrgh. One of my friends from the VCFE board mentioned a feature of the SCP assembler in which you can force a 16-bit reference by using a forward equate that&#8217;s not &#8220;near&#8221; (so -127 or +128 from the PC). So, the above would be&#8230;<br><br>&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;&nbsp;&nbsp;&nbsp;[bx+ONE],ch<br><br>&#8230;and then at the bottom of the source file I added:<br><br>&nbsp;&nbsp;&nbsp;&nbsp;ONE:&nbsp;&nbsp;&nbsp;&nbsp;equ&nbsp;&nbsp;&nbsp;&nbsp;1<br><br>That fixed it! Not sure if that&#8217;s how it would have actually been coded, but at least it causes ASM to emit the right bytes. There is one additional similar mis-coding in the FlushBuf routine (cmp cl,0) which requires using the same forward equate trick (cmp cl,ZERO) to get the right bytes (80\/F9\/00 rather than 82\/F9\/00).<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Amazing! With this trick, I managed to make my re-assembly of the disassembled BIOS identical to the original binary! Of course, instead of generic names like <code>ZERO<\/code> and <code>ONE<\/code>, I gave them more meaningful names.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"\/uploads\/2022\/11\/IBMBIO.ASM\">View\/Download PC-DOS 1.00 BIOS Disassembly<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Source Code Walkthrough<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now we have a fully working and commented copy of the IBM PC-DOS 1.00 BIOS source code! Here&#8217;s how it works:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>BIOS Function<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td><code>INIT<\/code><\/td><td>This is the entry point of the BIOS, executed only once.<br><br>It starts by setting up the stack &#8211; interrupts are disabled, <code>SS<\/code> is set to <code>CS<\/code>, and <code>SP<\/code> is pointed to a temporary area within the BIOS. Interrupts are then re-enabled. The reason for disabling interrupts during stack setup is a bug in the original 8086\/8088 CPUs (this bug was <a href=\"https:\/\/www.righto.com\/2022\/11\/a-bug-fix-in-8086-microprocessor.html\">fixed<\/a> in later versions) &#8211; interrupts could corrupt memory if they fire between changing <code>SS<\/code> and <code>SP<\/code>.<br><br>After that, it resets the disk system with <code>INT 13H<\/code>, configures 8N1 2400-baud serial I\/O with <code>INT 14H<\/code>, initialises the printer via <code>INT 17H<\/code>, and checks the number of floppy drives using <code>INT 11H<\/code>. Once hardware initialisation is done, the divide-by-zero handler is set up and debugging interrupts are disabled.<br><br>The BIOS then moves DOS to the end of its space, saving about a sector or two. The original DOS segment is calculated as RoundUp(SizeOfFile(<code>IBMBIO.COM<\/code>), SizeOfSector()) \/ <code>16<\/code> + <code>60H<\/code>, but the new segment is simply the paragraph right after the last byte of the BIOS, saving 752 bytes in total. After moving DOS, it launches the DOS kernel and <code>COMMAND.COM<\/code>, wrapping up the boot process.<\/td><\/tr><tr><td><code>STATUS<\/code><\/td><td>Returns the last character and sets or clears the <code>ZF<\/code> flag depending on whether a new character is ready. Notably, <code>Ctrl + PtrScr<\/code> is converted to <code>Ctrl + P<\/code>.<\/td><\/tr><tr><td><code>CONIN<\/code><\/td><td>Waits for user input and returns the character as soon as it arrives. If the character returned by <code>INT 16H<\/code> is <code>0<\/code>, it retries until a non-zero value comes back. Again, <code>Ctrl + PrtScr<\/code> is converted to <code>Ctrl + P<\/code>.<\/td><\/tr><tr><td><code>CONOUT<\/code><\/td><td>Outputs a character to the console using <code>INT 10H<\/code>. The page number is set to <code>0<\/code>, and the text colour is set to light grey.<\/td><\/tr><tr><td><code>PRINT<\/code><\/td><td>Sends a character to printer 0 using <code>INT 17H<\/code>. If an error occurs (except for out-of-paper, which fails immediately), it retries the operation. If it fails again, an error message is displayed on the console.<\/td><\/tr><tr><td><code>AUXIN<\/code><\/td><td>Receives and returns a character via <code>INT 14H<\/code>. If overrun, parity, or framing errors are detected, an error message is displayed. Only port 0 is used.<\/td><\/tr><tr><td><code>AUXOUT<\/code><\/td><td>Outputs a character to the auxiliary output device on port 0. An error message is displayed if the error bit is set.<\/td><\/tr><tr><td><code>READ<\/code>\/<code>WRITE<\/code><\/td><td>These functions are nearly identical, differing only in the operation parameter passed to <code>INT 13H<\/code>. They convert the data pointer to a linear address and check whether 64 KiB boundaries are crossed. If the data area doesn&#8217;t cross the boundary, the transfer happens in one fell swoop. If it does, the sectors up to the boundary are transferred first, followed by the sector at the boundary using a temporary area, and then the remaining sectors. Each read\/write operation can be retried up to 5 times, and the BIOS translates disk error codes into DOS error codes, setting the <code>CF<\/code> flag on fatal errors (i.e., if all 5 attempts fail).<\/td><\/tr><tr><td><code>DSKCHG<\/code><\/td><td>A stub that just returns 0. It does not appear that the IBM PC can detect disk changes.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"bios-bugs\">String Termination &#8211; Revisited<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">It&#8217;s worth noting that the internal string-printing function used for displaying error messages clears the MSB of each character before sending it to the console, since the strings are MSB-terminated. Interestingly, all error messages were also <code>NUL<\/code>-terminated on top of having their MSBs set for the last character. This is proof that Microsoft used the <code>DM<\/code> (Define Message) pseudo-op instead of <code>DB<\/code> (Define Byte), even though they still wanted NUL-termination in their strings.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>HELLO:\tDB\t\"Hello, World!\",0\t;NUL-terminated string \"Hello, World!\"\n\nHELLO:\tDM\t\"Hello, World!\"\t\t;MSB-terminated string \"Hello, World!\"\n\nHELLO:\tDM\t\"Hello, World!\"\t\t;MSB- and NUL-terminated string \"Hello, World!\"\n\tDB\t0\t\t\t;All strings are defined this way in the BIOS!<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #7EB233\">HELLO:<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #0991B6\">DB<\/span><span style=\"color: #002339\">\t&quot;Hello, World!&quot;,<\/span><span style=\"color: #174781\">0<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #357B42; font-style: italic\">;NUL-terminated string &quot;Hello, World!&quot;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">HELLO:<\/span><span style=\"color: #002339\">\tDM\t&quot;Hello, World!&quot;\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;MSB-terminated string &quot;Hello, World!&quot;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">HELLO:<\/span><span style=\"color: #002339\">\tDM\t&quot;Hello, World!&quot;\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;MSB- and NUL-terminated string &quot;Hello, World!&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #0991B6\">DB<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">0<\/span><span style=\"color: #002339\">\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;All strings are defined this way in the BIOS!<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">My guess is the Microsoft developers who wrote the BIOS didn&#8217;t realise they could define regular strings using the <code>DB<\/code> pseudo-op, so they went with <code>DM<\/code> for everything involving strings.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Segment Override Prefix<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Another interesting find is a strange segment override prefix lurking in the binary. Take a look at the console output function:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>;\n; CONOUT - Console output\n;\n; AL contains the character to output to the console, all registers\n; must be preserved.\n;\nCONOUT:\n\tPUSH\tBP\t\t;Save all necessary registers\n\tPUSH\tAX\n\tPUSH\tBX\n\tPUSH\tSI\n\tPUSH\tDI\n\tMOV\tAH,0EH\t\t;Function = write char\n\tSEG\tCS\n\tMOV\tBX,7\t\t;Light gray and page number = 0\n\tINT\t10H\t\t;Call video BIOS service\n\tPOP\tDI\t\t;Restore all saved registers\n\tPOP\tSI\n\tPOP\tBX\n\tPOP\tAX\n\tPOP\tBP\n\tRET\tL<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">; CONOUT - Console output<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">; AL contains the character to output to the console, all registers<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">; must be preserved.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">CONOUT:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">PUSH<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">BP<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Save all necessary registers<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">PUSH<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">AX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">PUSH<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">BX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">PUSH<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">SI<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">PUSH<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DI<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">AH<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">0EH<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Function = write char<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\tSEG\t<\/span><span style=\"color: #174781\">CS<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">BX<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">7<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Light gray and page number = 0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">INT<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">10H<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Call video BIOS service<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">POP<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DI<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Restore all saved registers<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">POP<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">SI<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">POP<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">BX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">POP<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">AX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">POP<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">BP<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">RET<\/span><span style=\"color: #002339\">\tL<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">See anything unusual? Look at the <code>SEG CS<\/code> line right above <code>MOV BX,7<\/code>. That effectively turns the instruction into <code>MOV CS:BX,7<\/code>, which isn&#8217;t really a valid 8086 instruction. SCP&#8217;s assembler didn&#8217;t care though &#8211; it treated the segment override prefix as a separate instruction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But why was there a segment override when you&#8217;re just moving a constant into a register? There must&#8217;ve been at least one line of code between <code>SEG CS<\/code> and <code>MOV BX,7<\/code> which got removed, then whoever removed it forgot about the segment override on the line before. I&#8217;ve actually made this exact mistake myself, twice, when modifying boot sector code for an upcoming project (which I&#8217;ll write about in a <a href=\"#114-pc-port\">future post<\/a>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, what was actually removed? Technically&#8230; nothing! The line after <code>SEG CS<\/code> was changed from <code>MOV BX,[TXTCOLOR]<\/code> to <code>MOV BX,7<\/code>. The <code>TXTCOLOR<\/code> variable had a constant value of <code>7<\/code>, they&#8217;ve simply replaced the constant variable with an immediate. How did I figure this out? Well, I didn&#8217;t, I took a peek at the BIOS from the PC-DOS 1.00 pre-release from early June.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Unbuggy Bug<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now, onto the most interesting&#8230; bug! When the BIOS couldn&#8217;t find a valid copy of <code>COMMAND.COM<\/code>, it&#8217;s supposed to display the error message <code>Bad or missing Command Interpreter<\/code> and then stall the machine in an infinite loop. This BIOS does exactly that, but in a rather peculiar way.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"930\" src=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/brokenpipe.png\" alt=\"\" class=\"wp-image-264\" style=\"width:212px;height:auto\" srcset=\"https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/brokenpipe.png 600w, https:\/\/thebrokenpipe.com\/blog\/wp-content\/uploads\/2022\/11\/brokenpipe-194x300.png 194w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><figcaption class=\"wp-element-caption\">The reason why the code works.<\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s the code responsible for printing the error message and putting the CPU into an infinite loop:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>COMERR:\n\tMOV\tDX,BADCOM\t;Load bad or missing message\n\tCALL\tOUTPSTR\t\t;Print it out\n\nSTALL:\n\tJPE\tSTALL\t\t;Do nothing forever<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #7EB233\">COMERR:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DX<\/span><span style=\"color: #002339\">,BADCOM\t<\/span><span style=\"color: #357B42; font-style: italic\">;Load bad or missing message<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">CALL<\/span><span style=\"color: #002339\">\tOUTPSTR\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Print it out<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">STALL:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JPE<\/span><span style=\"color: #002339\">\tSTALL\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Do nothing forever<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">How does <code>JPE<\/code> (jump-parity-even) stall the machine when it&#8217;s a conditional jump? Shouldn&#8217;t an unconditional jump be used, to guarantee the infinite loop? Actually, in this specific case, <code>JPE<\/code> does the job.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>PF<\/code> (parity flag) is set or cleared based on the result of the last operation &#8211; it&#8217;s set to <code>1<\/code> if the result has an even number of 1s, <code>0<\/code> otherwise. Let&#8217;s look at the instructions that run just before the <code>JPE<\/code>:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>;\n; Output string to console\n;\nOUTPSTR:\n\tXCHG\tDX,SI\t\t;Swap DX and SI (for LODB)\n\nOUTPSLOOP:\t;Character output loop\n\tSEG\tCS\n\tLODB\t\t\t;Load byte at SI to AL\n\n\tAND\tAL,7FH\t\t;Clear MSB, we used to set MSB for last\n\t\t\t\t;char to terminate strings to save\n\t\t\t\t;space, now we have moved to zero-\n\t\t\t\t;terminated strings but we still need to \n\t\t\t\t;handle MSB-terminated strings because \n\t\t\t\t;ASM still sets MSB for strings\n\n\tJZ\tOUTPSDONE\t;Reached end of string, break out of \n\t\t\t\t;loop\n\tCALL\tCONOUT,BIOSSEG\t;Call BIOS CONOUT function\n\tJP\tOUTPSLOOP\t;Back to loop beginning\n\nOUTPSDONE:\n\tXCHG\tDX,SI\t\t;Swap DX and SI back\n\tRET<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">; Output string to console<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">OUTPSTR:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">XCHG<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DX<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">SI<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Swap DX and SI (for LODB)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">OUTPSLOOP:<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #357B42; font-style: italic\">;Character output loop<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\tSEG\t<\/span><span style=\"color: #174781\">CS<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\tLODB\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Load byte at SI to AL<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">AND<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">AL<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">7FH<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Clear MSB, we used to set MSB for last<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;char to terminate strings to save<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;space, now we have moved to zero-<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;terminated strings but we still need to <\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;handle MSB-terminated strings because <\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;ASM still sets MSB for strings<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JZ<\/span><span style=\"color: #002339\">\tOUTPSDONE\t<\/span><span style=\"color: #357B42; font-style: italic\">;Reached end of string, break out of <\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;loop<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">CALL<\/span><span style=\"color: #002339\">\tCONOUT,BIOSSEG\t<\/span><span style=\"color: #357B42; font-style: italic\">;Call BIOS CONOUT function<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">JP<\/span><span style=\"color: #002339\">\tOUTPSLOOP\t<\/span><span style=\"color: #357B42; font-style: italic\">;Back to loop beginning<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #7EB233\">OUTPSDONE:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">XCHG<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DX<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">SI<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Swap DX and SI back<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">RET<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The instruction right before the <code>JPE<\/code> is the <code>RET<\/code> from <code>OUTPSTR<\/code>. <code>RET<\/code> doesn&#8217;t affect any flags. Before <code>RET<\/code> is an <code>XCHG<\/code>, which also doesn&#8217;t touch the flags. And how does one reach that <code>XCHG<\/code>? The only path is through the <code>JZ OUTPSDONE<\/code> instruction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What&#8217;s the condition for <code>JZ<\/code>? The result of the previous operation being zero. How many ones are there in zero? None. Is zero even? Yes. So the <code>PF<\/code> is guaranteed to be set. Since neither <code>XCHG<\/code> nor <code>RET<\/code> changes the flags, when <code>JPE<\/code> is executed, the <code>PF<\/code> would still be set. This guarantees that when the <code>JPE<\/code> is executed, it always performs the jump, hence stalls the machine as required.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So we&#8217;re back to the question why didn&#8217;t Microsoft use an unconditional jump here? I&#8217;m taking a wild guess here, but most likely they did in the original source code. However, they then modified SCP&#8217;s assembler, changing <code>JP<\/code> (short jump) to match Intel&#8217;s definition of &#8220;jump-if-parity&#8221;, and likely came up with a new name for short unconditional jump (maybe <code>JMPS<\/code>, supported by later versions of ASM). The code for loading <code>COMMAND.COM<\/code> was copied almost verbatim from 86-DOS&#8217; <code>DOSIO.ASM<\/code> (you can compare them yourself). Perhaps they overlooked replacing <code>JP<\/code> with their new name. As a result, <code>JP<\/code> in the source code got assembled into <code>JPE<\/code>, and they never noticed it because it always worked. Again, this is just a wild guess.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><del>Stealing<\/del> Borrowing More Code<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s another example of Microsoft copying SCP&#8217;s code. Check out this snippet:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>;Load 86-DOS\n\tMOV\tSI,INITTAB\t;Load drive list pointer\n\tCALL\t0,DOSSEG\t;Call DOS init\n\tSTI\t\t\t;Enable interrupts\n...\n...\n;Make all segment registers the same\n\tMOV\tDS,BX\n\tMOV\tES,BX\n\tMOV\tSS,BX\n\tMOV\tSP,40H\t\t;Set stack to 64 bytes<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;Load 86-DOS<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">SI<\/span><span style=\"color: #002339\">,INITTAB\t<\/span><span style=\"color: #357B42; font-style: italic\">;Load drive list pointer<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">CALL<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">0<\/span><span style=\"color: #002339\">,DOSSEG\t<\/span><span style=\"color: #357B42; font-style: italic\">;Call DOS init<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">STI<\/span><span style=\"color: #002339\">\t\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Enable interrupts<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #357B42; font-style: italic\">;Make all segment registers the same<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DS<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">BX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">ES<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">BX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">SS<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">BX<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">SP<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">40H<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Set stack to 64 bytes<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Why didn&#8217;t they disable interrupts when changing the stack segment? Simple &#8211; they copied this straight from SCP. DOS disables interrupts during initialisation and doesn&#8217;t re-enable them. Since SCP didn&#8217;t do an <code>STI<\/code> after DOS initialisation, it was safe for them to change <code>SS:SP<\/code> without worrying about memory corruption from interrupts firing midway. Microsoft, however, re-enables interrupts right after DOS initialisation. This makes changing <code>SS:SP<\/code> with interrupts enabled a risky move. That said, this wasn&#8217;t a major issue at all given that all IBM PCs shipped with processors that had the memory corruption issue fixed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Microsoft also didn&#8217;t just copy the <code>COMMAND.COM<\/code> loading code from SCP&#8217;s <code>DOSIO.ASM<\/code> &#8211; they also made two notable changes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Reduced the stack size by changing the initial stack pointer from <code>5CH<\/code> to <code>40H<\/code>.<\/li>\n\n\n\n<li>Changed the Disk Transfer Address (DTA) from a constant value (<code>80H<\/code>) to the value stored at offset <code>80H<\/code> in the <code>COMMAND.COM<\/code> segment (<code>[80H]<\/code>).<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:8;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#002339;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>\tMOV\tDX,&#91;80H&#93;\t;Transfer address\n\tMOV\tAH,1AH\t\t;Function = set disk transfer address\n\tINT\t21H\t\t;Call DOS interrupt<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki slack-ochin\" style=\"background-color: #FFF\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">DX<\/span><span style=\"color: #002339\">,&#91;<\/span><span style=\"color: #174781\">80H<\/span><span style=\"color: #002339\">&#93;\t<\/span><span style=\"color: #357B42; font-style: italic\">;Transfer address<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">MOV<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">AH<\/span><span style=\"color: #002339\">,<\/span><span style=\"color: #174781\">1AH<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Function = set disk transfer address<\/span><\/span>\n<span class=\"line\"><span style=\"color: #002339\">\t<\/span><span style=\"color: #7B30D0\">INT<\/span><span style=\"color: #002339\">\t<\/span><span style=\"color: #174781\">21H<\/span><span style=\"color: #002339\">\t\t<\/span><span style=\"color: #357B42; font-style: italic\">;Call DOS interrupt<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Reducing the stack size is somewhat understandable, though it increases the risk of stack overflow (and doesn&#8217;t actually save any RAM here). But changing the DTA from <code>80H<\/code> to <code>[80H]<\/code> is a real issue &#8211; it introduces undefined behaviour which could cause bugs and memory corruption. Clearly, whoever made this change didn&#8217;t know what they were doing. Both changes got reverted by the next release (PC-DOS 1.10).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"boot-sector\">Reversing the Boot Sector<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Now, the boot sector. Having written a boot sector game before, I was already familiar with how boot sectors work and that the base address is always <code>7C00H<\/code>. I saved the boot sector as a <code>.BIN<\/code> file, loaded it into IDA Pro and set the base address to <code>7C00H<\/code>. Compared to the BIOS, the boot sector was much simpler to reverse since it only does the bare minimum needed to boot DOS.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"\/uploads\/2022\/11\/BOOT.ASM\">View\/Download PC-DOS 1.00 Boot Sector Disassembly<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Walkthrough<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s a breakdown of what the boot sector does:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Set up the stack at <code>0000:7C00<\/code>, just below the boot sector in memory.<\/li>\n\n\n\n<li>Reset the disk system.<\/li>\n\n\n\n<li>Check for the presence of system files.\n<ul class=\"wp-block-list\">\n<li>Read the first sector of the root directory into memory at <code>0060:0000<\/code> (where the BIOS gets loaded later).<\/li>\n\n\n\n<li>Convert the first two filenames to lowercase and compare them against the hardcoded filenames.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>If the disk is a valid system disk&#8230;\n<ul class=\"wp-block-list\">\n<li>Read 20 sectors starting from the first data sector into memory at <code>0060:0000<\/code>.<\/li>\n\n\n\n<li>If disk read succeeds\u2026\n<ul class=\"wp-block-list\">\n<li>Jump to <code>0060:0000<\/code>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Otherwise&#8230;\n<ul class=\"wp-block-list\">\n<li>Print an error message and boot to ROM BASIC using <code>INT 18H<\/code>.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Otherwise\u2026\n<ul class=\"wp-block-list\">\n<li>Prompt the user to insert another disk and try again.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">String Termination &#8211; Revisited Again<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Just like in the BIOS, the error messages in the boot sector are MSB-terminated, suggesting Microsoft used <code>DM<\/code> for string definitions. But the most interesting part is how the system filenames are terminated &#8211; they&#8217;re also&#8230; zero-terminated, as in they ended with the ASCII character <code>'0'<\/code> (with the MSB set).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is again strong evidence that Microsoft didn&#8217;t know strings could also be defined with <code>DB<\/code>, and were forced to use <code>DM<\/code>. Using <code>DM<\/code> creates inefficiencies when comparing strings. For instance, you&#8217;d need to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Strip the MSB of the last character before using <code>REPE CMPSB<\/code>, or<\/li>\n\n\n\n<li>Compare the first 10 characters and handle the last one separately.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Instead, Microsoft&#8217;s workaround was to append an extra character (<code>'0'<\/code>) to the end of the strings, so the MSB would be set on this extra character instead of the actual last character, making comparisons straightforward.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lowercase Filenames<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Another intriguing oddity in the boot sector is that filenames are converted to lowercase before being compared to the hardcoded lowercase filenames. Given that filenames in the root directory could only be uppercase, why would they waste time and space on something seemingly pointless?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s the reason &#8211; initially, system files lived in the reserved system area, similar to CP\/M. But as DOS grew, the reserved area was no longer big enough to hold the entire system. To fix this, the system files were moved from the reserved area into standard disk files. The downside was that users could now easily modify or delete these files, making the disk unbootable. To protect them, the system files were then given lowercase filenames in the root directory, since DOS only recognised uppercase filenames. This hack effectively prevented DOS from touching the system files, and as a result, boot sectors from that time compared filenames against lowercase ones.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, why are the files <code>IBMBIO.COM<\/code> and <code>IBMDOS.COM<\/code> uppercase in PC-DOS 1.00? Good question. In mid-1981, file attributes were introduced to mark files as system and hidden. This prevented users from tampering with the system files while keeping filesystem abuse to a minimum. Once this was in place, the filenames were reverted to uppercase. According to a document from the PC-DOS 1.00 pre-release disk, this change happened shortly before 1981\/06\/05:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">2. The system files have reverted to upper case letters again, but will not be included in any directory searches because of a new byte (attribute) in the directory entry (they won&#8217;t show on a DIR command, and can&#8217;t be erased, copied, folded, spindled or mutilated). The FORMAT ans SYS commands both can be used to put these files on a diskette (if SYS is used, the disk must have already had the system on it). In addition, FORMAT writes the boot record and copies COMMAND.COM.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">PC-DOS 1.00&#8217;s boot sector dates back to 1981\/05\/07, so it makes perfect sense that it used lowercase filenames.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Reversing these binaries was an absolute blast! Along the way, I learned a ton about the 8086 architecture, the IBM PC, and DOS itself. I spent about 10 hours researching the quirks, inconsistencies, and bugs in the code, dove into over 250 pages of documentation (from <a href=\"http:\/\/bitsavers.org\/pdf\/seattleComputer\/86-DOS_0.3_Programmers_Manual_1980.pdf\">86-DOS 0.3 Programmer&#8217;s Manual<\/a>, <a href=\"http:\/\/bitsavers.org\/pdf\/seattleComputer\/Customizing_MS-DOS_1.23_and_Later.pdf\">Customizing MS-DOS 1.23 and Later<\/a>, <a href=\"https:\/\/bitsavers.org\/pdf\/ibm\/pc\/pc\/6025008_PC_Technical_Reference_Aug81.pdf\">IBM 5150 Technical Reference<\/a>, <a href=\"https:\/\/bitsavers.org\/pdf\/ibm\/pc\/dos\/6172220_DOS_1.0_Jan82.pdf\">PC-DOS 1.0 Manual<\/a>, and the <a href=\"http:\/\/www.bitsavers.org\/pdf\/seattleComputer\/Z80_8086_Cross_Assembler_Preliminary.pdf\">Z80\/8086 Cross Assembler Manual<\/a>), examined five different DOS versions (86-DOS 1.00, 86-DOS 1.14, PC-DOS 1.00 pre-release, PC-DOS 1.00, and PC-DOS 1.10), and spent over six hours on the actual reverse engineering.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To put all the stuff I&#8217;ve learned into use, I went and <a href=\"#114-pc-port\">fully ported<\/a> 86-DOS 1.14 to the IBM PC, making sure all features, including <code>INIT<\/code>, <code>RDCPM<\/code> and <code>SYS<\/code> worked as expected. It was quite the journey, and I had a fantastic time diving into the inner workings of these systems. I&#8217;ll be talking about that in the next post.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">See you later!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Notes<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li id=\"114-pc-port\"><a href=\"porting-86-dos-1-14-to-the-ibm-pc\">Porting 86-DOS 1.14 to the IBM PC<\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>I wanted to get familiar with the IBM PC INT 1xH BIOS interrupts and explore how they&#8217;re actually used in practice, all in preparation for a challenge project. Reverse engineering the BIOS of PC-DOS seemed like the perfect exercise &#8211; the DOS BIOS handles all input and output for the DOS kernel and applications, so [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,6],"tags":[],"class_list":["post-22","post","type-post","status-publish","format-standard","hentry","category-dos","category-reverse-engineering"],"_links":{"self":[{"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/posts\/22","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/comments?post=22"}],"version-history":[{"count":44,"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/posts\/22\/revisions"}],"predecessor-version":[{"id":272,"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/posts\/22\/revisions\/272"}],"wp:attachment":[{"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/media?parent=22"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/categories?post=22"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thebrokenpipe.com\/blog\/wp-json\/wp\/v2\/tags?post=22"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}