Notes on Timescale Patches
A couple of people have asked me for copies of my notes and initial (partly broken) server side patches to the T2 netcode to eliminate timescale, and other network manipulation tricks.
I see no reason I shouldn't make them public.
Patch 1 with inline notes:
Patch 2 (based on faulty assumptions that server processing jitter was the main factor in the problems introduced in patch 1):
Both of these patches produce the same result: timescale is ineffective (timescaling players will advance forward in time at the correct rate), however due to ordinary network latency fluctuations, non-timescaling players will experience jitter and slips as their movement commands aren't evenly bucketed into each server simulation timestep. Feel free to load either of those patches onto a server (run the functions containing the calls to memPatch) to experience in more detail what I'm talking about.
Likewise, both of these patches make use of inter-function padding as code caves. These are pretty horrifying to read or write machine code patches, but I thought I could get away with only writing the first patch.
I see no reason I shouldn't make them public.
Patch 1 with inline notes:
// Timescale Exploit Tick Test Fix // Written by Thyth // Version 0.1 -- 2013/03/05 // The Tribes 2 networking model was designed to be fairly robust, and assure server // authority absolutely over clients. This design was largely successful in minimizing // the presence of exploitable networking flaws in the game over a long lifetime. // However, clients could manipulate the state of the "authoritative" server model // in a limited way by delaying or slowing their packet flow. While the protocol is // robust against all other types of manipulation (including speedhacks), this packet // flow delay would trigger freezing of the client controlled objects in the server // simulation that would be propagated to all other players. These freezes interact // poorly with simulation interpolation performed by other players, and would result // in the player "warping" if intentionally manipulated. // Perhaps this is an unintended effect of the decision to freeze the client simulation // in brief network interruptions, or perhaps this is a bug that was masked by the // expected client behavior, and intentional triggers were not anticipated. Regardless, // after review of the Torque source code, the source of the problem was discovered. // The server simulation proceeds at one tick every 32 milliseconds. Connected clients // would transmit their desired control moves in a constant stream every timestep. At // each server tick, these enqueued moves would be applied to the controlled object. // Relevant function from Torque Game Engine (version 2004/08/15) // engine/game/gameProcess.cc // [200] void ProcessList::advanceObjects() // [201] { // [202] PROFILE_START(AdvanceObjects); // [203] // [204] // A little link list shuffling is done here to avoid problems // [205] // with objects being deleted from within the process method. // [206] GameBase list; // [207] GameBase* obj; // [208] list.plLinkBefore(head.mProcessLink.next); // [209] head.plUnlink(); // [210] while ((obj = list.mProcessLink.next) != &list) { // [211] obj->plUnlink(); // [212] obj->plLinkBefore(&head); // [213] // [214] // Each object is either advanced a single tick, or if it's // [215] // being controlled by a client, ticked once for each pending move. // [216] if (obj->mTypeMask & ShapeBaseObjectType) { // [217] // [218] ShapeBase* pSB = static_cast(obj); // [219] GameConnection* con = pSB->getControllingClient(); // [220] // [221] if (con && con->getControlObject() == pSB) { // [222] Move* movePtr; // [223] U32 m, numMoves; // [224] // [225] con->getMoveList(&movePtr, &numMoves); // [226] // [227] for (m = 0; m < numMoves && pSB->getControllingClient() == con; ) // [228] obj->processTick(&movePtr[m++]); // [229] // [230] con->clearMoves(m); // [231] // [232] continue; // [233] } // [234] } // [235] if (obj->mProcessTick) // [236] obj->processTick(0); // [237] } // [238] PROFILE_END(); // [239] } // In the event that the controlling client's move list is empty [225], processTick [228] // will not be called on the controlled object during the server's simulation timestep. // This effectively freezes the controlled object in time with its previous simulation values. // Addressing this problem may be as simple as assuring that each object processes a tick // during every server tick, even if no move imput has been transmitted on time by the client. // Instead of using a client supplied move input, if a NULL or 0 is passed to the processTick // function, the server will substitute a "NullMove", corresponding to completely unset // keyboard/mouse triggers. // If source code was available to Tribes 2, the simplest way to fix this problem would be to // add "if (!numMoves) obj->processTick(0);" at line 231. Unfortunately, with only the binary, // the patch becomes more complex. // The ProcessList::advanceObjects function has not changed significantly in TGE-2004 when // compared to Tribes 2, and a binary version of this function can be found in Tribes2.exe // at address 0x602720. This function is 331 bytes in length, and is followed by 7 bytes of // unused padding (at 0x602859). There is a further 15 bytes of padding at 0x6028a1 following // the function starting at 0x602860. function timetick_fix_advanceObjects() { // size of a connection move list is stored at [ebp+var_10] // con->clearMoves(m); continue; concludes with // 60280b: jmp short loc_60282B // and follows with 3 bytes of (occupied?) pad // obj->processTick(NULL) is 4 instructions from 60281f through 60282b: // 60281f: mov ecx, eax // 602821: mov ebx, [ecx] // 602823: push 0 // 602825: call dword ptr [ebx+0D8h] // However, this is required before, to load obj into eax: mov eax, [ebp+var_2A0] // function patch (2 of 2 used): // 60280b: jmp short loc_602859 -- jump +78 bytes (2 byte instruction) // eb1e -> eb4c // 7 byte pad (6 of 7 used): // 602859: jmp short loc_6028a1 -- jump +72 bytes (2 byte instruction) // 60285b: jmp short loc_60282b -- jump -48 bytes (2 byte instruction) // 60285d: jmp short loc_60281f -- jump -62 bytes (2 byte instruction) // 0000 0000 0000 -> eb46 ebce ebc0 // 15 byte pad (15 of 15 used): // 6028a1: xor eax,eax (2 byte instruction) // 6028a3: cmp eax, [ebp+var_10] (3 byte instruction) // 6028a6: jnz short loc_60285b -- jump -74 bytes (2 byte instruction, jump to loc_60282b, doable directly?) // 6028a8: mov eax, [ebp+var_2A0] (6 byte instruction) // 6028ae: jmp short loc_60285d -- jump -80 bytes (2 byte instruction, proxy jump to loc_60281f) // 0000 000000 0000 000000000000 0000 -> 31c0 3b45f0 75b3 8b8560fdffff ebad // we add the new code to the pad space first, and only then do we patch the jmp location // in the function to activate the fix -- this should prevent crashing the game on partial-update // since the jmp replacement is one instruction (of the same type), it should be close enough to atomic memPatch("6028a1", "31c03b45f075b38b8560fdffffebad"); memPatch("602859", "eb46ebceebc0"); memPatch("60280b", "eb4c"); }
Patch 2 (based on faulty assumptions that server processing jitter was the main factor in the problems introduced in patch 1):
// Version 0.2 -- 2013/03/06 // the goal of the patch to the advanceServerTime function is to set a "first pass" flag // globally so that multiple "catch up" calls to advance objects within a tick can be // distinguished from the first call -- on the first call to advanceObjects, the ordinary // (patched) processing occurs: applying any client moves (or a null move if empty), and then // clearing the move list. subsequent calls must not apply a null move, even if the list is // empty, so this flag is cleared on first pass, and then guards against further tick processing // until the next pass, and the flag is reset function timetick_fix_advanceServerTime() { // 13 byte pad before the function at 0x602343 // unused 4 byte pad in data segment at 0x9e873c => 0x9e871c // function patch (2 of 2 used): // initial: // 6023a2: jmp short loc_6023bd // new: (jump to pad) // 6023a2: jmp short loc_602343 -- jump -95 bytes // eb19 -> eb9f // #0 -- 13 byte pad (11 of 13 used): // 602343: xor ecx,ecx (2 byte instruction) // 602345: inc ecx (1 byte instruction) // 602346: mov [0x9e871c], ecx (6 byte instruction) // 60234c: jmp short loc_6023bd -- jump +113 bytes (2 byte instruction) // 0000 00 000000000000 0000 -> 31c9 41 890d1c879e00 eb6f // ------------------------------------------- // executed mempatches on ProcessList::advanceServerTime // ------------------------------------------- memPatch("602343", "31c941890d1c879e00eb6f"); memPatch("6023a2", "eb9f"); } function timetick_fix_advanceObjects2() { // 7 byte pad at 602859 // 15 byte pad at 6028a1 // 13 byte pad at 6028c3 // 10 byte pad at 6028e6 // 12 byte pad at 602b04 (out of short jump range) // clear guard: // xor eax,eax (2 byte instruction) // mov [0x9e871c], eax (5 byte instruction) // 31c0 a31c879e00 // check guard: // mov eax,[0x9e871c] (5 byte instruction) // test eax,eax (2 byte instruction) // a11c879e00 85c0 // ------------------------------------------- // implementation of the variable guarded NullMove insertion // ------------------------------------------- // function patch (2 of 2 used): // 60280b: jmp short loc_602859 -- jump +78 bytes (2 byte instruction) // eb1e -> eb4c // #1 -- 7 byte pad (6 of 7 used): // 602859: jmp short loc_6028a1 -- jump +72 bytes (2 byte instruction) // 60285b: jmp short loc_60282b -- jump -48 bytes (2 byte instruction) // 60285d: jmp short loc_60281f -- jump -62 bytes (2 byte instruction) // 0000 0000 0000 -> eb46 ebce ebc0 // #2 -- 15 byte pad (9 of 15 used): // 6028a1: xor eax,eax (2 byte instruction) // 6028a3: cmp eax, [ebp+var_10] (3 byte instruction) // 6028a6: jnz short loc_60285b -- jump -74 bytes (2 byte instruction, proxy jump to loc_60282b) // 6028a8: jmp short loc_6028c3 (2 byte instruction, jump to pad #3) // 0000 000000 0000 0000 -> 31c0 3b45f0 75b3 eb19 // #3 -- 13 byte pad (13 of 13 used): // 6028c3: mov eax,[0x9e871c] (5 byte instruction) // 6028c8: test eax,eax (2 byte instruction) // 6028ca: jz short loc_60285b (2 byte instruction, proxy jump to loc_60282b) // 6028cc: jmp short loc_6028e6 (2 byte instruction, jump to pad #4) // 6028ce: jmp short loc_60285d (2 byte instruction, proxy jump to loc_60281f) // 0000000000 0000 0000 0000 0000 -> a11c879e00 85c0 748f eb18 eb8d // #4 -- 10 byte pad (8 of 10 used) // 6028e6: mov eax, [ebp+var_2A0] (6 byte instruction) // 6028ec: jmp short loc_6028ce -- (2 byte instruction, proxy jump to loc_60285d->loc_60281f) // 000000000000 0000 -> 8b8560fdffff ebe0 // ------------------------------------------- // implementation of the guard variable clearing, and end of function // ------------------------------------------- // function patch (5 of 5 used): // 602854: jmp dword 0x602b04 (5 byte instruction) // 5f 5e 5b 5d c3 -> e9ab020000 // #5 -- 12 byte pad (12 of 12 used): // 602b04: xor eax,eax (2 byte instruction) // 602b06: mov [0x9e871c], eax (5 byte instruction) // 602b0b: pop edi (1 byte instruction) // 602b0c: pop esi (1 byte instruction) // 602b0d: pop ebx (1 byte instruction) // 602b0e: pop ebp (1 byte instruction) // 602b0f: retn (1 byte instruction) // 0000 0000000000 00 00 00 00 00 -> 31c0 a31c879e00 5f 5e 5b 5d c3 // ------------------------------------------- // executed mempatches on ProcessList::advanceObjects // ------------------------------------------- // Pads 1, 2, 3, 4, 5 memPatch("602859", "eb46ebceebc0"); memPatch("6028a1", "31c03b45f075b3eb19"); memPatch("6028c3", "a11c879e0085c0748feb18eb8d"); memPatch("6028e6", "8b8560fdffffebe0"); memPatch("602b04", "31c0a31c879e005f5e5b5dc3"); // Function patches 1, 2 memPatch("60280b", "eb4c"); memPatch("602854", "e9ab020000"); }
Both of these patches produce the same result: timescale is ineffective (timescaling players will advance forward in time at the correct rate), however due to ordinary network latency fluctuations, non-timescaling players will experience jitter and slips as their movement commands aren't evenly bucketed into each server simulation timestep. Feel free to load either of those patches onto a server (run the functions containing the calls to memPatch) to experience in more detail what I'm talking about.
Likewise, both of these patches make use of inter-function padding as code caves. These are pretty horrifying to read or write machine code patches, but I thought I could get away with only writing the first patch.
Comments
I got as far as enumerating a large region of free code space (so that the code cave tricks inside the inter-function padding would be unnecessary), and writing the patches that increased the size of the memory allocation for GameConnection objects by 8 bytes. These are actually the only technically challenging aspects required above what was already learned from patch 1.
This listing also contains the disassembly dump from IDA Pro of the relevant Torque network function as manifested in Tribes 2.
The incomplete version of patch 3:
If you'd like to pursue the completion of patch 3, I wish you good luck. There's nothing too hard about if, if you're familiar with x86 assembly programming, and have enough tenacity to write this kind of meticulously boring patch.
http://web.archive.org/web/20060528030757/http://www.garagegames.com/articles/networking1/
Server particulars here;
http://web.archive.org/web/20070109164100/http://www.garagegames.com/articles/networking1/simulationlayer.html
T2 has an enhanced version of the t1 netcode.
Ideal cross country network jitter will be somewhere around 16 ms. T2 has an engine timestep of 32 ms. If an event is not in the queue by the time a step is processed, it will be delayed until the next step. Assuming no additional buffer bloat between network and event processing, you're looking at up to 48 ms of effective jitter just in the client->server messaging leg, and comparable added additional jitter on the return path (or up to 96 ms total). Depending on when your event messages hit the server, you could experience a tenth of a second deviation in where that event will actually be realized in the authoritative simulation and propagated back to you. I suspect it's actually worse than this on the server->client messaging leg, because the server uses a priority queue around simulation update messaging, there might be the possibility for another simulation cycle (~128 ms jitter).
So, I suspect at best, you can expect an event to land predictably only within a 1/8th of a second window in the authoritative simulation. This is probably why hitting people with hitscan laser weapons is so difficult.
You might be able to use patch #2 to quantify the amount of engine processing jitter on a LAN (where ping and latency fluctuations are very small), because non-timescaling players will get "extra" move processing (thus move further) predictably with the number of simulation steps where they are absent a move event. Make sure you run the game using HPET timer mode or lock it to a single CPU core on both ends though, otherwise these effects will be even worse.
The protocol design probably fine when everyone was on dialup with crappy pings, and it is a fairly secure protocol (outside of the issue that led to these experiments in the first place), but it has never worked particularly well around ensuring events land in real time with low variance. A lot of other game engines don't even bother to try to solve this problem intelligently, and instead place utter faith in clients to give them data about the state of the simulation when a particular action was taken -- Unreal Engine to mind as a particularly terrible example from the speedhack/security standpoint, but they do have better latency behavior.
From the tribes networking whitepaper;
"1. Nonguaranteed data is data that is never retransmitted if lost.
2. Guaranteed data is data that must be retransmitted if lost, and delivered to the client in the order it was sent.
3. Most Recent State data is volatile data of which only the latest version is of interest.
4. Guaranteed Quickest data is data that needs to be delivered in the quickest possible manner."
This is figuring in a environment where there is packet loss and/or great delay or where the server or client aren't allowed full bandwidth as coded into the game. Also I've yet to see a t2 server that output 32pps - usualy around 20 max as seen at a given client even when set to 32 in serverprefs, while they do fulfill the 450 packet size if set to do so. Also the server will give a individual client whatever that client asks for as far as packet rates and sizes up to the serverprefs limits of 32/450. I also remember reading somewhere that the server actually calculates gamestate at half the 32ms step, meaning we get 16pps of actual server processing not 32. Maybe that was for some other torque game, dunno. It's a mess but it's the mess we love.