For those interested, here are some details on the steps I took to reverse engineer the CLE266 source code (http://www.ivor.it/cle266) in just a week.
You'll need a decent disassembler to work with. I use IDA-Pro (http://www.datarescue.com) which is the best disassembler I've ever seen. If you want to do any reverse engineering or code analysis I strongly recommend you get a copy, get an eval, try it out.
Firstly I needed to get a feel for the code, and work out what was involved. This was a simple matter of disassembling the library and browsing through it a few times.
Doing so it was possible to see the number and length of functions in the library. This made it clear that the library was a fairly thin API layer onto the underlying hardware. There were no particularly complicated functions and most appeared to simply be moving memory from one place to another.
This stage is no longer necessary with later versions of IDA Pro
Ok this was the least rewarding part of the process. There may be a better way to do this, but I didn't know it at the time, if you know what it is, tell me! :-)
The library is shared object (.so) file which contains address relocation information. In the assembly there are jumps/addresses that contain null values which result in dissasembly such as:
push ds:0
loc_2D70: ; Call Procedure
call loc_2D70+1
mov ds:0, 0
Calling objdump -R libddmpeg.so shows us the relocation table for the library in which we can see:
00002d50 R_386_PC32 setbuffer 00002d71 R_386_PC32 fclose 00002d97 R_386_32 gBuf
So we can select the "call loc_2D70+1" and label that "call fclose".
The same applies to the data values "ds:0", again objdump -R libddmpeg.so shows us:
00002d6c R_386_32 gFlog 00002d77 R_386_32 gFlog
So we can now fix the assembly to be:
push gFlog
call fclose
mov gFlog,0
And the operation becomes clear...
The next stage is to recode the assembly into C. Firstly function calls are identified in the assembly and a pseudo C file is written. So let's take a look at one of the nice simple functions:
public MPGCloseDebugFile
MPGCloseDebugFile proc near
push ebp
mov ebp, esp
sub esp, 8 ; Integer Subtraction
cmp gFlog, 0 ; Compare Two Operands
jz short loc_2D7F ; Jump if Zero (ZF=1)
sub esp, 0Ch ; Integer Subtraction
push gFlog
loc_2D70: ; Call Procedure
call fclose
mov gFlog, 0
loc_2D7F: ; CODE XREF: MPGCloseDebugFile+D^Xj
mov esp, ebp
pop ebp
retn ; Return Near from Procedure
MPGCloseDebugFile endp
Which becomes:
MPGCloseDebugFile()
{
cmp gFlog, 0 ; Compare Two Operands
jz short loc_2D7F ; Jump if Zero (ZF=1)
sub esp, 0Ch ; Integer Subtraction
push gFlog
call fclose
mov gFlog, 0
loc_2D7F:
}
Now we can look at the function and easily see what the intention is, and turn it into:
void MPGCloseDebugFile()
{
if (gFlog)
{
fclose(gFlog);
gFlog = 0;
}
}
The remainder of the code is then processed in the same way. This now gives us something half decent to work with. Although at the moment complex branches have been left in the C source so we still have stuff like:
if (ebx <=0) goto loc_36fd goto loc_373b
Working through the code I add in "// CHECK" comments wherever I've
got any doubts about the code.
Luckily the library from VIA comes with pretty comprehensive header files with plenty of structures defined. The next step was getting the parameters to the functions defined correctly and then by tracing the use of the parameters being able to determine the data types for all internal variables.
Also some of the logic could be detemined by decoding binary flag fields. For example:
push gVIAGraphicInfo push 805476C3h push fVideo call ioctl
Can now be decoded into the rather more readable:-
ioctl(fVideo,
_IOR('v', //118 192+3, VIAGRAPHICINFO),//0x805476C3,
&gVIAGraphicInfo )
Now's the time to start knocking the code into shape. With IDA-Pro you can generate flow process flow charts for the more complicated routines that need structuring. I find it easiest to print out the flows on A3 (or multiple A3) pages, lay them out on the floor then follow them through.
Here's an example, this is the chart for VIADisplayControl, as you can see I've been doodling in the bottom right corner what I think the C nesting is.
Ok, one of the easiest steps now. Start compiling and correcting/checking the code where variables need defining, or scope re-organising. I started by "#ifdef 0"-ing the entire code and then adding each procedure in one at a time to work on.
Verifying the code function by function, for example, to make sure pointer arithmetic looks sensible is the next step. Where possible comparing the compiled assembly code for each function with the disassembled library code, allows me to know that my code is very close to the original source code, and I re-order 'if' statements as necessary.
Check the code make sense now, and that the right variables are defined static. At this point some of the pointer arithmetic got fixed and tidied up. For example:
which in step 3 was temporarily coded to:mov eax, lpMPGDevice mov eax, [eax+84h] shr eax, 3 ; Shift Logical Right mov edx, lpMPEGMMIO mov [edx+2Ch], eax
lpMPEGMMIO->0x2C = lpMPGDevice->0x84 >> 3; //CHECK
then in step 6 to:
*(unsigned long*)(lpMPEGMMIO+0x2c) = *(unsigned long*)(lpMPEGDevice+0x84)>>3;
now becomes:
*(unsigned long*)(lpMPEGMMIO+0x2c) = lpMPGDevice->dwMPEGYPhysicalAddr[1] >> 3;
Where the casting was tidied and the PhysicalAddr array subscripts filled in.
Finally, start running the code.
Boom, machine crash.
Fix bugs.
Boom.
A few bugs were spotted and fixed....
and then a garbled picture...
a few more bugs spotted...
then a picture but didn't update until the mouse moved... a few more bugs...
then....
PING. Moving video!
Many thanks to Hugo Mills and Michiel van Noort for spotting my typo's in the source code.
Regards,
Ivor.
[www.ivor.it | www.ivor.org | www.difo.com ]