Magic Lantern Firmware Wiki
Register
Advertisement

ARM firmware analysis console

This will contain all my firmware analysis scripts which are now floating around.

Download[]

Git repo [1]:

git clone git://github.com/alexdu/ARM-console.git

Zip:

wget http://github.com/alexdu/ARM-console/zipball/master

Preparing to run[]

Requirements[]

  • Python (I use 2.6 under Linux, but it should run under any major operating system)
  • some Python libraries:
sudo apt-get install python python-dev python-scipy python-tk python-profiler graphviz libpng12-dev 
ipython 0.10. Latest version is not compatible.
sudo apt-get install python-setuptools python-matplotlib
sudo easy_install pydot easygui cheetah ahocorasick profilestats
sympy 0.6.7. Sympy 0.7.1 causes problems when decompiling.
  • arm-elf-gcc in your PATH (see Build instructions/550D for how to do that)
  • IPython version 0.11 and 0.12 are not compatible current ARM-console, downgrade to 0.10.2.
  • at least 4 GB of RAM (or skills to optimize the script).

(This is the reason my scripts run 10-100 times faster than in IDAPython: because I've cached lots of stuff in Python dictionaries.)

Step by step setting up on OS X 10.6.7 (by coutts). *** Confirmed working on 10.6 & 10.7 ***

Input and source files[]

Prepare a working directory where you will put the input files. You will need:

  • Some dumps, with the .bin extension. Include the load address in the dump name.
  • Some databases, in IDC or Stubs (*.S) format. Try to give them names similar to the dumps, to help the autodetection.
  • Unzip the scripts in the same folder

Example of contents of the working folder:

scripts <dir>
main.py
README.md

autoexec.0x8A000.bin                       [2]
5d2.204.0xff810000.bin
550d.108.0xff010000.bin
550d.109.0xff010000.bin

5d2.204.AJ.idc
550d.108.20101116_indy_ROM0.idc

autoexec.S
stubs-5d2.204.S                            [3]
stubs-550d.108.S                           [4]

Running in interactive mode[]

Start the program with:

python main.py

and you should get this prompt:

ARM firmware analysis console ready.
In [1]:

This is the IPython prompt; here you can browse the dumps, find/verify matches between firmware versions, and lots of other cool stuff.

If you are new to IPython, be sure to skim this tutorial:

http://ipython.scipy.org/doc/nightly/html/interactive/tutorial.html

Hex numbers[]

Python uses decimal format by default. If you know how to change it to hex for integers, please leave a message. Until then, you'll have to use these:

In [1]: hex(100)
Out[1]: 64

In [2]: hex(-1)
Out[2]: FFFFFFFF

In [3]: int("babe", 16)
Out[3]: 47806

Loading the dumps[]

You can select the dumps to load with a regex:

In [4]: D = load_dumps("(108|204|autoexec)")
===============================================================================
           Binary dump (*.bin)     LoadAddr     IDC database (*.idc)    
===============================================================================
       550d.108.0xff010000.bin     FF010000     550d.108.20101116_indy_ROM0.idc
          autoexec.0x8A000.bin        8A000     n/a
        5d2.204.0xff810000.bin     FF810000     5d2.204.AJ.idc
===============================================================================
...

In [5]: D
Out[5]: 
[Dump of 550d.108.0xff010000.bin,
 Dump of 5d2.204.0xff810000.bin,
 Dump of autoexec.0x8A000.bin]

You will want to assign a short name for each dump. Hint: they are sorted after the bin's file name.

In [6]: t2i, mk2, ml = D

Or,, If you load only single Binaly file.

In [6]: ml = D[0]

The script will auto-detect IDC files with similar filenames, and load some info from them.

You'll have to load stubs (*.S) files manually:

In [7]: ml.load_names("stubs-550d.108.S")
Found 80 stubs in stubs-550d.108.S.

In [8]: ml.load_names("autoexec.S")
Found 8 stubs in autoexec.S.

Automatic function guess[]

This will try to find function calls and identify functions inside the firmware. Experimental, but should be harmless. I prefer to run it before generating the HTML.

In [9]: guessfunc.run(ml)

Browsing the firmware: HTML[]

Run this to create a browseable HTML like this example:

In [10]: html.quick(ml)

In [11]: html.quick(t2i)

and when it's ready, open index.html in a webkit-based browser (firefox is too slow, sorry!)

If you want a more thorough analysis of the firmware, like this one, run:

In [12]: html.full(ml)

A full analysis of ML firmware takes 1-2 minutes. The same analyses for the 550D firmware takes around 1 day, or less if you help me optimize the algorithms :)

If you can leave the computer on for a week, just run this to analyze all your dumps:

In [13]: html.full(D)


!!! THE HTML FILES WILL CONTAIN CANON COPYRIGHTED MATERIAL !!!

!!! DO NOT SHARE THEM WITH ANYONE !!!

(see the FAQ for details: [5])

Of course, if you disassemble the Magic Lantern firmware (autoexec.bin), no Canon code will be in the output files.

Browsing the firmware: plain text[]

If you prefer to browse the disassembly in your favorite text editor, just export the disassembly to a file:

In [14]: t2i.save_disasm("550d.108.dis")
Saving disassembly to 550d.108.dis...

The format is somewhat similar to the one obtained with disassemble.pl from CHDK (it uses objcopy/objdump).

Main advantage over HTML: easy full-text search.

Browsing the firmware: IPython console[]

First, select a dump:

In [15]: sel t2i

For quick browsing, use the g magic command, which works somewhat like the G key in IDA:

In [16]: g 0xff053490+40
ff0534b8:	e3a05000 	mov	r5, #0	; 0x0
ff0534bc:	e5940058 	ldr	r0, [r4, #88]
ff0534c0:	e3a01000 	mov	r1, #0	; 0x0
ff0534c4:	eb005a69 	bl	@TakeSemaphore	
ff0534c8:	e3100001 	tst	r0, #1	; 0x1
ff0534cc:	159f2108 	ldrne	r2, [pc, #264]	; 0xff0535dc: pointer to 0x4ce
ff0534d0:	128f10dc 	addne	r1, pc, #220	; *'SoundDevice\\SoundDevice_CODEC.c'
ff0534d4:	128f0f41 	addne	r0, pc, #260	; *'!IS_ERROR( TakeSemaphore( m_hSemTask, FOREVER ))'
ff0534d8:	1bff00cf 	blne	@assert_0	
ff0534dc:	e1d400d0 	ldrsb	r0, [r4]

In [17]: g DebugMsg
// Start of function: DebugMsg
NSTUB(DebugMsg, ff0673ec):
ff0673ec:	e92d000f 	push	{r0, r1, r2, r3}
ff0673f0:	e92d41f0 	push	{r4, r5, r6, r7, r8, lr}
ff0673f4:	e59f812c 	ldr	r8, [pc, #300]	; 0xff067528: pointer to 0x2b6c
ff0673f8:	e1a04001 	mov	r4, r1
ff0673fc:	e5981000 	ldr	r1, [r8]
ff067400:	e24dd088 	sub	sp, sp, #136	; 0x88
ff067404:	e3510000 	cmp	r1, #0	; 0x0
ff067408:	135000ff 	cmpne	r0, #255	; 0xff
ff06740c:	1591200c 	ldrne	r2, [r1, #12]
ff067410:	13520000 	cmpne	r2, #0	; 0x0

You may search for strings using a regex:

In [18]: s purple
finding strings...
ff56c718: 'purple'
ff56c5bc: 'mediumpurple'

In [19]: s (mvr|set).*filter
ff541dd0: '***** DlgMnPictureStyleDetail.c SetDataToStorage IDC_DPM_FILTER(%d)'
ff541ee0: '***** DlgMnPictureUserDetail.c SetDataToStorage IDC_DPM_FILTER(%d)'
ff064cbc: 'SetFilterRec'
ff064f98: 'SetFilterOff'
ff1aa558: 'mvrSetDeblockingFilter (alpha = %d, beta = %d)'
ff1aab24: 'mvrSetDefDBFilter (A = %d, B = %d)'
ff1aabd4: 'mvrSetDeblockingFilter'
ff1aacd8: 'mvrSetDefDBFilter'

Or search for references to some names / values:

In [20]: r additional_version
GUI_GetFirmVersion+24:
ff20cbc8:	e59f019c 	ldr	r0, [pc, #412]	; 0xff20cd6c: pointer to 0x15094 (additional_version)
0x15094 (additional_version)

GUI_GetFirmVersion+72:
ff20cbf8:	e59f116c 	ldr	r1, [pc, #364]	; 0xff20cd6c: pointer to 0x15094 (additional_version)
0x15094 (additional_version)

sub_FF1FBA24+14344:
ff1ff22c:	159f0118 	ldrne	r0, [pc, #280]	; 0xff1ff34c: pointer to 0x15094 (additional_version)
0x15094 (additional_version)

In [21]: r 0x1234
sdSetRelativeAddress+28:
ff3f229c:	e59f01f0 	ldr	r0, [pc, #496]	; 0xff3f2494: pointer to 0x1234
0x1234

From now on, TAB completion and quick help are your friends:

t2i.<TAB>
funcs           refs            strings            strrefs     ...etc...
In [22]: t2i?
Base Class:	scripts.disasm.Dump
Docstring:
    Contains all the info about a dump: ROM contents, disassembly, references, function names...
...

Most of functions which output lots of text (like disasm, strings, refs) can display their output in a codebox (from easygui). To enable that, just pass gui=1 as the last argument:

In [23]: t2i.refs("sounddev", gui=1)
Refs-sounddev

If you want the gui boxes enabled by default, edit disasm.py (you'll find the setting there).

Annotating addresses in the firmware[]

You can use some functions whose names are inspired from IDAPython / IDC:

In [24]: t2i.MakeName(0xFF06AFC0, "MEM_GetSizeOfMaxRegion")

To delete a name, just say None (or empty string "") instead of name:

In [25]: t2i.MakeName(0x4, None)
Deleting name 4 -> GUI_GetMWBCaption

To create a function, you can specify the start address and let it guess the end address:

In [26]: t2i.MakeFunction(0xFF06A0F4)
Size: 72

Of course, you can specify both the start and end addresses:

In [27]: t2i.MakeFunction(0xFF06A0F4, FF06A138)

Right now, things may go wrong if you try to remove a function or to change an existing one, so... don't!

After you annotate some addresses in the firmware, you may want to see the new names in the HTML version. Just run:

In [28]: html.update(t2i)

It will (try to) update only the files which reference the newly annotated addresses.

Loading and saving names[]

If you want to load some names from another file, other than the auto-guessed ones, use this:

In [29]: t2i.load_names("stubs-550d.108.S")
Found 80 stubs in stubs-550d.108.S.
Overwriting name prop_cleanup
Overwriting name free
...

You can also pass an IDC file (it's autodetected).

If you want to export your names, use:

In [30]: t2i.save_names("mynames.S")
Saved 56800 names out of 56800.

In [31]: t2i.save_names("mynames.idc")
Saved 56800 names out of 56800.
Deleted 1 names.

What if you want to export only your changes? No problem:

In [32]: t2i.save_new_names("mychanges.S")
Saved 1 names out of 56800.

In [33]: t2i.save_new_names("mychanges.idc")
Saved 1 names out of 56800.
Deleted 1 names.
In [34]: cat mychanges.S
#include <idc.idc>
static main() {
  MakeName(0xFF06AFC0, MEM_GetSizeOfMaxRegion)
  MakeName(0x4, "")
}
In [35]: cat mychanges.idc
NSTUB(0xFF06AFC0, MEM_GetSizeOfMaxRegion)

It will save only the names which were not loaded from a file. Deleted names are only saved in IDC format.

Functions can't be exported yet, so for now it's better to use IDA for this. The demo version of IDA can import/export IDC files.

Matching functions and addresses between different firmware versions[]

See GPL Tools/match.py.

NumPy'ing the firmware[]

If you like the idea of doing numerical analysis on camera's firmware, then this may be for you.

If you already know Matlab or Octave, take a look here: http://www.scipy.org/NumPy_for_Matlab_Users

Let's try a histogram of the values referenced in the code:

In [36]: r = array([a[1] for a in t2i.REFLIST])

In [37]: hist(r, 100)

In [38]: show()
Hist-refs

There are two big peaks, and we can't see what's besides them. Let's try a log hist:

In [39]: cla()

In [40]: hist(r, 100, log=1)
Hist-refs-log

Let's zoom in a bit:

In [41]: slice = r[(r>1000) & (r < 10000)]

In [42]: cla()

In [43]: hist(slice, 100)
Hist-smallrefs

There are some peaks: they seem to be at 1024, 2048, 4096 and 8192 (since those are round numbers). Let's look at them:

In [44]: b = bincount(slice.astype(int32))

In [45]: o = argsort(-b)

In [46]: o[:10]
Out[46]: array([2048, 1024, 8192, 4096, 6464, 2112, 1104, 2080, 1280, 1776])

The next peak after those round numbers is 6464=0x1940. What could this be?

In [47]: t2i.refs(6464, gui=1)
Refs-0x1940

Want to see more lines before and after each reference?

In [48]: t2i.refs(6464, context=5, gui=1)
Refs-0x1940-context

So if you can figure out from this what 0x1940 is, you are a genius!

Running in non-interactive mode[]

Don't like the interactive mode? Start from "main.py" and create your own scripts. For example:

from scripts import *
D = load_dumps()
print D

Save it as myscript.py and run it like a normal Python script:

/home/user$ python myscript.py

Hint: when debugging, try to test your script with a smaller dump, like autoexec.bin.

API Reference[]

Don't miss this if you really want to use the script :)

What not to do[]

  • Do not publish files which contain copyrighted code! (from Canon or from any other third party). If you do, you'll cause lots of trouble to the Magic Lantern community.
  • Do not load too many dumps at once! The script is VERY memory hungry, and IT CAN CRASH LINUX IN SECONDS!!! If the system starts swapping, you'll have to reboot your machine! Or disable the swap (like I did), and instead, the script (or other memory-hungry program) will be killed when it asks for too much memory.
  • Do not change the working directory! The scripts use relative paths and won't find their required files.

Enjoy!

--Alexdu 18:48, December 1, 2010 (UTC)

Advertisement