From CSV to XML to JSON, humans sure love their structured data. Computers like it too. If you think about it, X86 assembly is not much more than a structured data format. So is true for ELF, dwarf, protobuf…
PNGs, JPEGs and even MySQL database files are all structured binary formats. They can get corrupted, store hidden data or you might need to simply patch something inside without pulling heavy tools to work with a particular file format.
To explore a binary file at a glance, we suggest using
hexdump -C
to get classical side by side view of address space on the left, hex representation of bytes in the middle and best-effort to print binary as ASCII on the rightxxd -b
in case you prefer to look at binary representation instead of hexadecimalod -S4
orstrings -n4
to automatically search for 4-byte long ASCII strings or longer
But what about editing?!
Hex editors are less-than-optimal tools
There is no clear “winner” in terms of hex editors. The options are so abundant that most popular hex editors tend to be the worst in terms of stability and being able to handle large files. Besides, hex editors tend to be extremely byte-oriented, so for proper bitwise processing, you have to look elsewhere.
Our choice is wxHexEditor
. Here’s an example of an attempt to unscramble some 7-bit ASCII with it:
As you can see it’s easy to lose track of where we were, especially while trying to edit binary quickly with a hex editor.
That’s why in this post we are providing a step-by-step guide on how to use GNU poke to do binary transformations!
GNU poke, step by step
Step 1: WSL2-compatible Ubuntu LTS 20.04 Installation
Sadly, right now it’s impossible to easily install main branch of poke
without nix
. Gladly, poke
authors have recently released v1:
sudo apt install tcl-dev libgc-dev \
libjson-c-dev libreadline-dev # (1)
wget https://ftp.gnu.org/gnu/poke/poke-1.0.tar.gz
tar xzvf ./poke-1.0.tar.gz
mkdir poke-1.0/build && cd poke-1.0/build
../configure — prefix=”$(pwd)” # (2)
make
make install
(1)
: These are the libraries that were required for no-GUI installation on a reasonably fresh Ubuntu installation.
(2)
: It’s very important to always override default prefix with project directory to not mess up your system. We symlink binaries we want to use to ~/.local/bin
after installing them in the project directory.
Step 2: Prepare output file
Let’s say you need to work with a file called file.in
. If you change something while working in poke
, the changes will be written to file.in
right away, so you should prepare a sufficiently large file.out
:
cat file.in > file.out; cat file.in >> file.out
Step 3: Describe input and output
First check if your file format is already described in standard library, also
known as “pickles”:
$ ls -1 pickles/ | grep pk
argp.pk
bmp.pk
bpf.pk
btf-dump.pk
btf.pk
color.pk
ctf.pk
dwarf-common.pk
dwarf-frame.pk
dwarf-pubnames.pk
dwarf-types.pk
dwarf.pk
elf.pk
id3v1.pk
leb128.pk
mbr.pk
pktest.pk
rgb24.pk
time.pk
ustar.pk
Now, describe the structure of your input and your output (we don’t have a relevant pickle, so we don’t load
anything):
type InAtom = struct {
uint<7> host;
bit guest;
};
type Input = InAtom[];
type Output = struct {
byte[] hosts;
bit[] guests;
};
Step 4: Transform input to output
It is possible to map transformations of data to files straight away, but it’s also possible to do more conventional iterative conversions between Input
and Output
:
fun solve = (Input xs) Output: {
var resultHosts = byte[]();
var resultGuests = bit[]();
var bitsWrote = 0;
var bytesWrote = 0;
for (i in xs) {
if (bitsWrote % 8 == 0) {
resultGuests += [(0 as bit), i.guest];
bitsWrote += 2;
} else {
resultGuests += [i.guest];
bitsWrote += 1;
}
resultHosts += [ (0 as bit):::i.host ];
bytesWrote += 1;
}
return Output {
hosts = resultHosts,
guests = resultGuests
};
};
Step 5: Write output to a file
Now let’s write something like “main” function, that will have the duty of reading file.in
, processing it and producing file.out
:
fun writeSolution = (string basename) bit:
{
var fin = open(basename + ".in", IOS_F_READ | IOS_F_WRITE);
var fout = open(basename + ".out", IOS_F_READ | IOS_F_WRITE);
var input = Input @ fin : 0#B;
var output = solve(input);
printf("INPUT: %v\nOUTPUT: %v\n", input, output);
/*** This doesn't work in 1.0 release for some reason: ***/
/* Output @ fout : 0#B = output; */
/*********************************************************/
byte[] @ fout : 0#B = output.hosts;
bit[] @ fout : (output.hosts'size) = output.guests;
close(fin);
close(fout);
return 0;
};
The interesting bit here is mapping of variables onto IO space “fout”. When the mapping instruction is on the left hand side and a value is on the right hand side, it means that poke
shall unwrap the contents of the value onto the IO space. In case of files it means that it shall write the contents of the value immediately.
Step 6: Run it!
$ cat file.in > file.out && cat file.in >> file.out && ./poke/poke
_____
---' __\_______
______) GNU poke 1.0
__)
__)
---._______)
...For help, type ".help".
Type ".exit" to leave the program.
(poke) .load solve.poke
(poke) writeSolution ("file");
INPUT: [InAtom {host=100U,guest=1U},InAtom {host=111U,guest=1U},InAtom {host=109U,guest=0U},...,InAtom {host=95U,guest=0U},InAtom {host=95U,guest=1U}]
OUTPUT: Output {hosts=[100UB,111UB,...,95UB,95UB],guests=[0U,1U,1U,...,1U,0U,1U]}
(uint<1>) 0
(poke) .file file.out
(poke) dump
76543210 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789ABCDEF
00000000: 646f 6d61 7b77 3472 6d33 3537 5f77 336c doma{w4rm357_w3l
00000010: 636f 6d33 5f5f 5f74 305f 5f5f 7468 3135 com3___t0___th15
00000020: 5f5f 5f62 6c30 677d ef68 e5db 666b 6fbe ___bl0g}.h..fko.
00000030: ee66 d9c7 deda 66be bfbf e860 bfbf bfe9 .f....f....`....
00000040: d163 6bbf bebf .ck...
And you got the secret message!
Now go poke something!
This blog provides sufficient techniques for you to start editing binary data without worrying about your hex editor crashing.
If you want to take a look at a functional programming approach, you’re welcome to read our blog over at doma.dev website, that covers using Erlang as a binary editor to make the same binary transformation.