Build Your Own String Viewer: Step-by-Step Tutorial
A string viewer is a small tool that reads binary or text files and displays sequences of readable characters (strings) alongside their offsets and encoding. This tutorial walks through building a simple cross-platform String Viewer in Python that supports ASCII and UTF‑8 extraction, shows offsets, and offers a basic GUI. We’ll use Python 3.10+, Tkinter for the interface, and standard libraries only.
What you’ll build
- Command-line parser to open files
- String extraction supporting ASCII and UTF‑8
- Display with offsets and extracted strings
- Simple GUI to open files and view results
- Save/export feature for extracted strings
Prerequisites
- Python 3.10 or newer installed
- Basic Python knowledge (file I/O, bytes/str handling)
- Optional: pip to install additional packages (not required here)
1. Project layout
- stringviewer/
- main.py
- viewer.py
- gui.py
- utils.py
- README.md
2. String extraction logic (utils.py)
This module reads a file as bytes and extracts contiguous sequences of printable characters.
python
# utils.py import re from typing import List, Tuple PRINTABLE_ASCII = bytes(range(0x20, 0x7F)) MIN_LEN = 4 def extract_ascii_strings(data: bytes, min_len: int = MIN_LEN) -> List[Tuple[int, bytes]]: pattern = re.compile(rb’[’ + re.escape(PRINTABLE_ASCII) + rb’]{’ + str(min_len).encode() + rb’,}’) return [(m.start(), m.group(0)) for m in pattern.finditer(data)] def extract_utf8_strings(data: bytes, min_len: int = MIN_LEN) -> List[Tuple[int, str]]: results = [] i = 0 n = len(data) while i < n: try: # attempt to decode increasing slices for j in range(i + 1, n + 1): chunk = data[i:j] s = chunk.decode(‘utf-8’) if all(ch.isprintable() or ch.isspace() for ch in s) and len(s) >= min_len: # extend j while still valid UTF-8 printable continue else: break except UnicodeDecodeError: i += 1 continue else: # find the maximal valid run j = i + 1 while j <= n: try: s = data[i:j].decode(‘utf-8’) except UnicodeDecodeError: break if all(ch.isprintable() or ch.isspace() for ch in s): j += 1 else: break s = data[i:j-1].decode(‘utf-8’) if len(s) >= minlen: results.append((i, s)) i = j - 1 else: i += 1 return results
3. Command-line interface (viewer.py)
A simple CLI to load a file and print offsets and strings.
python
# viewer.py import argparse from utils import extract_ascii_strings, extract_utf8_strings def format_hex_offset(offset: int) -> str: return f”0x{offset:08X}“ def main(): p = argparse.ArgumentParser(description=“String Viewer”) p.add_argument(“file”, help=“File to scan”) p.add_argument(”–utf8”, action=“store_true”, help=“Also scan for UTF-8 strings”) p.add_argument(”–min”, type=int, default=4, help=“Minimum string length”) args = p.parse_args() with open(args.file, “rb”) as f: data = f.read() ascii_res = extract_ascii_strings(data, args.min) for off, b in ascii_res: print(f”{format_hex_offset(off)} ASCII {b.decode(‘ascii’, errors=‘replace’)}“) if args.utf8: utf8_res = extract_utf8_strings(data, args.min) for off, s in utf8_res: print(f”{format_hexoffset(off)} UTF-8 {s}“)
4. GUI with Tkinter (gui.py)
A minimal GUI to open files and display results in a scrollable table.
python
# gui.py import tkinter as tk from tkinter import filedialog, ttk, messagebox from viewer import format_hex_offset from utils import extract_ascii_strings, extract_utf8_strings class StringViewerApp(tk.Tk): def init(self): super().init() self.title(“String Viewer”) self.geometry(“800x600”) self._build_ui() def _build_ui(self): toolbar = tk.Frame(self) toolbar.pack(fill=tk.X) open_btn = tk.Button(toolbar, text=“Open File”, command=self.open_file) open_btn.pack(side=tk.LEFT, padx=4, pady=4) self.utf8_var = tk.BooleanVar(value=False) utf8_cb = tk.Checkbutton(toolbar, text=“UTF-8”, variable=self.utf8_var) utf8_cb.pack(side=tk.LEFT, padx=4) self.minlen = tk.IntVar(value=4) tk.Label(toolbar, text=“Min length:”).pack(side=tk.LEFT) tk.Spinbox(toolbar, from=1, to=32, textvariable=self.min_len, width=4).pack(side=tk.LEFT) cols = (“Offset”, “Encoding”, “String”) self.tree = ttk.Treeview(self, columns=cols, show=“headings”) for c in cols: self.tree.heading(c, text=c) self.tree.column(c, anchor=“w”) self.tree.pack(fill=tk.BOTH, expand=True) scrollbar = ttk.Scrollbar(self, orient=“vertical”, command=self.tree.yview) self.tree.configure(yscroll=scrollbar.set) scrollbar.pack(side=tk.RIGHT, fill=tk.Y) def open_file(self): path = filedialog.askopenfilename() if not path: return try: with open(path, “rb”) as f: data = f.read() except Exception as e: messagebox.showerror(“Error”, str(e)) return self.tree.delete(self.tree.get_children()) ascii_res = extract_ascii_strings(data, self.min_len.get()) for off, b in ascii_res: self.tree.insert(””, “end”, values=(format_hex_offset(off), “ASCII”, b.decode(“ascii”, errors=“replace”))) if self.utf8_var.get(): utf8_res = extract_utf8_strings(data, self.min_len.get()) for off, s in utf8_res: self.tree.insert(””, “end”, values=(format_hex_offset(off), “UTF-8”, s)) if name == “main”: StringViewerApp().mainloop()
5. Export feature (append to gui.py)
Add buttons to export results to CSV or plain text.
python
# add to _build_ui after tree creation export_btn = tk.Button(toolbar, text=“Export CSV”, command=self.export_csv) export_btn.pack(side=tk.RIGHT, padx=4) def export_csv(self): path = filedialog.asksaveasfilename(defaultextension=”.csv”, filetypes=[(“CSV”,“.csv”)]) if not path: return with open(path, “w”, encoding=“utf-8”) as out: out.write(“Offset,Encoding,String “) for iid in self.tree.get_children(): off, enc, s = self.tree.item(iid, “values”) out.write(f’{off},{enc},”{s.replace(”“”,”“””)}” ‘)
6. Testing and usage
- Run CLI: python viewer.py sample.bin –utf8 –min 6
- Run GUI: python gui.py
- Test with known text files and binary files (e.g., executables) to verify extraction.
7. Improvements and next steps
- Add hex + ASCII side-by-side view.
- Support additional encodings (UTF-16LE/BE).
- Allow regex filters and highlighting.
- Add background scanning for large files and progress bar.
This gives a working, readable String Viewer you can extend.