Context
In our last post, we fought FFmpeg's filter graph parser over single-quote escaping in our video subtitle system. That battle was about parsing. This one is about rendering.
Since then, our subtitles evolved from per-sentence overlays to per-word overlays — 257 individual drawtext filters per video, each with its own animated entry, opacity dimming, and precisely computed X/Y coordinates. Per-sentence subtitles could use x=(w-tw)/2 and let FFmpeg center the text. Per-word subtitles can't. We need to measure every word's pixel width in PHP, lay them out into lines, and tell FFmpeg exactly where each one goes.
The measurement tool: PHP's imagettfbbox(). The target renderer: FFmpeg's drawtext. Both use FreeType under the hood. How far off could they be?
35%.
Problem 1: Words That Float
The first render looked like ransom-note typography. Some words sat higher than their neighbors. "a" floated above "the". "on" floated above "big".
The Cause
FFmpeg's drawtext positions text by the top of each word's individual bounding box, not by a shared baseline. At y=100:
text=the: the bounding box top (including ascenderst,h) sits at y=100text=a: the bounding box top (no ascenders, ~20px shorter) sits at y=100
Same Y value, different baselines. Words without ascenders appear to float.
The Obvious Fix (That Didn't Work)
Measure each word's ascent with imagettfbbox, compare to a reference ascent, push short words down by the difference:
$fontAscent = $this->measureAscent('Hh', $fontPath, $fontSize); // 48px
$wordAscent = $this->measureAscent('a', $fontPath, $fontSize); // 28px
$yOffset = $fontAscent - $wordAscent; // 20px
Where measureAscent uses the upper-corner Y values from imagettfbbox:
public function measureAscent(string $text, string $fontPath, int $fontSize): int
{
$bbox = imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
return (int) abs(min($bbox[5], $bbox[7]));
}
We applied the 20px offset. Too much — "a" now sat below the baseline of "the".
The Calibration Factor
We rendered a fine-grained offset grid: the word "a" at offsets +12 through +18, each next to "the" at offset 0. The correct value was +16px, not +20px.
16/20 = 0.8.
GD and FFmpeg use different FreeType hinting configurations. Their tight bounding box ascents diverge by ~20%. The fix:
$yOffset = (int) round(max(0, $fontAscent - $wordAscent) * 0.8);
Is 0.8 a magic number? Yes. Does it produce pixel-perfect baselines across 257 words? Also yes.
Problem 2: Words That Drift Apart
With baselines fixed, the horizontal spacing was wrong. Longer words had proportionally larger gaps after them — like someone cranked the word spacing to 150%.
The Rabbit Hole: Advance Width
Our first theory: imagettfbbox returns the visual bounding box width (tight around visible pixels), but text renderers advance the cursor by the advance width (which includes an invisible right-side bearing). We were using the wrong measurement.
We built a way to extract the advance width from GD using a doubling method:
public function measureAdvanceWidth(string $text, string $fontPath, int $fontSize): int
{
$singleBbox = imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
$doubleBbox = imagettfbbox((float) $fontSize, 0.0, $fontPath, $text . $text);
return (int) max($doubleBbox[2], $doubleBbox[4])
- (int) max($singleBbox[2], $singleBbox[4]);
}
The logic: in "braverybravery", the second "b" starts at exactly the advance width of the first word. Subtract single from double and you get the advance width that imagettfbbox doesn't directly expose.
The advance widths were perfectly consistent within GD. The sum of per-word advance widths matched a single-string measurement with zero error. We committed the fix, ran a render, and... the spacing looked unchanged.
The Actual Problem
We'd been comparing GD measurements against GD measurements. We never checked whether GD's numbers matched what FFmpeg actually rendered. Time for a cross-engine comparison.
We wrote an FFmpeg filter that rendered individual words at our GD-computed positions (yellow) next to the same sentence as a single drawtext string (cyan). The result was immediately damning:
Yellow (GD positions): and bravery to make a new friend.
Cyan (FFmpeg native): and bravery to make a new friend.
The GD-positioned line was about 35% wider. Not a subtle drift — a completely different scale.
Measuring FFmpeg's Actual Widths
To quantify it, we rendered each word individually with FFmpeg to a PNG, loaded the output in GD, and scanned for the rightmost non-black pixel:
$process = new Process([
'ffmpeg', '-y',
'-f', 'lavfi', '-i', 'color=c=black:s=600x100:d=1',
'-vf', "drawtext=fontfile=$fontPath:text=$word:fontsize=64:fontcolor=white:x=0:y=0",
'-frames:v', '1', $outputPng
]);
$process->run();
$img = imagecreatefrompng($outputPng);
for ($x = imagesx($img) - 1; $x >= 0; $x--) {
for ($y = 0; $y < imagesy($img); $y++) {
if ((imagecolorat($img, $x, $y) & 0xFFFFFF) > 0) {
$ffmpegWidth = $x + 1;
break 2;
}
}
}
The ratios were remarkably consistent:
| Word | GD Width | FFmpeg Width | Ratio |
|---|---|---|---|
| "and" | 141px | 106px | 0.752 |
| "bravery" | 309px | 226px | 0.731 |
| "to" | 81px | 61px | 0.753 |
| "make" | 204px | 151px | 0.740 |
| "a" | 45px | 33px | 0.733 |
| "new" | 158px | 116px | 0.734 |
| "Allison" | 266px | 197px | 0.741 |
| "the" | 126px | 95px | 0.754 |
| space | 21px | 16px | 0.762 |
Every measurement: 0.73–0.76. Average: 0.74.
Why 0.74?
GD's imagettfbbox defaults to 96 DPI. FFmpeg's drawtext uses FreeType at (likely) 72 DPI. The ratio 72/96 = 0.75 — close to our measured 0.74, with the small deviation probably coming from hinting and rounding differences.
The point-size is the same. The font file is the same. The FreeType library is the same. But the DPI setting is different, and that scales every pixel measurement proportionally.
The Fix
Apply a configurable scaling factor to all GD width measurements:
$widthScale = (float) config('video.text_width_scale', 0.74);
$spaceWidth = (int) round($rawSpaceWidth * $widthScale);
foreach ($words as $word) {
$widths[$word] = (int) round($this->measureText($word, ...) * $widthScale);
$advances[$word] = (int) round($this->measureAdvanceWidth($word, ...) * $widthScale);
}
After the fix, individual word positions matched FFmpeg's native single-string rendering almost exactly.
The Red Herring Postmortem
The advance width theory was internally correct — visual bounding box width does differ from advance width by 1-3px per word, and using visual widths does cause cumulative drift in GD's coordinate space. But that 11px-over-7-words problem was invisible next to the 35% DPI scaling error.
We spent hours perfecting GD-to-GD measurements before doing the one test that mattered: rendering in the target engine and comparing.
Lesson: When you're computing positions in one engine for use in another, validate against the target engine first. Internal consistency in the source engine tells you nothing about cross-engine accuracy.
Quick Reference
| What | GD Says | FFmpeg Does | Fix |
|---|---|---|---|
| Word widths | ~96 DPI | ~72 DPI | Multiply by 0.74 |
| Space widths | Inflated for " " alone |
Natural kerning | Measure via bbox("n n") - bbox("nn"), then scale by 0.74 |
| Ascent difference | Per GD hinting | Per FFmpeg hinting | Multiply difference by 0.8 |
| Visual vs advance width | max(bbox[2], bbox[4]) vs doubling |
Native advance | Use advance widths for cursor positioning, visual for last word on line |
The Code
Three measurement functions and one scaling step. The raw GD functions return unscaled values — the caller applies WIDTH_SCALE to everything horizontal:
private const WIDTH_SCALE = 0.74;
public function measureText(string $text, string $fontPath, int $fontSize): int
{
$bbox = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
return $bbox !== false ? (int) max($bbox[2], $bbox[4]) : (int) (strlen($text) * $fontSize * 0.55);
}
public function measureAdvanceWidth(string $text, string $fontPath, int $fontSize): int
{
$single = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
$double = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text . $text);
if ($single !== false && $double !== false) {
return (int) max($double[2], $double[4]) - (int) max($single[2], $single[4]);
}
return (int) (strlen($text) * $fontSize * 0.55);
}
public function measureAscent(string $text, string $fontPath, int $fontSize): int
{
$bbox = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
return $bbox !== false ? (int) abs(min($bbox[5], $bbox[7])) : (int) ($fontSize * 0.8);
}
// Usage:
$ffmpegWidth = (int) round($this->measureText($word, $fontPath, $fontSize) * self::WIDTH_SCALE);
$yOffset = (int) round(max(0, $fontAscent - $wordAscent) * 0.8);
TL;DR
imagettfbboxuses 96 DPI. FFmpegdrawtextuses ~72 DPI. All GD widths are 35% too large. Scale by 0.74.drawtextpositions text by per-word bounding box top, not a shared baseline. Scale GD ascent differences by 0.8.- Validate against FFmpeg early. GD-to-GD consistency means nothing if the target engine disagrees.
- Advance width matters (use the doubling method), but it's a 1-3px refinement — don't mistake it for the main problem.