Artisan Build logo
Start a project

Why Your FFmpeg Drawtext Words Don't Line Up

Len Woodward

PHP's GD library and FFmpeg both use FreeType to render fonts. They produce different pixel measurements for the same text. Nobody warns you about this

Why Your FFmpeg Drawtext Words Don't Line Up

Context

In our last post, we fought FFmpeg's filter graph parser over single-quote escaping in our video subtitle system. That battle was about parsing. This one is about rendering.

Since then, our subtitles evolved from per-sentence overlays to per-word overlays — 257 individual drawtext filters per video, each with its own animated entry, opacity dimming, and precisely computed X/Y coordinates. Per-sentence subtitles could use x=(w-tw)/2 and let FFmpeg center the text. Per-word subtitles can't. We need to measure every word's pixel width in PHP, lay them out into lines, and tell FFmpeg exactly where each one goes.

The measurement tool: PHP's imagettfbbox(). The target renderer: FFmpeg's drawtext. Both use FreeType under the hood. How far off could they be?

35%.


Problem 1: Words That Float

The first render looked like ransom-note typography. Some words sat higher than their neighbors. "a" floated above "the". "on" floated above "big".

The Cause

FFmpeg's drawtext positions text by the top of each word's individual bounding box, not by a shared baseline. At y=100:

  • text=the: the bounding box top (including ascenders t, h) sits at y=100
  • text=a: the bounding box top (no ascenders, ~20px shorter) sits at y=100

Same Y value, different baselines. Words without ascenders appear to float.

The Obvious Fix (That Didn't Work)

Measure each word's ascent with imagettfbbox, compare to a reference ascent, push short words down by the difference:

$fontAscent = $this->measureAscent('Hh', $fontPath, $fontSize);  // 48px
$wordAscent = $this->measureAscent('a', $fontPath, $fontSize);   // 28px
$yOffset = $fontAscent - $wordAscent;                            // 20px

Where measureAscent uses the upper-corner Y values from imagettfbbox:

public function measureAscent(string $text, string $fontPath, int $fontSize): int
{
    $bbox = imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
    return (int) abs(min($bbox[5], $bbox[7]));
}

We applied the 20px offset. Too much — "a" now sat below the baseline of "the".

The Calibration Factor

We rendered a fine-grained offset grid: the word "a" at offsets +12 through +18, each next to "the" at offset 0. The correct value was +16px, not +20px.

16/20 = 0.8.

GD and FFmpeg use different FreeType hinting configurations. Their tight bounding box ascents diverge by ~20%. The fix:

$yOffset = (int) round(max(0, $fontAscent - $wordAscent) * 0.8);

Is 0.8 a magic number? Yes. Does it produce pixel-perfect baselines across 257 words? Also yes.


Problem 2: Words That Drift Apart

With baselines fixed, the horizontal spacing was wrong. Longer words had proportionally larger gaps after them — like someone cranked the word spacing to 150%.

The Rabbit Hole: Advance Width

Our first theory: imagettfbbox returns the visual bounding box width (tight around visible pixels), but text renderers advance the cursor by the advance width (which includes an invisible right-side bearing). We were using the wrong measurement.

We built a way to extract the advance width from GD using a doubling method:

public function measureAdvanceWidth(string $text, string $fontPath, int $fontSize): int
{
    $singleBbox = imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
    $doubleBbox = imagettfbbox((float) $fontSize, 0.0, $fontPath, $text . $text);

    return (int) max($doubleBbox[2], $doubleBbox[4])
         - (int) max($singleBbox[2], $singleBbox[4]);
}

The logic: in "braverybravery", the second "b" starts at exactly the advance width of the first word. Subtract single from double and you get the advance width that imagettfbbox doesn't directly expose.

The advance widths were perfectly consistent within GD. The sum of per-word advance widths matched a single-string measurement with zero error. We committed the fix, ran a render, and... the spacing looked unchanged.

The Actual Problem

We'd been comparing GD measurements against GD measurements. We never checked whether GD's numbers matched what FFmpeg actually rendered. Time for a cross-engine comparison.

We wrote an FFmpeg filter that rendered individual words at our GD-computed positions (yellow) next to the same sentence as a single drawtext string (cyan). The result was immediately damning:

Yellow (GD positions):   and    bravery      to    make     a    new     friend.
Cyan (FFmpeg native):    and bravery to make a new friend.

The GD-positioned line was about 35% wider. Not a subtle drift — a completely different scale.

Measuring FFmpeg's Actual Widths

To quantify it, we rendered each word individually with FFmpeg to a PNG, loaded the output in GD, and scanned for the rightmost non-black pixel:

$process = new Process([
    'ffmpeg', '-y',
    '-f', 'lavfi', '-i', 'color=c=black:s=600x100:d=1',
    '-vf', "drawtext=fontfile=$fontPath:text=$word:fontsize=64:fontcolor=white:x=0:y=0",
    '-frames:v', '1', $outputPng
]);
$process->run();

$img = imagecreatefrompng($outputPng);
for ($x = imagesx($img) - 1; $x >= 0; $x--) {
    for ($y = 0; $y < imagesy($img); $y++) {
        if ((imagecolorat($img, $x, $y) & 0xFFFFFF) > 0) {
            $ffmpegWidth = $x + 1;
            break 2;
        }
    }
}

The ratios were remarkably consistent:

Word GD Width FFmpeg Width Ratio
"and" 141px 106px 0.752
"bravery" 309px 226px 0.731
"to" 81px 61px 0.753
"make" 204px 151px 0.740
"a" 45px 33px 0.733
"new" 158px 116px 0.734
"Allison" 266px 197px 0.741
"the" 126px 95px 0.754
space 21px 16px 0.762

Every measurement: 0.73–0.76. Average: 0.74.

Why 0.74?

GD's imagettfbbox defaults to 96 DPI. FFmpeg's drawtext uses FreeType at (likely) 72 DPI. The ratio 72/96 = 0.75 — close to our measured 0.74, with the small deviation probably coming from hinting and rounding differences.

The point-size is the same. The font file is the same. The FreeType library is the same. But the DPI setting is different, and that scales every pixel measurement proportionally.

The Fix

Apply a configurable scaling factor to all GD width measurements:

$widthScale = (float) config('video.text_width_scale', 0.74);

$spaceWidth = (int) round($rawSpaceWidth * $widthScale);

foreach ($words as $word) {
    $widths[$word]   = (int) round($this->measureText($word, ...) * $widthScale);
    $advances[$word] = (int) round($this->measureAdvanceWidth($word, ...) * $widthScale);
}

After the fix, individual word positions matched FFmpeg's native single-string rendering almost exactly.


The Red Herring Postmortem

The advance width theory was internally correct — visual bounding box width does differ from advance width by 1-3px per word, and using visual widths does cause cumulative drift in GD's coordinate space. But that 11px-over-7-words problem was invisible next to the 35% DPI scaling error.

We spent hours perfecting GD-to-GD measurements before doing the one test that mattered: rendering in the target engine and comparing.

Lesson: When you're computing positions in one engine for use in another, validate against the target engine first. Internal consistency in the source engine tells you nothing about cross-engine accuracy.


Quick Reference

What GD Says FFmpeg Does Fix
Word widths ~96 DPI ~72 DPI Multiply by 0.74
Space widths Inflated for " " alone Natural kerning Measure via bbox("n n") - bbox("nn"), then scale by 0.74
Ascent difference Per GD hinting Per FFmpeg hinting Multiply difference by 0.8
Visual vs advance width max(bbox[2], bbox[4]) vs doubling Native advance Use advance widths for cursor positioning, visual for last word on line

The Code

Three measurement functions and one scaling step. The raw GD functions return unscaled values — the caller applies WIDTH_SCALE to everything horizontal:

private const WIDTH_SCALE = 0.74;

public function measureText(string $text, string $fontPath, int $fontSize): int
{
    $bbox = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
    return $bbox !== false ? (int) max($bbox[2], $bbox[4]) : (int) (strlen($text) * $fontSize * 0.55);
}

public function measureAdvanceWidth(string $text, string $fontPath, int $fontSize): int
{
    $single = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
    $double = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text . $text);
    if ($single !== false && $double !== false) {
        return (int) max($double[2], $double[4]) - (int) max($single[2], $single[4]);
    }
    return (int) (strlen($text) * $fontSize * 0.55);
}

public function measureAscent(string $text, string $fontPath, int $fontSize): int
{
    $bbox = @imagettfbbox((float) $fontSize, 0.0, $fontPath, $text);
    return $bbox !== false ? (int) abs(min($bbox[5], $bbox[7])) : (int) ($fontSize * 0.8);
}

// Usage:
$ffmpegWidth = (int) round($this->measureText($word, $fontPath, $fontSize) * self::WIDTH_SCALE);
$yOffset = (int) round(max(0, $fontAscent - $wordAscent) * 0.8);

TL;DR

  • imagettfbbox uses 96 DPI. FFmpeg drawtext uses ~72 DPI. All GD widths are 35% too large. Scale by 0.74.
  • drawtext positions text by per-word bounding box top, not a shared baseline. Scale GD ascent differences by 0.8.
  • Validate against FFmpeg early. GD-to-GD consistency means nothing if the target engine disagrees.
  • Advance width matters (use the doubling method), but it's a 1-3px refinement — don't mistake it for the main problem.

Let’s talk

Tell us about your product, timeline, and what success looks like. We’ll reply with a concise plan of attack.

  • Calm, predictable cadence
  • Accessible, testable components
  • Transparent reporting & demos

Ready to start the conversation?

You can book a quick intro call or send us an email. No pressure, no forms — just a friendly hello.