Text diff looks like "mark differences red and green"—the hard part is: after inserting one line, every line below shifts—how do you know they are unchanged, just offset? Diff's core is not character comparison but finding a minimum-cost edit path from A to B.

What Problem Is Diff Actually Solving?
Diff asks for the shortest edit sequence from text A to text B: the fewest insert and delete operations to turn A into B. Why not compare character by character? Insert a line at row 3 and naive comparison misaligns everything below—each line looks "changed" while a human sees only a downward shift with identical content.
Modeling "minimum edits" lets the algorithm recognize "these lines are unchanged, just pushed down by the insert." That pairs with longest common subsequence (LCS): the longer the LCS of A and B, the fewer edits needed—two views of the same coin.
How Does LCS Locate the Unchanged Parts?
LCS is the longest subsequence appearing in both texts in original order, not necessarily contiguous. It is what A and B share unchanged; A-only parts are deletes, B-only parts are adds. Classic approach: dynamic programming—
- Table
dp[i][j]= LCS length of first i units of A and first j of B; - If units match,
dp[i][j] = dp[i-1][j-1] + 1; else max of cell above and left; - Backtrack the table to recover common vs add/delete regions.
Intuitive and easy to implement—but time and space are O(N×M) for lengths N, M. Large texts blow up the table—that is the main weakness.
Why Is Myers Faster and More Common?
Myers treats diff as shortest path on an edit graph—often much faster than naive DP when differences are small, which is why Git uses it. Key insight: real versions usually differ in few places—difference count D is small. Myers runs in roughly O(ND) (N total length, D edit distance); when D ≪ N, behavior is nearly linear.
| Dimension | LCS dynamic programming | Myers (O(ND)) |
|---|---|---|
| Idea | Fill 2D table for LCS | Shortest edit path on edit graph |
| Time | O(N×M) | ~O(ND), D = diff count |
| Space | O(N×M) (optimizable to O(N)) | Linear-space variants O(N) |
| Small diffs | Still fills full table | Very fast (near linear) |
| Large diffs | Stable but slow | Degrades toward O(N²) |
| Typical use | Teaching, short text | Git, most diff tools |
Practical takeaway: Myers wins when D is small; both slow when diffs dominate.
Line-Level vs Character-Level Diff—How to Choose?
Diff granularity matters—wrong unit hurts readability. Three common levels:
- Line-level: compare whole lines—default for code, config, logs. Clean structure; tells you a line changed, not which character inside.
- Word-level: words or tokens—better for prose; shows which word changed.
- Character-level: finest; long text produces noisy micro-fragments and hurts readability.
Common engineering pattern: layered—line diff for structure, then finer diff on paired "modified" lines.
How Is Inline Highlighting Implemented?
Inline highlight (exact characters changed within one line) is usually two-level diff. Level one: line diff yields delete/add operations; when a delete and add pair nearby with enough similarity, classify as modified line. Level two: character- or word-level diff on that pair to mark changed fragments precisely.
Inline highlight is not separate magic—it is the same algorithm at finer granularity—which is why it costs more: every modified line pair runs diff again.
Limits and Known Boundaries
Diff is not universal. Common boundaries:
- Large diffs degrade: when texts are almost entirely different (D ≈ N), Myers approaches O(N²)—large inputs may stutter;
- No semantic equivalence: only character sequences—rename-with-same-logic or moved code blocks read as delete + add, not "moved unchanged";
- Line endings and whitespace: CRLF vs LF, trailing spaces, tab vs space indent all count as diffs—normalize before compare;
- Granularity vs readability: character diff on long paragraphs fragments output—word or line level often clearer.
A useful diff result depends on granularity matching text type and sensible newline/whitespace normalization—not raw algorithm speed alone.
Summary
Text diff is not "compare every character" but find a shortest edit path: LCS finds shared unchanged content; Myers uses edit-graph shortest path for near-linear behavior when D is small. Line, word, and character levels are granularity choices; inline highlight is line diff then character diff on modified pairs. Understand shortest edit + granularity + normalization and you can judge what diff implementation fits which text and scale.