Does a diff algorithm compare character by character?

No. Naive character-by-character comparison fails when inserting one line shifts everything below—every following line looks 'changed.' Mainstream diff models the problem as the shortest edit sequence from text A to B, using longest common subsequence (LCS) or equivalent shortest edit script to locate real adds and deletes—not mechanical position-by-position comparison.

What is the relationship between Myers' algorithm and LCS?

Shortest edit script and longest common subsequence are equivalent problems. Myers (1986) solves via shortest path on an edit graph, roughly O(ND) time (N = length, D = number of differences)—very fast when D is small. Git and most diff tools default to Myers.

How is inline highlighting (which characters changed in a line) implemented?

Usually two-level diff: line-level diff first finds 'modified' lines (paired delete + add); then character- or word-level diff on each paired line highlights the exact changed fragments. Inline highlight is the same algorithm run again at finer granularity.

How Do Text Diff Tools Find Changes? A Look at Diff Algorithms

Text diff looks like "mark differences red and green"—the hard part is: after inserting one line, every line below shifts—how do you know they are unchanged, just offset? Diff's core is not character comparison but finding a minimum-cost edit path from A to B.

Finding the shortest edit path between two texts on an edit graph grid

What Problem Is Diff Actually Solving?

Diff asks for the shortest edit sequence from text A to text B: the fewest insert and delete operations to turn A into B. Why not compare character by character? Insert a line at row 3 and naive comparison misaligns everything below—each line looks "changed" while a human sees only a downward shift with identical content.

Modeling "minimum edits" lets the algorithm recognize "these lines are unchanged, just pushed down by the insert." That pairs with longest common subsequence (LCS): the longer the LCS of A and B, the fewer edits needed—two views of the same coin.

How Does LCS Locate the Unchanged Parts?

LCS is the longest subsequence appearing in both texts in original order, not necessarily contiguous. It is what A and B share unchanged; A-only parts are deletes, B-only parts are adds. Classic approach: dynamic programming—

Table dp[i][j] = LCS length of first i units of A and first j of B;
If units match, dp[i][j] = dp[i-1][j-1] + 1; else max of cell above and left;
Backtrack the table to recover common vs add/delete regions.

Intuitive and easy to implement—but time and space are O(N×M) for lengths N, M. Large texts blow up the table—that is the main weakness.

Why Is Myers Faster and More Common?

Myers treats diff as shortest path on an edit graph—often much faster than naive DP when differences are small, which is why Git uses it. Key insight: real versions usually differ in few places—difference count D is small. Myers runs in roughly O(ND) (N total length, D edit distance); when D ≪ N, behavior is nearly linear.

Dimension	LCS dynamic programming	Myers (O(ND))
Idea	Fill 2D table for LCS	Shortest edit path on edit graph
Time	O(N×M)	~O(ND), D = diff count
Space	O(N×M) (optimizable to O(N))	Linear-space variants O(N)
Small diffs	Still fills full table	Very fast (near linear)
Large diffs	Stable but slow	Degrades toward O(N²)
Typical use	Teaching, short text	Git, most diff tools

Practical takeaway: Myers wins when D is small; both slow when diffs dominate.

Line-Level vs Character-Level Diff—How to Choose?

Diff granularity matters—wrong unit hurts readability. Three common levels:

Line-level: compare whole lines—default for code, config, logs. Clean structure; tells you a line changed, not which character inside.
Word-level: words or tokens—better for prose; shows which word changed.
Character-level: finest; long text produces noisy micro-fragments and hurts readability.

Common engineering pattern: layered—line diff for structure, then finer diff on paired "modified" lines.

How Is Inline Highlighting Implemented?

Inline highlight (exact characters changed within one line) is usually two-level diff. Level one: line diff yields delete/add operations; when a delete and add pair nearby with enough similarity, classify as modified line. Level two: character- or word-level diff on that pair to mark changed fragments precisely.

Inline highlight is not separate magic—it is the same algorithm at finer granularity—which is why it costs more: every modified line pair runs diff again.

Limits and Known Boundaries

Diff is not universal. Common boundaries:

Large diffs degrade: when texts are almost entirely different (D ≈ N), Myers approaches O(N²)—large inputs may stutter;
No semantic equivalence: only character sequences—rename-with-same-logic or moved code blocks read as delete + add, not "moved unchanged";
Line endings and whitespace: CRLF vs LF, trailing spaces, tab vs space indent all count as diffs—normalize before compare;
Granularity vs readability: character diff on long paragraphs fragments output—word or line level often clearer.

A useful diff result depends on granularity matching text type and sensible newline/whitespace normalization—not raw algorithm speed alone.

Summary

Text diff is not "compare every character" but find a shortest edit path: LCS finds shared unchanged content; Myers uses edit-graph shortest path for near-linear behavior when D is small. Line, word, and character levels are granularity choices; inline highlight is line diff then character diff on modified pairs. Understand shortest edit + granularity + normalization and you can judge what diff implementation fits which text and scale.