Row Segmentation QA: Catch Merged Lines Before They Break Bank Reconciliation

A row-segmentation QA workflow to detect merged, split, and page-break rows before they get into your export files.

May 2, 20268 min read

Row Segmentation QA: Catch Merged Lines Before They Break Bank Reconciliation

When a statement row gets merged, everything downstream starts lying in a very calm voice.

Your CSV still exports. Your XLSX still opens. Your JSON still validates. But the data is wrong because two statement lines got glued together into one row, or one row got split into two. That breaks reconciliation in ways totals-only checks often miss.

Row segmentation QA is the method for detecting that problem before import. It focuses on structure, not just values:

does each row correspond to exactly one transaction line?
are wrapped descriptions getting merged into neighboring rows?
do date/amount/description patterns still look like a statement table, not OCR soup?

This post gives you a practical workflow to catch row merges, line splits, and boundary drift before they reach your ledger.

Why row segmentation is its own problem

People lump row segmentation into OCR quality, but that’s too vague.

A statement can have excellent OCR and still fail row segmentation because of:

wrapped descriptions that cross line boundaries
page breaks that continue a transaction onto the next page
table layouts with thin grid lines or no grid lines
bank-specific formatting where a fee or adjustment block uses a different structure

The key point: a row can be textually readable and still structurally wrong.

That’s why segmentation QA needs its own gates.

What row segmentation QA must guarantee

A safe transaction export should guarantee:

One transaction per row

no accidental merges of adjacent lines
no accidental splits of one transaction into multiple rows

Stable column shape

each row should preserve the same field pattern (date, description, amount, maybe balance)

Boundary consistency

when a statement wraps text or breaks across pages, the row boundary rule must remain consistent

Export parity

the same row structure should be present in CSV, XLSX, and JSON

If row segmentation fails, dedupe, merchant normalization, and sequence QA all get noisier.

A simple segmentation signal set

You don’t need a big model. You need a few strong signals.

Signal 1: date presence

For most transaction rows, a date appears at the start or within a predictable field. If a row suddenly loses the date while keeping a long description, that’s a merge candidate.

Signal 2: amount presence

Every transaction row should usually include an amount. If an export row has two amounts, or a row that looks like text only, investigate.

Signal 3: description length distribution

Healthy descriptions usually live in a normal range.

too short: possible truncation or split row
too long: possible merge or wrapped line absorbed into the row

Signal 4: balance consistency

If the statement includes running balances, a merged row often causes a mismatch in the balance step pattern.

Signal 5: page boundary anomalies

Rows near page breaks are high risk. Any row with unusual continuation patterns should be scored higher.

The segmentation QA workflow

Use this order.

Step 1: capture the statement table shape

Before any row-level logic, detect:

number of visible columns
where date, description, amount, and balance fields usually land
whether the page has a repeated header or repeated summary block

This is the row equivalent of layout fingerprinting.

Step 2: score every candidate row

For each candidate row, compute a basic segmentation score from:

date confidence
amount confidence
description length
field alignment with prior rows

Rows with unusually low confidence are merge/split candidates.

Step 3: group suspect rows into break regions

Don’t investigate row-by-row in isolation.

If one bad row appears, check the rows immediately before and after it. Merges and splits almost always create local neighborhoods of weirdness.

Step 4: classify the failure

Classify as:

merged row
split row
page-break carryover
layout drift for a statement variant

Step 5: repair the smallest block

Fix the affected segment only, then re-run QA.

A practical table of segmentation failure patterns

Pattern	What it looks like	Likely cause	Repair lever
Merged row	one row has unusually long text and maybe two transaction ideas inside it	OCR glued adjacent lines together	re-segment the affected block
Split row	one transaction is spread across two rows, with one row missing amount/date	OCR broke a wrapped line	merge based on adjacency and column continuity
Page break carryover	the last row of a page and the first row of the next page look connected	page continuation not handled	stitch page boundary using continuation rules
Layout drift	same bank, different statement template variant changes row shape	fingerprint mismatch	select the correct layout mapping

This table is the whole game. If you can classify the pattern, you can fix it.

Worked example 1: a merged row that still looks “parseable”

The failure

You have two adjacent transactions:

04/28 Coffee Shop 8.50
04/28 Metro 2.75

OCR merges them into:

04/28 Coffee Shop 8.50 Metro 2.75

This row still looks parseable if your extractor finds a date and a final amount. But it’s wrong.

What QA catches

description length spikes far above normal
there are two merchant ideas in one row
row count is lower than expected for the page

If running balances exist, the row’s balance step also won’t line up with a two-transaction sequence.

Repair

split the row using the same table shape that produced the merge
preserve both amounts as separate transactions
re-run segmentation QA and sequence QA

Worked example 2: wrapped descriptions split into two fake rows

The failure

A transaction description continues onto the next line:

Row 1: 04/29 ONLINE PAYMENT AUTHORIZED
Row 2: BY MERCHANT X 42.00

If the parser treats these as separate transactions, you create a fake transaction.

What QA catches

Row 1 has no amount, only description
Row 2 has amount but no date
the pair shares a continuation pattern

Repair

merge the continuation line back into a single transaction row
keep the date from the first row and the amount from the final amount-bearing line
re-run export parity to confirm CSV/XLSX/JSON all match the same merged record

Boundary rules that reduce false positives

A good segmentation system needs a few hard rules.

Rule 1: one date anchor per transaction row

If a row has no date but depends on a nearby date line, it might be a continuation, not a new transaction.

Rule 2: one amount anchor per transaction row

If a row has an amount and the previous row already had the same logical transaction, that may indicate a merge or split.

Rule 3: continuation text should be explicitly marked

When a line clearly continues a description, keep it as continuation, not as a row.

Rule 4: rows near page breaks get special scrutiny

Continuations often happen there.

These rules are simple, but they stop a lot of garbage from slipping through.

How row segmentation affects other QA gates

Row segmentation failure doesn’t stay in one lane.

It contaminates:

Merchant normalization, because merchants get glued to the wrong transaction
Deduplication, because duplicated fragments can look like separate rows
Sequence QA, because row order and balance steps become inconsistent
Export parity, because formats may split or merge differently

That’s why segmentation QA belongs near the top of the pipeline.

A quick scoring template you can operationalize

Score each row 0–100.

Start with 100 and subtract:

-30 if no date anchor where one is expected
-25 if amount confidence is missing or duplicated
-20 if description length is 2× the statement median
-15 if row occurs at a page boundary and lacks a clear continuation marker
-10 if field alignment differs from nearby rows

Then use buckets:

80–100: likely clean
50–79: inspect nearby rows
< 50: high-risk segmentation break

This is not perfect, but it gives operators a triage path.

FAQ

1) Can segmentation QA replace OCR QA?

No. OCR QA catches character-level issues. Segmentation QA catches row-structure issues. You need both.

2) What if the statement has no running balances?

You can still catch merges/splits using date, amount, description length, and page boundary rules.

3) What if a row is legitimately very long?

That happens. The fix is not to ban long rows, it’s to compare against layout-specific baselines and nearby row shapes.

4) Should dedupe happen before segmentation QA?

No. Segment first, then normalize merchants, then dedupe.

5) What’s the most common real-world mistake?

Treating a wrapped line as a new transaction row. That error causes fake rows and missed rows at the same time.

Bottom line

Row segmentation QA catches the structural mistakes that value checks miss.

If you can detect merged rows, split rows, and page-break carryover early, your exports stop lying about transaction counts. Then reconciliation becomes a math problem again, not a detective job.

Stop retyping bank statements

Convert PDF bank statements to clean CSV, Excel, or JSON in 30 seconds — no signup required to try.

Try ParseMyStatement Free

FAQ