Row Segmentation QA: Catch Merged Lines Before They Break Bank Reconciliation

A row-segmentation QA workflow to detect merged, split, and page-break rows before they get into your export files.

May 2, 20268 min read

Row Segmentation QA: Catch Merged Lines Before They Break Bank Reconciliation

When a statement row gets merged, everything downstream starts lying in a very calm voice.

Your CSV still exports. Your XLSX still opens. Your JSON still validates. But the data is wrong because two statement lines got glued together into one row, or one row got split into two. That breaks reconciliation in ways totals-only checks often miss.

Row segmentation QA is the method for detecting that problem before import. It focuses on structure, not just values:

  • does each row correspond to exactly one transaction line?
  • are wrapped descriptions getting merged into neighboring rows?
  • do date/amount/description patterns still look like a statement table, not OCR soup?

This post gives you a practical workflow to catch row merges, line splits, and boundary drift before they reach your ledger.


Why row segmentation is its own problem

People lump row segmentation into OCR quality, but that’s too vague.

A statement can have excellent OCR and still fail row segmentation because of:

  • wrapped descriptions that cross line boundaries
  • page breaks that continue a transaction onto the next page
  • table layouts with thin grid lines or no grid lines
  • bank-specific formatting where a fee or adjustment block uses a different structure

The key point: a row can be textually readable and still structurally wrong.

That’s why segmentation QA needs its own gates.


What row segmentation QA must guarantee

A safe transaction export should guarantee:

  1. One transaction per row
  • no accidental merges of adjacent lines
  • no accidental splits of one transaction into multiple rows
  1. Stable column shape
  • each row should preserve the same field pattern (date, description, amount, maybe balance)
  1. Boundary consistency
  • when a statement wraps text or breaks across pages, the row boundary rule must remain consistent
  1. Export parity
  • the same row structure should be present in CSV, XLSX, and JSON

If row segmentation fails, dedupe, merchant normalization, and sequence QA all get noisier.


A simple segmentation signal set

You don’t need a big model. You need a few strong signals.

Signal 1: date presence

For most transaction rows, a date appears at the start or within a predictable field. If a row suddenly loses the date while keeping a long description, that’s a merge candidate.

Signal 2: amount presence

Every transaction row should usually include an amount. If an export row has two amounts, or a row that looks like text only, investigate.

Signal 3: description length distribution

Healthy descriptions usually live in a normal range.

  • too short: possible truncation or split row
  • too long: possible merge or wrapped line absorbed into the row

Signal 4: balance consistency

If the statement includes running balances, a merged row often causes a mismatch in the balance step pattern.

Signal 5: page boundary anomalies

Rows near page breaks are high risk. Any row with unusual continuation patterns should be scored higher.


The segmentation QA workflow

Use this order.

Step 1: capture the statement table shape

Before any row-level logic, detect:

  • number of visible columns
  • where date, description, amount, and balance fields usually land
  • whether the page has a repeated header or repeated summary block

This is the row equivalent of layout fingerprinting.

Step 2: score every candidate row

For each candidate row, compute a basic segmentation score from:

  • date confidence
  • amount confidence
  • description length
  • field alignment with prior rows

Rows with unusually low confidence are merge/split candidates.

Step 3: group suspect rows into break regions

Don’t investigate row-by-row in isolation.

If one bad row appears, check the rows immediately before and after it. Merges and splits almost always create local neighborhoods of weirdness.

Step 4: classify the failure

Classify as:

  • merged row
  • split row
  • page-break carryover
  • layout drift for a statement variant

Step 5: repair the smallest block

Fix the affected segment only, then re-run QA.


A practical table of segmentation failure patterns

PatternWhat it looks likeLikely causeRepair lever
Merged rowone row has unusually long text and maybe two transaction ideas inside itOCR glued adjacent lines togetherre-segment the affected block
Split rowone transaction is spread across two rows, with one row missing amount/dateOCR broke a wrapped linemerge based on adjacency and column continuity
Page break carryoverthe last row of a page and the first row of the next page look connectedpage continuation not handledstitch page boundary using continuation rules
Layout driftsame bank, different statement template variant changes row shapefingerprint mismatchselect the correct layout mapping

This table is the whole game. If you can classify the pattern, you can fix it.


Worked example 1: a merged row that still looks “parseable”

The failure

You have two adjacent transactions:

  • 04/28 Coffee Shop 8.50
  • 04/28 Metro 2.75

OCR merges them into:

  • 04/28 Coffee Shop 8.50 Metro 2.75

This row still looks parseable if your extractor finds a date and a final amount. But it’s wrong.

What QA catches

  • description length spikes far above normal
  • there are two merchant ideas in one row
  • row count is lower than expected for the page

If running balances exist, the row’s balance step also won’t line up with a two-transaction sequence.

Repair

  • split the row using the same table shape that produced the merge
  • preserve both amounts as separate transactions
  • re-run segmentation QA and sequence QA

Worked example 2: wrapped descriptions split into two fake rows

The failure

A transaction description continues onto the next line:

  • Row 1: 04/29 ONLINE PAYMENT AUTHORIZED
  • Row 2: BY MERCHANT X 42.00

If the parser treats these as separate transactions, you create a fake transaction.

What QA catches

  • Row 1 has no amount, only description
  • Row 2 has amount but no date
  • the pair shares a continuation pattern

Repair

  • merge the continuation line back into a single transaction row
  • keep the date from the first row and the amount from the final amount-bearing line
  • re-run export parity to confirm CSV/XLSX/JSON all match the same merged record

Boundary rules that reduce false positives

A good segmentation system needs a few hard rules.

Rule 1: one date anchor per transaction row

If a row has no date but depends on a nearby date line, it might be a continuation, not a new transaction.

Rule 2: one amount anchor per transaction row

If a row has an amount and the previous row already had the same logical transaction, that may indicate a merge or split.

Rule 3: continuation text should be explicitly marked

When a line clearly continues a description, keep it as continuation, not as a row.

Rule 4: rows near page breaks get special scrutiny

Continuations often happen there.

These rules are simple, but they stop a lot of garbage from slipping through.


How row segmentation affects other QA gates

Row segmentation failure doesn’t stay in one lane.

It contaminates:

  • Merchant normalization, because merchants get glued to the wrong transaction
  • Deduplication, because duplicated fragments can look like separate rows
  • Sequence QA, because row order and balance steps become inconsistent
  • Export parity, because formats may split or merge differently

That’s why segmentation QA belongs near the top of the pipeline.

Related reading:


A quick scoring template you can operationalize

Score each row 0–100.

Start with 100 and subtract:

  • -30 if no date anchor where one is expected
  • -25 if amount confidence is missing or duplicated
  • -20 if description length is 2× the statement median
  • -15 if row occurs at a page boundary and lacks a clear continuation marker
  • -10 if field alignment differs from nearby rows

Then use buckets:

  • 80–100: likely clean
  • 50–79: inspect nearby rows
  • < 50: high-risk segmentation break

This is not perfect, but it gives operators a triage path.


FAQ

1) Can segmentation QA replace OCR QA?

No. OCR QA catches character-level issues. Segmentation QA catches row-structure issues. You need both.

2) What if the statement has no running balances?

You can still catch merges/splits using date, amount, description length, and page boundary rules.

3) What if a row is legitimately very long?

That happens. The fix is not to ban long rows, it’s to compare against layout-specific baselines and nearby row shapes.

4) Should dedupe happen before segmentation QA?

No. Segment first, then normalize merchants, then dedupe.

5) What’s the most common real-world mistake?

Treating a wrapped line as a new transaction row. That error causes fake rows and missed rows at the same time.


Bottom line

Row segmentation QA catches the structural mistakes that value checks miss.

If you can detect merged rows, split rows, and page-break carryover early, your exports stop lying about transaction counts. Then reconciliation becomes a math problem again, not a detective job.

FAQ