Production-Grade MT940 Parsing in Python for Automated Financial Reconciliation
Automated financial reconciliation and ledger matching require deterministic ingestion of SWIFT MT940 bank statements. While the MT940 specification defines a rigid field structure, real-world implementations exhibit significant vendor-specific deviations in narrative formatting (:86:), date encoding (YYMMDD vs YYYYMMDD), transaction code mapping (:61:), and currency placement. For FinOps engineers and accounting technology developers, building a resilient parser demands a state-machine architecture that enforces strict schema validation, maintains cryptographic audit trails, and integrates seamlessly into modern Core Architecture & Bank Feed Ingestion paradigms. This guide provides an implementation-ready blueprint for parsing MT940 files in Python, optimized for high-throughput batch processing, multi-currency normalization, and production-grade fault tolerance.
Ingestion Strategy & Secure Pipeline Configuration
MT940 ingestion operates primarily in batch mode due to end-of-day bank statement generation cycles, though streaming architectures can approximate near-real-time processing by polling SFTP endpoints or consuming webhook-triggered payloads. Regardless of the ingestion cadence, secure credential handling remains non-negotiable. API tokens, SSH keys, and SFTP credentials must be injected via environment variables or a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) and never persisted in configuration files or version control. Implement a rotating credential strategy with automated lease renewal to prevent pipeline failures during token expiration windows. When designing the ingestion layer, align your parser architecture with established OFX & MT940 Parser Design principles to ensure idempotent file consumption, duplicate statement detection via the :20: transaction reference, and atomic commit patterns that prevent partial ledger updates.
State-Machine Architecture & Deterministic Parsing
Naive line-by-line splitting or regex-only extraction fails under production conditions due to multiline narratives, optional fields, and vendor-specific whitespace handling. A production MT940 parser must implement a tag-aware finite state machine (FSM) that processes SWIFT field delimiters (:) sequentially, maintains context across line breaks, and enforces strict type coercion. Financial precision mandates the use of decimal.Decimal over IEEE 754 floating-point arithmetic to prevent rounding drift during reconciliation. Date parsing must explicitly handle the YYMMDD format with century inference rules aligned to ISO 8601. The FSM should track three primary states: HEADER, STATEMENT_LINES, and FOOTER, transitioning only upon valid tag recognition.
Implementation: Tag-Aware FSM with Cryptographic Audit Hooks
The following Python implementation utilizes compiled regular expressions for deterministic tag extraction, embeds cryptographic audit hooks for reconciliation traceability, and isolates parsing logic from I/O operations. It adheres to strict financial engineering standards, including explicit debit/credit resolution, comma-to-period normalization, and structured error boundaries.
import re
import hashlib
import logging
from datetime import datetime
from typing import List, Dict, Optional, Iterator, Tuple
from decimal import Decimal, InvalidOperation, ROUND_HALF_EVEN
from dataclasses import dataclass, field
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger("mt940_parser")
# Precompiled SWIFT tag patterns for deterministic parsing
TAG_PATTERN = re.compile(r"^:(\d{2}[A-Z]?):(.*)$", re.MULTILINE)
DATE_PATTERN = re.compile(r"^\d{6}$")
AMOUNT_PATTERN = re.compile(r"^([DC])(\d+,\d{2})$")
@dataclass
class MT940Transaction:
value_date: datetime
entry_date: Optional[datetime]
debit_credit: str
amount: Decimal
transaction_code: str
reference: str
narrative: str
raw_line: str
@dataclass
class MT940Statement:
transaction_ref: str
account_id: str
statement_number: str
opening_balance: Decimal
closing_balance: Decimal
currency: str
transactions: List[MT940Transaction] = field(default_factory=list)
audit_hash: str = ""
class MT940AuditHook:
"""Cryptographic audit and reconciliation validator."""
@staticmethod
def compute_hash(statement: MT940Statement) -> str:
payload = f"{statement.transaction_ref}|{statement.account_id}|{statement.opening_balance}|{statement.closing_balance}"
return hashlib.sha256(payload.encode("utf-8")).hexdigest()
class MT940Parser:
"""Production-grade, tag-aware MT940 state machine."""
def __init__(self, strict_mode: bool = True):
self.strict_mode = strict_mode
self._state = "HEADER"
self._current_statement: Optional[MT940Statement] = None
self._pending_narrative: List[str] = []
self._current_tx: Optional[MT940Transaction] = None
def parse(self, raw_content: str) -> List[MT940Statement]:
statements: List[MT940Statement] = []
lines = raw_content.splitlines()
self._state = "HEADER"
for line in lines:
stripped = line.strip()
if not stripped:
continue
tag_match = TAG_PATTERN.match(stripped)
if tag_match:
self._commit_pending_narrative()
tag, content = tag_match.groups()
self._process_tag(tag, content, statements)
else:
self._pending_narrative.append(stripped)
self._commit_pending_narrative()
if self._current_statement:
self._current_statement.audit_hash = MT940AuditHook.compute_hash(self._current_statement)
statements.append(self._current_statement)
return statements
def _commit_pending_narrative(self):
if self._pending_narrative and self._current_tx:
self._current_tx.narrative += "\n".join(self._pending_narrative)
self._pending_narrative.clear()
def _resolve_date(self, date_str: str) -> datetime:
if not DATE_PATTERN.match(date_str):
raise ValueError(f"Invalid date format: {date_str}")
yy, mm, dd = int(date_str[:2]), int(date_str[2:4]), int(date_str[4:6])
year = 2000 + yy if yy < 50 else 1900 + yy
return datetime(year, mm, dd)
def _resolve_amount(self, amount_str: str) -> Decimal:
match = AMOUNT_PATTERN.match(amount_str)
if not match:
raise ValueError(f"Invalid amount format: {amount_str}")
sign, value = match.groups()
amount = Decimal(value.replace(",", "."))
return amount.quantize(Decimal("0.01"), rounding=ROUND_HALF_EVEN) if sign == "C" else -amount
def _process_tag(self, tag: str, content: str, statements: List[MT940Statement]):
if tag == "20":
if self._current_statement:
self._current_statement.audit_hash = MT940AuditHook.compute_hash(self._current_statement)
statements.append(self._current_statement)
self._current_statement = MT940Statement(
transaction_ref=content.strip(),
account_id="", statement_number="",
opening_balance=Decimal("0"), closing_balance=Decimal("0"),
currency="", transactions=[]
)
self._state = "HEADER"
elif tag == "25":
if self._current_statement: self._current_statement.account_id = content.strip()
elif tag == "28C":
if self._current_statement: self._current_statement.statement_number = content.strip()
elif tag in ("60F", "60M"):
if self._current_statement:
parts = content.split()
self._current_statement.currency = parts[1]
self._current_statement.opening_balance = self._resolve_amount(parts[2])
elif tag == "61":
self._commit_pending_narrative()
parts = content.split()
if len(parts) < 4:
if self.strict_mode: raise ValueError(f"Malformed :61: line: {content}")
return
val_date = self._resolve_date(parts[0])
entry_date = self._resolve_date(parts[1]) if len(parts[1]) == 6 and parts[1] != parts[0] else None
dc_flag = parts[2][0]
amount = self._resolve_amount(parts[2][1:])
tx_code = parts[3] if len(parts) > 3 else ""
ref = parts[4] if len(parts) > 4 else ""
self._current_tx = MT940Transaction(
value_date=val_date, entry_date=entry_date,
debit_credit=dc_flag, amount=amount,
transaction_code=tx_code, reference=ref, narrative=""
)
self._state = "STATEMENT_LINES"
elif tag == "62F":
if self._current_statement:
parts = content.split()
self._current_statement.closing_balance = self._resolve_amount(parts[2])
self._state = "FOOTER"
elif tag == "86":
if self._current_tx:
self._current_tx.narrative += content.strip()
else:
self._pending_narrative.append(content.strip())
Multi-Currency Normalization & Ledger Mapping
MT940 does not consistently embed ISO 4217 currency codes across all fields. Currency is typically defined in :60F: and :62F:, but transaction-level amounts (:61:) inherit this context implicitly. For multi-entity or cross-border FinOps operations, implement a deterministic currency normalization layer that:
- Validates currency codes against the official ISO 4217 Currency Codes registry.
- Applies mid-market FX conversion using timestamped rate snapshots to prevent reconciliation drift.
- Maps transaction codes (
:61:subfields) to general ledger accounts via a versioned mapping table, ensuring auditability when bank codes change.
Maintain precision to four decimal places during FX conversion, rounding only at the final ledger commit using ROUND_HALF_EVEN to comply with standard accounting practices.
Data Normalization Pipelines & Fault Tolerance
Production reconciliation pipelines require structured validation, retry logic, and dead-letter queue (DLQ) routing. Wrap the parser in a Pydantic or dataclass schema validator to reject malformed payloads before they reach the ledger. Implement exponential backoff with jitter for transient I/O failures, and route unparseable statements to a DLQ with full context preservation.
Leverage Python’s built-in decimal module for all monetary arithmetic to guarantee deterministic results across distributed nodes. Attach structured JSON logging to every parsed transaction, including the raw line hash, parsed values, and reconciliation status. This observability layer enables rapid root-cause analysis when vendor-specific deviations break downstream matching algorithms.
For high-throughput environments, decouple ingestion from parsing using message brokers (Kafka, RabbitMQ). Process MT940 payloads in parallel worker pools, ensuring each worker maintains an isolated state machine instance. Commit parsed statements to the ledger using idempotent upserts keyed on :20: reference + :61: value date + amount. This architecture guarantees exactly-once semantics, prevents duplicate posting, and satisfies SOX/GDPR audit requirements.