Sundar | Bridging Financial Statements and IFRS Using NLP

Bridging Financial Statements and IFRS Using NLP

April 25, 2025

Recently, I had the opportunity to work on a challenging yet rewarding task—mapping line items in financial statements to their corresponding IFRS (International Financial Reporting Standards) concept names. While this may sound straightforward, the nuances of financial terminology and the variety in reporting formats make it anything but trivial.

In this post, I’ll walk through the different approaches we explored, the trade-offs we encountered, and the final solution we implemented.

🧩 The Challenge

Financial statements from various companies often use slightly different terminology for line items that conceptually map to the same IFRS concept. For instance:

“Accounts receivable” vs. “Trade receivables”
“Cash in hand” vs. “Cash and cash equivalents”

Our goal was to automatically map such variations to standardized IFRS concept names.

🔍 Approaches We Considered

1. Dictionary-Based Mapping (with Levenshtein Distance)

We started with a straightforward method: using a dictionary of known mappings. To handle variations, we applied Levenshtein distance to compute the similarity between terms.

Pros:

Simple and quick to implement.
Works well for small or known datasets.

Cons:

Doesn’t scale well to large or ambiguous datasets.
May struggle with semantically similar but syntactically distant terms.

Example:

“Cash and cash equivalents” ➝ “Cash and cash equivalents” (exact match)
“Accounts receivable” ➝ “Trade receivables” (closest match)

2. Supervised Machine Learning

Another approach was to train a classification model using a labeled dataset of line items and their IFRS concepts.

Pros:

Potentially high accuracy with enough data.
Can learn semantic and contextual cues.

Cons:

Requires a clean, labeled dataset (which we didn’t have).
Longer setup and training time.

3. Embedding-Based Semantic Matching with SBERT

This method used a pretrained NLP model like Sentence-BERT (SBERT) to generate vector embeddings for both the line items and the IFRS concept names. We then used cosine similarity to find the closest semantic match.

Pros:

Captures the meaning of the sentence, not just the words.
Handles more complex variations effectively.

Cons:

Requires additional setup.
Slightly more compute-heavy.

4. Hybrid Approach

In real-world scenarios, no single method is perfect. So, we also considered combining methods to improve accuracy and flexibility.

🛠️ What We Did

Due to time constraints, training an ML model wasn’t feasible. So, we implemented a hybrid approach combining the dictionary and SBERT methods:

We used exact string matches and Levenshtein distance for quick wins.
For complex cases, we generated sentence embeddings using SBERT and computed cosine similarity between line items and IFRS concept names.
We combined the results from both approaches and selected the best match based on the highest confidence score.

To ensure quality, we added a human-in-the-loop step for manual validation—critical for financial applications where precision matters.

💡 Key Takeaways

Text matching in financial data isn’t just about string similarity—it’s about semantic understanding.
Combining traditional methods with NLP can yield better results than either in isolation.
Human validation remains essential when working with sensitive financial mappings.

This project was a great learning experience in applying natural language processing to solve practical finance problems.