Recently, I had the opportunity to work on a challenging yet rewarding task—mapping line items in financial statements to their corresponding IFRS (International Financial Reporting Standards) concept names. While this may sound straightforward, the nuances of financial terminology and the variety in reporting formats make it anything but trivial.
In this post, I’ll walk through the different approaches we explored, the trade-offs we encountered, and the final solution we implemented.
Financial statements from various companies often use slightly different terminology for line items that conceptually map to the same IFRS concept. For instance:
Our goal was to automatically map such variations to standardized IFRS concept names.
We started with a straightforward method: using a dictionary of known mappings. To handle variations, we applied Levenshtein distance to compute the similarity between terms.
Pros:
Cons:
Example:
Another approach was to train a classification model using a labeled dataset of line items and their IFRS concepts.
Pros:
Cons:
This method used a pretrained NLP model like Sentence-BERT (SBERT) to generate vector embeddings for both the line items and the IFRS concept names. We then used cosine similarity to find the closest semantic match.
Pros:
Cons:
In real-world scenarios, no single method is perfect. So, we also considered combining methods to improve accuracy and flexibility.
Due to time constraints, training an ML model wasn’t feasible. So, we implemented a hybrid approach combining the dictionary and SBERT methods:
To ensure quality, we added a human-in-the-loop step for manual validation—critical for financial applications where precision matters.
This project was a great learning experience in applying natural language processing to solve practical finance problems.