June 17, 2021

Understanding Documents thanks to RPAI (Part 1)

Document Understanding is a pretty wide area and there is a lot to cover. In this post, I want to focus on the basics. In the coming blog posts, I will talk about the framework, which helps while building the solutions and also ML models and how to train them, with some examples and videos. So stay tuned !

There is no company having no documents to process. Processing documents is a challenge, especially if the work is done manually. It is prone to human error, takes time, costs money and most importantly is a repetitive task, which is perfect for automation. 

An automated Document Understanding technology powered by AI can wipe out those challenges and lead to cost and time efficiency by removing the risk of making any mistakes. The trapped data in those documents, then, can be extracted and successfully processed. 

Based on the type of the documents, one can choose to use AI or not since the choice brings together some other challenges to be considered, which I will list out below.

But first, let's go through the different types of documents:

  • Structured Documents are the easiest ones to tackle with since their format is fixed, like passports, driving licenses and time sheets. 
  • Semi-structured Documents contain fixed and variable parts like tables. To give some examples; invoices, receipts and purchase orders fall into this category.
  • Unstructured Documents are the most challenging ones since analysing and extracting data from them is a complex process. Emails, contracts and agreements are all unstructured documents.
Now that we have seen what kind of documents we might need to "understand", we can decide which of them require AI and which do not.

As mentioned above, the "structured documents" are the easiest ones to manage and their formats do not change. A passport has always same fields and most of the time in the same place of the document.


For the structured documents, you can choose not to use AI (an ML model) in the first place since those documents are rule based. If there is an ML model already built, then this can also be used instead, but by just referring to a template you have already defined, you can get very good results and extract all the data you require.

When it comes to processing the semi-structured and unstructured documents, you can easily end up in challenging situations since you might have some difficulties in creating templates for all kinds of invoices, receipts and purchase orders. Assuming that your company is receiving thousands of invoices from hundreds of different suppliers, you should consider a more feasible way to manage them than trying to create templates for all of them one by one, which is almost an impossible task to achieve. This feasible way is as you might guess about using a pre-trained ML model, extracting the correct data from them regardless where the data is located in those documents. More about this will be discussed in the coming blog post.

The good thing is that you can blend in different types of documents to process in the same flow as well. You can extract data from invoices, from contracts and from passports, if you need so, by using either model based approach or rule based approach, which I explained below.

In an earlier post, I mentioned the "human-in-the-loop" concept and how a person could work together with the robot. This situation makes perfect sense here. A robot using an ML model extracting data might want a human to verify this and can then use the input to train itself.

Pretty brilliant, right ?

Using AI or not can also be a challenge itself according to the comparison below: (Please consider that in some cases, you should use an ML model.)




In the following video clip, you can see a rule based approach that tackles with three different kinds of documents and extract data from them. Also in the development tool, Studio, you can get a small flavour of the Document Understanding framework, which I will talk about more in Part 2.




Until next time, have a great summer break 😎...










A Great Combination

One of the most common questions I have been asking to myself lately is what would be a good use case combining RPA and LLM as the term LLM ...