It has been some time since I published my last blog entry, which I am aware of. As of this year, my role at UiPath has changed to focus only in AI products of the platform and due to this reason, I wanted to revive the blog. I will be sharing some use cases, my views on AI in general and also talking about new AI functionalities/capabilities of the UiPath Business Platform in the coming weeks. Stay tuned. Hope you like my first article below. Please share your views.
AI is not a hype anymore, it is everywhere.
It is not anymore a technology of the future, but instead we are surrounded by
it. Look what happened with Chatgpt, which was officially released on 30th of
November 2022. It crossed the 100 million users milestone in January 2023 and
in the first of month of its launch, it had more than 57 million monthly users.
It was a real game changer and a great step forward to reach AGI, Artificial
General Intelligence. And we can all be sure that more to come.
RPA and AI are also very close-to-each-other technologies (I call it RPAI) and when used together, they bring huge benefits to the business.
Looking at the UiPath Business Automation Platform, we can easily see that it embraces AI in different ways to meet the different business challenges. Studio, for instance, uses AI capabilities in an embedded way, like suggesting the next-best-activity as well as enabling the computer vision technology which is used while recording the actions before converting them into the activities. Simply put, you can create a whole RPA flow thanks to the computer vision without being in need of creating it manually, one by one.
The same applies for the Test Manager as well since it can also use the computer vision technology while creating the test cases. This of course applies even to Task Mining, especially the Unassisted one, which uses an ML model to identify the repetitive tasks to further explore and reveal the automation and process optimization opportunities.
As process automation expands to meet the needs of digital transformation and digitization initiatives, the ability to rapidly discover and analyze existing processes objectively and at scale becomes an imperative according to Forrester. UiPath Task Mining exists as part of the platform to embrace this challenge. It uses a data driven approach to gain deeper understanding of existing business processes happening on employees’ desktops to identify process improvement areas and automation candidates. It helps the business accelerate with the automation pipeline and discover by mining the unknown areas to identify repetitive tasks with AI-powered analysis by using a ML model deployed in AI Center and gives you a clear picture on the extracted data.
When talking of AI and RPA, we should also mention AI Center which is a very important component of the platform. It helps the businesses orchestrate all the moving pieces of AI, like deploy, consume, manage and improve machine learning models. It really bridges the gap between RPA and the data science teams and enables you to instantly apply the limitless cognitive power of AI to any software currently being automated by RPA. AI Center includes some ready made ML models (we call them OOB (Out of the box) models), which you can use and train according to your needs. You can also bring in your own ML models and deploy them. You can think of AI Center as a container of the ML models, some of which are also used by the different components of the platform like Task Mining and Document Understanding.
I would also like to elaborate more on Document Understanding (DU), after having mentioned about it. There is no company that is not dealing with the documents in different ways. But mainly, with the trapped information in the documents, there are some challenges in finding and extracting them. Processing documents is a challenge, especially if the work is done manually. It is prone to human error, it is pretty repetitive and sometimes (if not always) even boring. An automated DU powered by AI can wipe out those challenges and lead to cost and time efficiency by removing the risk of making mistakes, which might even have big consequences. In a DU framework, you can also involve in humans to verify the findings and train the ML models with their inputs to become smarter. AI plays an important role within those processes. Let’s consider invoices and assume you want to extract the invoice number, the total sum and the company name from them. As we all know, there is no two identical invoices in terms of templates since there are different suppliers that send those invoices and they have of course the freedom to create their own invoice templates.
An ML model, deployed via AI Center, can help in identifying those fields regardless of where they are located in those documents. With the time, ML models become better and learn where to look at for those fields regardless of the template and even the language the invoices are created in.
Classifying emails, is also a huge task, especially when tons of them come in, for instance to the customer helpdesks. Managing this manually costs too much, both in terms of money and time. Once
classified, moving them to the right folders is even a bigger challenge since there might be different rules applying to different folders. AI can also help with this challenge. A subset of AI; NLP (Natural Language Processing) is the technology enabling machines to understand human written text by using AI. With the acquisition of Re:infer, UiPath added a very important component to the platform to achieve this. The Re:infer platform is now integrated with the UiPath platform and has also been rebranded as Communications Mining, which you can read more about here.
All in all, AI is a great helper to automation when it comes to adding more intelligence, but we should also consider the execution, which is similarly important. You can extract information from the invoices, you can classify those emails, but when it comes to moving the extracted information from those invoices to a CRM system or moving those classified emails into the correct folders, you need RPA acting as hands and feet.
To conclude, I would like to refer to a Gartner analysis from 2019, where they mentioned that by 2022, 80% of organizations that deployed RPA would introduce artificial intelligence (AI), including machine learning and natural language processing algorithms for improving business processing activities. I am sure this transition will have a good momentum during this year and onwards with the introduction of the generative AI models as there will be more use cases to combine them both to achieve RPAI. This is where the real strength of the intelligent automation surfaces.
The summer vacation was great, but it is also exciting to be back and kick off the first blog entry after the break...
In the first part of the Document Understanding (DU), I have mainly focused on the "Rule Based" approach and also promised to go into details with the "Document Understanding Framework".
Let's start with it !
DU framework makes it simpler to structure the DU flow that you build in Studio. You can get more details about it here under the "Framework Components".
I will not go into details about the components since they are explained in details on the above link I provided, but still want to refer to the two important loops seen on the picture.
The first one is about the "Validation of the Classification" as you can help the robot do the classification via "Classification Station" if you get an escalation/task about it through Action Center.
Assume that you want to process three different documents in your workflow:
Since you will most probably want to fetch different fields from those documents, the DU automation should be able to separate those documents, shortly they need to be classified correctly. You can configure this so that the robot creates a task if it is not 100% sure about the classification. Once you get the notification, you can go to the Action Center to help the robot by dragging and dropping the documents into the right areas should the robot has failed to do so. If you have used "ML Classifier" in your workflow, you can train the model with your input and make sure that it does not do the same mistakes again.
The second one I would like to refer to is the second loop on the framework picture, which is about the "Validation of the Extraction". This is where you help the robot again through the Action Center when you get a new task via "Validation Station", about the retrieved fields from the document. Again the input you give back to the ML model, assuming you have built the flow with it, can be used to train the model so that it can fetch the fields correctly next time.
Within the ML context, there is also one important component that is not part of the framework above, called "Data Manager". This is where you pre-train a DU related ML model. You can read more about it on this link.
Hopefully this gives you an idea on how you can build a DU solution by using automation. Feel free to add comments or questions below this entry.
In the next blog post, we will deep dive on how to create an ML model and how to train it.
Document Understanding is a pretty wide area and there is a lot to cover. In this post, I want to focus on the basics. In the coming blog posts, I will talk about the framework, which helps while building the solutions and also ML models and how to train them, with some examples and videos. So stay tuned !
There is no company having no documents to process. Processing documents is a challenge, especially if the work is done manually. It is prone to human error, takes time, costs money and most importantly is a repetitive task, which is perfect for automation.
An automated Document Understanding technology powered by AI can wipe out those challenges and lead to cost and time efficiency by removing the risk of making any mistakes. The trapped data in those documents, then, can be extracted and successfully processed.
Based on the type of the documents, one can choose to use AI or not since the choice brings together some other challenges to be considered, which I will list out below.
But first, let's go through the different types of documents:
Structured Documents are the easiest ones to tackle with since their format is fixed, like passports, driving licenses and time sheets.
Semi-structured Documents contain fixed and variable parts like tables. To give some examples; invoices, receipts and purchase orders fall into this category.
Unstructured Documents are the most challenging ones since analysing and extracting data from them is a complex process. Emails, contracts and agreements are all unstructured documents.
Now that we have seen what kind of documents we might need to "understand", we can decide which of them require AI and which do not.
As mentioned above, the "structured documents" are the easiest ones to manage and their formats do not change. A passport has always same fields and most of the time in the same place of the document.
For the structured documents, you can choose not to use AI (an ML model) in the first place since those documents are rule based. If there is an ML model already built, then this can also be used instead, but by just referring to a template you have already defined, you can get very good results and extract all the data you require.
When it comes to processing the semi-structured and unstructured documents, you can easily end up in challenging situations since you might have some difficulties in creating templates for all kinds of invoices, receipts and purchase orders. Assuming that your company is receiving thousands of invoices from hundreds of different suppliers, you should consider a more feasible way to manage them than trying to create templates for all of them one by one, which is almost an impossible task to achieve. This feasible way is as you might guess about using a pre-trained ML model, extracting the correct data from them regardless where the data is located in those documents. More about this will be discussed in the coming blog post.
The good thing is that you can blend in different types of documents to process in the same flow as well. You can extract data from invoices, from contracts and from passports, if you need so, by using either model based approach or rule based approach, which I explained below.
In an earlier post, I mentioned the "human-in-the-loop" concept and how a person could work together with the robot. This situation makes perfect sense here. A robot using an ML model extracting data might want a human to verify this and can then use the input to train itself.
Pretty brilliant, right ?
Using AI or not can also be a challenge itself according to the comparison below: (Please consider that in some cases, you should use an ML model.)
In the following video clip, you can see a rule based approach that tackles with three different kinds of documents and extract data from them. Also in the development tool, Studio, you can get a small flavour of the Document Understanding framework, which I will talk about more in Part 2.
Conversational AI makes it possible to talk to the chatbots as it acts as a bridge between the human and the bots and lets human ask questions and get answers in a more natural way. It is a set of technologies allowing computers to understand human language. The main technology behind this is of course AI...
Conversational AI uses natural language processing techniques to respond back and can even anticipate something that has not been asked yet. It can be also manifested in many different ways, but using chatbots, mostly via the companies' webpages soared lately since people really like to engage with AI in a more human likely way. We tend to interact with AI instead of filling out forms or searching the information ourselves, even the information can be found in a company's website.
Now consider integrating chatbots with RPA and AI. Think about putting an intent to a chatbot, which in its turn can pass this to an RPA bot, that performs an action based on the intent. There are numerous scenarios, where this kind of approach can be relevant. You can check your order status, post data to your CRM system, book a service for your car, upload an ID to verify yourself to your bank... These are just a few of them. With a chatbot at the core, it is as simple to automate conversations between people and robots as it is to automate any other processes. By connecting some cognitive services, like translation into the solution, the language barrier can be removed easily. We have even discussed this earlier, when talking about email classification in an earlier blog entry. Especially for the contact centre agents, the chatbots working in conjunction with the RPA bots can take a huge load from them by taking care of customer requests performing some automations behind the scenes. Any chat then can also be forwarded to the live agents if the chatbots think that the customer is not satisfied with the provided answers anymore by using sentiment analysis constantly during the chats. When the agent is then forwarded to, he/she can also see the whole history of the chat so that unnecessary questions to the customers, like their contact details, what kind of help they need, etc can fully be skipped not to annoy the customer.
One interesting scenario I tested was to let a chatbot (from Druid) trigger an RPA bot so that it can get the answer of the question from another bot. So I let three bots communicate with each other, which I found it pretty interesting 😀. Let me describe this below:
Since I have heard the name of GPT-3 (You can read more about it in the Open AI blog, here.), I got really excited as I see this one step closer to AGI, Artificial General Intelligence. I will not talk about it as you can read yourself and find more information about it. If you are really interested in the topic, and want to dive deeper, you can even read this document. I joined the waitlist to get an API so that I can test it and got my API key, a couple of weeks back. One of the best days ever...😂
Testing it via Open AI playground and seeing the answers I was receiving blew my mind !!! When you enter a text like: "Once upon a time there was a girl...", the algorithm completes it with this:
"who loved to read.
She was a child who loved books so much, she would read in the car.
In the cinema.
At the dinner table.
In the bath.
She would take a book and read it everywhere.
She would read during school."
It can also answer questions like: "What is RPA ?" , "How do you say "Good morning" in Swedish" or in "Turkish" and even interacts with you in any language you want. (To be honest, I have not tested all the languages yet.) And a question like: "Quel est le couleur du ciel?" in French is returned with "bleu." I have also integrated it to my Teams channel so that I can chat via Teams.
Have a look at this:
As you can see above, I can ask my questions in different languages and in any topics. If this is not mind-blowing, then what is ?
In the video below, you can also see how I am interacting with it via the Druid portal. Please check as well how the answer to the question: "What is RPA ?" differs between the different sessions, like the above Teams chat session and the Druid chatbot session.
Hope you also share the joy I have got while building this. If interested, I can also share the details in another blog entry. I would really appreciate to get some feedbacks from you...
So far, we have seen how RPA can bring more value when integrated with AI. The level of the value is of course dependent on the business model and how you have embedded it into your automation. Be it a machine learning model that predicts what the outcome will be, be it a translation service that translates a content or another cognitive service that does a sentiment analysis.
Regardless of what model you choose, the question remains... How can you trust AI ?Or can you trust AI fully ?
I think the answer is still far from being a "OF COURSE, WHY NOT ?"
For sure, RPA & AI combination brings a huge value, it makes our lives more easier, but even though the AI model we use is "narrow" and is expected to take care of the task it was trained for, we cannot let it handle our business critical automation until it can prove that it can do the job flawlessly. "Flawlessly" is only possible if the model is well trained and until we see the outcome, we should keep AI under control.
By control, we mean to bring human into this process and let the silicon work together with the carbon while in the meantime, letting the carbon train the silicon...
This means, you can still embed AI into your solution and set some rules. The confidence level can be one of those rules. The robot can for instance read a text, analyses it and classifies it as positive or negative if it is really certain with its finding with a score of 80%, meaning if it is 80% sure that the text is positive or negative. For an ML model, setting a confidence score is a no brainer.
Here comes a tricky question... What happens then if it is below 80% ? The simple answer is, you can program it to escalate it and ask for help from a human if this is the case. This situation is what we call as : "human-in-the-loop" state. When a robot ends up in this situation, it can send an email, warns someone with a pop-up message, or can find another way to ask for help. Regardless of the method, the receiver of this escalation is a human. The input from the human, then, can be used to train the robot so that it does not need to escalate when the same case occurs.
You might also want to ask another question..."What happens if the human does not answer in a timely manner ?" This is a very relevant one since it might take a while until someone gets back to the robot.Should it then wait until it receives an answer ? Of course, not. The robot should be able to put the process on hold until the answer comes and continue with another one, as otherwise it gets locked unnecessarily as it is impossible to know when this input will come.
Shortly, the robot can run another process when it has escalated the earlier one to a human and once it receives an input, it takes it from there and continues with the rest of the process.
Assume that I have an excel sheet with some movie critiques in it and I want my robot to read the content and provides me a feedback for every critique, either positive or negative. I want it to provide me with this info if it is 80% confident and otherwise, expect it to escalate it to get a human input.
Let's see how it works...
By using an ML model, the robot reads the content, classifies it if the confidence score is above 80%. if not, it expects to get a human feedback and also uses it to train itself. So the next time, it does need to involve a human for the same critique. Consider that my input got a confidence score of 100% since the robot fully respects my entry :)
This is a good example of a human-robot cooperation, where you can feel yourself more comfortable as you are in control and train the robot to become smarter.
If we look from a scenario perspective, it fits to the sixth one in the below picture:
After having shared the email classification video, I got a couple of requests to show the details of the flow.
I have now recorded the video below:
It all starts by checking the messages coming with a specific subject and from a specific email address. I put those requirements so that the automation reacts only to those emails and nothing else. With the delay and the loop, I let the process run all the time and wake up when those specific emails come in to the mailbox.
Then, by using Microsoft translator cognitive service, I am translating the emails to English since the ML model has been built only to classify the emails in English. Adding this activity removes this limitation.
By using the "emailclassification" ML skill, the system evaluates the content of the translated email (if it is a non-English email) and comparing the confidence score, which I set deliberately pretty low, it classifies it and moves it to the corresponding email folder.
Alphabet.Workflow.Activities gives me the possibility to update the subject of the email with the translation. For the moment, the activity only updates the subject and not the body and this is why I am doing the update on the Subject level.
The Message Boxes used in the flow are there just for the demo purposes and removing them makes the flow run quicker and without a human interaction.
Hope you find this useful.
Next week, I will focus on the human interaction part in more details as the robots might need to get some confirmation from time to time, especially when they are not sure with the outcome, for instance when reading a document. You can instruct the robot to use document understanding framework and also put a confidence score if a ML model is required. For any score below the confidence score, you might want the robot contact a human to get his/her approval. This is a recommended way to proceed when using the framework, especially in the beginning. The good thing is that you can use the human input to train the ML model.
This setup, I mean using RPA-AI and Human, opens new doors and gives you the possibility to build very complicated flows.
As promised in my earlier blog entry, I would like to show you how AI and RPA can cooperate to classify emails landing to a contact centre's mailbox.
Let's have a look:
As you can see, there are four emails in four different languages (Japanese, Turkish, Swedish and French) in the mailbox. The ML model developed works only in English, and this is why the non-English emails first need to be translated into English so that the ML can understand the content.
First of all, a robot constantly monitors the mailbox to fetch the right emails. The right emails can be separated from the others via some filters, like a specific word in the subject, the sender or based on if they are unread. Once the conditions are met, the process kicks in. To translate the emails, a translation cognitive service is used. Once the email is translated, the ML model looks into the content, understands it and then moves it to the correct folder. Assuming a contact centre receiving thousands of emails, this automation helps enormously to the agents by removing this mundane task from them so that they can spend more time with their customers. Just for the sake of the demo, I am showing both the translation and the confidence score on the screen, but those parts of the automation can be removed easily to have an uninterrupted and continuous process.
Providing even the possibility to understand the content of the non-English emails, makes the agents who will then respond to the customers jobs even easier. To make the automation even more complex, the agents can respond the emails by writing in English, but an automation can translate it into the emails' original languages and then send them to the senders. That part of the process can also be automated and those responses can either be provided by the robots without involving people, even though this of course brings the challenge of having not perfect translations even though those AI services are becoming more and more intelligent day by day or by involving a human before they are sent. We call this "human-in-the-loop" concept, which I will talk about later.
For the next blog entry, please provide some feedbacks. I can continue with more examples and even go deeper and discuss how I have created this automation.
According to "AI:Built to Scale" by Accenture, 75% of the executives believe they risk going out of business by 2025 if they don't scale AI across their organizations. Yet, only 16% of them could move the AI projects into production. It is good that the awareness is there when it comes to adapting AI into the lines of business, but why are they failing, then ?
Let's consider the following (dream) scenario:
Assume that a company has already implemented RPA and they have fully understood and embraced all the benefits of it. Their efficiency has increased, they have gained 20000 hours a year by automating many mundane tasks. Their employees are happier than ever because they have robot colleagues taking care of all the boring parts of their jobs. The management now has started to think what would be the next step in their automation journey since they see also a huge potential in automating other processes that are not rule based. (As you know, this is mainly RPA's job 😉). Who will then take care of those kind of automations where there are exceptions, uncertainties, variabilities and unstructured data to deal with ?
This is when AI comes into the picture, but how can we make sure that AI and RPA can meet on one common platform?
This is a technological challenge, for sure. As data scientists and RPA developers have two different worlds they are living in. They use different platforms while developing automations and training their ML models.
So having AI in mind is perfectly fine, but you should also have a platform to consume AI and ensure that it works hand in hand with RPA.
AI Center is a good example to this. (You can read more about it by clicking here.)It is not a platform to develop ML models, but it is a one where you can deploy the ML models you have developed. You can also use the ones provided in the platform, called OOB (Out of Box) models. You can create data sets and then run pipelines (training, evaluation or full) by using the ML model and the dataset. The product of a pipeline then becomes an ML skill, which you can let RPA developers consume in their automation flows. You can of course create an ML skill without running a pipeline. To make an ML skill more intelligent is also possible via the platform as you can train those skills.
The following picture summarizes pretty well what I have talked about:
Now that you have brought those two different worlds together, there is almost no limit on what you automate. You can blend both rule based and the model based automations in the same flow to tackle with really challenging and complex tasks...
In my next blog entry, I will discuss some examples where different ML skills are embedded into RPA flows to manage difficult automations, like language translation, text understanding, email classification, sentiment analysis, etc.
While you can automate many processes with the "basic" RPA, there are always cases when you need more than this. You can of course collect data from disparate sources automatically and process it, but you cannot find patterns and insights to make complex decisions. In order to achieve this, you need AI.
When you have both of them in place, you get the most benefit out of your automations since then you can easily overcome the barriers. AI enables automation of processes that include:
Uncertainty, when for instance you cannot determine an outcome with 100% certainty (like loan defaults, property valuation)
High Variability, when there is too much variability to apply any rules (like language translation, resume matching)
Unstructured Data, since it is pretty challenging to process it (like extracting information from articles, images and videos)
Let's pretend we are a contact centre manager and our agents are receiving thousands of emails every day. We want to know how many of them are complaints and also what kind of complaints. We want to then categorize them and distribute them to the right skilled agents so that they are answered. This is a very typical scenario, where you can and must use automation in order to take the unnecessary load from your agents so that they can focus answering your customers instead of classifying those emails. To make the scenario even more complex, assume that you are receiving emails in 10 different languages...
Without using the AI, which can read the emails, translate them, look into the content, understand if the feeling in an email is positive and negative, you cannot achieve much. And by using it, you can have grateful, happy agents and even happier customers since their emails will be returned quicker...
At the end, it all comes to allow you to focus on the things you need to do, which is the funnier part of your job, especially when you can become more creative by doing the tasks that also make you more productive. Those things that make you "You" and unique. Marry AI and RPA when you think of automation next time... You will not regret it :) !
The End-to-End Automation process consists of three base concepts:
The software robots need to be instructed somehow so that they can perform the tasks we want them to. We need to BUILD and program them by giving those instructions before they start to fulfil any requirements from us. Different design tools can help us with this. You can either choose to write some programming code or if the tool permits, use some drag and drop functionalities while teaching the robots what to do, which makes your life easier, for sure.
When programmed correctly, the robots can RUN 7/24, without taking any breaks and making mistakes. If a robot makes a mistake, it is either the fact the environment where it operates has changed (like an updated webpage causing a link the robot is trained to click on disappears) or the developer has not programmed it correctly. So you know what and who to blame!
The in-between component in this picture, is about how to MANAGE the robots. By programming, the robots learn exactly how they should tackle with different tasks, but how they can be made aware of when to run, when to stop and which of them will run which task? Even in more complex scenarios, a queue mechanism may come into the picture if there are too many actions and the actions need to be prioritized. MANAGE part helps with those challenges.