This week, WorkFusion Senior Solutions Consultant Ryan Peters offered a webinar on email intake processes. Since the manual work of classifying and extracting data from email is a chore that practically every business has to deal with, there was a lot of audience engagement. We received tons of great questions, and we’re answering some of the most interesting ones here:
Which OCR tool do you use handle attachments?
WorkFusion includes ABBYY’s OCR engine as part of our technology stack. This is useful for email intake processing because emails will typically contain attachments (e.g. PDFs) or embedded images (e.g. TIFF, JPG, GIF). When processing an email, it’s important to digitize all the attachments and images, so their text can be combined with the body of the email for classification and extraction.
Is the classification model provided out of the box or is it set up by a data scientist during implementation?
WorkFusion’s AutoML provides generic classification models to automatically classify texts by analyzing its parts (tokens) and their combinations (features). No data scientist is required. A Machine Learning Engineer will feed training data into a WorkFusion business process and let our AutoML do the work of determining the best combination of tokens and features. This can be accomplished using historical data or manual training tasks, like in the example below:
Example training task for classification
Results after training
What is the typical POC duration?
A typical email intake POC will take 4–6 weeks. Duration depends on the complexity of the RPA integration, numbers of email types for classification, number of extraction models, etc. I typically recommend that our customers start small for a POC, classify 3–6 different types of emails, and select one email type for an extraction model.
Here’s an example of a high-level POC project plan:
Are the POCs typically done on-site or in the cloud?
It depends on the data we will be using for the POC. If there is no PII or confidential data, WorkFusion can host a POC in our AWS cloud. The benefit is the speed of implementation and not needing to onboard our resources. If there are confidential data concerns, customers will set up our infrastructure on-premise prior to the POC kickoff. The other key concern is integrations with back-end systems. If there is RPA involved to integrate with enterprise applications behind the firewall, it is easier to set up our infrastructure on-premise instead of opening ports for a short POC.
The guidance above is for POCs for production implementations. Almost all our customers deploy in their data center or their private cloud.
How do you handle inaccurate results?
WorkFusion SPA offers Automatic Quality Control (AutoQC) to ensure that the automation and models are meeting accuracy thresholds set by the business. AutoQC is an out-of-the-box feature that uses statistical methods in the monitoring and maintaining of data quality. In WorkFusion automation workflow, the AutoQC sub-process chooses the optimally cost-effective combination of automated machines and human worker that always deliver at or above the acceptable quality level.
The main concept of AutoQC is to take a sample from a defined batch of items and verify each item in that sample. After the sample verification, the whole batch is considered as accepted or rejected depending on the Rejection Limit.
Also published on Medium.