End-to-end information extraction from documents

The Attend, Copy, Parse architecture is a deep neural network model trained on end-to-end data, that bypasses the need for word-level labels

Article

Reading time:

By

Rasmus Berg Palm, Florian Laws, Ole Winther

TABLE OF CONTENTS

Document information extraction tasks performed by humans create data consisting of a PDF or document image input and extracted string outputs.

This end-to-end data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available at no additional cost.

Unfortunately, state-of-the-art word classification methods for information extraction cannot use this data, instead requiring word-level labels, which are expensive to create and consequently not available for many real-life tasks.

In this paper, we propose the Attend, Copy, Parse architecture, a deep neural network model that can be trained directly on end-to-end data, bypassing the need for word-level labels. We evaluate the proposed architecture on a large, diverse set of invoices and outperform a state-of-the-art production system based on word classification.

We believe our proposed architecture can be used on many real-life information extraction tasks where word classification cannot be used due to a lack of the required word-level labels.

Download

An AI search engine trained on YOUR content.

More from the Newsroom

Blog

June 25, 2024

Chatbot vs Conversational AI: What Are 5 Differences?

End-to-end information extraction from documents

The Attend, Copy, Parse architecture is a deep neural network model trained on end-to-end data, that bypasses the need for word-level labels

Document information extraction tasks performed by humans create data consisting of a PDF or document image input and extracted string outputs.

This end-to-end data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available at no additional cost.

Unfortunately, state-of-the-art word classification methods for information extraction cannot use this data, instead requiring word-level labels, which are expensive to create and consequently not available for many real-life tasks.

In this paper, we propose the Attend, Copy, Parse architecture, a deep neural network model that can be trained directly on end-to-end data, bypassing the need for word-level labels. We evaluate the proposed architecture on a large, diverse set of invoices and outperform a state-of-the-art production system based on word classification.

We believe our proposed architecture can be used on many real-life information extraction tasks where word classification cannot be used due to a lack of the required word-level labels.

Download

Read the customer story

End-to-end information extraction from documents

Get an AI assistant for your website

More from the Newsroom

What are API integrations?

Understanding the Climate Impact of Generative AI: A Balance Perspective

Unlock the power of AI Search

Chatbot vs Conversational AI: What Are 5 Differences?

End-to-end information extraction from documents

More Videos from Raffle

Raffle AI Search for energy companies

What our customers say about us

How to implement Raffle in any website in minutes

5 hacks to improve CX in 2024

Other contents from Newsroom

A Brief Timeline of Digital Self-Service

What is enterprise search?

OpenAI o1: A Glimpse into the Future of AI

Unlock the power of AI Search

Ready to Experience the
‍Raffle Difference?

Solutions

Features

Industries

Company

Resources

End-to-end information extraction from documents

An AI assistant trained on your content

Get an AI assistant for your website

More from the Newsroom

What are API integrations?

Understanding the Climate Impact of Generative AI: A Balance Perspective

Unlock the power of AI Search

Chatbot vs Conversational AI: What Are 5 Differences?

Your competitors are getting AI on their websites. Are you?

End-to-end information extraction from documents

More Videos from Raffle

Raffle AI Search for energy companies

What our customers say about us

How to implement Raffle in any website in minutes

5 hacks to improve CX in 2024

Other contents from Newsroom

A Brief Timeline of Digital Self-Service

What is enterprise search?

OpenAI o1: A Glimpse into the Future of AI

Unlock the power of AI Search

Ready to Experience the ‍Raffle Difference?

Solutions

Features

Industries

Company

Resources

Ready to Experience the
‍Raffle Difference?