オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【マルチモーダル】Optical Character Recognition / OCR

Index

Optical Character Recognition / OCR

Optical Character Recognition : 光学式文字認識.

Text Recognition とも

画像から文字情報を抽出するマルチモーダルタスク.

アルゴリズムの構造

Text Detection

U-Net / Mask R-CNN などの汎用的なアルゴリズムも利用される.

TextSnake / 2018

  • TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Pixel Aggregation Network / PANet / 2019

  • Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Progressive Scale Expansion Network / PSENet / 2019

Differentiable Binarization Net / DBNet / 2019

  • Real-time Scene Text Detection with Differentiable Binarization

DBNet++ / 2022

  • Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

Deep Relational Reasoning Graph / DRRG / 2020

  • Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

FCENet / 2021

  • Fourier Contour Embedding for Arbitrary-Shaped Text Detection

Text Recognition

MobileNet や ResNet などの汎用的なアルゴリズムも利用される.

Convolutional Recurrent Neural Network / RCNN / 2015

  • An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

NRTR / 2018

No Recurrence sequence-to-sequence Text Recognizer



  • NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition

SAR / 2018

Show Attend and Read
  • Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Self-Attention Text Recognition Network / SATRN / 2019

  • On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

MASTER / 2019

  • MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

RobustScanner / 2020

  • RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

ABINet / 2021

  • Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

End to End

Easter2.0 / 2022

手書き文字認識 (HTR) の分野における新しい CNN ベースのアーキテクチャ Easter2.0. 1D畳み込み層とスクイーズ・アンド・エキサイテーション (SE) モジュールを組み合わせることで、グローバルな文脈情報を捉えつつ、CNNの速度と効率を維持することを目指している. また、訓練データの質を高めるための新しいデータ拡張技術 Tiling and Corruption Augmentation (TACo) も紹介されており、限られたデータセットでも優れた性能を発揮することが実験で示されている.

  • Easter2.0: Improving convolutional models for handwritten text recognition

  • Researchers Propose Easter2.0, a Novel Convolutional Neural Network CNN-Based Architecture for the Task of End-to-End Handwritten Text Line Recognition that Utilizes Only 1D Convolutions

Nougat / 2023

  • Nougat: Neural Optical Understanding for Academic Documents

ライブラリ・API

MMOCR

Japanese OCR 実装

Tesseract

参考

  • MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

Web サイト

書籍