International Research journal of Management Science and Technology

  ISSN 2250 - 1959 (online) ISSN 2348 - 9367 (Print) New DOI : 10.32804/IRJMST

Impact Factor* - 6.2311


**Need Help in Content editing, Data Analysis.

Research Gateway

Adv For Editing Content

   No of Download : 33    Submit Your Rating     Cite This   Download        Certificate

A COMPOSITE APPROACH THAT COMBINES OPTICAL CHARACTER RECOGNITION (OCR) AND NATURAL LANGUAGE PROCESSING (NLP) TO EXTRACT INFORMATION FROM UNSTRUCTURED DATA

    2 Author(s):  PROF. RUTUJA VILAS KOTKAR,DR. SHUBHANGI M. POTDAR

Vol -  14, Issue- 7 ,         Page(s) : 100 - 106  (2023 ) DOI : https://doi.org/10.32804/IRJMST

Abstract

The extraction of information from unstructured data poses a significant challenge in various domains such as document analysis, information retrieval, and data mining. In this paper, we propose a hybrid approach that combines Optical Character Recognition (OCR) and Natural Language Processing (NLP) techniques to tackle this problem effectively.

Wang, S., Wu, Q., Li, M., & Xiong, W. (2018). OCR-aided information extraction from invoices using semantic labels. Future Generation Computer Systems, 87, 498-508.
Bollmann, M., Bögel, T., Christof, T., & Mitschick, A. (2020). Combining Optical Character Recognition and Named Entity Recognition for Semantic Metadata Extraction from Historical Newspapers. In Proceedings of the Digital Humanities in the Nordic Countries 5th Conference (pp. 71-83). CEUR-WS.

*Contents are provided by Authors of articles. Please contact us if you having any query.






Bank Details