丁香五月天综合网,日韩专区内容更新,麻豆国产高清无码

超碰91资源站-超碰97豆花-超碰97人妻-超碰97人人干-超碰97人人香蕉-超碰97天天操-超碰97在线资源站-超碰97资源站共享-超碰97资源站总站-超碰aa在线91-超碰av操-超碰爱爱

快速入手光學(xué)字符識別控件Aspose.OCR！如何從PDF中提取文本

翻譯|使用教程|編輯：顏馨|2023-05-16 10:09:01.360|閱讀 231 次

概述：本章介紹如何在C#中對PDF文檔進行OCR并從PDF中提取文本

Aspose.OCR是一款字符識別組件，它使得開發(fā)人員可以添加OCR功能到他們的ASP.NET Web應(yīng)用程序、web服務(wù)和windows應(yīng)用程序中。它提供了一個簡單的類集用于控制字符識別。Aspose.OCR目的是為那些需要在他們自己的應(yīng)用程序中使用圖像（BMP和TIFF）的開發(fā)人員提供需求。它允許開發(fā)人員快速而簡單的從圖像中提取文本，并節(jié)省了從頭開發(fā)一個OCR解決方案的時間和精力。

Aspose API支持流行文件格式處理，并允許將各類文檔導(dǎo)出或轉(zhuǎn)換為固定布局文件格式和最常用的圖像/多媒體格式。

Aspose.OCR 最新下載

PDF 文件是最常見的業(yè)務(wù)文檔之一。在某些情況下，我們可能需要以編程方式閱讀掃描的PDF文檔。從掃描的PDF文件中提取文本的困難導(dǎo)致了工具的開發(fā)，這些工具可以更輕松地從此類PDF文檔中閱讀和檢索文本。根據(jù)文檔的內(nèi)容，出于多種原因，從 PDF 文件中提取文本可能很有用。在本文中，我們將學(xué)習(xí)如何在C#中對PDF文檔進行OCR并從PDF中提取文本。

OCR PDF 到文本 C# API

我們將使用 Aspose.OCR for .NET API 對 PDF 文檔執(zhí)行 OCR。它可以識別掃描的圖像，智能手機照片，屏幕截圖和圖像區(qū)域。API 以最流行的文檔和數(shù)據(jù)交換格式返回識別的文本結(jié)果。除了將圖像轉(zhuǎn)換為文本外，API 還可以根據(jù)掃描創(chuàng)建可搜索的 PDF。此外，它能夠自動更正已識別文本中的拼寫錯誤。

該 API 提供了 AsposeOcr 類，該類提供了執(zhí)行 OCR 操作的各種方法。它提供了RecognizePdf（字符串，DocumentRecognitionSettings）方法來識別所提供的PDF文檔中的文本。API 的 DocumentRecognitionSettings 類提供 PDF 識別過程的設(shè)置。類表示圖像識別的結(jié)果。

OCR PDF 和從 C 語言的 PDF 中提取文本

我們可以對PDF文檔執(zhí)行OCR，并按照以下步驟提取識別的文本：

首先，創(chuàng)建 AsposeOcr 類的實例。
接下來，初始化 DocumentRecognitionSettings 類的對象。
然后，指定要用于 OCR 的語言。
之后，通過調(diào)用 RecognizePdf（）方法獲取 RecognitionResult。它采用圖像路徑和文檔識別設(shè)置對象作為參數(shù)。
最后，循環(huán)瀏覽識別結(jié)果列表并顯示標識的文本。

以下示例代碼演示如何在 C# 中對 PDF 文檔進行 OCR 和提取識別的文本。

// This code example demonstrates how to OCR PDF documents and extract the recognized text.
// Initialize the PCR engine
AsposeOcr recognitionEngine = new AsposeOcr();

// Initialize recognition settings
DocumentRecognitionSettings recognitionSettings = new DocumentRecognitionSettings();

// Specify language for OCR. Multi-language by default
recognitionSettings.Language = Language.Eng;

// Recognize text from PDF
List<RecognitionResult> results = recognitionEngine.RecognizePdf("C:\\Files\\sample.pdf", recognitionSettings);

// Show the recognized text
foreach (RecognitionResult result in results)
{
Console.WriteLine(result.RecognitionText);
}

OCR PDF 和從 C 語言的 PDF 中提取文本#

對 PDF 執(zhí)行 OCR 并將文本保存在 C 語言中

我們可以對PDF文檔執(zhí)行OCR，并按照以下步驟保存識別的文本：

首先，創(chuàng)建 AsposeOcr 類的實例。
接下來，初始化 DocumentRecognitionSettings 類的對象。
然后，指定要用于 OCR 的語言。
之后，調(diào)用 RecognizePdf（）方法來獲取 RecognitionResult。它采用圖像路徑和文檔識別設(shè)置對象作為參數(shù)。
最后，使用 SaveMultipageDocument（）方法保存文本。它采用輸出文件路徑、SaveFormat 和 RecognitionResult 對象作為參數(shù)。

以下示例代碼演示如何對 PDF 文檔進行 OCR 并將識別的文本保存在 C# 中。

// This code example demonstrates how to OCR PDF documents and extract the recognized text.
// Initialize the PCR engine
AsposeOcr recognitionEngine = new AsposeOcr();

// Initialize recognition settings
DocumentRecognitionSettings recognitionSettings = new DocumentRecognitionSettings();

// Specify language for OCR. Multi-language by default
recognitionSettings.Language = Language.Eng;

// Recognize text from PDF
List<RecognitionResult> results = recognitionEngine.RecognizePdf("C:\\Files\\sample.pdf", recognitionSettings);

// Save the recognized text
AsposeOcr.SaveMultipageDocument("C:\\Files\\OCR_result.txt", SaveFormat.Text, results);

OCR PDF 和將掃描的 PDF 轉(zhuǎn)換為 C 語言中的單詞

我們可以對掃描的PDF文檔執(zhí)行OCR，并按照前面提到的步驟將識別的文本保存在Word文檔中。但是，我們只需要在最后一步中指定 SaveFormat.Docx。

下面的示例代碼演示如何在 C# 中對 PDF 進行 OCR PDF 并將識別的文本另存為 Word 文檔。

// This code example demonstrates how to OCR PDF documents and save the recognized text as DOCX.
// Initialize the PCR engine
AsposeOcr recognitionEngine = new AsposeOcr();

// Initialize recognition settings
DocumentRecognitionSettings recognitionSettings = new DocumentRecognitionSettings();

// Specify language for OCR. Multi-language by default
recognitionSettings.Language = Language.Eng;

// Recognize text from PDF
List<RecognitionResult> results = recognitionEngine.RecognizePdf("C:\\Files\\sample.pdf", recognitionSettings);

// Save the recognized text as DOCX
AsposeOcr.SaveMultipageDocument("C:\\Files\\OCR_result.docx", SaveFormat.Docx, results);

OCR PDF 和將掃描的 PDF 轉(zhuǎn)換為 C 語言中的單詞#

OCR PDF 和將 PDF 轉(zhuǎn)換為 JSON 語言

我們可以對 PDF 文檔執(zhí)行 OCR，并按照前面提到的步驟將識別的文本保存在 JSON 文件中。但是，我們只需要在最后一步中指定 SaveFormat.Json。

以下示例代碼演示如何在 C# 中對 PDF 進行 OCR PDF 并將識別的文本另存為 JSON 文件。

// This code example demonstrates how to OCR PDF documents and save the recognized text as JSON.
// Initialize the PCR engine
AsposeOcr recognitionEngine = new AsposeOcr();

// Initialize recognition settings
DocumentRecognitionSettings recognitionSettings = new DocumentRecognitionSettings();

// Specify language for OCR. Multi-language by default
recognitionSettings.Language = Language.Eng;

// Recognize text from PDF
List<RecognitionResult> results = recognitionEngine.RecognizePdf("C:\\Files\\sample.pdf", recognitionSettings);

// Save the recognized text as JSON
AsposeOcr.SaveMultipageDocument("C:\\Files\\OCR_result.json", SaveFormat.Json, results);

以上便是如何對 PDF 文檔執(zhí)行 OCR 以及如何在 C# 中從 PDF 中提取文本的詳細步驟，希望能幫到您，若有其他問題歡迎加入我們的技術(shù)交流群，或關(guān)注我們。

歡迎下載|體驗更多Aspose產(chǎn)品

獲取更多信息請咨詢或加入Aspose技術(shù)交流群（761297826）