Hi,
Do you have any idea how I could get the font and size of a text read from pdf with iText in c#? Available versions are 7 and 8. Or if you have another idea, please share it.
Thank you very much!
I just posted your query in Chatgpt and below mentioned solution is from there!
using System;
using iText.Kernel.Pdf;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
class Program
{
static void Main(string[] args)
{
string filePath = "path_to_your_pdf_file.pdf";
using (PdfReader pdfReader = new PdfReader(filePath))
{
using (PdfDocument pdfDocument = new PdfDocument(pdfReader))
{
for (int pageNumber = 1; pageNumber <= pdfDocument.GetNumberOfPages(); pageNumber++)
{
PdfPage page = pdfDocument.GetPage(pageNumber);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string text = PdfTextExtractor.GetTextFromPage(page, strategy);
// Now let's iterate through the text and get font and size information
foreach (TextRenderInfo renderInfo in strategy.GetResultantTextRenderers())
{
var font = renderInfo.GetFont();
var fontSize = renderInfo.GetFontSize();
Console.WriteLine($"Text: {renderInfo.GetText()}, Font: {font.PostscriptFontName}, Size: {fontSize}");
}
}
}
}
}
}
In this example:
- Replace
"path_to_your_pdf_file.pdf"
with the actual path to your PDF file. - The code iterates through each page of the PDF, extracts text, and then iterates through each text element to get font and size information.
TextRenderInfo.GetFont()
returns the font information for the current text element.TextRenderInfo.GetFontSize()
returns the font size for the current text element.
Make sure you have iTextSharp installed via NuGet for your project. This example uses iText 7. If you’re using iText 8, the general approach would be similar, but some of the APIs might have changed, so make sure to consult the iText documentation for version 8 for any necessary adjustments.
Hope this will resolve your query!
Regards,
Ajay Mishra
Hi, but this method GetResultantTextRenderers is not availlable, is there any alternative?
Ya @anamariavioleta.dinca use GetResultantText() Method!
using System;
using iText.Kernel.Pdf;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
class Program
{
static void Main(string[] args)
{
string filePath = "path_to_your_pdf_file.pdf";
using (PdfReader pdfReader = new PdfReader(filePath))
{
using (PdfDocument pdfDocument = new PdfDocument(pdfReader))
{
for (int pageNumber = 1; pageNumber <= pdfDocument.GetNumberOfPages(); pageNumber++)
{
PdfPage page = pdfDocument.GetPage(pageNumber);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string text = PdfTextExtractor.GetTextFromPage(page, strategy);
// Now let's iterate through the text and get font and size information
foreach (TextRenderInfo renderInfo in strategy.GetResultantText(page))
{
var font = renderInfo.GetFont();
var fontSize = renderInfo.GetFontSize();
Console.WriteLine($"Text: {renderInfo.GetText()}, Font: {font.PostscriptFontName}, Size: {fontSize}");
}
}
}
}
}
}
Regards,
Ajay Mishra
That method returns a string, how could I iterate with a TextRenderInfo object?
@anamariavioleta.dinca Okayy just try another method:
In iText 7, you can achieve this by implementing a custom ITextExtractionStrategy
that extends LocationTextExtractionStrategy
and overrides the RenderText
method. Here’s how you can modify the code to accomplish this:
using System;
using iText.Kernel.Font;
using iText.Kernel.Pdf;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
class FontSizeExtractionStrategy : LocationTextExtractionStrategy
{
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
var font = renderInfo.GetFont();
var fontSize = renderInfo.GetFontSize();
Console.WriteLine($"Text: {renderInfo.GetText()}, Font: {font.PostscriptFontName}, Size: {fontSize}");
}
}
class Program
{
static void Main(string[] args)
{
string filePath = "path_to_your_pdf_file.pdf";
using (PdfReader pdfReader = new PdfReader(filePath))
{
using (PdfDocument pdfDocument = new PdfDocument(pdfReader))
{
for (int pageNumber = 1; pageNumber <= pdfDocument.GetNumberOfPages(); pageNumber++)
{
PdfPage page = pdfDocument.GetPage(pageNumber);
ITextExtractionStrategy strategy = new FontSizeExtractionStrategy();
PdfCanvasProcessor processor = new PdfCanvasProcessor(strategy);
processor.ProcessPageContent(page);
// You can access extracted text if needed
// string text = strategy.GetResultantText();
}
}
}
}
}
- I created a custom
FontSizeExtractionStrategy
class that extendsLocationTextExtractionStrategy
. Inside theRenderText
method override, we retrieve font and size information for each text element. - In the main loop, we instantiate this custom strategy and pass it to
PdfCanvasProcessor
to process each page’s content. - You can optionally retrieve the extracted text using the
GetResultantText()
method of the strategy if needed.
This way, you’ll be able to extract font and size information for each text element in the PDF.
Hope this will resolve your issue!
Regards,
Ajay Mishra