Tools · Jun 28, 2026

AWS describes a protocol-based server for real-time PDF text extraction from Amazon S3

A new AWS how-to shows how to build an MCP-based server that lets AI assistants query PDFs stored in S3 interactively, with a cost comparison to Amazon Textract.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

AWS published a technical how-to for building an MCP-based server that extracts text from PDFs in Amazon S3 in real time.
The solution targets text-based PDFs and interactive, on-demand queries rather than batch processing or OCR.
AWS contrasts the approach with Amazon Textract, providing indicative cost comparisons for a 10,000-page-per-month proof-of-concept workload.

AWS’s Machine Learning Blog published a technical how-to describing a server that extracts text from PDF files stored in Amazon S3 in real time.

The solution uses the Model Context Protocol (MCP), an open standard, to provide programmatic access to documents. It targets text-based PDFs and interactive queries rather than batch pipelines or OCR workflows.

The post outlines four components: a command-line interface for users, the MCP layer for communication, a custom MCP server for PDF processing, and Amazon S3 for document storage secured by AWS IAM.

Authors Phani Parcha and Saibal Gosh position the MCP-based approach as a fit for compliance, legal, financial services, and executive teams that need on-demand access to document text during time-sensitive reviews or meetings.

The post contrasts this approach with Amazon Textract, noting that Textract is purpose-built for document processing at scale, including scanned pages, handwriting, forms, tables, and complex layout analysis.

For a 10,000 text-based PDF page-per-month proof-of-concept workload, AWS provides indicative monthly costs: Textract-based processing roughly $15, S3 storage $2, Lambda compute $1, and LLM token processing $5–$10, totaling about $23–$28; the MCP server approach costs roughly S3 storage $2 and data transfer $0.50, totaling about $2.50.

All cost figures are illustrative and subject to change; AWS directs readers to official pricing pages for current rates.

The end-to-end workflow starts when an AI client requests extraction via a CLI, the MCP server retrieves the PDF from S3, extracts the text, and returns it to the client.

Sources

01AWS — Machine Learning Blog — Build interactive PDF text extraction from Amazon S3

Also on Tools

AWS describes a protocol-based server for real-time PDF text extraction from Amazon S3

Sakana AI and 360 launch Mythos-like models amid U.S. export controls on Anthropic’s Mythos

Founder used Anthropic’s Claude to manage cancer treatment data and navigate rare diagnosis

Tutorial outlines how to build a local coding agent using open-weight models and open-source tools