Jina Reader API — 웹페이지를 LLM 친화적 마크다운으로 변환

jina.ai

One Line Turns Any URL Into AI-Ready Text — Jina Reader, the Easiest Way to Convert Web Pages to Markdown

Jina Reader (r.jina.ai) converts any web page to clean markdown by just prependiDev

jina-ai/reader: Convert any URL to an LLM-friendly input with a simple prefix

Reader API — Jina AI

Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown

Know what happens when you tell an AI to "read this website"? HTML tags, ad scripts, navigation bars, footers, cookie banners... all this junk gets mixed in with the actual text and shoved in whole. Tokens get wasted, and AI response quality tanks.

Add one line before the URL and this problem disappears. r.jina.ai/anysite.com — that's it.

3-Second Summary

Any URL → r.jina.ai/ prefix → HTML noise removed → Clean markdown → Feed directly to AI

What Is This?

Jina Reader is a free API that converts any web page URL into clean markdown that LLMs can directly digest. Built by Jina AI, a Berlin-based AI infrastructure company, it has crossed 10,300 GitHub stars since its 2024 launch and quickly established itself in the developer community.

The usage is absurdly simple. Just prepend https://r.jina.ai/ to whatever URL you want to read. No sign-up, no API key needed. Type it straight into your browser address bar and markdown comes right out. Simon Willison (Django co-creator) called it "one of their most instantly useful" products.

Under the hood, it runs Puppeteer (headless Chrome) to handle JavaScript-rendered SPAs, uses Mozilla's Readability.js to extract core content, then converts to markdown via the Turndown library. On top of that, Jina AI built ReaderLM-v2, a dedicated 1.5-billion-parameter language model that doesn't rely on rules — a neural network that understands HTML structure and converts it. It supports 29 languages with 20% higher accuracy than its predecessor.

It's not just a Read mode either. Use s.jina.ai/query to get the top 5 web search results as markdown. Perfect for RAG systems and AI agent web grounding.

10.3K

GitHub Stars

Free

Base Cost

Supported Languages

9.3T

Tokens Processed (Last 30 Days)

What Actually Changes?

Let's compare the existing ways of feeding web content to AI.

	Copy-Paste	Custom Scraping	Jina Reader
Setup Time	Manual every time	Build per-site parsers	0 seconds (URL prefix only)
HTML Noise	Manual cleanup needed	Maintain per-site selectors	Automatically removed
JS Rendering	Not possible	Selenium/Puppeteer setup	Built-in headless Chrome
PDF Support	Separate tool needed	Separate library	Just feed the URL
Image Captions	Not possible	Separate vision model	Auto-generated (optional)
Cost	Free	Infrastructure costs	Free (basic)

Let's also compare with similar services.

Tool	Method	Free Tier	License	Strength
Jina Reader	URL prefix	10M tokens	Apache 2.0	Zero barrier, commercial-friendly
Firecrawl	API	500 credits	AGPL-3.0	Large-scale crawling, JS automation
Crawl4AI	Local install	Fully free	Apache 2.0	Self-hosted, LLM chunking
Diffbot	API	Trial	Commercial	Automatic entity classification

Bottom line: Jina Reader for the fastest start, Firecrawl for large-scale crawling, Crawl4AI for full control. According to an Apify blog analysis, Firecrawl is 4-5x cheaper at 100K pages/month, but for small-scale use or prototyping, Jina Reader is overwhelmingly more convenient.

Key Takeaway

Jina Reader's real value is that you can feed clean web data to AI without writing a single line of code. Even non-developers can create AI input data just by prepending r.jina.ai/ in the browser address bar.

Getting Started: The Essentials

Test in your browser right now
Type https://r.jina.ai/https://github.com/jina-ai/reader in your address bar. Clean markdown appears instantly. No installation, no sign-up needed.
Use it with AI chat
When asking ChatGPT or Claude to "analyze this page," paste the r.jina.ai/URL output instead of the raw URL. You'll notice a clear difference in response quality.
Try Search mode
Enter https://s.jina.ai/Jina+Reader+tutorial and you'll get the full text of the top 5 results as markdown. A great starting point for research automation.
Get an API key (optional)
A free key bumps rate limits from 20 to 500 RPM and cuts response time from 7.9s to 2.5s. You get 10 million free tokens, so there's no downside.
Connect to automation
In code, curl https://r.jina.ai/URL is all you need. Works with any automation tool — Python, Node.js, n8n — a single HTTP GET pulls web content as markdown.

Note

Some sites may block Jina Reader due to bot protection policies. In those cases, try adding the x-with-proxy: true header or use the cookie forwarding feature.

🔗

Want to Go Deeper?

Jina Reader API Official Page

Live demo, pricing, and full parameter docs all in one place

jina-ai/reader GitHub Repository

Source code, issue tracker, and self-hosting guide

Reader-LM: Dedicated HTML-to-Markdown Model

Deep dive into ReaderLM-v2 architecture and benchmarks

Jina AI vs. Firecrawl Detailed Comparison

Price, performance, and license comparison for real-world use

Google Cloud Blog — Jina AI Infrastructure Case Study

How they built a 100-billion-token web grounding system

FAQ

How strict are the rate limits without an API key?

Without an API key, you get 20 requests per minute with an average response time of 7.9 seconds. A free API key bumps that to 500 RPM with 2.5-second responses. Signing up is simple and comes with 10 million free tokens, so it is worth getting a key after initial testing.

Does it handle JavaScript-rendered SPA sites properly?

Yes, it uses Puppeteer-based headless Chrome to render pages, so SPAs work fine. You can use the x-wait-for-selector header to wait for specific DOM elements to load, and x-timeout to control wait duration.

Does it work well with non-English content?

ReaderLM-v2 supports 29 languages including Korean, Japanese, Chinese, and German. According to the Google Cloud blog, Jina AI has invested significantly in multilingual processing.

Can it convert PDF files too?

Just feed it the PDF URL directly. It handles PDFs with images too, powered by PDF.js. However, scanned image PDFs requiring OCR may yield limited results.

What advantages does it have over alternatives like Firecrawl or Crawl4AI?

The biggest difference is the barrier to entry. Firecrawl requires setup and Crawl4AI needs local installation, while Jina Reader just needs a URL prefix. The Apache 2.0 license also means no restrictions on commercial use. That said, at scale (100K+ pages/month), Firecrawl can be 4-5x cheaper.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Antioch — Meet the Cursor for Robot AI

Physical AI startups no longer need to rent warehouses or build million-dollar test facilities. Antioch brings software-speed development to robotics through cloud simulation — and just raised $8.5M seed to prove it.

Explore more AI workflow guides on similar topics

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

morningbrew.com

Medvi telehealth, AI startup leverage, GLP-1 startup, one-person unicorn, AI operations

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

Matthew Gallagher built Medvi, a GLP-1 telehealth startup, in 14 months with $20,000 and AI tools. 2 employees. 16.2% net margin. $401M in year one. Here's how the model works — and where it's breaking.

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

substackcdn.com

What if your code review was already done when you woke up, and your newsletter

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

What if your code review was already done when you woke up, and your newsletter sources were already organized? Here's how to automate recurring tasks with Claude Code Scheduled Task.

Next →Antioch — Meet the Cursor for Robot AI