Skip to content

joeseesun/markdown-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

markdown-proxy

Convert any URL to clean Markdown, with built-in support for login-required pages (X/Twitter, WeChat, Feishu/Lark docs, etc.)

将任意 URL 转为干净的 Markdown,支持需要登录的页面(X/Twitter、微信公众号、飞书文档等)。

English | 中文


English

Features

Send any URL to Claude, and it automatically fetches the full content as Markdown. Four special platforms have dedicated extraction:

URL Type Method Why
WeChat Articles (mp.weixin.qq.com) Built-in Playwright script Anti-scraping protection requires headless browser
Feishu/Lark Docs (feishu.cn, larksuite.com) Built-in Feishu API script Requires API authentication, auto-converts to Markdown
YouTube Dedicated YouTube skill Video content has its own toolchain
All other URLs Proxy cascade: r.jina.ai → defuddle.md → agent-fetch Free, no API key needed

Prerequisites

  • Claude Code installed
  • curl (built-in on macOS/Linux)
  • (WeChat scraping) Python 3.8+ with playwright
    pip install playwright beautifulsoup4 lxml
    playwright install chromium
  • (Proxy fallback) agent-fetch — local fallback when online proxies fail
    npx agent-fetch --help  # No pre-install needed, npx auto-downloads
  • (Feishu docs) Environment variables FEISHU_APP_ID and FEISHU_APP_SECRET
    echo $FEISHU_APP_ID  # Verify configured

Installation

npx skills add joeseesun/markdown-proxy

Verify:

ls ~/.claude/skills/markdown-proxy/SKILL.md

Usage

Just send Claude a URL:

Proxy Priority

  1. r.jina.ai — Most complete content, preserves image links
  2. defuddle.md — Cleaner output with YAML frontmatter
  3. agent-fetch — Local tool, no network proxy needed
  4. defuddle CLI — Local CLI, good for standard web pages

Feishu/Lark Document Support

Built-in fetch_feishu.py script fetches documents via Feishu Open API and auto-converts to Markdown:

  • Supports new docs (docx), legacy docs (doc), and wiki pages
  • Auto-parses document blocks into Markdown format
  • Supports headings, lists, code blocks, quotes, todos, equations, images, etc.
  • Requires FEISHU_APP_ID and FEISHU_APP_SECRET environment variables
  • App needs docx:document:readonly permission

Troubleshooting

Issue Solution
WeChat scraping fails Run playwright install chromium to install browser
Feishu returns permission error Check FEISHU_APP_ID and FEISHU_APP_SECRET env vars, confirm app has document read permission
Feishu wiki page fails Confirm app has wiki:wiki:readonly permission
r.jina.ai returns empty Auto-falls back to defuddle.md (no action needed)
All proxies fail URL may have strict auth restrictions, try npx agent-fetch

Credits


中文

功能

给 Claude 发一个 URL,自动抓取完整内容并转为 Markdown。支持四种特殊平台的专用抓取:

URL 类型 抓取方式 原因
微信公众号 (mp.weixin.qq.com) 内置 Playwright 脚本 公众号有反爬,需无头浏览器
飞书文档 (feishu.cn/docx/, /wiki/, /docs/) 内置飞书 API 脚本 需要 API 认证,自动转 Markdown
YouTube 专用 YouTube skill 视频内容有专用工具链
其他所有 URL 代理级联:r.jina.ai → defuddle.md → agent-fetch 免费、无需 API key

前置条件

  • 已安装 Claude Code
  • curl(macOS/Linux 自带)
  • (公众号抓取)Python 3.8+ 及 playwright
    pip install playwright beautifulsoup4 lxml
    playwright install chromium
  • (代理降级)agent-fetch — 当在线代理都失败时的本地回退工具
    npx agent-fetch --help  # 无需预装,npx 自动下载
  • (飞书抓取)环境变量 FEISHU_APP_IDFEISHU_APP_SECRET
    echo $FEISHU_APP_ID  # 验证已配置

安装

npx skills add joeseesun/markdown-proxy

验证:

ls ~/.claude/skills/markdown-proxy/SKILL.md

使用示例

直接给 Claude 发 URL:

代理优先级

  1. r.jina.ai — 内容最完整,保留图片链接
  2. defuddle.md — 输出更干净,带 YAML frontmatter
  3. agent-fetch — 本地工具,无需网络代理
  4. defuddle CLI — 本地 CLI,适合普通网页

飞书文档支持

内置 fetch_feishu.py 脚本,通过飞书开放 API 抓取文档内容并自动转为 Markdown:

  • 支持新版文档(docx)、旧版文档(doc)、知识库页面(wiki)
  • 自动解析文档 blocks 并转换为 Markdown 格式
  • 支持标题、列表、代码块、引用、待办、公式、图片等
  • 需要飞书应用的 FEISHU_APP_IDFEISHU_APP_SECRET 环境变量
  • 应用需要 docx:document:readonly 权限

常见问题

| 问题 | 解决方法 | |------|---------.| | 公众号抓取失败 | 运行 playwright install chromium 安装浏览器 | | 飞书文档返回权限错误 | 检查 FEISHU_APP_IDFEISHU_APP_SECRET 环境变量,确认应用有文档读取权限 | | 飞书知识库页面抓取失败 | 确认应用有 wiki:wiki:readonly 权限 | | r.jina.ai 返回空内容 | 自动降级到 defuddle.md(无需手动操作) | | 所有代理都失败 | URL 可能有严格认证限制,尝试 npx agent-fetch |

致谢


关注作者

  • X (Twitter): @vista8
  • 微信公众号「向阳乔木推荐看」

向阳乔木推荐看公众号二维码

About

Fetch any URL as clean Markdown via proxy services (r.jina.ai / defuddle.md) or built-in scripts. Works with login-required pages like X/Twitter, WeCh

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages