PDF scientific paper translation and bilingual comparison.
-
📊 Retain formulas and charts.
-
📄 Preserve table of contents.
-
🌐 Support multiple translation services.
Feel free to provide feedback in issues or user group.
Require Python version >=3.8, <=3.12
pip install pdf2zh
Execute the translation command in the command line to generate the translated document example-zh.pdf
and the bilingual document example-dual.pdf
in the current directory. Use Google as the default translation service.
Please refer to ChatGPT for how to set environment variables.
- Entire document
pdf2zh example.pdf
- Part of the document
pdf2zh example.pdf -p 1-3,5
See Google Languages Codes, DeepL Languages Codes
pdf2zh example.pdf -li en -lo ja
-
DeepL
See DeepL
Set ENVs to construct an endpoint like:
{DEEPL_SERVER_URL}/translate
DEEPL_SERVER_URL
(Optional), e.g.,export DEEPL_SERVER_URL=https://api.deepl.com
DEEPL_AUTH_KEY
, e.g.,export DEEPL_AUTH_KEY=xxx
pdf2zh example.pdf -s deepl
-
DeepLX
See DeepLX
Set ENVs to construct an endpoint like:
{DEEPL_SERVER_URL}/translate
DEEPLX_SERVER_URL
(Optional), e.g.,export DEEPLX_SERVER_URL=https://api.deeplx.org
DEEPLX_AUTH_KEY
, e.g.,export DEEPLX_AUTH_KEY=xxx
pdf2zh example.pdf -s deeplx
-
Ollama
See Ollama
Set ENVs to construct an endpoint like:
{OLLAMA_HOST}/api/chat
OLLAMA_HOST
(Optional), e.g.,export OLLAMA_HOST=https://localhost:11434
pdf2zh example.pdf -s ollama:gemma2
-
LLM with OpenAI compatible schemas (OpenAI / SiliconCloud / Zhipu)
See SiliconCloud, Zhipu
Set ENVs to construct an endpoint like:
{OPENAI_BASE_URL}/chat/completions
OPENAI_BASE_URL
(Optional), e.g.,export OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_API_KEY
, e.g.,export OPENAI_API_KEY=xxx
pdf2zh example.pdf -s openai:gpt-4o
-
Azure
Following ENVs are required:
AZURE_APIKEY
, e.g.,export AZURE_APIKEY=xxx
AZURE_ENDPOINT
, e.g,export AZURE_ENDPOINT=https://api.translator.azure.cn/
AZURE_REGION
, e.g.,export AZURE_REGION=chinaeast2
pdf2zh example.pdf -s azure
Use regex to specify formula fonts and characters that need to be preserved.
pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
See documentation for GUI for more details.
Document merging: PyMuPDF
Document parsing: Pdfminer.six
Document extraction: MinerU
Multi-threaded translation: MathTranslate
Layout parsing: DocLayout-YOLO
Document standard: PDF Explained, PDF Cheat Sheets