Summra Book – Pengyao Notes

Classic literature, rewritten sentence-by-sentence into modern English.

The problem

The classics are full of ideas worth reading. But Victorian sentence structure, archaic vocabulary, and paragraphs that stretch for half a page keep most readers away. People who would love Wuthering Heights or Moby Dick never make it past chapter two.

Existing “modernized” editions either butcher the original or skip the hard parts entirely. I wanted something different: the full original text, preserved — and right next to it, a faithful modern translation, paragraph for paragraph, entirely for free leveraging the power of AI.

The solution: side-by-side reading

Wuthering Heights Original (left) and the modern translation (right). Paragraphs align exactly.

Books on Summra has both versions. You can read the original alone, the modern alone, or both side by side and jump between them. Paragraph boundaries match exactly, so you never lose your place when you switch.

The translation isn’t a summary or a paraphrase. Every sentence in the original has a corresponding sentence in modern English, and the meaning, tone, and structure are preserved.

The translation rules

“Modern English” is vague, so the pipeline holds every chapter to a concrete target: Flesch-Kincaid grade 7–9. That’s the level most U.S. news sites aim for, and it’s the sweet spot where prose stays readable without sounding dumbed down.

The translation prompt enforces a specific set of rules:

Replace archaic vocabulary — thee, thou, wherefore, betwixt, countenance, visage become you, why, between, face.
Break up long sentences — a 60-word Victorian sentence with three semicolons becomes two or three shorter ones.
Modernize idioms and references — “go to the Deuce” becomes “go to hell”; obscure cultural references get the closest modern equivalent.
Preserve meaning, tone, and structure — paragraph boundaries stay exactly where they were, names and dialogue stay intact, and the narrator’s voice carries through.
No summarizing, no skipping — every sentence in the original gets a corresponding sentence in modern English. Nothing is dropped because it was “implied.”
Keep proper nouns and period detail — characters still ride in carriages, not Ubers. The translation modernizes the language, not the setting.

The validator checks all of this automatically: paragraph counts have to match, character ratio has to stay between 50% and 110% (anything shorter is probably a summary, anything much longer is probably padded), and the output has to end on a sentence terminator. Anything that fails goes back through the self-healing pipeline.

Features

Side-by-side reading — original and modern translation, paragraph-aligned.
Kindle-style reading controls — font size, font family, light/dark/sepia themes.
AI-generated summaries — per chapter and per book, for orientation before you dive in.
Audio summary — listen to book summaries before committing.
Visual guides — character maps and plot summaries for dense classics.

By the numbers

90 classic books ingested from Project Gutenberg.
38 with full plain-English translations (and growing).
4,159 chapters parsed and aligned.
~16 million characters of plain-English prose generated — roughly 2.6 million words.
Titles include Pride and Prejudice, Frankenstein, Moby Dick, The Odyssey, Jane Eyre, Dracula, The Picture of Dorian Gray, The Origin of Species, Thus Spake Zarathustra.

How it’s built

Backend — Python + Flask, SQLite for storage.
Translation — Google Gemini, with a multi-pass pipeline that enforces paragraph-by-paragraph alignment, strips decorative dividers, and re-runs chapters where the output is truncated or summary-shaped.
Chapter parsing — custom parser that handles single-level (CHAPTER I..N), two-level (PART I / CHAPTER I..N), and verse structures from raw Project Gutenberg texts.
AI-driven quality control — every chapter passes a deterministic validator (paragraph count match, character ratio, sentence-terminator endings) with an AI-driven self-healing orchestrator backed by Claude that diagnoses failures and dispatches the right fix — mechanical for title-misses and divider noise, regeneration for truncations, sub-agent splits for paragraph merges.
Frontend — vanilla JS reader with Playwright-based end-to-end tests.

The hardest part wasn’t the translation itself — it was getting paragraph boundaries to match exactly across all 4,000+ chapters. LLMs love to merge or split paragraphs silently, and even a single off-by-one breaks the side-by-side view. The pipeline has six steps just for paragraph alignment, including mechanical fixes for the most common failure modes (missing chapter titles, decorative scene dividers, hard-wrapped source text).

Try it

summrabook.com

Try Wuthering Heights or Moby Dick if you want to see the translation earn its keep.