Adds Wind and Truth (till Ch9)

other changes:
1. go away wkhtml2pdf
2. switch to wkhtml2pdf
3. go away calibre, mobi isnt' used anywhere now

want to get rid of pdftk also, soon
This commit is contained in:
Nemo 2024-08-28 17:50:40 +05:30
parent dd39b9037a
commit 4411b032ed
10 changed files with 112 additions and 36 deletions

1
.gitignore vendored
View File

@ -12,3 +12,4 @@ darkone/
*.cbz
/mythwalker/*.html
lost-metal/
wat/*.html

View File

@ -12,15 +12,13 @@ WORKDIR /src
RUN apt-get update && \
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
calibre \
pandoc \
pdftk-java \
ruby \
ruby-dev \
wget \
wkhtmltopdf \
xvfb \
zlib1g-dev \
python3-xhtml2pdf \
&& gem install bundler \
&& bundle install \
&& apt-get remove -y --purge build-essential \

View File

@ -1,11 +1,11 @@
GEM
remote: https://rubygems.org/
specs:
humanize (2.5.1)
nokogiri (1.16.6-x86_64-linux)
humanize (3.1.0)
nokogiri (1.16.7-x86_64-linux)
racc (~> 1.4)
paru (1.2.1)
racc (1.8.0)
paru (1.3)
racc (1.8.1)
PLATFORMS
x86_64-linux
@ -16,4 +16,4 @@ DEPENDENCIES
paru
BUNDLED WITH
2.3.22
2.5.11

View File

@ -8,15 +8,13 @@ If you are interested in just generating the books, follow the guide on the READ
- Nokogiri gem installed (`gem install nokogiri`)
- `pandoc` installed and available (for all 3 formats)
- Paru gem installed (`gem install paru`)
- (mobi only): `ebook-convert` (from calibre) available to generate the mobi file
- (pdf) `wkhtmltopdf` for converting html to pdf
- (pdf) `xhtml2pdf` for converting html to pdf
- (pdf) `pdftk` to stitch the final PDF file
### Notes
- The final 2 tools can be skipped if you don't care about the PDF generation.
- You can also skip calibre if you only want the EPUB file.
- Edit the last line in `*.rb` to `:epub` / `:mobi`, `:pdf` to only trigger the specific builds
- Edit the last line in `*.rb` to `:epub` / `:pdf` to only trigger the specific builds
- Windows users need wget. Download the latest wget.exe from https://eternallybored.org/misc/wget/ and add it's directory to the PATH environment variable or put it directly in C:\Windows.
## Generation
@ -27,7 +25,7 @@ After downloading the repo and installing the requirements, just run
ruby oathbringer.rb
All the generated files will be saved with the filename `Oathbringer.{epub|pdf|mobi|html}`
All the generated files will be saved with the filename `Oathbringer.{epub|pdf|html}`
### Way of Kings Reread
@ -35,7 +33,7 @@ To generate the book:
ruby wok-reread.rb
All the generated files will be saved with the filename `wok-reread.{epub|pdf|mobi|html}`
All the generated files will be saved with the filename `wok-reread.{epub|pdf|html}`
### Words of Radiance Reread
@ -43,7 +41,7 @@ To generate the book:
ruby wor-reread.rb
All the generated files will be saved with the filename `books/wok-reread.{epub|pdf|mobi|html}`. This generation might take a while because it contains a lot of images. It doesn't have the best possible index either, but is still pretty readable.
All the generated files will be saved with the filename `books/wok-reread.{epub|pdf|html}`. This generation might take a while because it contains a lot of images. It doesn't have the best possible index either, but is still pretty readable.
### Edgedancer Reread
@ -51,25 +49,25 @@ To generate the book:
ruby edgedancer-reread.rb
All the generated files will be saved with the filename `books/edgedancer-reread.{epub|pdf|mobi|html}`. This generation might take a while because it contains a lot of images. It doesn't have the best possible index either, but is still pretty readable.
All the generated files will be saved with the filename `books/edgedancer-reread.{epub|pdf|html}`. This generation might take a while because it contains a lot of images. It doesn't have the best possible index either, but is still pretty readable.
### Warbreaker Prime: Mythwalker
ruby mythwalker.rb
All the generated files will be saved with the filename `books/mythwalker.{epub|pdf|mobi|html}`. This generation might take a while the script attempts to strip out unnecessary HTML.
All the generated files will be saved with the filename `books/mythwalker.{epub|pdf|html}`. This generation might take a while the script attempts to strip out unnecessary HTML.
### Oathbringer Reread
ruby oathbringer-reread.rb
All the generated files will be saved with the filename `books/oathbringer-reread.{epub|pdf|mobi|html}`. This generation might take a while the script attempts to strip out unnecessary HTML.
All the generated files will be saved with the filename `books/oathbringer-reread.{epub|pdf|html}`. This generation might take a while the script attempts to strip out unnecessary HTML.
### Skyward
ruby skyward.rb
All the generated files will be saved with the filename `books/skyward.{epub|pdf|mobi|html}`. This generation might take a while the script attempts to strip out unnecessary HTML.
All the generated files will be saved with the filename `books/skyward.{epub|pdf|html}`. This generation might take a while the script attempts to strip out unnecessary HTML.
### Defending Elysium

View File

@ -6,6 +6,7 @@ Scripts to generate books from the [Cosmere](https://coppermind.net/wiki/Cosmere
**Books**
1. Wind and Truth (being serialized till Chapter 30, In Progress)
1. Rhythm of War (Serialized till Chapter 18) (with Annotations and Illustrations)
1. Oathbringer (Serialized till Chapter 32)
1. Warbreaker Prime: Mythwalker
@ -59,15 +60,29 @@ As an example, you'd like to get a ebook for Rhythm of War, run the following co
For directions specific to your OS, see above.
## Wind and Truth
> Reactor is serializing the new book from now until its release date on
December 6, 2024. A new installment will go live every Monday at 11 AM ET,
along with read-along commentary from Stormlight beta readers and Cosmere
experts Lyndsey Luther, Drew McCaffrey, and Paige Vest.
## Rhythm of War
>The chapter-by-chapter serialization of Rhythm of War, Brandon Sandersons fourth volume in The Stormlight Archive series. New chapters go live every Tuesday up to the November 17, 2020 release date.
>The chapter-by-chapter serialization of Rhythm of War, Brandon Sandersons
fourth volume in The Stormlight Archive series. New chapters go live every
Tuesday up to the November 17, 2020 release date.
This supports the annotations that Brandon is publishing on Reddit along with [3 illustrations from Part 1](https://www.17thshard.com/forum/topic/92967-some-new-illustrations-from-row/). This covers the entire Part 1 of the book.
## Oathbringer
Tor.com is published Oathbringer in serialized form till Chapter 32. This script downloads all of these posts and converts them into a publishable format, including epub, mobi, pdf and html. You can find the tor.com announcement at https://www.tor.com/2017/08/15/brandon-sanderson-oathbringer-serialization-announcement/. This covers the entire Part 1 of the book.
Tor.com is published Oathbringer in serialized form till Chapter 32. This
script downloads all of these posts and converts them into a publishable
format, including epub, mobi, pdf and html. You can find the tor.com
announcement at
https://www.tor.com/2017/08/15/brandon-sanderson-oathbringer-serialization-announcement/.
This covers the entire Part 1 of the book.
## Way of Kings Reread

BIN
covers/wat.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 287 KiB

BIN
covers/wat.pdf Normal file

Binary file not shown.

6
metadata/wat.xml Normal file
View File

@ -0,0 +1,6 @@
<dc:identifier id="epub-id-1" opf:scheme="ISBN-10">1250319188</dc:identifier>
<dc:identifier id="epub-id-2" opf:scheme="ISBN-13">978-1250319180</dc:identifier>
<dc:title id="epub-title-1">Wind and Truth: Book Five of the Stormlight Archive</dc:title>
<dc:date>2024-12-06</dc:date>
<dc:language>en-US</dc:language>
<dc:creator id="epub-creator-1" opf:role="aut">Brandon Sanderson</dc:creator>

View File

@ -58,13 +58,22 @@ def gen_epub(name, format)
end
end
def gen_mobi(name, format)
if command?('ebook-convert') && format_match(format, :mobi)
# Convert epub to a mobi
`ebook-convert books/#{name}.epub books/#{name}.mobi`
puts '[mobi] Generated MOBI file'
else
puts "[error] Can't generate MOBI without ebook-convert"
def gen_pdf_pandoc(name, format)
return unless format_match(format, :pdf)
begin
require 'paru/pandoc'
Paru::Pandoc.new do
from 'html'
to 'pdf'
pdf_engine 'xelatex'
metadata title: name
data_dir Dir.pwd
output "books/#{name}.pdf"
end.convert File.read("books/#{name}_pdf.html")
puts '[pdf] Generated PDF file'
rescue LoadError
puts "[error] Can't generate PDF without paru"
end
end
@ -77,18 +86,13 @@ rescue Errno::ENOENT
end
def gen_pdf(name, format)
if commands?(%w[pandoc convert wkhtmltopdf pdftk]) && format_match(format, :pdf)
if commands?(%w[pandoc convert xhtml2pdf pdftk]) && format_match(format, :pdf)
# Generate PDF as well
# First, lets make a better css version of the html
`pandoc books/#{name}.html -s -c ../epub.css -o books/#{name}_pdf.html`
puts '[pdf] Generated html for pdf'
# Print the pdf_html file to pdf
if inside_docker?
`xvfb-run wkhtmltopdf books/#{name}_pdf.html books/#{name}-nocover.pdf`
else
`wkhtmltopdf books/#{name}_pdf.html books/#{name}-nocover.pdf`
end
`xhtml2pdf books/#{name}_pdf.html books/#{name}-nocover.pdf`
puts '[pdf] Generated PDF without cover'
@ -102,6 +106,5 @@ end
def generate(name, format = :all)
gen_epub(name, format)
gen_mobi(name, format)
gen_pdf(name, format)
end

55
wat.rb Normal file
View File

@ -0,0 +1,55 @@
# frozen_string_literal: true
require 'date'
require 'fileutils'
require 'nokogiri'
require_relative './methods'
FileUtils.mkdir_p('wat')
BASE = 'https://reactormag.com/read-wind-and-truth-by-brandon-sanderson-'
links = [
'preface-and-prologue/',
'chapters-1-and-2/',
'chapters-3-and-4/',
'chapters-5-and-6/',
'chapters-7-8-and-9/',
]
# Automatically adds all recent chapters
puts 'Downloading all found links'
episode = 1
links.each do |link|
url = BASE + link
puts "Download #{url}"
unless File.exist? "wat/#{episode}.html"
`wget --no-clobber "#{url}" --output-document "wat/#{episode}.html" -o /dev/null`
end
episode += 1
end
# Now we have all the files
html = ''
(1..(links.length)).each do |i|
page = Nokogiri::HTML(open("wat/#{i}.html")).css('article-content')
start = ending = false
page.children.each do |e|
if e.name == 'h3'
e.name = 'h1'
start = true
end
ending = true if e.text.include?("Excerpted") && start
e.remove if !start || ending
end
chapter_html = page.inner_html.sub(/<h1/, "<h1 id='chapter-#{i-1}'")
html += chapter_html
url = BASE + links[i - 1]
end
File.open('books/wat.html', 'w') { |file| file.write(html) }
puts '[html] Generated HTML file'
generate('wat', :all)