[dep] Crystal Upgrade to 1.10.1

The Crystal Debian repo has moved, so we shift as well. Debian 10 is still supported, so use it for now
[dep] Dependency Upgrade
2023-10-31 23:44:52 +05:30 · 2023-10-31 23:42:35 +05:30 · 2022-05-30 14:50:06 +05:30 · 2021-06-04 13:56:51 +05:30 · 2020-07-01 18:29:44 +05:30 · 2020-07-01 18:29:22 +05:30
31 changed files with 5522 additions and 132 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1,3 @@
+ko_fi: captn3m0
+liberapay: captn3m0
+github: captn3m0
--- a/.travis.yml
+++ b/.travis.yml
@ -12,9 +12,9 @@ install:
 script:
  - crystal spec
  - crystal tool format --check
-  - git ls-files --exclude='Dockerfile*' --ignored | xargs --max-lines=1 ${HADOLINT}
+  - git ls-files --exclude='Dockerfile*' --ignored | xargs --max-lines=1 ${HADOLINT} --ignore DL3008

 addons:
  apt:
    packages:
-      - pdftk
+      - pdftk
--- a/24
+++ b/24
@ -5,29 +5,33 @@ WORKDIR /build
 COPY . .

 # Add the key for the crystal debian repo
-ADD https://keybase.io/crystal/pgp_keys.asc /tmp/crystal.gpg
+ADD https://download.opensuse.org/repositories/devel:/languages:/crystal/Debian_10/Release.key /tmp/crystal.key

 # See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199 for why mkdir is needed
 RUN mkdir -p /usr/share/man/man1 && \
 	apt-get update && \
 	apt-get install  --yes --no-install-recommends \
 	# Install gnupg for the apt-key operation
-	gnupg=2.2.12-1+deb10u1 \
+	gnupg \
 	# libssl for faster TLS in Crystal
-	libssl-dev=1.1.1d-0+deb10u2 \
+	libssl-dev \
 	# pdftk as a dependency for muse-dl
 	pdftk=2.02-5 \
 	# ca-certificates for talking to crystal-lang.org
-	ca-certificates=20190110 \
+	ca-certificates \
 	# git to let shards install happen
-	git=1:2.20.1-2+deb10u1 \
+	git \
+	# needed by myhtml crystal shard
+	make \
 	# build --release
-	zlib1g-dev=1:1.2.11.dfsg-1 && \
+	zlib1g-dev && \
 	# See https://crystal-lang.org/install/
-	apt-key add /tmp/crystal.gpg && \
-	echo "deb https://dist.crystal-lang.org/apt crystal main" > /etc/apt/sources.list.d/crystal.list && \
+	echo "deb http://download.opensuse.org/repositories/devel:/languages:/crystal/Debian_10/ /" | tee /etc/apt/sources.list.d/crystal.list && \
+	gpg --dearmor /tmp/crystal.key && \
+	mv /tmp/crystal.key.gpg /etc/apt/trusted.gpg.d/crystal.gpg && \
+	rm /tmp/crystal.key && \
 	apt-get update && \
-	apt-get install --no-install-recommends --yes crystal=0.33.0-1 && \
+	apt-get install --no-install-recommends --yes crystal && \
 	# Cleanup
 	apt-get clean && \
 	rm -rf /var/lib/apt/lists/*
@ -40,4 +44,4 @@ RUN apt-get --yes remove git gnupg
 WORKDIR /data
 VOLUME /data

-ENTRYPOINT ["/usr/bin/muse-dl"]
+ENTRYPOINT ["/usr/bin/muse-dl"]
--- a/5
+++ b/5
@ -7,4 +7,7 @@ release:
 	# Then extract the image | extract the layer.tar file (we only have one layer) | extract the muse-dl-static file
 	docker image save muse-dl-static | tar xf - --wildcards "*/layer.tar" -O | tar xf - "muse-dl-static"
 	# And move it to the bin/ directory
-	mv -f muse-dl-static bin/
+	mv -f muse-dl-static bin/
+
+test:
+	crystal spec
--- a/README.md
+++ b/README.md
@ -1,4 +1,4 @@
-# muse-dl ![Travis (.org)](https://img.shields.io/travis/captn3m0/muse-dl) ![GitHub issues](https://img.shields.io/github/issues/captn3m0/muse-dl) ![GitHub issues by-label](https://img.shields.io/github/issues/captn3m0/muse-dl/bug?color=red&label=open%20bugs) ![GitHub](https://img.shields.io/github/license/captn3m0/muse-dl) ![GitHub top language](https://img.shields.io/github/languages/top/captn3m0/muse-dl) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com) 
+# muse-dl ![Travis (.org)](https://img.shields.io/travis/captn3m0/muse-dl) ![GitHub issues](https://img.shields.io/github/issues/captn3m0/muse-dl) ![GitHub issues by-label](https://img.shields.io/github/issues/captn3m0/muse-dl/bug?color=red&label=open%20bugs) ![GitHub](https://img.shields.io/github/license/captn3m0/muse-dl) ![GitHub top language](https://img.shields.io/github/languages/top/captn3m0/muse-dl) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com) ![Docker Cloud Automated build](https://img.shields.io/docker/cloud/automated/captn3m0/muse-dl) ![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/captn3m0/muse-dl) ![Docker Image Size (latest semver)](https://img.shields.io/docker/image-size/captn3m0/muse-dl)

 Download PDFs from Project MUSE and stitch them together into a single-file using [`pdftk`](https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/).

@ -28,15 +28,26 @@ A docker image is available at `captn3m0/muse-dl` on Docker Hub. The working dir

 ```
 # Download the book, and put it in your Downloads directory
-docker run -it /home/nemo/Downloads:/data captn3m0/muse-dl https://muse.jhu.edu/book/875
+docker run -it /home/nemo/Downloads:/data captn3m0/muse-dl:edge https://muse.jhu.edu/book/875

-# If you have a list.txt file in your Downloads directory, then you can run 
-docker run -it /home/nemo/Downloads:/data captn3m0/muse-dl /data/list.txt
+# If you have a list.txt file in your Downloads directory, then you can run
+docker run -it /home/nemo/Downloads:/data captn3m0/muse-dl:edge /data/list.txt

 # If you want to keep the temporary files with your host, and not delete them
-docker run -it /home/nemo/Downloads:/data /tmp:/musetmp --tmp-dir /musetmp --no-cleanup https://muse.jhu.edu/book/875
+docker run -it /home/nemo/Downloads:/data /tmp:/musetmp captn3m0/muse-dl:edge --tmp-dir /musetmp --no-cleanup https://muse.jhu.edu/book/875
 ```

+Replace edge with the latest version number if you'd like to run a tagged release.
+
+### Docker Images
+
+The following images are available:
+
+- `edge`: Run `muse-dl` against latest master.
+- `edge-static`: Get the pre-built static-binary against latest master.
+- `v1.3.1`: Run `muse-dl` against the specific release.
+- `v1.3.1-static`: Get the pre-built static binary against the specific release.
+
 ## Requirements

 Please ensure you have `pdftk` installed, unless you're running via docker.
@ -53,8 +64,8 @@ INPUT_FILE: Path to a file containing a list of links
    --tmp-dir PATH                   Temporary Directory to use
    --output FILE                    Output Filename
    --no-bookmarks                   Don't add bookmarks in the PDF
-    --input-pdf INPUT                Input Stitched PDF. Will not download anything
-    --clobber                        Overwrite the output file, if it already exists. Not compatible with input-pdf
+    --clobber                        Overwrite the output file, if it already exists.
+    --dont-strip-first-page          Disables first page from being stripped. Use carefully
    --cookie COOKIE                  Cookie-header
    -h, --help                       Show this help
 ```
@ -74,4 +85,4 @@ And it will download all the links in that file.

 ## License

-Licensed under the [MIT License](https://nemo.mit-license.org/). See LICENSE file for details.
+Licensed under the [MIT License](https://nemo.mit-license.org/). See LICENSE file for details.
--- a/shard.lock
+++ b/shard.lock
@ -1,14 +1,22 @@
-version: 1.0
+version: 2.0
 shards:
  crest:
-    github: mamantoha/crest
-    version: 0.24.1
+    git: https://github.com/mamantoha/crest.git
+    version: 1.3.12

  http-client-digest_auth:
-    github: mamantoha/http-client-digest_auth
-    version: 0.3.0
+    git: https://github.com/mamantoha/http-client-digest_auth.git
+    version: 0.6.0
+
+  http_proxy:
+    git: https://github.com/mamantoha/http_proxy.git
+    version: 0.10.1

  myhtml:
-    github: kostya/myhtml
-    version: 1.5.1
+    git: https://github.com/kostya/myhtml.git
+    version: 1.5.8
+
+  webmock:
+    git: https://github.com/manastech/webmock.cr.git
+    version: 0.14.0+git.commit.42b347cdd64e13193e46167a03593944ae2b3d20

--- a/shard.yml
+++ b/shard.yml
@ -1,5 +1,5 @@
 name: muse-dl
-version: 1.1.0
+version: 1.3.1

 authors:
  - Nemo <muse.dl@captnemo.in>
@ -15,4 +15,9 @@ dependencies:
  myhtml:
    github: kostya/myhtml
  crest:
-    github: mamantoha/crest
+    github: mamantoha/crest
+
+development_dependencies:
+  webmock:
+    github: manastech/webmock.cr
+    branch: master
--- a/spec/fetch_spec.cr
+++ b/spec/fetch_spec.cr
@ -1,7 +1,12 @@
 require "./spec_helper"
+require "webmock"
 # require "errors/muse_corrupt_pdf.cr"

 describe Muse::Dl::Book do
+  headers = {"Content-Type" => "text/html"}
+  WebMock.stub(:get, "https://muse.jhu.edu/chapter/2379787/pdf")
+    .to_return(body_io: File.new("spec/fixtures/chapter-2379787.html"), headers: headers)
+
  it "should notice the unable to construct chapter PDF error" do
    f = "/tmp/chapter-2379787.pdf"
    File.delete(f) if File.exists? f
--- a/spec/fixtures/chapter-2379787.html
+++ b/spec/fixtures/chapter-2379787.html
@ -0,0 +1,359 @@
+<style>
+.page404 {
+	display: table;
+	width: 100%;
+	padding: 60px 4em;
+	min-height: 350px;
+}
+.page404 .int {
+	display: table-cell;
+	vertical-align: middle;
+	text-align: left; 
+}
+.page404 h4 {
+	margin-bottom: 10px;
+	font-weight: 700;
+}
+.page404 .logo {
+	display: table-cell;
+	width: 23%;
+	vertical-align: middle;
+	padding-right: 30px;
+}
+.page404 blockquote {
+	border: none;
+	padding-left: 0;
+}
+</style>
+
+
+<!DOCTYPE html>
+<html lang="en">
+	<head>
+		<!-- Global site tag (gtag.js) - Google Analytics -->
+		<script async src="https://www.googletagmanager.com/gtag/js?id=UA-58347753-2"></script>
+		<script>
+		  window.dataLayer = window.dataLayer || [];
+		  function gtag(){dataLayer.push(arguments);}
+		  gtag('js', new Date());		
+		  gtag('config', 'UA-58347753-2');
+		</script>
+		<meta charset="utf-8">
+		<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+		<meta property="og:image" content="/images/muselogo_dark.jpg" />
+
+		
+		
+		<title>Project MUSE</title>
+		<link rel="search" type="application/opensearchdescription+xml" title="Search Project MUSE from your browser's Searchbar" href="/plugins/muse-opensearch.xml" />
+		
+		
+		<link rel="stylesheet" type="text/css" href="/css/normalize.css"/>
+		<link href="/css/jquery.qtip2.css" rel="stylesheet" type="text/css" />
+<!-- 		foundation 6.4.1 custom float/typ/vis 250rem max width 30col float grid -->
+		<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,400i,600,600i,700,700i" rel="stylesheet">
+		<link rel="stylesheet" type="text/css" href="/css/foundation.min.css"/>
+		<link rel="stylesheet" type="text/css" href="/css/style_home2.css?031820"/>
+		
+		
+		
+		
+		
+
+		<script type="text/javascript" src="/js/jquery3.js"></script>		
+		<script type="text/javascript" src="/js/pre.js"></script>
+		<script type="text/javascript" src="/js/core/head.js?new"></script>
+		
+		<script type="text/javascript" src="https://s7.addthis.com/js/250/addthis_widget.js#pubid=ra-4ecb5479089cb81a"></script>
+
+		
+		
+		
+		<title>Article</title>
+	</head>
+	<body>
+		<a id="skip" href="#skip_target">[Skip to Content]</a>
+		<span id="top"></span>
+		<div id="header" role="banner" aria-label="header">
+			<div class="row wrap" id="institution_banner">
+				<div class="content">
+					<div id="institution_wrap" class="columns small-15 medium-text-left">
+						<div id="institution" class="img_text_col">
+							<div class="img_contain_left"><img src="/images/institution.png" alt="institution icon" /></div>
+							<div class="text_contain_left"><span class="small"><a href='/account' class='color_white login_status'>Institutional Login</a></span></div>
+						</div>
+					</div>
+					<div id="person_wrap" class="columns small-15">
+						<div id="person" class="img_text_col">
+							<div class="img_contain_right"><img src="/images/person.png" alt="account icon" /></div>
+							<div class="text_contain_right"><span class="small"><a href="/account/" class="color_white login_status" onclick="gtag('event', 'click', {'event_category': 'Account link', 'event_label': 'account name link - header'});">LOG IN</a></span></div>
+						</div>
+					</div>
+				</div>
+			</div>
+			
+			
+			
+			<div class="row wrap" id="search_banner">
+				<div class="content">
+					<div class="medium-4 small-4 columns" id="header_logo_wrap">
+						<div id="header_logo">
+							<a href="/"><img src="/images/muselogo.png" alt="Project MUSE" class="show-for-large"/>
+							<img src="/images/muselogo_notext.png" alt="Project MUSE" class="hide-for-large"/></a>							
+						</div>
+					</div>
+					<div class="medium-21 small-22 columns" id="search_bar_wrap">				
+						<div class="row"> 
+							<div id="browse_button_wrap">
+								<a id="browse_button" href="/browse" onclick="gtag('event', 'click', {'event_category': 'Browse link', 'event_label': 'browse button - header'});"><span class="small">browse</span></a>
+							</div>
+							<div id="or_text_wrap" class="show-for-medium">
+								<div id="or_text">
+									<span class="small">or</span> 
+								</div>
+							</div>
+							<div id="search_input_wrap" class="small-30">
+								<div id="search_input">
+  								
+  								<noscript>
+										<form method="post" action="/search/">
+										<input name="no_js_header_query"/>
+										<input type="hidden" name="action" value="search"/>
+										<input type="hidden" name="t" value="header"/>
+										<a id="search_button">
+										
+										<input type="image" src="/images/search_white.png" alt="Search icon"/>
+										
+                    </a>
+										</form>
+									</noscript>
+								
+									<script>document.write('<input name="search_input_header" id="search_input_header" aria-label="search input"/>');</script>									
+									
+									<script>document.write('<a id="search_button"><img src="/images/search_white.png" alt="Search icon"/></a>');</script>
+									
+									
+								</div>
+							</div>
+						</div>				
+					</div>
+				
+					<div class="medium-5 small-4 columns" id="menu_wrap">
+						<div id="menu" class="menu-btn">
+	<div class="nav-toggle">
+		<div class="nav-toggle-btn">
+			<a href="#" class="menu-icon-wrap">
+				<span class="icon"></span>
+				<span class="small show-for-large">menu</span>
+			</a>
+		</div>
+
+		<div class="nav-mobile">
+			<a href="/search">Advanced Search</a>
+			<a href="/browse">Browse</a>
+			<script>
+				document.write('<div class="accordion">');
+			</script>
+			<noscript>
+				<div class="accordion noscript">
+			</noscript>
+				<a href="#" class="acc_trig open"><span>MyMUSE Account</span></a>
+				<div class="acc_block">
+					<a href="/account">Log In / Sign Up</a>
+					<a href="/account/change">Change My Account</a>
+					<a href="/account/user_settings">User Settings</a>
+					<a href="/account/">Access via Institution</a>
+					<a href="/account/saved_items">MyMUSE Library</a>
+					<a href="/account/search_history">Search History</a>
+					<a href="/account/view_history">View History</a>
+					<a href="/account/purchase_history">Purchase History</a>
+					<a href="/account/alerts">MyMUSE Alerts</a>
+																
+				</div>
+			</div>									
+			
+			<div class="nav-mobile-footer">
+				<!--<a class="modal_trigger">Contact Support</a>-->
+				<a href="/contact">Contact Support</a>
+			</div>
+		</div>
+	</div>
+</div>
+		
+ 
+					</div>
+				</div>
+			</div>
+			
+			
+			
+		</div>
+
+
+
+<div class="page404" id="main">
+	<div class="logo">
+		<img src="/images/muselogo_notext.png" alt="MUSE logo">
+	</div>
+	<div class="int">
+		<html><head><title>Error</title></head><body>Unable to construct chapter PDF</body></html>
+
+	</div>
+</div>
+
+
+		<div id="footer_block" role="banner" aria-label="footer">
+			<div class="content">
+				<div class="wrap row" id="about_wrap">
+					<div id="about">
+						<h3>Project MUSE Mission</h3>
+						<p>Project MUSE promotes the creation and dissemination of essential humanities and social science resources through collaboration with libraries, publishers, and scholars worldwide. Forged from a partnership between a university press and a library, Project MUSE is a trusted part of the academic and scholarly community it serves.</p>
+					</div>
+					<div id="about_logo" class="columns medium-10 show-for-large">
+						<img src="/images/muselogo_notext.png" alt="MUSE logo"/>
+					</div>
+				</div>
+			</div>
+			
+			<div class="footer_main">
+				<div class="footer_item_color wrap">
+					<div class="footer_item_left">
+						<div class="group">
+							<div class="footer_item_about cont_sub">
+								<h5 class="small">about</h5>
+								<ul>
+									<li><a href="https://about.muse.jhu.edu/publishers">Publishers</a></li>
+									<li><a href="https://about.muse.jhu.edu/about/discovery-partners/">Discovery Partners</a></li>
+									<li><a href="https://about.muse.jhu.edu/about/advisory-board/">Advisory Board</a></li>
+									<li><a href="https://about.muse.jhu.edu/about/journal-subscribers/">Journal Subscribers</a></li>
+									<li><a href="https://about.muse.jhu.edu/about/book-customers">Book Customers</a></li>
+									<li><a href="https://about.muse.jhu.edu/about/at-conferences/">Conferences</a></li>
+								</ul>
+							</div>
+							<div class="footer_item_res cont_sub">
+								<h5 class="small">resources</h5>
+								<ul>
+									<li><a href="https://about.muse.jhu.edu/resources/news/">News & Announcements</a></li>
+									<li><a href="https://about.muse.jhu.edu/resources/promotional-materials">Promotional Material</a></li>
+									<li><a href="https://about.muse.jhu.edu/resources/alerts">Get Alerts</a></li>
+									<li><a href="https://about.muse.jhu.edu/resources/muse-presentations">Presentations</a></li>
+								</ul>
+							</div>
+							<div class="clear"></div>
+						</div>
+						<div class="group">
+							<div class="footer_item_what cont_sub">
+								<h5 class="small">what's on muse</h5>
+								<ul>
+									<li><a href="https://about.muse.jhu.edu/muse">Open Access</a></li>
+									<li><a href="https://about.muse.jhu.edu/pub/journals">Journals</a></li>
+									<li><a href="https://about.muse.jhu.edu/pub/books">Books</a></li>
+								</ul>
+							</div>
+							<div class="footer_item_info cont_sub">
+								<h5 class="small">information for</h5>
+								<ul>
+									<li><a href="https://about.muse.jhu.edu/publishers">Publishers</a></li>
+									<li><a href="https://about.muse.jhu.edu/librarians">Librarians</a></li>
+									<li><a href="https://about.muse.jhu.edu/individuals">Individuals</a></li>
+								</ul>
+							</div>
+							<div class="clear"></div>
+						</div>
+					</div>
+					<div class="footer_item_right">
+						<div class="group">
+							<div class="footer_item_social cont_sub">
+								<h5 class="small">Contact</h5>
+								<ul>
+									<li class="clear"><a href="/contact">Contact Us</a></li>
+									<li><a href="https://about.muse.jhu.edu/resources/help-overview">Help</a></li>
+								</ul>
+								<ul>
+									<li>
+										<ol class="social_icons">
+											<li class="list_h"><a href="https://www.facebook.com/ProjectMUSE"><img src="/images/footer_icon_fb.png" alt="Facebook" /></a></li>
+											<li class="list_h"><a href="https://www.linkedin.com/company/projectmuse/"><img src="/images/footer_icon_linkedin.png" alt="Linkedin" /></a></li>
+											<li class="list_h"><a href="https://twitter.com/ProjectMUSE"><img src="/images/footer_icon_twitter.png" alt="Twitter" /></a></li>
+										</ol>
+									</li>
+								</ul>
+							</div>
+							<div class="footer_item_policy cont_sub">
+								<h5 class="small">Policy & Terms</h5>
+								<ul>
+									<li><a href="https://about.muse.jhu.edu/about/accessibility/">Accessibility</a></li>
+									<li><a href="/privacy_policy">Privacy Policy</a></li>
+									<li><a href="/terms_use">Terms of Use</a></li>	
+								</ul>
+							</div>
+							<div class="clear"></div>
+						</div>
+						<div class="group">
+							<div class="footer_item_addr cont_sub">
+								<p class="address"><span>2715 North Charles Street<br/>Baltimore, Maryland, USA 21218</span></p>
+								<p class="phone"><span><a href="tel:1-410-516-6989">+1 (410) 516-6989</a></span><br>
+								<span><a href="mailto:muse@press.jhu.edu">muse@press.jhu.edu</a></span></p>
+								<p class="footer_text_sm copy color_oxfordblue hide-for-small"><span>&copy;2020 Project MUSE. Produced by Johns Hopkins University Press in collaboration with The Sheridan Libraries.</span></p>
+							</div>
+							<div class="footer_item_logo cont_sub">
+								<p class="show-for-medium"><span class="semiboldit footer_text_sm">Now and always,<br/>The Trusted Content Your Research Requires.</span></p>
+								<p><span><a href="https://muse.jhu.edu">
+								
+								<img class="show-for-medium" src="/images/muselogoblack.png" alt="Project MUSE logo" />
+								
+								<img class="hide-for-medium" src="/images/muselogo.png" alt="Project MUSE logo" /></a></span></p>
+								<p class="hide-for-medium"><span class="semiboldit footer_text_sm">Now and always, The Trusted Content Your Research Requires.</span></p>
+								<p class="hide-for-small"><span class="footer_text_sm">Built on the Johns Hopkins University Campus</span></p>
+							</div>
+							<div class="clear"></div>
+						</div>
+					</div>
+					<div class="clear"></div>
+				</div>
+			</div>
+			<div class="footer_item_sub wrap hide-for-medium">
+				<p><span class="footer_text_sm">Built on the Johns Hopkins University Campus</span></p>		
+				<p class="footer_text_sm copy color_oxfordblue"><span>&copy;2020 Project MUSE. Produced by Johns Hopkins University Press in collaboration with The Sheridan Libraries.</span></p>			
+			</div>		
+		</div>
+		
+		
+		
+		
+		<div id="btn_top">
+			<a href="#top"><span>Back To Top</span></a>
+		</div>
+		
+		
+		
+		  <input type="hidden" name="cookie_acknowledgement_type"  id="cookie_acknowledgement_type" value="cookie_acknowledgement">
+		
+		
+		
+			<div id="cookies_msg">
+				<p>This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.</p>
+				<script>document.writeln('<a href="javascript://" class="btn_accept" id="accept_cookie_msg">Accept</a>');</script>
+				<noscript>	
+						
+								<form method="post" action="/account/set_attribute_no_ajax/cookie_acknowledgement/1">
+						
+						<input type="submit" class="btn_accept" value="accept">
+						</form>
+				</noscript>
+			</div>
+		
+		
+		<script type="text/javascript" src="/js/lightbox.js"></script>
+		<script type="text/javascript" src="/js/jquery.qtip2.min.js"></script>
+		<script type="text/javascript" src="/js/post.js"></script>
+		
+		<script type="text/javascript" src="/js/footnotes.js"></script>
+		
+		
+		<script type="text/javascript" src="/js/references.js"></script>
+		
+	</body>
+</html>
+
--- a/spec/fixtures/issue-35852.html
+++ b/spec/fixtures/issue-35852.html
--- a/spec/fixtures/issue-41793.html
+++ b/spec/fixtures/issue-41793.html
--- a/spec/fixtures/journal-159.html
+++ b/spec/fixtures/journal-159.html
--- a/spec/fixtures/ratelimit.html
+++ b/spec/fixtures/ratelimit.html
@ -0,0 +1,65 @@
+<!DOCTYPE html>
+<html lang="en">
+    <head>
+        <meta charset="utf-8">
+        <meta name="viewport" content="width=device-width, initial-scale=1.0">
+        <title>Too Many Free PDF Requests</title>
+        <style>
+    body {
+      margin: 0;
+      padding: 0;
+    }
+    .page429 {
+        display: table;
+        width: 100%;
+        padding: 60px 30px;
+        box-sizing: border-box;
+        min-height: 350px;
+        font-family: sans-serif;
+    }
+    .page429 .int {
+        display: table-cell;
+        vertical-align: middle;
+        text-align: left;
+        padding-left: 30px;
+    }
+    .page429 h4 {
+        margin-bottom: 10px;
+        font-weight: 700;
+        font-size: 24px;
+    }
+    .page429 .logo {
+        display: table-cell;
+        width: 23%;
+        max-width: 182px;
+        vertical-align: middle;
+    }
+    .page429 .logo img {
+      max-width: 100%;
+      height: auto;
+    }
+    .page429 p {
+        font-weight: normal;
+        line-height: 1.3;
+    }
+    .page429 a {
+      text-decoration: none;
+      color: #284f84;
+    }
+    </style>
+    </head>
+    <body>
+
+    <div class="page429" id="main">
+        <div class="logo">
+            <a href="https://muse.jhu.edu"><img src="/images/muselogo_notext.png" alt="MUSE logo"></a>
+        </div>
+        <div class="int">
+            <h4>Too Many Free PDF Requests</h4>
+                <p>Your IP has requested too many free PDFs too quickly.</p>
+                <p>Please wait before you continue downloading, and if possible slow down the rate of your requests.</p>
+        </div>
+    </div>
+
+    </body>
+</html>
--- a/spec/issue_spec.cr
+++ b/spec/issue_spec.cr
@ -0,0 +1,85 @@
+require "../src/issue"
+require "./spec_helper"
+require "webmock"
+
+describe Muse::Dl::Issue do
+  WebMock.stub(:get, "https://muse.jhu.edu/issue/41793")
+    .to_return(body: File.new("spec/fixtures/issue-41793.html").gets_to_end)
+
+  issue = Muse::Dl::Issue.new "41793"
+  issue.parse
+
+  it "should initialize correctly" do
+    issue.id.should eq "41793"
+    issue.url.should eq "https://muse.jhu.edu/issue/41793"
+  end
+
+  it "should parse info correctly" do
+    issue.info["ISSN"].should eq "1530-7131"
+    issue.info["Print ISSN"].should eq "1531-2542"
+    issue.info["Launched on MUSE"].should eq "2020-02-05"
+    issue.info["Open Access"].should eq "No"
+    issue.title.should eq "Volume 20, Number 1, January 2020"
+  end
+
+  it "should parse title correctly" do
+    issue.volume.should eq "20"
+    issue.number.should eq "1"
+    issue.date.should eq "January 2020"
+  end
+
+  it "should parser summary" do
+    issue.summary.should eq <<-EOT
+    Focusing on important research about the role of academic libraries and librarianship, portal also features commentary on issues in technology and publishing. Written for all those interested in the role of libraries within the academy, portal includes peer-reviewed articles addressing subjects such as library administration, information technology, and information policy. In its inaugural year, portal earned recognition as the runner-up for best new journal, awarded by the Council of Editors of Learned Journals (CELJ). An article in  portal, "Master's and Doctoral Thesis Citations: Analysis and Trends of a Longitudinal Study," won the Jesse H. Shera Award for Distinguished Published Research from the Library Research Round Table of the American Library Association.
+    EOT
+  end
+
+  it "should parse publisher" do
+    issue.publisher.should eq "Johns Hopkins University Press"
+  end
+  it "should parse the journal title" do
+    issue.journal_title.should eq "portal: Libraries and the Academy"
+  end
+
+  it "should parse non-numbered issues" do
+    WebMock.stub(:get, "https://muse.jhu.edu/issue/35852")
+      .to_return(body: File.new("spec/fixtures/issue-35852.html").gets_to_end)
+    issue = Muse::Dl::Issue.new "35852"
+    issue.parse
+
+    issue.volume.should eq "1"
+    issue.number.should eq "2"
+    issue.date.should eq "2016"
+
+    issue.info["ISSN"].should eq "2474-9419"
+    issue.info["Print ISSN"].should eq "2474-9427"
+    issue.info["Launched on MUSE"].should eq "2017-02-21"
+    issue.info["Open Access"].should eq "Yes"
+    issue.title.should eq "Volume 1, Issue 2, 2016"
+    issue.journal_title.should eq "Constitutional Studies"
+
+    expected_pages = [
+      [1, 22],
+      [23, 40],
+      [41, 58],
+      [59, 80],
+      [81, 95],
+      [97, 116],
+    ]
+
+    expected_titles = [
+      "The Limits of Veneration: Public Support for a New Constitutional Convention",
+      "Secession and Nullification as a Global Trend",
+      "Challenging Constitutionalism in Post-Apartheid South Africa",
+      "Democracy by Lawsuit: Or, Can Litigation Alleviate the European Union’s “Democratic Deficit?”",
+      "Private Enforcement of Constitutional Guarantees in the Ku Klux Act of 1871",
+      "Sober Second Thoughts: Evaluating the History of Horizontal Judicial Review by the U.S. Supreme Court",
+    ]
+
+    issue.articles.each_with_index do |a, i|
+      a.start_page.should eq expected_pages[i][0]
+      a.end_page.should eq expected_pages[i][1]
+      a.title.should eq expected_titles[i]
+    end
+  end
+end
--- a/spec/journal_spec.cr
+++ b/spec/journal_spec.cr
@ -0,0 +1,28 @@
+require "./spec_helper"
+
+describe Muse::Dl::Journal do
+  html = File.new("spec/fixtures/journal-159.html").gets_to_end
+  j = Muse::Dl::Journal.new html
+
+  it "it should parse the infobox for 159" do
+    j.info["ISSN"].should eq "1530-7131"
+    j.info["Print ISSN"].should eq "1531-2542"
+    j.info["Coverage Statement"].should eq "Vol. 1 (2001) through current issue"
+    j.info["Open Access"].should eq "No"
+  end
+
+  it "should parser summary" do
+    j.summary.should eq <<-EOT
+    Focusing on important research about the role of academic libraries and librarianship, portal also features commentary on issues in technology and publishing. Written for all those interested in the role of libraries within the academy, portal includes peer-reviewed articles addressing subjects such as library administration, information technology, and information policy. In its inaugural year, portal earned recognition as the runner-up for best new journal, awarded by the Council of Editors of Learned Journals (CELJ). An article in  portal, "Master's and Doctoral Thesis Citations: Analysis and Trends of a Longitudinal Study," won the Jesse H. Shera Award for Distinguished Published Research from the Library Research Round Table of the American Library Association.
+    EOT
+  end
+
+  it "should parse publisher" do
+    j.publisher.should eq "Johns Hopkins University Press"
+  end
+
+  it "should return issues" do
+    j.issues[0].id.should eq "41793"
+    j.issues[-1].id.should eq "1578"
+  end
+end
--- a/spec/parser_spec.cr
+++ b/spec/parser_spec.cr
@ -13,7 +13,6 @@ describe Muse::Dl::Parser do
    parser = Muse::Dl::Parser.new(["https://muse.jhu.edu/book/68534"])
    parser.bookmarks.should eq true
    parser.cleanup.should eq true
-    parser.tmp.should eq "/tmp"
    parser.output.should eq "tempfilename.pdf"
    parser.url.should eq "https://muse.jhu.edu/book/68534"
  end
--- a/spec/util_spec.cr
+++ b/spec/util_spec.cr
@ -0,0 +1,9 @@
+require "../src/util"
+require "./spec_helper"
+
+describe Muse::Dl::Util do
+  it "should sanitize filenames properly" do
+    fn = Muse::Dl::Util.slug_filename("Hello world - \" :A$3, a story; a poem|chapter")
+    fn.should eq "Hello world - - -A-3, a story- a poem-chapter"
+  end
+end
--- a/src/article.cr
+++ b/src/article.cr
@ -0,0 +1,19 @@
+require "./infoparser.cr"
+require "./issue.cr"
+
+module Muse::Dl
+  class Article
+    getter id : String, :start_page, :end_page, :title
+    setter title : String | Nil, start_page : Int32 | Nil, end_page : Int32 | Nil
+
+    def initialize(id : String)
+      @id = id
+      @url = "https://muse.jhu.edu/article/#{id}"
+    end
+
+    # TODO: Fix this
+    def open_access
+      return false
+    end
+  end
+end
--- a/src/errors/download_error.cr
+++ b/src/errors/download_error.cr
@ -0,0 +1,4 @@
+module Muse::Dl::Errors
+  class DownloadError < Exception
+  end
+end
--- a/src/errors/missing_chapter.cr
+++ b/src/errors/missing_chapter.cr
@ -1,4 +0,0 @@
-module Muse::Dl::Errors
-  class MissingChapter < Exception
-  end
-end
--- a/src/errors/missing_file.cr
+++ b/src/errors/missing_file.cr
@ -0,0 +1,4 @@
+module Muse::Dl::Errors
+  class MissingFile < Exception
+  end
+end
--- a/src/errors/pdf_operation_error.cr
+++ b/src/errors/pdf_operation_error.cr
@ -0,0 +1,4 @@
+module Muse::Dl::Errors
+  class PDFOperationError < Exception
+  end
+end
--- a/src/fetch.cr
+++ b/src/fetch.cr
@ -4,7 +4,8 @@ require "myhtml"

 module Muse::Dl
  class Fetch
-    USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
+    USER_AGENT            = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
+    DOWNLOAD_TIMEOUT_SECS = 60

    HEADERS = {
      "User-Agent"      => USER_AGENT,
@ -13,6 +14,10 @@ module Muse::Dl
      "Connection"      => "keep-alive",
    }

+    def self.article_file_name(id : String, tmp_path : String)
+      "#{tmp_path}/article-#{id}.pdf"
+    end
+
    def self.chapter_file_name(id : String, tmp_path : String)
      "#{tmp_path}/chapter-#{id}.pdf"
    end
@ -22,9 +27,83 @@ module Muse::Dl
      File.delete(fns) if File.exists?(fns)
    end

-    def self.save_chapter(tmp_path : String, chapter_id : String, chapter_title : String, cookie : String | Nil = nil, add_bookmark = true)
+    def self.cleanup_articles(tmp_path : String, id : String)
+      fns = article_file_name(id, tmp_path)
+      File.delete(fns) if File.exists?(fns)
+    end
+
+    def self.save_url(url : String, referer : String, file_name : String, tmp_path : String, cookie : String | Nil = nil, bookmark_title : String | Nil = nil, strip_first_page = true)
+      tmp_pdf_file = "#{file_name}.tmp"
+      if File.exists? file_name
+        puts "#{file_name} already downloaded"
+        return
+      end
+
+      uri = URI.parse(url)
+      http_client = HTTP::Client.new(uri)
+      # Raise a IO::TimeoutError after 60 seconds.
+      http_client.read_timeout = DOWNLOAD_TIMEOUT_SECS
+
+      headers = HEADERS.merge({
+        "Referer" => referer,
+      })
+
+      if cookie
+        headers["Cookie"] = cookie
+      end
+
+      request = Crest::Request.new(:get, url, headers: headers, max_redirects: 0, handle_errors: false)
+
+      begin
+        response = request.execute
+      rescue ex : IO::TimeoutError
+        raise Muse::Dl::Errors::DownloadError.new("Error downloading #{url}. Download took longer than #{DOWNLOAD_TIMEOUT_SECS} seconds.")
+      end
+
+      # TODO: Add validation for the downloaded file (should be PDF)
+      if !response.success?
+        raise Muse::Dl::Errors::DownloadError.new("Error downloading chapter. HTTP response code: #{response.status}")
+      end
+
+      content_type = response.headers["Content-Type"]
+      if content_type.is_a? String
+        if /html/.match content_type
+          response.body.each_line do |line|
+            # https://muse.jhu.edu/chapter/2383438/pdf
+            # https://muse.jhu.edu/book/67393
+            # Errors are Unable to determine page runs / Unable to construct chapter PDF
+            if /Unable to/.match line
+              raise Muse::Dl::Errors::MuseCorruptPDF.new("Error: MUSE is unable to generate PDF for #{url}")
+            end
+            if /Your IP has requested/.match line
+              raise Muse::Dl::Errors::DownloadError.new("Error: MUSE Rate-limit reached")
+            end
+          end
+        end
+      end
+
+      File.open(tmp_pdf_file, "w") do |file|
+        file << response.body
+        if file.size == 0
+          raise Muse::Dl::Errors::DownloadError.new("Error: downloaded chapter file size is zero. Response Content-Length header was #{headers["Content-Length"]}")
+        end
+      end
+
+      pdftk = Muse::Dl::Pdftk.new tmp_path
+
+      pdftk.strip_first_page tmp_pdf_file if strip_first_page
+
+      if bookmark_title
+        # Run pdftk and add the bookmark to the file
+        pdftk.add_bookmark tmp_pdf_file, bookmark_title
+      end
+
+      # Now we can move the file to the proper PDF filename
+      File.rename tmp_pdf_file, file_name
+    end
+
+    def self.save_chapter(tmp_path : String, chapter_id : String, chapter_title : String, cookie : String | Nil = nil, add_bookmark = true, strip_first_page = true)
      final_pdf_file = chapter_file_name chapter_id, tmp_path
-      tmp_pdf_file = "#{final_pdf_file}.tmp"

      if File.exists? final_pdf_file
        puts "#{chapter_id} already downloaded"
@ -33,49 +112,22 @@ module Muse::Dl

      # TODO: Remove this hardcoding, and make this more generic by generating it within the Book class
      url = "https://muse.jhu.edu/chapter/#{chapter_id}/pdf"
-      headers = HEADERS.merge({
-        "Referer" => "https://muse.jhu.edu/verify?url=%2Fchapter%2F#{chapter_id}%2Fpdf",
-      })
+      referer = "https://muse.jhu.edu/verify?url=%2Fchapter%2F#{chapter_id}%2Fpdf"

-      if cookie
-        headers["Cookie"] = cookie
-      end
+      save_url(url, referer, final_pdf_file, tmp_path, cookie, chapter_title, strip_first_page)

-      # TODO: Add validation for the downloaded file (should be PDF)
-      Crest.get(url, max_redirects: 0, handle_errors: false, headers: headers) do |response|
-        # puts response.headers["Content-Type"]
-        content_type = response.headers["Content-Type"]
-        if content_type.is_a? String
-          if /html/.match content_type
-            puts response
-            response.body_io.each_line do |line|
-              if /Unable to construct chapter PDF/.match line
-                raise Muse::Dl::Errors::MuseCorruptPDF.new
-              end
-            end
-          end
-        end
-        File.open(tmp_pdf_file, "w") do |file|
-          IO.copy(response.body_io, file)
-        end
-      end
-
-      pdftk = Muse::Dl::Pdftk.new tmp_path
-
-      pdftk.strip_first_page tmp_pdf_file
-
-      if add_bookmark
-        # Run pdftk and add the bookmark to the file
-        pdftk.add_bookmark tmp_pdf_file, chapter_title
-      end
-
-      # Now we can move the file to the proper PDF filename
-      File.rename tmp_pdf_file, final_pdf_file
      puts "Downloaded #{chapter_id}"
    end

-    def self.get_info(url : String) : Muse::Dl::Thing | Nil
-      match = /https:\/\/muse.jhu.edu\/(book|journal)\/(\d+)/.match url
+    def self.save_article(tmp_path : String, article_id : String, cookie : String | Nil = nil, article_title = nil, strip_first_page = true)
+      file_name = article_file_name article_id, tmp_path
+      url = "https://muse.jhu.edu/article/#{article_id}/pdf"
+      referer = "https://muse.jhu.edu/article/#{article_id}"
+      save_url(url, referer, file_name, tmp_path, cookie, article_title, strip_first_page)
+    end
+
+    def self.get_info(url : String)
+      match = /https:\/\/muse.jhu.edu\/(book|journal|issue|article)\/(\d+)/.match url
      if match
        begin
          response = Crest.get(url).to_s
@ -84,12 +136,16 @@ module Muse::Dl
            return Muse::Dl::Book.new response
          when "journal"
            return Muse::Dl::Journal.new response
+          when "issue"
+            return Muse::Dl::Issue.new match[2], response
+          when "article"
+            return Muse::Dl::Article.new match[2]
          end
        rescue ex : Crest::NotFound
-          raise Muse::Dl::Errors::InvalidLink.new
+          raise Muse::Dl::Errors::InvalidLink.new("Error - could not download url: #{url}")
        end
      else
-        raise Muse::Dl::Errors::InvalidLink.new
+        raise Muse::Dl::Errors::InvalidLink.new("Error - url does not match expected pattern: #{url}")
      end
    end
  end
--- a/src/infoparser.cr
+++ b/src/infoparser.cr
@ -34,6 +34,18 @@ module Muse::Dl
      myhtml.css("#book_about_info .title").map(&.inner_text).to_a[0].strip
    end

+    def self.issue_title(myhtml : Myhtml::Parser)
+      begin
+        myhtml.css(".card_text .title").map(&.inner_text).to_a[0].strip
+      rescue
+        nil
+      end
+    end
+
+    def self.journal_title(myhtml : Myhtml::Parser)
+      myhtml.css("#journal_about_info .title").map(&.inner_text).to_a[0].strip
+    end
+
    def self.author(myhtml : Myhtml::Parser)
      myhtml.css("#book_about_info .author").map(&.inner_text).to_a[0].strip.gsub("<BR>", ", ").gsub("\n", " ")
    end
@ -50,9 +62,13 @@ module Muse::Dl
      myhtml.css("#book_about_info .pub a").map(&.inner_text).to_a[0].strip
    end

+    def self.journal_publisher(myhtml : Myhtml::Parser)
+      myhtml.css(".card_publisher a").map(&.inner_text).to_a[0].strip
+    end
+
    def self.summary(myhtml : Myhtml::Parser)
      begin
-        return myhtml.css("#book_about_info .card_summary").map(&.inner_text).to_a[0].strip
+        return myhtml.css(".card_summary").map(&.inner_text).to_a[0].strip
      rescue e : Exception
        STDERR.puts "Could not fetch summary"
        return "NA"
--- a/src/issue.cr
+++ b/src/issue.cr
@ -0,0 +1,97 @@
+"./thing.cr"
+require "./fetch.cr"
+require "./article.cr"
+
+module Muse::Dl
+  class Issue
+    getter id : String,
+      title : String | Nil,
+      articles : Array(Muse::Dl::Article),
+      url : String,
+      summary : String | Nil,
+      publisher : String | Nil,
+      info : Hash(String, String),
+      volume : String | Nil,
+      number : String | Nil,
+      date : String | Nil,
+      journal_title : String | Nil
+
+    setter :journal_title
+
+    def initialize(id : String, response : String | Nil = nil)
+      @id = id
+      @url = "https://muse.jhu.edu/issue/#{id}"
+      @articles = [] of Muse::Dl::Article
+      parse(response) if response
+      @info = Hash(String, String).new
+    end
+
+    def open_access
+      if @info.has_key? "Open Access"
+        return @info["Open Access"] == "Yes"
+      end
+      false
+    end
+
+    def parse
+      html = Crest.get(@url).to_s
+      parse(html)
+    end
+
+    def parse(html : String)
+      h = Myhtml::Parser.new html
+      @info = InfoParser.infobox(h)
+      @title = InfoParser.issue_title(h)
+      @summary = InfoParser.summary(h)
+      @publisher = InfoParser.journal_publisher(h)
+      parse_title
+      parse_contents(h)
+    end
+
+    def parse_title
+      t = @title
+      unless t.nil?
+        @volume = /Volume (\d+)/.match(t).try &.[1]
+        @number = /Number (\d+)/.match(t).try &.[1]
+        @number = /Issue (\d+)/.match(t).try &.[1] unless @number
+        @date = /((January|February|March|April|May|June|July|August|September|October|November|December|Sring|Winter|Fall|Summer) (\d+))/.match(t).try &.[1]
+        @date = /(\d{4})/.match(t).try &.[1] unless @date
+      end
+    end
+
+    def parse_contents(myhtml : Myhtml::Parser)
+      unless @journal_title
+        journal_title_a = myhtml.css("#journal_banner_title a").first
+        if journal_title_a
+          @journal_title = journal_title_a.inner_text
+        end
+      end
+      myhtml.css(".articles_list_text ol").each do |ol|
+        link = ol.css("li.title a").first
+        title = link.inner_text
+
+        pages = ol.css("li.pg")
+        if pages.size > 0
+          p = pages.first.try &.inner_text
+          matches = /(\d+)-(\d+)/.match p
+          if matches
+            start_page = matches[1].to_i
+            end_page = matches[2].to_i
+          end
+        end
+
+        ol.css("a").each do |l|
+          url = l.attribute_by("href").to_s
+          matches = /\/article\/(\d+)\/pdf/.match url
+          if matches
+            a = Muse::Dl::Article.new matches[1]
+            a.title = title
+            a.start_page = start_page if start_page
+            a.end_page = end_page if end_page
+            @articles.push a
+          end
+        end
+      end
+    end
+  end
+end
--- a/src/journal.cr
+++ b/src/journal.cr
@ -1,6 +1,44 @@
-require "./thing.cr"
+require "./infoparser.cr"
+require "./issue.cr"

 module Muse::Dl
-  class Journal < Muse::Dl::Thing
+  class Journal
+    getter :info, :summary, :publisher, :issues, :title
+    @info = Hash(String, String).new
+    @summary : String
+    @publisher : String
+    @issues = [] of Muse::Dl::Issue
+    @title : String
+
+    private getter :h
+
+    def initialize(html)
+      @h = Myhtml::Parser.new html
+      @info = InfoParser.infobox(h)
+      @summary = InfoParser.summary(h)
+      @publisher = InfoParser.journal_publisher(h)
+      @title = InfoParser.journal_title(h)
+      parse_volumes(h)
+    end
+
+    def open_access
+      if @info.has_key? "Open Access"
+        return @info["Open Access"] == "Yes"
+      end
+      false
+    end
+
+    def parse_volumes(myhtml : Myhtml::Parser)
+      myhtml.css("#available_issues_list_text a").each do |a|
+        link = a.attribute_by("href").to_s
+
+        matches = /\/issue\/(\d+)/.match link
+        if matches
+          issue = Muse::Dl::Issue.new matches[1]
+          issue.journal_title = @title
+          @issues.push issue
+        end
+      end
+    end
  end
 end
--- a/src/muse-dl.cr
+++ b/src/muse-dl.cr
@ -4,16 +4,23 @@ require "./fetch.cr"
 require "./book.cr"
 require "./journal.cr"
 require "./util.cr"
+require "file_utils"

 module Muse::Dl
-  VERSION = "1.1.0"
+  VERSION = "1.3.1"

  class Main
    def self.dl(parser : Parser)
      url = parser.url
+      puts "Downloading #{url}"
      thing = Fetch.get_info(url) if url
      return unless thing

+      if (thing.open_access) && (parser.skip_oa)
+        STDERR.puts "Skipping #{url}, available under Open Access"
+        return
+      end
+
      if thing.is_a? Muse::Dl::Book
        unless thing.formats.includes? :pdf
          STDERR.puts "Book not available in PDF format, skipping: #{url}"
@ -24,34 +31,30 @@ module Muse::Dl

        # If file exists and we can't clobber
        if File.exists?(parser.output) && parser.clobber == false
-          STDERR.puts "File already exists: #{parser.output}"
+          STDERR.puts "Skipping #{url}, File already exists: #{parser.output}"
          return
        end
        temp_stitched_file = nil
        pdf_builder = Pdftk.new(parser.tmp)

-        unless parser.input_pdf
-          # Save each chapter
-          thing.chapters.each do |chapter|
-            begin
-              Fetch.save_chapter(parser.tmp, chapter[0], chapter[1], parser.cookie, parser.bookmarks)
-            rescue e : Muse::Dl::Errors::MuseCorruptPDF
-              STDERR.puts "Got a 'Unable to construct chapter PDF' error from MUSE, skipping: #{url}"
-              return
-            end
+        # Save each chapter
+        thing.chapters.each do |chapter|
+          begin
+            Fetch.save_chapter(parser.tmp, chapter[0], chapter[1], parser.cookie, parser.bookmarks, parser.strip_first)
+          rescue e : Muse::Dl::Errors::MuseCorruptPDF
+            STDERR.puts "Got a 'Unable to construct chapter PDF' error from MUSE, skipping: #{url}"
+            return
          end
-          chapter_ids = thing.chapters.map { |c| c[0] }
-
-          # Stitch the PDFs together
-          temp_stitched_file = pdf_builder.stitch chapter_ids
-          pdf_builder.add_metadata(temp_stitched_file, parser.output, thing)
-        else
-          x = parser.input_pdf
-          pdf_builder.add_metadata(File.open(x), parser.output, thing) if x
        end
+        chapter_ids = thing.chapters.map { |c| c[0] }
+
+        # Stitch the PDFs together
+        temp_stitched_file = pdf_builder.stitch chapter_ids
+        pdf_builder.add_metadata(temp_stitched_file, parser.output, thing)

        temp_stitched_file.delete if temp_stitched_file
-        puts "Saved final output to #{parser.output}"
+        puts "--dont-strip-first-page was on. Please validate PDF file for any errors." unless parser.strip_first
+        puts "DL: #{url}. Saved final output to #{parser.output}"

        # Cleanup the chapter files
        if parser.cleanup
@ -59,20 +62,97 @@ module Muse::Dl
            Fetch.cleanup(parser.tmp, c[0])
          end
        end
+      elsif thing.is_a? Muse::Dl::Article
+        # No bookmarks are needed since this is just a single article PDF
+        begin
+          Fetch.save_article(parser.tmp, thing.id, parser.cookie, nil, parser.strip_first)
+        rescue e : Muse::Dl::Errors::MuseCorruptPDF
+          STDERR.puts "Got a 'Unable to construct chapter PDF' error from MUSE, skipping: #{url}"
+          return
+        end
+
+        # TODO: Move this code elsewhere
+        source = Fetch.article_file_name(thing.id, parser.tmp)
+        destination = "article-#{thing.id}.pdf"
+        # Needed because of https://github.com/crystal-lang/crystal/issues/7777
+        FileUtils.cp source, destination
+        FileUtils.rm source if parser.cleanup
+      elsif thing.is_a? Muse::Dl::Issue
+        # Will have no effect if parser has a custom title
+        parser.force_set_output Util.slug_filename "#{thing.journal_title} - #{thing.title}.pdf"
+
+        # If file exists and we can't clobber
+        if File.exists?(parser.output) && parser.clobber == false
+          STDERR.puts "Skipping #{url}, File already exists: #{parser.output}"
+          return
+        end
+        temp_stitched_file = nil
+        pdf_builder = Pdftk.new(parser.tmp)
+
+        thing.articles.each do |article|
+          begin
+            Fetch.save_article(parser.tmp, article.id, parser.cookie, article.title, parser.strip_first)
+          rescue e : Muse::Dl::Errors::MuseCorruptPDF
+            STDERR.puts "Got a 'Unable to construct chapter PDF' error from MUSE, skipping: #{url}"
+            return
+          end
+        end
+        article_ids = thing.articles.map { |a| a.id }
+
+        # Stitch the PDFs together
+        temp_stitched_file = pdf_builder.stitch_articles article_ids
+        pdf_builder.add_metadata(temp_stitched_file, parser.output, thing)
+
+        # temp_stitched_file.delete if temp_stitched_file
+        puts "--dont-strip-first-page was on. Please validate PDF file for any errors." unless parser.strip_first
+        puts "DL: #{url}. Saved final output to #{parser.output}"
+
+        # Cleanup the issue files
+        if parser.cleanup
+          thing.articles.each do |a|
+            Fetch.cleanup_articles(parser.tmp, a.id)
+          end
+        end
+      elsif thing.is_a? Muse::Dl::Journal
+        thing.issues.each do |issue|
+          begin
+            # Update the issue
+            issue.parse
+            parser.url = issue.url
+            Main.dl parser
+          rescue e
+            puts e.message
+            puts "Faced an exception with previous issue, continuing"
+          end
+        end
      end
    end

    def self.run(args : Array(String))
      parser = Parser.new(args)

+      delay_secs = 1
      input_list = parser.input_list
      if input_list
        File.each_line input_list do |url|
-          # TODO: Change this to nil
-          parser.reset_output_file
-          parser.url = url.strip
-          # Ask the download process to not quit the process, and return instead
-          Main.dl parser
+          begin
+            # TODO: Change this to nil
+            parser.reset_output_file
+            parser.url = url.strip
+            # Ask the download process to not quit the process, and return instead
+            Main.dl parser
+            if delay_secs >= 2
+              delay_secs /= 2
+            end
+          rescue ex
+            puts ex.message
+            puts ex.backtrace.join("\n    ")
+            puts "Error. Skipping book: #{url}. Waiting for #{delay_secs} seconds before continuing."
+            sleep(delay_secs)
+            if delay_secs < 256
+              delay_secs *= 2
+            end
+          end
        end
      elsif parser.url
        Main.dl parser
--- a/src/parser.cr
+++ b/src/parser.cr
@ -6,16 +6,19 @@ module Muse::Dl
    @bookmarks = true
    @tmp : String
    @cleanup = true
+    # Whether to strip the first page
+    @strip_first = true
    @output = DEFAULT_FILE_NAME
    @url : String | Nil
-    @input_pdf : String | Nil
    @clobber = false
    @input_list : String | Nil
    @cookie : String | Nil
+    @h : Bool | Nil
+    @skip_oa = false

    DEFAULT_FILE_NAME = "tempfilename.pdf"

-    getter :bookmarks, :tmp, :cleanup, :output, :url, :input_pdf, :clobber, :input_list, :cookie
+    getter :bookmarks, :tmp, :cleanup, :output, :url, :clobber, :input_list, :cookie, :strip_first, :skip_oa
    setter :url

    # Update the output filename unless we have a custom one passed
@ -23,6 +26,10 @@ module Muse::Dl
      @output = output_file unless @output != DEFAULT_FILE_NAME
    end

+    def force_set_output(output_file : String)
+      @output = output_file
+    end
+
    def reset_output_file
      @output = DEFAULT_FILE_NAME
    end
@ -38,7 +45,6 @@ module Muse::Dl

    def initialize(arg : Array(String) = [] of String)
      @tmp = Dir.tempdir
-      @input_pdf = nil

      parser = OptionParser.new
      parser.banner = <<-EOT
@ -48,23 +54,25 @@ module Muse::Dl
      INPUT_FILE: Path to a file containing a list of links

      EOT
+
      parser.on(long_flag = "--no-cleanup", description = "Don't cleanup temporary files") { @cleanup = false }
      parser.on(long_flag = "--tmp-dir PATH", description = "Temporary Directory to use") { |path| @tmp = path }
      parser.on(long_flag = "--output FILE", description = "Output Filename") { |file| @output = file }
      parser.on(long_flag = "--no-bookmarks", description = "Don't add bookmarks in the PDF") { @bookmarks = false }
-      parser.on(long_flag = "--input-pdf INPUT", description = "Input Stitched PDF. Will not download anything") { |input| @input_pdf = input }
-      parser.on(long_flag = "--clobber", description = "Overwrite the output file, if it already exists. Not compatible with input-pdf") { @clobber = true }
+      parser.on(long_flag = "--clobber", description = "Overwrite the output file, if it already exists.") { @clobber = true }
+      parser.on(long_flag = "--dont-strip-first-page", description = "Disables first page from being stripped. Use carefully") { @strip_first = false }
      parser.on(long_flag = "--cookie COOKIE", description = "Cookie-header") { |cookie| @cookie = cookie }
-      parser.on("-h", "--help", "Show this help") { puts parser }
+      parser.on(long_flag = "--skip-open-access", description = "Don't download open access content") { @skip_oa = true }
+      parser.on("-h", "--help", "Show this help") { @h = true; puts parser }

      parser.unknown_args do |args|
        if args.size != 1
-          puts parser
+          # Prevent showing helptext twice
+          puts parser unless @h
          exit 1
        end
        if File.exists? args[0]
          @input_list = args[0]
-          @input_pdf = nil
        else
          @url = args[0]
        end
--- a/src/pdftk.cr
+++ b/src/pdftk.cr
@ -28,14 +28,23 @@ module Muse::Dl
    def execute(args : Array(String))
      binary = @binary
      if binary
-        Process.run(binary, args)
+        status = Process.run(binary, args, output: STDOUT, error: STDERR)
+        if !status.success?
+          puts "pdftk command failed: #{binary} #{args.join(" ")}"
+        end
+        return status.success?
      end
    end

    def strip_first_page(input_file : String)
      output_pdf = File.tempfile("muse-dl-temp", ".pdf")
-      execute [input_file, "cat", "2-end", "output", output_pdf.path]
-      File.rename output_pdf.path, input_file
+      is_success = execute [input_file, "cat", "2-end", "output", output_pdf.path]
+      if is_success
+        File.rename output_pdf.path, input_file
+      else
+        puts ("Error stripping first page of chapter. Maybe try using --dont-strip-first-page")
+        exit 1
+      end
    end

    def add_bookmark(input_file : String, title : String)
@ -48,16 +57,19 @@ module Muse::Dl
      BookmarkPageNumber: 1
      END
      File.write(bookmark_text_file.path, bookmark_text)
-      execute [input_file, "update_info", bookmark_text_file.path, "output", output_pdf.path]
+      is_success = execute [input_file, "update_info", bookmark_text_file.path, "output", output_pdf.path]

      # Cleanup
      bookmark_text_file.delete
-      File.rename output_pdf.path, input_file
+      if is_success
+        File.rename output_pdf.path, input_file
+      else
+        raise Muse::Dl::Errors::PDFOperationError.new("Error adding bookmark metadata to chapter.")
+      end
    end

    def add_metadata(input_file : File, output_file : String, book : Book)
      # First we have to dump the current metadata
-      metadata_text_file = File.tempfile("muse-dl-metadata-tmp", ".txt")
      keywords = "Publisher:#{book.publisher}, Published:#{book.date}"

      # Known Info keys, if they are present
@ -67,43 +79,94 @@ module Muse::Dl
        end
      end

-      text = <<-EOT
+      metadata_text = gen_metadata(book.title, keywords, book.summary.gsub(/\n\s+/, " "), book.author)
+      write_metadata(input_file, output_file, metadata_text)
+    end
+
+    def gen_metadata(title : String, keywords : String, subject : String, author : String | Nil = nil)
+      metadata = <<-EOT
      InfoBegin
      InfoKey: Creator
-      InfoValue: Project MUSE (https://muse.jhu.edu/)
+      InfoValue:
      InfoBegin
      InfoKey: Producer
-      InfoValue: Muse-DL/#{Muse::Dl::VERSION}
+      InfoValue:
      InfoBegin
      InfoKey: Title
-      InfoValue: #{book.title}
+      InfoValue: #{title}
      InfoBegin
      InfoKey: Keywords
      InfoValue: #{keywords}
      InfoBegin
-      InfoKey: Author
-      InfoValue: #{book.author}
-      InfoBegin
      InfoKey: Subject
-      InfoValue: #{book.summary.gsub(/\n\s+/, " ")}
+      InfoValue: #{subject}
      InfoBegin
      InfoKey: ModDate
      InfoValue:
      InfoBegin
      InfoKey: CreationDate
      InfoValue:
+
      EOT

+      unless author.nil?
+        metadata += <<-EOT
+        InfoBegin
+        InfoKey: Author
+        InfoValue: #{author}
+        EOT
+      end
+
+      return metadata
+    end
+
+    def write_metadata(input_file : File, output_file : String, text)
+      metadata_text_file = File.tempfile("muse-dl-metadata-tmp", ".txt")
      File.write(metadata_text_file.path, text)
-      execute [input_file.path, "update_info_utf8", metadata_text_file.path, "output", output_file]
+
+      is_success = execute [input_file.path, "update_info_utf8", metadata_text_file.path, "output", output_file]
+      if !is_success
+        raise Muse::Dl::Errors::PDFOperationError.new("Error adding metadata to book.")
+      end
      metadata_text_file.delete
    end

+    def add_metadata(input_file : File, output_file : String, issue : Issue)
+      # First we have to dump the current metadata
+      metadata_text_file = File.tempfile("muse-dl-metadata-tmp", ".txt")
+      keywords = "Journal:#{issue.journal_title}, Published:#{issue.date},Volume:#{issue.volume},Number:#{issue.number}"
+      ["ISSN", "Print ISSN", "DOI", "Language", "Open Access"].each do |label|
+        if issue.info.has_key? label
+          keywords += ", #{label}:#{issue.info[label]}"
+        end
+      end
+
+      # TODO: Move this to Issue class
+
+      s = issue.summary
+      unless s.nil?
+        summary = s.gsub(/\n\s+/, " ")
+      else
+        summary = "NA"
+      end
+
+      t = issue.title
+
+      unless t.nil?
+        title = t
+      else
+        title = "NA"
+      end
+      # TODO: Add support for all authors in the PDF
+      metadata = gen_metadata(title, keywords, summary)
+      write_metadata(input_file, output_file, metadata)
+    end
+
    def stitch(chapter_ids : Array(String))
      output_file = File.tempfile("muse-dl-stitched-tmp", ".pdf")
      # Do some sanity checks on each Chapter PDF
      chapter_ids.each do |id|
-        raise Muse::Dl::Errors::MissingChapter.new unless File.exists? Fetch.chapter_file_name(id, @tmp_file_path)
+        raise Muse::Dl::Errors::MissingFile.new unless File.exists? Fetch.chapter_file_name(id, @tmp_file_path)
        raise Muse::Dl::Errors::CorruptFile.new unless File.size(Fetch.chapter_file_name(id, @tmp_file_path)) > 0
      end

@ -111,9 +174,35 @@ module Muse::Dl

      chapter_files = chapter_ids.map { |id| Fetch.chapter_file_name(id, @tmp_file_path) }
      args = chapter_files + ["cat", "output", output_file.path]
-      execute args
+      is_success = execute args

      # TODO: Validate final file here
+      if !is_success
+        raise Muse::Dl::Errors::PDFOperationError.new("Error stitching chapters together.")
+      end
+
+      return output_file
+    end
+
+    # TODO: Merge with stitch
+    def stitch_articles(article_ids : Array(String))
+      output_file = File.tempfile("muse-dl-stitched-tmp", ".pdf")
+      # Do some sanity checks on each Chapter PDF
+      article_ids.each do |id|
+        raise Muse::Dl::Errors::MissingFile.new unless File.exists? Fetch.article_file_name(id, @tmp_file_path)
+        raise Muse::Dl::Errors::CorruptFile.new unless File.size(Fetch.article_file_name(id, @tmp_file_path)) > 0
+      end
+
+      # Now let's stitch them together
+      article_files = article_ids.map { |id| Fetch.article_file_name(id, @tmp_file_path) }
+      args = article_files + ["cat", "output", output_file.path]
+      is_success = execute args
+
+      # TODO: Validate final file here
+      if !is_success
+        puts args
+        raise Muse::Dl::Errors::PDFOperationError.new("Error stitching articles together.")
+      end

      return output_file
    end
--- a/src/thing.cr
+++ b/src/thing.cr
@ -19,6 +19,13 @@ module Muse::Dl

    private getter :h

+    def open_access
+      if @info.has_key? "Open Access"
+        return @info["Open Access"] == "Yes"
+      end
+      false
+    end
+
    def initialize(html : String)
      @h = Myhtml::Parser.new html
      @info = InfoParser.infobox(h)
--- a/src/util.cr
+++ b/src/util.cr
@ -2,7 +2,7 @@ module Muse::Dl
  class Util
    # Generates a safe filename
    def self.slug_filename(input : String)
-      input.strip.tr("\u{202E}%$|:;/\t\r\n\\", "-")
+      input.strip.tr("\u{202E}%$|:;/\"\t\r\n\\", "-")
    end
  end
 end
Author	SHA1	Message	Date
Nemo	2a35f3c68c	[dep] Crystal Upgrade to 1.10.1 The Crystal Debian repo has moved, so we shift as well. Debian 10 is still supported, so use it for now	2023-10-31 23:44:52 +05:30
Nemo	dc43331609	[dep] Dependency Upgrade Tested against 1.10.1	2023-10-31 23:42:35 +05:30
Nemo	24f4bb10c8	Create FUNDING.yml	2022-05-30 14:50:06 +05:30
Nemo	5fd0056d77	Dependency and version bump	2021-06-04 13:56:51 +05:30
Nemo	1e57857a4e	Version Bump (1.3.0)	2020-07-01 18:29:44 +05:30
Nemo	ba0a47038d	Remove input-pdf from README and help	2020-07-01 18:29:22 +05:30
Nemo	a4f5c03912	Merge pull request #8 from captn3m0/journal-support Adds Journal Support	2020-07-01 18:27:39 +05:30
Nemo	a05a1253db	Keep going with next issue	2020-07-01 18:26:48 +05:30
Nemo	03fccde754	Adds support for final journal downloads	2020-06-30 18:36:01 +05:30
Nemo	3a2d45fb6e	Adds a skip-open-access flag	2020-06-30 18:09:38 +05:30
Nemo	62e6a21c84	Finishes support for downloading complete issues	2020-06-30 17:36:44 +05:30
Nemo	38db0dd000	Adds tests for page detection	2020-06-30 16:50:49 +05:30
Nemo	919c8ac43f	Fixes parser for issue HTML This also adds .journal_title as an attribute to the Issue object	2020-06-30 15:19:12 +05:30
Nemo	870ed3080d	Modular code in fetch to support both chapters and articles	2020-06-30 14:47:51 +05:30
Nemo	f04e9b799e	Removes input_pdf and initial work on article download	2020-06-30 14:18:19 +05:30
Nemo	04a2fe52ec	Minor fixes, parse contents for issues	2020-06-30 14:08:28 +05:30
Nemo	aa392eaa64	Adds support for parsing title to volume/number/date of a journal issue	2020-06-16 19:27:11 +05:30
Nemo	c01e071328	[make] Adds tests to Makefile	2020-06-16 19:13:52 +05:30
Nemo	3e56efed52	Parses summary for issueS	2020-06-16 18:52:29 +05:30
Nemo	7b48731afe	Parse title and publisher for issues	2020-06-16 18:52:29 +05:30
Nemo	6b278531fd	Infobox is parsing for an issue now	2020-06-16 18:52:29 +05:30
Nemo	f11f64b9d5	Adds webmock	2020-06-16 18:52:29 +05:30
Nemo	ff225b12c6	Fix filenames with double-quotes	2020-06-16 18:52:29 +05:30
Nemo	4a358d0cb0	Journal parser now parses all issues	2020-06-16 18:52:29 +05:30
Nemo	d8702b2fcb	Initial work on parsing the journal page	2020-06-16 18:52:29 +05:30
Nemo	fcc4f0c48b	Clear out the Producer/Creator on the PDF	2020-06-16 18:52:28 +05:30
Nemo	a23bd52ffa	Fix Crystal and DL3008 issues	2020-05-14 03:40:42 +05:30
Nemo	3de4053037	[docker] Remove pinned versions	2020-05-14 01:31:38 +05:30
Nemo	487b222d79	Adds support for --dont-strip-first-page	2020-05-14 01:04:15 +05:30
Nemo	d245538e33	Version bump	2020-04-22 18:32:37 +05:30
Nemo	c3722430e1	Adds a check for rate-limit	2020-04-22 18:31:37 +05:30
Nemo	a2db89ddf7	[docs] Fix docker badges	2020-04-21 19:34:39 +05:30
Prad Nelluru	5e5158fe1c	Don't backoff for more than 256 seconds (~4 min) (#13 )	2020-04-21 17:56:25 +05:30
Nemo	ebf1b57e22	Merge pull request #12 from pradn/better-errors Improve error handling	2020-04-20 03:23:24 +05:30
Prad Nelluru	2206c41228	Use response.body, not response.body_io, which is nil when you pass in HTTPClient for some reason.	2020-04-19 17:50:06 -04:00
Prad Nelluru	4e435dd3ab	Add 60s timeout to downloads. Do backoff for all errors.	2020-04-19 17:44:21 -04:00
Prad Nelluru	9659c0ef5e	Trim chapter titles to ensure bookmarks are valid in PDF (#11 )	2020-04-20 02:03:30 +05:30
Prad Nelluru	762164e223	more descriptive error messages	2020-04-19 15:18:05 -04:00
Prad Nelluru	77201bda85	Fix download issue - revert to using body_io	2020-04-19 15:00:59 -04:00
Prad Nelluru	db2d86c1a8	Also add exception message to top-level rescue	2020-04-19 14:49:41 -04:00
Prad Nelluru	1d2f53bad0	forgot to git-add new error files	2020-04-19 14:46:26 -04:00
Prad Nelluru	26d96d3f7d	Remove assert that temp path be tmp. It has been changed to an actual random temp path so we can't test for it easily.	2020-04-19 02:40:42 -04:00
Prad Nelluru	5d9d951c9a	Write backtrace in top-level rescue blocks.	2020-04-19 02:24:09 -04:00
Prad Nelluru	483f838d24	Report pdftk and download errors. Add exponential backoff to downloading after download failures. Add top-level rescue block to improve forward progress.	2020-04-19 01:58:20 -04:00
Nemo	d52b06377d	Version bump (1.1.2)	2020-04-05 18:58:28 +05:30
Nemo	b7aad7a3c2	Add link to download message	2020-04-05 18:58:02 +05:30
Nemo	380f1f03f8	Put URL when skipping a file	2020-04-05 18:57:24 +05:30
Nemo	61005ab405	fix docker image to edge	2020-04-05 04:41:49 +05:30
Nemo	5ce11df239	[docker] Install Make	2020-04-05 03:08:57 +05:30
Nemo	449be5e554	Version bump	2020-04-05 02:55:35 +05:30
Nemo	c08b8b7284	Show version in help	2020-04-05 02:55:19 +05:30
Nemo	1d95cce3f8	Catch another PDF error	2020-04-05 02:14:50 +05:30
Nemo	aec6d853b3	Use latest release tag in docs	2020-04-04 03:53:03 +05:30
Nemo	78043e81a2	[docs] Adds list of docker images	2020-04-04 03:51:18 +05:30