Create FUNDING.yml

Add README badges
update changelog
2022-05-27 07:22:57 +00:00 · 2022-01-26 11:10:00 +05:30 · 2021-12-31 13:15:36 +05:30 · 2021-12-30 17:29:41 +05:30 · 2021-12-30 17:18:03 +05:30 · 2021-12-30 17:02:30 +05:30
18 changed files with 307 additions and 57 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1,3 @@
+ko_fi: captn3m0
+liberapay: captn3m0
+github: captn3m0
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@ -0,0 +1,29 @@
+name: Run Tests
+on: push
+jobs:
+  python:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python: ["3.7", "3.8", "3.9", "3.10"]
+    env:
+      PYTHON_VERSION: ${{matrix.python}}
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python ${{matrix.python}}
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{matrix.python}}
+      - name: Install deps
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e .[testing]
+      - name: Run pytest
+        run: |
+          pytest --cache-clear --cov=./ --cov-report=xml --cov-report=html
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v1
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          files: ./coverage.xml
+          env_vars: RUNNER_OS,PYTHON_VERSION,CI,GITHUB_SHA,RUNNER_OS,GITHUB_RUN_ID
--- a/CHANGELOG.rst
+++ b/CHANGELOG.rst
@ -2,6 +2,20 @@
 Changelog
 =========

+Version 1.0.4
+=============
+- Switched from `html5` to `html5lib` as a dependency, since the former is unmaintained
+- Python 3.10 is now supported
+- Python 3.6 is no longer supported
+
+Version 1.0.3
+=============
+- Added tests and code coverage
+- PDFs can be directly fetched from Remote URLs
+- PDFs can be filtered to have start and end pages
+- Support for Python 3.6-3.8
+- Removed --cleanup argument, since that is default
+
 Version 1.0.2
 =============
 - Adds support for rotating PDFs
--- a/README.rst
+++ b/README.rst
@ -2,8 +2,35 @@
 pystitcher
 ==========

+.. image:: https://img.shields.io/pypi/v/pystitcher
+    :target: https://pypi.org/project/pystitcher/
+    :alt: PyPI Version
+
+.. image:: https://img.shields.io/pypi/l/pystitcher
+    :target: LICENSE.txt
+    :alt: Repository License
+
+.. image:: https://img.shields.io/github/checks-status/captn3m0/pystitcher/main
+    :target: https://github.com/captn3m0/pystitcher/actions?query=branch%3Amain
+    :alt: GitHub branch checks status
+
+.. image:: https://img.shields.io/codecov/c/gh/captn3m0/pystitcher
+    :target: https://app.codecov.io/gh/captn3m0/pystitcher/
+    :alt: Codecov
+
+|
+
 pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative input in the form of a markdown file. It is written in pure python and uses `PyPDF3 <https://pypi.org/project/PyPDF3/>`_ for reading and writing PDF files.

+Installation
+============
+
+You can install it easily using `pipx <https://pypa.github.io/pipx/>`_::
+
+	pipx install pystitcher
+	
+The Wiki has `Alternative Installation Instructions <https://github.com/captn3m0/pystitcher/wiki/Installation>`_.
+

 Description
 ===========
@ -38,8 +65,8 @@ Given this input::

 	# The Bills

-	- [Personal Data Protection Bill, 2019](1.a.pdf)
-	- [Personal Data Protection Bill, 2018](1.b.pdf)
+	- [Personal Data Protection Bill, 2019](https://example.com/2019-bill.pdf)
+	- [Personal Data Protection Bill, 2018](https://example.com/2018-bill.pdf)

 	# Other key reading material

@ -88,15 +115,29 @@ Configuration options can be specified with Meta data at the top of the file.
 |                     | for more details.                                                        |
 +---------------------+--------------------------------------------------------------------------+

-Additionally, PDF links specified in markdown can have attributes to alter the PDFs before merging::
+Additionally, PDF links specified in markdown can have attributes to alter the PDFs before merging. The below attribute will rotate the second PDF file by 90 degrees clockwise before merging::

 	[Part 1](1.pdf)
 	[Part 2](2.pdf){: rotate="90"}

-The above will rotate the second PDF file by 90 degrees clockwise before merging. List of attributes:
+And the below attribute will merge only pages 2 to 5, both inclusive, from the second PDF file::

-+---------------------+---------------------------------------------+
-| Attribute           | Notes                                       |
-+=====================+=============================================+
-| rotate              | Rotate the PDF. Valid values are 90,180,270 |
-+---------------------+---------------------------------------------+
+	[Part 1](1.pdf)
+	[Part 2](2.pdf){: start=2 end=5}
+
+The list of available attributes are:
+
+---------------------+-----------------------------------------------+
+| Attribute           | Notes                                         |
+=====================+===============================================+
+| rotate              | Rotate the PDF. Valid values are 90, 180, 270 |
+---------------------+-----------------------------------------------+
+| start               | Start page number for PDF page selection      |
+---------------------+-----------------------------------------------+
+| end                 | End page number for PDF page selection        |
+---------------------+-----------------------------------------------+
+
+Documentation
+=============
+
+Additional documentation is maintained on the `project wiki <https://github.com/captn3m0/pystitcher/wiki>`_ on GitHub.
--- a/setup.cfg
+++ b/setup.cfg
@ -36,16 +36,18 @@ package_dir =
    =src

 # Require a min/specific Python version (comma-separated conditions)
-python_requires = >=3.6
+python_requires = >=3.7

 # PyPDF3: Read and write PDF files
 # Markdown: Render input markdown file to HTML
-# html5: Parse HTML file to generate bookmarks
+# html5lib: Parse HTML file to generate bookmarks
+# validators: Validate URL for fetching external PDF
 install_requires =
    importlib-metadata; python_version<"3.8"
    PyPDF3>=1.0.4
    Markdown>=3.3.4
-    html5>=0.0.9
+    html5lib>=1.1
+    validators>=0.18.1

 [options.packages.find]
 where = src
@ -80,9 +82,9 @@ console_scripts =
 # in order to write a coverage file that can be read by Jenkins.
 # CAUTION: --cov flags may prohibit setting breakpoints while debugging.
 #          Comment those flags to avoid this py.test issue.
-addopts =
-    --cov pystitcher --cov-report term-missing
-    --verbose
+addopts = --verbose
+    # --cov pystitcher --cov-report term-missing
+    
 norecursedirs =
    dist
    build
--- a/src/pystitcher/skeleton.py
+++ b/src/pystitcher/skeleton.py
@ -52,9 +52,10 @@ def parse_args(args):
    )

    parser.add_argument(
-        '--cleanup',
-        action=argparse.BooleanOptionalAction,
+        '--no-cleanup',
+        action='store_false',
        default=True,
+        dest='cleanup',
        help="Delete temporary files"
    )

--- a/src/pystitcher/stitcher.py
+++ b/src/pystitcher/stitcher.py
@ -1,12 +1,17 @@
 import os
-import markdown
-from .bookmark import Bookmark
+import logging
+import shutil
+import tempfile
+import urllib.request
+import validators
+
 import html5lib
+import markdown
+
 from PyPDF3 import PdfFileWriter, PdfFileReader
 from PyPDF3.generic import FloatObject
 from pystitcher import __version__
-import tempfile
-import logging
+from .bookmark import Bookmark

 _logger = logging.getLogger(__name__)

@ -24,6 +29,10 @@ class Stitcher:
        DEFAULT_FIT = '/FitV'
        # Do not rotate by default
        DEFAULT_ROTATE = 0
+        # Start at page 1 by default
+        DEFAULT_START = 1
+        # End at the final page by default
+        DEFAULT_END = None

        # TODO: This is a hack
        os.chdir(self.dir)
@ -34,11 +43,27 @@ class Stitcher:
        self.attributes = md.Meta
        self.defaultFit = self._getAttribute('fit', DEFAULT_FIT)
        self.defaultRotate = self._getAttribute('rotate', DEFAULT_ROTATE)
+        self.defaultStart = self._getAttribute('start', DEFAULT_START)
+        self.defaultEnd = self._getAttribute('end', DEFAULT_END)

        document = html5lib.parseFragment(html, namespaceHTMLElements=False)
        for e in document.iter():
            self.iter(e)

+    """
+    Check if file has been cached locally and if
+    not cached, download from provided URL. Return
+    download filename
+    """
+    def _cacheURL(self, url):
+        if not os.path.exists(os.path.basename(url)):
+            _logger.info("Downloading PDF from remote URL %s", url)
+            with urllib.request.urlopen(url) as response, open(os.path.basename(url), 'wb') as downloadedFile:
+                shutil.copyfileobj(response, downloadedFile)
+        else:
+            _logger.info("Locally cached PDF found at %s", os.path.basename(url))
+        return os.path.basename(url)
+
    """
    Get the number of pages in a PDF file
    """
@ -92,11 +117,17 @@ class Stitcher:
            self.currentLevel = 3
        elif(tag =='a'):
            file = element.attrib.get('href')
-            rotate = element.attrib.get('rotate', self.defaultRotate)
+            if(validators.url(file)):
+                file = self._cacheURL(file)
            fit = element.attrib.get('fit', self.defaultFit)
+            rotate = int(element.attrib.get('rotate', self.defaultRotate))
+            start = int(element.attrib.get('start', self.defaultStart))
+            end = int(element.attrib.get('end', self._get_pdf_number_of_pages(file)
+                                         if self.defaultEnd is None else self.defaultEnd))
+            filters = (rotate, start, end)
            b = Bookmark(self.currentPage, element.text, self.currentLevel+1, fit)
-            self.files.append((file, self.currentPage, rotate))
-            self.currentPage += self._get_pdf_number_of_pages(file)
+            self.files.append((file, self.currentPage, filters))
+            self.currentPage += (end - start) + 1
        if b:
            self.bookmarks.append(b)

@ -133,7 +164,7 @@ class Stitcher:
        self.bookmarks = bookmarks

    """
-    Gets the last bookmkark level at a given page number
+    Gets the last bookmark level at a given page number
    on the combined PDF
    """
    def _get_level_from_page_number(self, page):
@ -190,13 +221,14 @@ class Stitcher:
    """
    def _merge(self, output):
        writer = PdfFileWriter()
-        for (inputFile,startPage,rotate) in self.files:
+        for (inputFile,startPage,filters) in self.files:
            assert os.path.isfile(inputFile), ERROR_PATH.format(inputFile)
            reader = PdfFileReader(open(inputFile, 'rb'))
            # Recursively iterate through the old bookmarks
            self._iterate_old_bookmarks(reader, startPage, reader.getOutlines())
-            for page in range(1, reader.getNumPages()+1):
-                writer.addPage(reader.getPage(page - 1).rotateClockwise(int(rotate)))
+            rotate, start, end = filters
+            for page in range(start, end + 1):
+                writer.addPage(reader.getPage(page - 1).rotateClockwise(rotate))

        writer.write(output)
        output.close()
--- a/tests/book-clean.md
+++ b/tests/book-clean.md
@ -1,5 +1,6 @@
 existing_bookmarks: remove
 author: Wiki, the Cat
+title: Super Jelly Book
 subject: A book about adventures of Wiki, the cat.
 keywords: wiki,potato,jelly
 # Super Potato Book
--- a/tests/book-external-url.md
+++ b/tests/book-external-url.md
@ -0,0 +1,17 @@
+existing_bookmarks: remove
+author: Wiki, the Cat
+subject: A book about adventures of Wiki, the cat.
+keywords: wiki,potato,jelly
+# Super Potato Book
+
+# Volume 1
+
+[Part 1](1.pdf)
+
+# Volume 2
+
+[Part 2](https://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf)
+
+# Volume 3
+
+[Part 3](https://juventudedesporto.cplp.org/files/sample-pdf_9359.pdf)
--- a/tests/book-flatten.md
+++ b/tests/book-flatten.md
@ -1,5 +1,4 @@
 existing_bookmarks: flatten
-title: Super Jelly Book

 # Super Potato Book

--- a/tests/book-headings.md
+++ b/tests/book-headings.md
@ -0,0 +1,10 @@
+# Heading 1
+[Part 1](1.pdf)
+
+## Heading 2
+
+[Part 2](1.pdf)
+
+### Heading 3
+
+[Part 3](1.pdf)
--- a/tests/book-min.md
+++ b/tests/book-min.md
--- a/tests/book-page-select.md
+++ b/tests/book-page-select.md
@ -0,0 +1,18 @@
+existing_bookmarks: keep
+# Super Potato Book
+
+# Volume 1
+
+[Part 1](1.pdf){: start=1 end=2}
+
+# Volume 2
+
+[Part 2](2.pdf){: start=2}
+
+# Volume 3
+
+[Part 3](1.pdf){: end=2}
+
+# Volume 4
+
+[Part 4](2.pdf){: start=1 end=3 rotate="90"}
--- a/tests/book-rotate.md
+++ b/tests/book-rotate.md
@ -1,7 +1,4 @@
 existing_bookmarks: remove
-author: Wiki, the Cat
-subject: A book about adventures of Wiki, the cat.
-keywords: wiki,potato,jelly
 # Super Potato Book

 # Volume 1
--- a/tests/book-title.md
+++ b/tests/book-title.md
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@ -0,0 +1,15 @@
+from pystitcher.skeleton import parse_args
+import logging
+
+def test_default_args():
+    args = parse_args(['tests/book-clean.md', 'o.pdf'])
+    assert args.loglevel == None
+    assert args.cleanup == True
+
+def test_loglevel():
+    args = parse_args(['-v', 'tests/book-clean.md', 'o.pdf'])
+    assert args.loglevel == logging.INFO
+
+def test_cleanup():
+    args = parse_args(['--no-cleanup', 'tests/book-clean.md', 'o.pdf'])
+    assert args.cleanup == False
--- a/tests/test_integration.py
+++ b/tests/test_integration.py
@ -0,0 +1,96 @@
+import os
+import io
+
+import PyPDF3
+from pystitcher.stitcher import Stitcher
+from pystitcher import __version__
+
+import pytest
+from contextlib import redirect_stdout
+
+ROOT_DIR = os.path.dirname(os.path.abspath(__file__)) + "/../"
+
+"""
+Fixtures for the integration tests. Each test is a tuple consisting of 4 things:
+- input name (used as book-{name}.md)
+- total expected page count
+- A dictionary of expected metadata. Leave empty if nothing is set
+- A flattened list of expected bookmarks, with each bookmark as a tuple containing:
+  - Title
+  - Destination Page Number
+  - Bookmark Level (default = 0)
+Each of the above 4 is passed to test_book as an argument
+"""
+TEST_DATA = [
+    ("clean",6, {'Author': 'Wiki, the Cat', 'Title': 'Super Jelly Book', 'Subject': 'A book about adventures of Wiki, the cat.', 'Keywords': 'wiki,potato,jelly'}, [('Super Potato Book', 0, 0), ('Volume 1', 0, 0), ('Part 1', 0, 1), ('Volume 2', 3, 0), ('Part 2', 3, 1)]),
+    ("keep",6, {'Title': 'Super Potato Book'}, [('Super Potato Book', 0, 0), ('Volume 1', 0, 0), ('Part 1', 0, 1), ('Chapter 1', 0, 2), ('Chapter 2', 1, 2), ('Scene 1', 1, 3), ('Scene 2', 2, 3), ('Volume 2', 3, 0), ('Part 3', 3, 1), ('Chapter 3', 3, 2), ('Chapter 4', 4, 2), ('Scene 3', 4, 3), ('Scene 4', 5, 3)]),
+    ("flatten", 6, {}, [('Super Potato Book', 0, 0), ('Volume 1', 0, 0), ('Part 1', 0, 1), ('Chapter 1', 0, 2), ('Chapter 2', 1, 2), ('Scene 1', 1, 2), ('Scene 2', 2, 2), ('Volume 2', 3, 0), ('Part 3', 3, 1), ('Chapter 3', 3, 2), ('Chapter 4', 4, 2), ('Scene 3', 4, 2), ('Scene 4', 5, 2)]),
+    ("rotate", 9, {}, [('Super Potato Book', 0, 0), ('Volume 1', 0, 0), ('Part 1', 0, 1), ('Volume 2', 3, 0), ('Part 2', 3, 1), ('Volume 3', 6, 0), ('Part 3', 6, 1)]),
+    ("min",3, {}, [('Part 1', 0, 0), ('Chapter 1', 0, 1), ('Chapter 2', 1, 1), ('Scene 1', 1, 2), ('Scene 2', 2, 2)]),
+    ("page-select", 9, {}, [('Super Potato Book', 0, 0), ('Volume 1', 0, 0), ('Part 1', 0, 1), ('Chapter 1', 0, 2), ('Chapter 2', 1, 2), ('Scene 1', 1, 3), ('Volume 2', 2, 0), ('Part 2', 2, 1), ('Scene 2', 2, 2), ('Chapter 3', 2, 2), ('Chapter 4', 3, 2), ('Scene 3', 3, 3), ('Volume 3', 4, 0), ('Part 3', 4, 1), ('Scene 4', 4, 2), ('Chapter 1', 4, 2), ('Chapter 2', 5, 2), ('Scene 1', 5, 3), ('Volume 4', 6, 0), ('Part 4', 6, 1), ('Scene 2', 6, 2), ('Chapter 3', 6, 2), ('Chapter 4', 7, 2), ('Scene 3', 7, 3), ('Scene 4', 8, 3)]),
+    ("headings", 9, {'Title': 'Heading 1'}, [('Heading 1', 0, 0), ('Part 1', 0, 1), ('Heading 2', 3, 1), ('Part 2', 3, 2), ('Heading 3', 6, 2), ('Part 3', 6, 3)])
+]
+
+def pdf_name(name):
+    return "tests/%s.pdf" % name
+
+def render(name, cleanup=True):
+    input_file = open("tests/book-%s.md" % name, 'r')
+    output_file = "%s.pdf" % name
+    stitcher = Stitcher(input_file)
+    stitcher.generate(output_file, cleanup)
+    # Switch back to main directory
+    os.chdir(ROOT_DIR)
+    return pdf_name(name)
+
+def flatten_bookmarks(bookmarks, level=0):
+    """Given a list, possibly nested to any level, return it flattened."""
+    output = []
+    for destination in bookmarks:
+        if type(destination) == type([]):
+            output.extend(flatten_bookmarks(destination, level+1))
+        else:
+            output.append((destination, level))
+    return output
+
+def get_all_bookmarks(pdf):
+    """ Returns a list of all bookmarks with title, page number, and level in a PDF file"""
+    bookmarks = flatten_bookmarks(pdf.getOutlines())
+    return [(d[0]['/Title'], pdf.getDestinationPageNumber(d[0]), d[1]) for d in bookmarks]
+
+@pytest.mark.parametrize("name,pages,metadata,bookmarks", TEST_DATA)
+def test_book(name, pages, metadata, bookmarks):
+    output_file = render(name)
+    pdf = PyPDF3.PdfFileReader(output_file)
+    assert pages == pdf.getNumPages()
+    assert bookmarks == get_all_bookmarks(pdf)
+    info = pdf.getDocumentInfo()
+    identity = "pystitcher/%s" % __version__
+    assert identity == info['/Producer']
+    assert identity == info['/Creator']
+    for key in metadata:
+        assert info["/%s" % key] == metadata[key]
+
+def test_rotation():
+    """ Validates the book-rotate.pdf with pages rotated."""
+    output_file = render("rotate")
+    pdf = PyPDF3.PdfFileReader(output_file)
+    # Note that inputs to getPage are 0-indexed
+    assert 90 == pdf.getPage(3)['/Rotate']
+    assert 90 == pdf.getPage(4)['/Rotate']
+    assert 90 == pdf.getPage(5)['/Rotate']
+    assert 180 == pdf.getPage(6)['/Rotate']
+    assert 180 == pdf.getPage(7)['/Rotate']
+    assert 180 == pdf.getPage(8)['/Rotate']
+
+def test_cleanup_disabled():
+    f = io.StringIO()
+    with redirect_stdout(f):
+        output_file = render("min", False)
+    temp_filename = f.getvalue()[29:-1]
+    assert os.path.exists(temp_filename)
+    pdf = PyPDF3.PdfFileReader(temp_filename)
+    assert 3 == pdf.getNumPages()
+    assert [] == pdf.getOutlines()
+    # Clean it up manually to avoid cluttering
+    os.remove(temp_filename)
--- a/tests/test_skeleton.py
+++ b/tests/test_skeleton.py
@ -1,25 +0,0 @@
-import pytest
-
-from pystitcher.skeleton import fib, main
-
-__author__ = "Nemo"
-__copyright__ = "Nemo"
-__license__ = "MIT"
-
-
-def test_fib():
-    """API Tests"""
-    assert fib(1) == 1
-    assert fib(2) == 1
-    assert fib(7) == 13
-    with pytest.raises(AssertionError):
-        fib(-10)
-
-
-def test_main(capsys):
-    """CLI Tests"""
-    # capsys is a pytest fixture that allows asserts agains stdout/stderr
-    # https://docs.pytest.org/en/stable/capture.html
-    main(["7"])
-    captured = capsys.readouterr()
-    assert "The 7-th Fibonacci number is 13" in captured.out
Author	SHA1	Message	Date
Nemo	bb122223fd	Create FUNDING.yml	2022-05-27 07:22:57 +00:00
Vonter	2c386a3f2f	Add README badges	2022-01-26 11:10:00 +05:30
Nemo	e62284f3b0	update changelog	2021-12-31 13:15:36 +05:30
Nemo	55bfb6e26b	Merge pull request #20 from captn3m0/python-upgrade	2021-12-30 17:29:41 +05:30
Nemo	be985dd40b	[dep] switch from html5 to html5lib	2021-12-30 17:18:03 +05:30
Nemo	c614de7efc	[ci] Run tests on python3.10	2021-12-30 17:02:30 +05:30
Nemo	f617c6fde5	Add Installation instructions Closes #19	2021-07-21 18:17:00 +00:00
Nemo	5167dd4c8a	Merge pull request #18 from captn3m0/old-python Support older python releases	2021-07-16 17:07:24 +05:30
Nemo	dd8129aa2d	Fix for older Python	2021-07-16 17:05:27 +05:30
Nemo	3ea18ff01b	[tests] Add tests for argument parser	2021-07-16 16:57:09 +05:30
Nemo	2db41250f6	docs: Update docs to mention remote URL support	2021-07-05 13:34:45 +05:30
Nemo	cc2a58bddc	Add Tests (#13 ) Basic functional tests that cover 90% of the usecases. Doesn't cover zoomlevel, remote fetch yet.	2021-07-04 07:27:18 +00:00
Vonter	af4752bee1	Merge pull request #11 from captn3m0/feature/external_url Add basic implementation of external URL fetching of PDFs	2021-06-27 20:51:10 +05:30
Vonter	052060d256	Fix setup.cfg Included validators	2021-06-27 17:57:38 +05:30
Vonter	e70166efc2	Fix logged filename for locally cached file	2021-06-27 17:43:09 +05:30
Vonter	31faa1a36c	Add external URL fetching of PDFs Also changed import order according to PEP8	2021-06-27 17:33:49 +05:30
Vonter	ebc9c1e0cf	Update README.rst Fixed attribute table	2021-06-27 00:15:26 +05:30
Vonter	1324c2e4aa	Merge pull request #10 from Vonter/feature/page_filter Add PDF page selection/filter	2021-06-27 00:12:17 +05:30
Vonter	487e1002d4	Make defaultEnd correspond to absolute page number	2021-06-27 00:03:57 +05:30
Vonter	096b1f6be2	Add PDF page selection/filter	2021-06-26 22:56:38 +05:30
Nemo	4f505efde2	Add link to wiki	2021-06-26 18:05:47 +05:30