Improve fetch script, ditch the serial number

2022-04-25 12:38:09 +05:30 · 2022-04-25 12:38:09 +05:30 · 4553266282
parent 31c1a080ab
commit 4553266282
3 changed files with 31 additions and 15 deletions
--- a/.github/workflows/update.yml
+++ b/.github/workflows/update.yml
@ -22,10 +22,19 @@ jobs:
        format: "YYYY.M.D"
    - name: Update data
      run: ./fetch.sh
+    # Only tag if we're running on the scheduled job
    - uses: stefanzweifel/git-auto-commit-action@v4
+      if: ${{ github.event_name == 'schedule' }}
      with:
        commit_message: Update ISIN Data
        commit_author: 'github-actions[bot] <github-actions[bot]@users.noreply.github.com>'
        file_pattern: "*.csv"
        status_options: '--untracked-files=no'
        tagging_message: "v${{ steps.current-time.outputs.formattedTime }}"
+    - uses: stefanzweifel/git-auto-commit-action@v4
+      if: ${{ github.event_name == 'push' }}
+      with:
+        commit_message: Update ISIN Data
+        commit_author: 'github-actions[bot] <github-actions[bot]@users.noreply.github.com>'
+        file_pattern: "*.csv"
+        status_options: '--untracked-files=no'
--- a/README.md
+++ b/README.md
@ -2,23 +2,25 @@

 ISIN Data from various public securities.

-Source: NSDL provides a ISIN Search at <https://nsdl.co.in/master_search.php>.
+Source: [NSDL Website Detailed ISIN Search][nsdl].

 Automatically updated every Sunday using GitHub Actions.

 Currently tracked:

-|File|Issuer|
-----|-----
-`INA.csv`|Central Government
-`INB.csv`|State Government
-`INE.csv`|Company, Statuatory Corporation, Banking Company
-`INF.csv`|Mutual Funds
-`IN9.csv`|Partly paid up shares
+|File|Issuer|Tracked|
+-----|-----|----|
+`INA.csv`|Central Government|No
+`INB.csv`|State Government|No
+`INE.csv`|Company, Statuatory Corporation, Banking Company|Yes
+`INF.csv`|Mutual Funds|Yes
+`IN9.csv`|Partly paid up shares|Yes
+
+**Note**: The [NSDL Website][nsdl] returns zero valid results for `INA, INB`, so those are not tracked.

 # Code

-You can run the `fetch.sh` script to generate all the files from scratch. Dependencies:
+You can run the `fetch.sh` script to generate the tracked the files from scratch. Dependencies:

 - https://github.com/ericchiang/pup
 - https://stedolan.github.io/jq/
@ -27,9 +29,13 @@ You can run the `fetch.sh` script to generate all the files from scratch. Depend

 # Structure

-See https://www.basunivesh.com/how-your-dmat-mutual-funds-and-shares-isin-structured/
+- https://www.basunivesh.com/how-your-dmat-mutual-funds-and-shares-isin-structured/
+- https://theindianstockbrokers.com/what-is-isin-number-and-how-to-find-it/
+

 # Alternative Sources

 - https://nsdl.co.in/downloadables/html/hold-mutual-fund-units.html
- [The Kuvera Mutual Fund Details API](https://stoplight.captnemo.in/docs/kuvera/reference/Kuvera.yaml/paths/~1mf~1api~1v4~1fund_schemes~1%7Bcodes%7D.json/get) returns ISIN codes.
+- [The Kuvera Mutual Fund Details API](https://stoplight.captnemo.in/docs/kuvera/reference/Kuvera.yaml/paths/~1mf~1api~1v4~1fund_schemes~1%7Bcodes%7D.json/get) returns ISIN codes.
+
+[nsdl]: https://nsdl.co.in/master_search.php
--- a/fetch.sh
+++ b/fetch.sh
@ -21,9 +21,7 @@ function fetch_page() {
    --connect-timeout 10 \
    --retry-max-time 30 \
    --data cnum=$1 \
-    --data "page_no=$2" |
-  # for each row
-  $PUP_BINARY '#nsdl-tables tr json{}' | \
+    --data "page_no=$2" | $PUP_BINARY '#nsdl-tables tr json{}' | \
  # generate 6 lines (second column has a link, so parse that) with raw output
  jq --raw-output '.[] | [.children[1].children[0].text, .children[2].text, .children[3].text,.children[4].text,.children[5].text]|.[]' | \
  # and create a CSV from every 5 lines
@ -47,11 +45,14 @@ function fetch_class() {
  done
 }

-for i in A B E F 9; do
+for i in E F 9; do
  total=$(fetch_total_pages "IN$i")
  echo "::group::IN$i (Total=$total)"
+  rm "IN$i.csv"
  fetch_class "IN$i" $total
  echo "::endgroup::"
+  # Sort the file in place
+  sort -o "IN$i.csv" "IN$i.csv"
 done

 sem --wait