Improve fetch script, ditch the serial number

2022-04-25 12:38:09 +05:30 · 2022-04-25 12:38:09 +05:30 · 4553266282
parent 31c1a080ab
commit 4553266282
3 changed files with 31 additions and 15 deletions
--- a/.github/workflows/update.yml
+++ b/.github/workflows/update.yml
@ -22,10 +22,19 @@ jobs:
        format: "YYYY.M.D"
    - name: Update data
      run: ./fetch.sh
    # Only tag if we're running on the scheduled job
    - uses: stefanzweifel/git-auto-commit-action@v4
      if: ${{ github.event_name == 'schedule' }}
      with:
        commit_message: Update ISIN Data
        commit_author: 'github-actions[bot] <github-actions[bot]@users.noreply.github.com>'
        file_pattern: "*.csv"
        status_options: '--untracked-files=no'
        tagging_message: "v${{ steps.current-time.outputs.formattedTime }}"
    - uses: stefanzweifel/git-auto-commit-action@v4
      if: ${{ github.event_name == 'push' }}
      with:
        commit_message: Update ISIN Data
        commit_author: 'github-actions[bot] <github-actions[bot]@users.noreply.github.com>'
        file_pattern: "*.csv"
        status_options: '--untracked-files=no'
--- a/README.md
+++ b/README.md
@ -2,23 +2,25 @@
 ISIN Data from various public securities.
-Source: NSDL provides a ISIN Search at <https://nsdl.co.in/master_search.php>.
+Source: [NSDL Website Detailed ISIN Search][nsdl].
 Automatically updated every Sunday using GitHub Actions.
 Currently tracked:
-|File|Issuer|
+|File|Issuer|Tracked|
-----|-----
+-----|-----|----|
-`INA.csv`|Central Government
+`INA.csv`|Central Government|No
-`INB.csv`|State Government
+`INB.csv`|State Government|No
-`INE.csv`|Company, Statuatory Corporation, Banking Company
+`INE.csv`|Company, Statuatory Corporation, Banking Company|Yes
-`INF.csv`|Mutual Funds
+`INF.csv`|Mutual Funds|Yes
-`IN9.csv`|Partly paid up shares
+`IN9.csv`|Partly paid up shares|Yes
 **Note**: The [NSDL Website][nsdl] returns zero valid results for `INA, INB`, so those are not tracked.
 # Code
-You can run the `fetch.sh` script to generate all the files from scratch. Dependencies:
+You can run the `fetch.sh` script to generate the tracked the files from scratch. Dependencies:
 - https://github.com/ericchiang/pup
 - https://stedolan.github.io/jq/
@ -27,9 +29,13 @@ You can run the `fetch.sh` script to generate all the files from scratch. Depend
 # Structure
-See https://www.basunivesh.com/how-your-dmat-mutual-funds-and-shares-isin-structured/
+- https://www.basunivesh.com/how-your-dmat-mutual-funds-and-shares-isin-structured/
 - https://theindianstockbrokers.com/what-is-isin-number-and-how-to-find-it/
 # Alternative Sources
 - https://nsdl.co.in/downloadables/html/hold-mutual-fund-units.html
- [The Kuvera Mutual Fund Details API](https://stoplight.captnemo.in/docs/kuvera/reference/Kuvera.yaml/paths/~1mf~1api~1v4~1fund_schemes~1%7Bcodes%7D.json/get) returns ISIN codes.
+- [The Kuvera Mutual Fund Details API](https://stoplight.captnemo.in/docs/kuvera/reference/Kuvera.yaml/paths/~1mf~1api~1v4~1fund_schemes~1%7Bcodes%7D.json/get) returns ISIN codes.
 [nsdl]: https://nsdl.co.in/master_search.php
--- a/fetch.sh
+++ b/fetch.sh
@ -21,9 +21,7 @@ function fetch_page() {
    --connect-timeout 10 \
    --retry-max-time 30 \
    --data cnum=$1 \
-    --data "page_no=$2" |
+    --data "page_no=$2" | $PUP_BINARY '#nsdl-tables tr json{}' | \
  # for each row
  $PUP_BINARY '#nsdl-tables tr json{}' | \
  # generate 6 lines (second column has a link, so parse that) with raw output
  jq --raw-output '.[] | [.children[1].children[0].text, .children[2].text, .children[3].text,.children[4].text,.children[5].text]|.[]' | \
  # and create a CSV from every 5 lines
@ -47,11 +45,14 @@ function fetch_class() {
  done
 }
-for i in A B E F 9; do
+for i in E F 9; do
  total=$(fetch_total_pages "IN$i")
  echo "::group::IN$i (Total=$total)"
  rm "IN$i.csv"
  fetch_class "IN$i" $total
  echo "::endgroup::"
  # Sort the file in place
  sort -o "IN$i.csv" "IN$i.csv"
 done
 sem --wait