A small automatically-updated subset of the Microsoft Knowledgebase metadata, useful for mapping KB IDs to Publication Dates or URLs
Go to file
github-actions[bot] d8bcfd113e Update KB Data
2024-07-24 18:35:34 +00:00
.github/workflows fix publish 2024-07-12 16:05:40 +05:30
.gitignore Add retry and timeouts 2024-07-04 10:29:24 +05:30
data.json Update KB Data 2024-07-24 18:35:34 +00:00
discovery.txt add more urls 2024-07-02 14:25:35 +05:30
LICENSE Add LICENSE 2023-01-12 15:55:25 +05:30
README.md add more urls 2024-07-02 14:25:35 +05:30
requirements.txt Add retry and timeouts 2024-07-04 10:29:24 +05:30
update.py fix url variable 2024-07-04 10:44:29 +05:30

Microsoft Knowledge Base metadata

This repository hosts a small subset of the Microsoft Knowledgebase metadata. The data in the data.json contains the following:

  1. Date of the KB publication
  2. KB UUID
  3. KB Slug
  4. KB URL

The list of KB IDs in the database is scraped from the URLs in discovery.txt. The primary usecase of the dataset is to provide a KB:DATE mapping to other projects.

wip

Discovery notes. Need to check the sitemaps more thoroughly before i automate this

See https://learn.microsoft.com/_sitemaps/sitemapindex.xml

curl --silent https://learn.microsoft.com/_sitemaps/officeupdates_en-us_1.xml | yq -p xml -o c '.urlset.url[]|.loc' >> discovery.txt
curl --silent https://learn.microsoft.com/_sitemaps/security-updates_en-us_1.xml | yq -p xml -o c '.urlset.url[]|.loc' >> discovery.txt

license

Data and code in this repository is licensed under Creative Commons Zero v1.0 Universal.