diff options
| author | diogo464 <[email protected]> | 2025-07-21 15:02:48 +0100 |
|---|---|---|
| committer | diogo464 <[email protected]> | 2025-07-21 15:02:48 +0100 |
| commit | 8c8dabd0ed20679a2dad43a5c239f9fcfe1c1ad7 (patch) | |
| tree | 55abbcfbbff19efa3aaf6cf36540ac7651c54973 /README.md | |
init
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 000000000..dc718ceeb --- /dev/null +++ b/README.md | |||
| @@ -0,0 +1,40 @@ | |||
| 1 | # portugal-running-data | ||
| 2 | repo with scraper for the portugal running calendar data | ||
| 3 | |||
| 4 | | Filename | Source Script | Optional | Description | | ||
| 5 | | ------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------- | | ||
| 6 | | `lastmod` | `setup-directories` | no | last modification time extracted from the sitemap file | | ||
| 7 | | `page.html` | `fetch-page` | no | event page from portugalrunning.com | | ||
| 8 | | `id` | `extract-id` | no | event numeric id from wordpress | | ||
| 9 | | `data.json` | `fetch-data` | no | json file with some event data | | ||
| 10 | | `ics` | `fetch-ics` | no | calendar file with location, date and other event information | | ||
| 11 | | `location` | `fetch-location` | yes | location data for the event | | ||
| 12 | | `image` | `fetch-image` | yes | cover image for the event | | ||
| 13 | | `date` | `extract-date` | no | event date extracted from the ics file | | ||
| 14 | | `oneline-description` | `fetch-oneline-description` | yes | ai generated one line description | | ||
| 15 | | `categories` | `extract-categories` | no | event categories | | ||
| 16 | | `circuits` | `extract-circuits` | no | event circuits | | ||
| 17 | |||
| 18 | ## `fetch-sitemap` | ||
| 19 | this script fetches the sitemap that contains a list of event page urls and the last modification date | ||
| 20 | |||
| 21 | ## `fetch-pages` | ||
| 22 | this script will fetch any missing pages or outdated pages by looking at the lastmod file. | ||
| 23 | |||
| 24 | ## `extract-ids` | ||
| 25 | this script will extract the event ids from the page.html file. this id can be used to later fetch other data related to this event. | ||
| 26 | |||
| 27 | ## `fetch-ics` | ||
| 28 | this script uses the event id and fetches its ics file. | ||
| 29 | |||
| 30 | ## `fetch-data` | ||
| 31 | this script uses the event id to fetch some event data in json format. | ||
| 32 | |||
| 33 | ## `fetch-images` | ||
| 34 | some events have a main image in the json data file, this script will fetch that image. | ||
| 35 | |||
| 36 | ## `extract-organizer` | ||
| 37 | this script extracts the organizer from the class list in the json data file, if one exists. | ||
| 38 | |||
| 39 | ## `extract-categories` | ||
| 40 | this script extracts a list of categories from the class list in the json data file. | ||
