aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: dc718ceeb947b89d15c2742b24e5a2933a875d1d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# portugal-running-data
repo with scraper for the portugal running calendar data

| Filename                                                      | Source Script                                                 | Optional                                                      | Description                                                   |
| ------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------- |
| `lastmod`                                                     | `setup-directories`                                           | no                                                            | last modification time extracted from the sitemap file        |
| `page.html`                                                   | `fetch-page`                                                  | no                                                            | event page from portugalrunning.com                           |
| `id`                                                          | `extract-id`                                                  | no                                                            | event numeric id from wordpress                               |
| `data.json`                                                   | `fetch-data`                                                  | no                                                            | json file with some event data                                |
| `ics`                                                         | `fetch-ics`                                                   | no                                                            | calendar file with location, date and other event information |
| `location`                                                    | `fetch-location`                                              | yes                                                           | location data for the event                                   |
| `image`                                                       | `fetch-image`                                                 | yes                                                           | cover image for the event                                     |
| `date`                                                        | `extract-date`                                                | no                                                            | event date extracted from the ics file                        |
| `oneline-description`                                         | `fetch-oneline-description`                                   | yes                                                           | ai generated one line description                             |
| `categories`                                                  | `extract-categories`                                          | no                                                            | event categories                                              |
| `circuits`                                                    | `extract-circuits`                                            | no                                                            | event circuits                                                |

## `fetch-sitemap`
this script fetches the sitemap that contains a list of event page urls and the last modification date

## `fetch-pages`
this script will fetch any missing pages or outdated pages by looking at the lastmod file.

## `extract-ids`
this script will extract the event ids from the page.html file. this id can be used to later fetch other data related to this event.

## `fetch-ics`
this script uses the event id and fetches its ics file.

## `fetch-data`
this script uses the event id to fetch some event data in json format.

## `fetch-images`
some events have a main image in the json data file, this script will fetch that image.

## `extract-organizer`
this script extracts the organizer from the class list in the json data file, if one exists.

## `extract-categories`
this script extracts a list of categories from the class list in the json data file.