Changelog 5

Here is what I worked on since the last changelog.

tried out typst

I tried a new typesetting system: typst It is so awesome.

I love the command line interface. I love how simple it is in comparison with LaTeX. I love that it has a scripting language embedded. And I love that it invites you separate layout from data.

There are lots of interesting templates. But I didn’t find something that suited me well enough, so I just wrote my own. This would be a big undertaking in LaTeX, but in typst I could just learn how to do it on the fly.

My document is not completely done yet, but it is mainly a question of content at this point. I will definitely use typst again, especially when creating documents programmaticly.

Reactivated the Scraper

2 years ago I wrote a webscraper and automated it to run once a day. The program …

  1. scrapes all the job offers on a special website
  2. transforms them from html to org-mode
  3. saves the resulting files into a git repository

I managed to implement all this into a gitlab CI job. Also the repo of the scraper is the very repo where the data ends up, which I find elegant. Even more elegant: This means that the CI job creates a new commit on it’s own repository. It wasn’t even that hard, here is the .gitlab-ci.yml:

image: python

stages:
  - collect_data
  - commit_and_push

collect_data:
  stage: collect_data
  before_script:
    - pip install -r requirements.txt
  script:
    - python scrape.py
    - find scraped_data | sort    # show off what we created
  artifacts:
    paths:
      - scraped_data
    expire_in: 1 week

# Take the artifacts from the previous job and commit them as new data.
commit_and_push:
  stage: commit_and_push
  only:
    - schedules
  script:
    - git checkout data                 # make sure to use the right branch
    - rm -rf data                       # remove current data
    - rm -rf scraped_data/html          # remove stuff I don't want in the repo
    - mv scraped_data data              # put the new data in place
    - git add --no-ignore-removal data  # put all the changes in data on the index
    - git -c user.email="$GITLAB_USER_EMAIL" -c user.name="$GITLAB_USER_NAME" commit --allow-empty --message="Added data_from $(date +%FT%T)"
    - git push "https://gitlab-ci-token:$GITLAB_PUSH_TOKEN@gitlab.com/$CI_PROJECT_PATH.git" "HEAD:$CI_COMMIT_BRANCH"

The result of running this regularly is a versioned plain text job offerings. This is awesome if you are interested in questions like:

I find this insights fascinating. Especially because the website itself answers none of this questions, yet scraping it over some time provides the answers seemlessly.

…unless the pipeline breaks that is. I had not cared to fix the CI for some time. But now I did and the fix was pretty trivial.

Improved my desktop

I had some time to improve my desktop.