Changelog 5
Here is what I worked on since the last changelog.
tried out typst
I tried a new typesetting system: typst It is so awesome.
I love the command line interface. I love how simple it is in comparison with LaTeX. I love that it has a scripting language embedded. And I love that it invites you separate layout from data.
There are lots of interesting templates. But I didn’t find something that suited me well enough, so I just wrote my own. This would be a big undertaking in LaTeX, but in typst I could just learn how to do it on the fly.
My document is not completely done yet, but it is mainly a question of content at this point. I will definitely use typst again, especially when creating documents programmaticly.
Reactivated the Scraper
2 years ago I wrote a webscraper and automated it to run once a day. The program …
- scrapes all the job offers on a special website
- transforms them from html to org-mode
- saves the resulting files into a git repository
I managed to implement all this into a gitlab CI job.
Also the repo of the scraper is the very repo where the data ends up, which I find elegant.
Even more elegant:
This means that the CI job creates a new commit on it’s own repository.
It wasn’t even that hard, here is the .gitlab-ci.yml
:
image: python
stages:
- collect_data
- commit_and_push
collect_data:
stage: collect_data
before_script:
- pip install -r requirements.txt
script:
- python scrape.py
- find scraped_data | sort # show off what we created
artifacts:
paths:
- scraped_data
expire_in: 1 week
# Take the artifacts from the previous job and commit them as new data.
commit_and_push:
stage: commit_and_push
only:
- schedules
script:
- git checkout data # make sure to use the right branch
- rm -rf data # remove current data
- rm -rf scraped_data/html # remove stuff I don't want in the repo
- mv scraped_data data # put the new data in place
- git add --no-ignore-removal data # put all the changes in data on the index
- git -c user.email="$GITLAB_USER_EMAIL" -c user.name="$GITLAB_USER_NAME" commit --allow-empty --message="Added data_from $(date +%FT%T)"
- git push "https://gitlab-ci-token:$GITLAB_PUSH_TOKEN@gitlab.com/$CI_PROJECT_PATH.git" "HEAD:$CI_COMMIT_BRANCH"
The result of running this regularly is a versioned plain text job offerings. This is awesome if you are interested in questions like:
- What offerings contain the word a certain word (like
python
orarchitect
)? (just usegrep
/git grep
/ripgrep
) - How long has an offering been online yet? (just use
git log
) - How many offerings are added/removed from the website in a certain time frame? (just use
git diff --stat
) - Do offerings change after their initial posting? (
git log
again)
I find this insights fascinating. Especially because the website itself answers none of this questions, yet scraping it over some time provides the answers seemlessly.
…unless the pipeline breaks that is. I had not cared to fix the CI for some time. But now I did and the fix was pretty trivial.
Improved my desktop
I had some time to improve my desktop.
- Use a new nerdy greeter: tuigreet on greetd
- implement a workflow where I can edit screenshots right after taking them. I am still looking for a great tool to edit screenshots with. Perhaps the incoming gimp 3? But gimp has no good arrow-support :(
- allow mouse scrolling in tmux
- use more features of waybar
- add a shortcut to switch keyboard layouts