Better Everyday

More progress, thanks Ai

drodol

28 Jun 2024 • 2 min read

I keep on trucking along with the port to Django and how smooth coding has been these days.

Here's the "to-do" list from yesterday:

Things to do:

Get scraper to work with pagination (for now, it just scrapes the first 20 jobs that appear in the first page).

After pagination works, make the scraper run on a schedule with Celery.

Re-use the code from the Language Detector from the last project to build one and connect it to this new project. Make it work as a Celery worker.

Re-use the code from the Job Categorizer from the last project and incorporate it into this project, and make it work as a Celery worker also.

Port as much of the frontend as possible.

Buy a server with Hetzner, bring the project live, first into a password protected staging environment.

Here are the things I have managed to do so far:

~~Get scraper to work with pagination (for now, it just scrapes the first 20 jobs that appear in the first page).~~

I didn't have to fiddle with getting pagination to work after all because when inspecting the url structure, I noticed that including a page parameter would do the trick >) I am triggering the scraper now manually. It should in theory scrape all pages until failure (but I won't let it scrape all pages because out of 100 pages, I am only interested in the first half the first time it runs to populate the DB). The next time I run it, it will scrape all jobs in the first page and compare them with those in the DB, if there are new ones, they will be saved and it will go to the next page, if not, then it will stop the scraping job.

Moreover, I expanded the scope of the scraping to scrape from all regions in Denmark (previously it was just the Greater Copenhagen area). I have made it so that all major regions are scraped, and when running the scraper manually, you can pass a --region flag to the run_scraper.py script command, followed by the region for localised scraping. Neat. (

~~Re-use the code from the Language Detector from the last project to build one and connect it to this new project. Make it work as a Celery worker.~~

This one works already. Job description languages are being detected and saved to the language attribute in the DB. This is currently happening during the job creation as the scraper runs, but it is also happening following a signal sent when a job is saved to the DB. In the not too distant future, I will change this behaviour so that the language detection happens within the model, and not in a separate signal.py file.

Things to do:

~~Get scraper to work with pagination (for now, it just scrapes the first 20 jobs that appear in the first page).~~

After pagination works, make the scraper run on a schedule with Celery. I will hold off on this one as it is almost done, but won't do it until the categorizer is done.

~~Re-use the code from the Language Detector from the last project to build one and connect it to this new project. Make it work as a Celery worker.~~

Re-use the code from the Job Categorizer from the last project and incorporate it into this project, and make it work as a Celery worker also.

Port as much of the frontend as possible.

Buy a server with Hetzner, bring the project live, first into a password protected staging environment.