I am pretty darn frustrated after having spent more than an hour trying to clean up my mess from a 5 second code change. At work we are now supposed to squash (rebase?) multiple commits in a single branch into a single commit. Before starting at this job, I had done exactly zero rebases in […]
Category: Uncategorized
Interviewing Data Engineer Candidates
At my current employer, as a part of the interviewing process we have a technical challenge & team interview (after an initial phone screen interview). One thing that I really appreciate about where I work now is that when it comes to considering data engineer candidates, we absolutely will bring in people that lack any […]
9 Rules To Interviewing For Data Engineer Jobs
This post is the first of two covering interviewing from 2 different perspectives: as the interviewee & the interviewer. It seems like there’s probably (at least!) one post a week on reddit’s /r/dataengineering asking for advice about getting a DE job, and many of the posts seem to hover around dealing with/getting past impostor syndrome. […]
Explaining My Recent Silence…
The reason for the lack of Airflow and/or PostgreSQL posts is simple: I took a new job. I left my old data engineering (DE) job a few days before Christmas & started a new job at the beginning of January. While I greatly enjoyed many aspects of the old job, company, & people that I […]
Another Way to Dynamically Create New Tasks: Factory Methods
New project, new challenge. This time I wanted to create very similar tasks (in this case to drop indexes), however, I didn’t want to be locked in with doing those similar tasks at the same time (read: in parallel with help from an iterable). So while I will not even pretend to be the originator […]
PostgreSQL Foreign Data Wrappers: file_fdw
If there’s one thing that I love about PostgreSQL, it’s foreign data wrappers (FDW). Being able to create a connection to another source–usually another database–can make the seemingly impossible possible. However, it’s essential to understand what they do–and may not/don’t do–before you go creating them on your production database. Before I get any deeper into […]
Testing Task Dependencies In Your DAG
While I have been using a simpler version in the past, it is only recently that I wrapped my head around how to simplify the creation–and really the maintenance–of upstream & downstream task dependencies in a DAG. The way I had been doing it involved writing these massive tuples with the task names over & […]
Marking Tests as “Happy” or “Sad” with pytest
I was watching the video of a talk on Advanced pytest from EuroPython 2019 where the presenter (also a pytest core developer) used some interesting decorators in his slides (actually his blog). Being rather familiar with @pytest.mark.parametrize(), it wasn’t entirely obvious at first what @pytest.mark.wip, @pytest.mark.happy, & @pytest.mark.sad were doing or functionality they provided. However, […]
Why Setting an SLA for Your DAGs Might Be a Very Good Idea
While I feel like I’d already taken plenty of steps to be notified if there’s a problem when a DAG fails, I recently discovered that I hadn’t done anything to catch those scenarios where a DAG takes longer–or in my case, a lot longer–to complete (or fail) than normal. Thus, the need to learn how […]
Complex Formatting of Results in List or Dictionary Comprehensions (or Generators)
I love the power that comes with list (or dictionary) comprehensions or generators in Python. I also despise the code pattern of iterating over object and appending the results to another list. Let’s keep an example simple with a list comprehension performing a single calculation on each value in the original list: In either bits […]