Another day, another interesting data source to perform ETL on. This time, it’s a CSV file that’s downloaded from a remote server. Not going to focus on the part about using requests
to download the file, but jumping right into using PostgresHook.copy_expert()
to dump the CSV file into a staging table.
Even the actual code_export
method code doesn’t offer much insight, so I had to resort to looking through the psycopg2 docs (scroll to bottom of the page) to find out how to use this method!
Here’s what makes it work: you provide the method with a SQL statement (starting with COPY …) but instead of passing the file name–which just so happens to also be the second parameter you pass to copy_expert()
–you use STDIN
with COPY FROM
(or STDOUT
with COPY TO
)!
Pretty trivial in hindsight but it tripped me up for a few minutes as to why I don’t need to pass the file name twice (once as part of SQL statement & again as a parameter).