Learning to effectively test data pipelines
A Talk by Jacqueline Bilston (Software Developer, Yelp)
About this Talk
This talk is a story, using examples in Python and pySpark, about testing ETL pipelines efficiently. I won’t try to convince you that you need unit tests or automated tests – that’s up to you. If you do have unit tests for your ETL pipelines, or if you want them, it can be useful to make sure your tests are verifying what's important. I’ll be describing how a practical (non-pyramid shaped) heuristic helps me avoid duplicate test coverage, so that I test only the code needed for the feature I’m building.