testing notebooks (Draft)
5 min read
June 10, 2022

The reason you need to write tests, even when you work with notebooks

Testing is a process that's often underestimated and undervalued, but it's one of the most important parts of creating high-quality software. The goal of testing is to enable the sustainable growth of your software project. And we as data analysts, engineers, and scientists. are constantly working with code, and improving our current code. In order to keep up with this growth, we need to write proper tests to know that existing functionality keeps working.

The problem with notebooks.

In today's world notebooks are becoming the standard way of writing code for data projects. Data scientist and engineers are writing their code interchangeably on platforms like Jupyter and Databricks. Those tools are an extremely easy way to quickly prototype, do research, or do data analysis.

The problem with notebooks is that they are not optimized for testing. It is quite annoying that those great tools do not give us the proper (in-build) ways to properly test our code. We can not directly create a test suite, import functions, and run those tests with Pytest. We do not have a debugger. We have to make complex automated executions that call other notebooks or do something like making inline assertions.

Lending good software practices.

High-quality software products are built using proper engineering tools and best practices. We need our software to be modular, cohesive, and have a good separation of concerns, abstraction, and information hiding. Working in notebooks encourages you to put code into cells and run them interactively. But to develop highly maintainable software is to break down your solution into smaller and more cohesive components that can be composed and reused. This means that we need to put our code inside modules, classes, and functions. When working with notebooks version control is also something that is difficult. 

Quality software is also made using proper tools with git and an IDE. This IDE helps the coder automatically help with debugging, and formatting our code, and helps us with compilation errors.

Proper software has something like a continuous deployment pipeline. This is the process where new code from version control will lead to automated builds, that will be unit, and integration tested. After passing these tests, a deployment artifact is created which will be released via an automated orchestrated workflow or saved to a shared repository. When software is released other tests can be applied. When all tests are passed we certainly know that our software fits all requirements. If we lack tests, how can you be sure that your code will work in production? When tests are failing we can quickly reject the release.

How to make them work if notebooks are your only choice

If notebooks are the only option to deploy your code you should think about modularizing your code into components. This can be done in different approaches based on the experience level of the team or the point-in-time of development of a project. The most important part is to start in small steps and keep improving over time.

The first approach we can do is to modularize our current code into components. We can do this by splitting our long transformation into smaller functional steps and putting them into functions.

The next step is to create some tests for the functions. In the same cell as the function, we can create a test where we check certain assertions. When something fails, the rest of our notebook execution stops, and we know exactly where the problem exists. This will help to better debug our failures.

Unfortunately, putting the tests below our function definition, notebooks become very long, and not readable. Therefore a good way is to put those functions into their own notebook file. The functions can then be imported into the main notebook. If some tests fail while importing, the execution stops.

To further improve on that step, we can lend some better software practices as stated before. To do this is to put those functions into their own package. We create those packages with proper software tooling. We create a repository where the code is separated by a corresponding unit test suite. The code is then released via an automated deployment pipeline.We can then import the package into our notebook, and we know that the functions are properly tested. Then we call the functions in our main pipeline.

The last and most difficult way is to put all your transformation logic into a code package, usually a Python Wheel file. When you have arrived at this stage, you will no longer work in a notebook, but created an artifact that will be deployed as a Databricks Python wheel Task.

Another reason why you don't write tests.

Another reason why you don't write unit tests is that you don't possess the necessary skills.

Practice makes progress

It is a skill that must be learned, and how boring and unimportant it looks, learning to write good tests is an investment. If you think that no-code solutions will be the future, think again, because writing good code is here to stay. And to enable sustainable growth for your code, you must write proper tests. So not doing them is not an option.

Another good skill to learn is to know when to write tests. Achieving maximum value with minimum maintenance costs is the most difficult part of unit testing.

It doesn't stop with unit tests.

Unit testing is one part of the system, and to properly test the whole system, other tests must be included. integration tests, end2end tests, and quality tests are different processes that can be added to the deployment pipeline. These tests are as important as unit testing. The only thing that varies is the proportions. Because those tests require more code and are more complex to maintain, the less you need to have them.

How to use these different types of tests is a topic for another blog post.

Thanks for reading, and I hope you find this is helpful!