I worked with a U.S.-based healthcare startup for 7 years. This startup developed a software product that sent appointment reminders to the patients of healthcare facilities; the reminders were sent via email, text, and IVR. The startup, still operational, has been modestly successful. It has raised capital and right now serves thousands of customers.

I penned a lot of the startup’s backend code, all of which was written in Python (2.x). Python proved to be a great choice due to its overall ecosystem. The community was helpful. Also, the support of built-in modules and third-party packages helped the startup move fast. Almost every time we had a feature to implement, either there was a built-in module that came to the rescue or we found a decent library that helped solve the issue.

Here are the 15 Python libraries that played a pivotal role in our quest to grow the startup:

The first thing we had to do was get the appointment data stored in healthcare facilities to our system for further processing. Most healthcare facilities agreed to send us appointment data, over a secure channel using SFTP (Secure File Transfer Protocol), in a CSV (Comma-Separated Values) file. To help them send us data using SFTP, we used Paramiko.

With Paramiko, we developed and deployed a small Python program on our customers’ machines. The program would send CSV files, containing appointment data, to our servers over SFTP. Paramiko took care of all the SFTP implementation details. The code was simple, and it worked out well for our purposes. The initial program was roughly 100 lines and worked with minimal maintenance for the first 4-5 years.

Project home: http://www.paramiko.org/

The built-in CSV module

A CSV file is supposed to be a simple file in which a row represents data and each field is separated by commas.

But, in actuality, CSV files are rarely simple. In real life, we deal with all kinds of odd-looking ones. I have seen CSVs with delimiters such as a tab, a space, and a colon. CSVs without a header are also quite common.

Things can get worse. Sometimes you are asked to parse a CSV from an important potential customer, who could increase your revenue manifold, in which the header starts on a random line below a random number of blank lines. Imagine doing this parsing under a tight deadline.

Since the healthcare startup I worked for received appointment data in CSVs from healthcare facilities, CSV parsing was at the heart of the system. All of the data had to be correctly retrieved from CSVs for further processing.

To process the incoming CSVs, we wrote the backend code on top of the built-in CSV module. The code evolved over time, with the built-in CSV module playing an important role.

This module provides many effective utilities. We relied on the flexibility that its reader object provides. This object offers the ability to configure various formatting parameters (e.g. delimiter, quotechar, doublequote, escapechar, quoting, lineterminator, etc.). Each medical facility was assigned appropriate formatting parameter values in the database. This, coupled with a reasonably-designed object-oriented hierarchy, of healthcare facilities, gave us the flexibility to handle differently-formatted CSVs with relative ease.

The code was sensitive. A single error could disrupt appointment processing for many medical facilities, affecting thousands of patients at a time. In case of an error, the information lost could not be backfilled easily, since an appointment gone would be gone unless re-processed by the facility. Thus, to avoid regressions, we covered our CSV-processing code with solid unit-test coverage. This effort paid off in terms of a low number of regressions popping up over time.

The startup had a web application as well, which was used by our customers on a daily basis. The legacy web application was written in PHP. However, we chose to convert it into a Python-based application. This led us to a decision point. Since the backend already used Python, we needed to use a common object-relational mapping (ORM) code for use both by the backend and web application, to avoid repetition.

For the ORM, we settled with SQLAlchemy. It did not disappoint us.

Using SQLAlchemy, the overall code remained maintainable and reusable because of the way we used mixins, classes, and magic methods. Each database table was represented by a model class that used mixins. The models also tactfully used magic methods and evolved with short methods, containing relevant business rules, over time. In addition, we had great unit test coverage around this code, so it scaled nicely. Great unit test coverage also made it easy to refactor the code when needed.

One caveat, though. In the initial days, some features were implemented after a cursory reading of the SQLAlchemy documentation. That was a mistake we should have avoided. We learned to always consult the documentation of a major system component in detail, so as to understand it in-depth, before releasing software. Little knowledge of such a core system component can delay the fixing of critical bugs as well as the release of important features. Such delays can prove costly for startups.

Project home: https://www.sqlalchemy.org/

I think I really don’t need to say anything nice about this library, because everyone already knows how nice Requests is. It’s really easy to use and results in readable Pythonic code.

On a side note, Kenneth Reitz, creator of Requests, is a great engineer; reading his code is always a treat. I really like his idea of Readme-driven development — i.e., writing the README file, filled with examples, before actually writing any code. I have used this technique and found it very useful.

Project home: https://2.python-requests.org/en/master/

Just like Requests, another very useful library. We had to parse XML-based appointment data for a few healthcare facilities and BeautifulSoup worked flawlessly. In many instances we also successfully used it to parse HTML documents.

Project home: https://www.crummy.com/software/BeautifulSoup/

Unit testing was always taken seriously. We had a decent amount of unit tests around our code. The backend code had a large suite of unit-tests and end-to-end tests. No amount of critical software code was shipped to production without extensive code coverage. This extensive code coverage helped maintain the cadence of the backend team as well.

The system had a lot of configurations and settings. For each feature, a lot of combinations of input data needed to be tested against expected outcomes. A test case for each of the input combinations would have resulted in a lot of redundant code.

So in order to write more compact, maintainable code we looked for other options that we could use on top of the built-in unittest module.

Eventually, we settled on using this cool library called TestScenarios. For each feature, all of the different input combinations and their corresponding expected outcomes were defined in scenarios. These scenarios would be then executed by test code. Essentially one piece of feature-specific test code executed all combinations one by one.

A major benefit of this approach over time, while maintaining the code, was that it scaled nicely in our case. Whenever a feature was tweaked, all an engineer had to do was to add another scenario (by giving input data, and expected output). That’s it. In some cases, no Python code needed to be added. Just add another scenario and we would be done.

Project home: https://pypi.org/project/testscenarios/

At one point, we started getting customers who wanted to send us appointment data in HL7 format. This library got us up to speed fairly quickly. It saved us time in that we didn’t have to write our own code to parse data from HL7 segments, as per the HL7 protocol.

The availability of this library and Python’s battery-included concept, in general, is an example of onboarding a customer quickly, even when the system does not contain the feature asked by the customer. This ability to onboard customers quickly sometimes proves to be useful for resource-constrained startups. In our case, for potential customers interested in sending us HL7 data, we were able to get a working feature up and running in a week or so using this library.

Project home: https://python-hl7.readthedocs.io/en/latest/

Since our software application in question was sending appointment reminders to patients via both SMS and IVR (in addition to email), we had to parse and validate the patients’ phone numbers before sending the SMS or IVR. As we received them in so many different formats in the appointment data, we wanted to be sure that we used the correct numbers. This library proved immensely helpful for that.

Project home: https://github.com/daviddrysdale/python-phonenumbers



We had background scripts that eventually sent the appointment reminders to patients. These scripts had to do a lot of things at one time. So in order to improve their performance we used Gevent.

This worked out pretty well and stood the test of time. However, we did come across a few issues. For caveats, please read this good blog post by Mixpanel Engineering.

Project home: http://www.gevent.org/

The backend code had to deal extensively with date-times and time zones, since the healthcare facilities were situated in different time zones. We kept using the built-in datetime module for the first couple of years, but then a team member suggested we give dateutil (for computing relative deltas, parsing of dates in any string format, time zone related issues, etc.) a try. We did and never looked back.

The backend code that dealt with date-times and time zone information was sensitive. Any error or small miscalculation could result in chaos (no one wants to receive a 9 a.m. appointment reminder at 3 a.m.). Therefore, we covered this code extensively with unit-tests as well. Overall, we were happy we chose these libraries.

Project home: https://dateutil.readthedocs.io/en/stable/

We had to generate visualizations, for reporting purposes, for various backend projects. The obvious choice was Matplotlib, and it proved to be a great one, because we were able to generate different kinds of visualizations without difficulty in no time.

Project home: https://matplotlib.org/

The backend had to deal abundantly with various kinds of files. At one point, we wanted to be able to identify file types before processing them. To do that we used this library, making this task a breeze.

Project home: https://github.com/ahupp/python-magic

As mentioned earlier, the web application needed to be converted to Python. Although multiple web frameworks were considered, the business stakeholders, CTO, and engineers were more inclined to use Django. Over time, Django proved to be a useful choice.

A variety of features proved to be great assets and time-savers. Since we had chosen SQLAlchemy as our ORM, we kept using it with Django without any problems. Initially, we used a lot of jQuery, but eventually, the front-end was changed to use AngularJS. Eventually, we also moved to RESTful architecture. One benefit of that move was that we were able to expose various endpoints for internal and external use, thus widening the scope of applications that could benefit from common code.

The Django code was also covered with a lot of unit test code. Moreover, using Selenium we also created tests to validate functionality on various browsers. Further, a decent set of end-to-end tests also helped test various major system-wide use cases. This comprehensive test coverage (consisting of unit tests, cross browser tests and end-to-end tests) helped engineers to ship with confidence and maintain the cadence.

A vast number of features in Django and the plethora of available Django plugins made the job really easy in terms of maintenance and releasing often.

Project home: https://www.djangoproject.com/

The backend code used various Amazon Web Services (AWS). Life would not have been easy without Boto.

Project home: https://github.com/boto/boto

Mailgun and Twilio’s Python bindings

Apart from all the above libraries, the Python’s bindings for Mailgun (for email) and Twilio’s (for SMS and IVR) deserve an honorable mention, because they helped us send appointment reminders seamlessly.

