Speed tests for writing to MSSQL from pandas (ODBC)¶

In this notebook, we'll be playing with various methods of writing data to a Microsoft SQL Server database. Compared to PostgreSQL, which I was using previously, the connection is approximately an order of magnitude slower. This is apparently a known issue. It boggles the mind why, and I might not be searching deep enough to understand the reasons for this, but it's something I can't work around.

Trying to transmit larger volumes of data can also result in errors, as I get memory and connection timeout errors I typically was not having previously.

According to some research, this is due to the ODBC Driver. Whether this is true or not, I do not know. The naive implementation for the .to_sql() method from the Pandas docs is:

df.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, method=None, )

There are various ways known to speed up the process:

Use the method='multi' parameter for the to_sql() method call Add the execute_fastmany=True parameter to the SQLAlchemy engine Use both methods 1 and 2 Use sqlalchemy_turbodbc instead of vanilla SQLAlchemy Use pymssql instead of SQLAlchemy

But so far none have been benchmarked for comparison. In this article, we will be juxtaposing these methods to find the best performance in order to write data from a pandas DataFrame to Microsoft SQL Server.

Failed implementations¶

I also tried the following methods, but there was some issue or reason behind not including them in the list. This includes: