开发者

Optimizing Python Code for Database Access

开发者 https://www.devze.com 2023-01-17 08:03 出处:网络
I am building an application with objects which have their data stored in mysql tables (across multiple tables). When I need to work with the object (retrieve object attributes / change the attributes

I am building an application with objects which have their data stored in mysql tables (across multiple tables). When I need to work with the object (retrieve object attributes / change the attributes) I am querying the sql database using mysqldb (select / update). However, since the application is quite computation intensive, the execution time is killing me.

Wanted to understand if there are approaches where all of the data is loaded into python, the computations / modifications are done on those objects and then subsequently a full data update is done to the mysql database? Will loading the data initially into lists of those objects in one go from the database improve the performance? Also since the db size is close to around 25 mb, will it cause any memory problems.

Thanks i开发者_如何学Cn advance.


25Mb is tiny. Microscopic. SQL is slow. Glacial.

Do not waste time on SQL unless you have transactions (with locking and multiple users).

If you're doing "analysis", especially computationally-intensive analysis, load all the data into memory.

In the unlikely event that data doesn't fit into memory, then do this.

  1. Query data into flat files. This can be fast. It's fastest if you don't use Python, but use the database native tools to extract data into CSV or something small.

  2. Read flat files and do computations, writing flat files. This is really fast.

  3. Do bulk updates from the flat files. Again, this is fastest if you use database native toolset for insert or update.


If you didn't need SQL in the first place, consider the data as you originally received it and what you're going to do with it.

  1. Read the original file once, parse it, create your Python objects and pickle the entire list or dictionary. This means that each subsequent program can simply load the pickled file and start doing analysis. However. You can't easily update the pickled file. You have to create a new one. This is not a bad thing. It gives you complete processing history.

  2. Read the original file once, parse it, create your Python objects using shelve. This means you can update the file.

  3. Read the original file once, parse it, create your Python objects and save the entire list or dictionary as a JSON or YAML file. This means that each subsequent program can simply load the JSON (or YAML) file and start doing analysis. However. You can't easily update the file. You have to create a new one. This is not a bad thing. It gives you complete processing history.

    This will probably be slightly slower than pickling. And it will require that you write some helpers so that the JSON objects are dumped and loaded properly. However, you can read JSON (and YAML) giving you some advantages in working with the file.


Please check sqlalchemy, an Object Relational Mapper for Python.

sqlalchemy allows you to map database tables to Python objects. When you do this, all the operations can be done on the Python objects (once the data is loaded), and when you are done processing, you can update the database.

Assuming you have a baseline of-the-shelf computer, 25 MB is absolutely no big deal, you can cache the entire database into memory.

0

精彩评论

暂无评论...
验证码 换一张
取 消