Efficient Memory/Disk Data Containers With Python

By Francesc Alted

Many programming paradigms are reaching us nowadays bringing the promise of being faster by leveraging more cores and more machines (and more system administration headaches, but this is rarely stated). Reality is that many times these paradigms do not take in account the increasing mismatch between memory speed and CPUs (see http://www.blosc.org/docs/StarvingCPUs-CISE-2010.pdf), and this is becoming utterly critical so as to get maximum performance out of your data handling applications.

During my tutorial, I will introduce different data containers for handling different kind of data and will propose experimenting with them while explaining why some adapts better to the task at hand. I will start with a quick introduction for Python data containers (lists, dicts, arrays…), continuing with well-known in-memory NumPy and Pandas containers as well as on-disk HDF5/PyTables and ending with bcolz (https://github.com/Blosc/bcolz), a novel way to store and quickly retrieve data which uses chunking and compression techniques so as to leverage the memory hierarchy of modern computer architectures.

People attending will need a working Python setup with IPython notebook, NumPy, pandas, PyTables and bcolz installed. Anaconda or Enthought Canopy distributions are recommended, but any other means of installing (e.g. pip) will do.

in on Wednesday 22 July at 11:00 See schedule

Comments

It's listed as talk but the text says "tutorial" ????
— Oliver Thieke, 12 May 2015
Yes, that should be a tutorial, not sure why this is labeled as 'talk'. I opened a ticket about this already, but will ping again.
— Francesc Alted, 18 May 2015

New comment

Comment

Name

Email address

URL

Captcha