使用Python在内存中生成zip文件

最近需要使用 Django 动态地将一些文件打包压缩并提供下载,搜索了一下,发现一个不错的使用 Python 在内存中生成 zip 文件的方案,用到了 zipfile 以及 StringIO 模块 。

主要代码如下(我做了一点小修改):

# -*- coding: utf-8 -*-

u"""
使用 Python 在内存中生成 zip 文件
@see http://www.kompato.com/in-memory-zip-in-python
"""

import os
import zipfile
import StringIO

class InMemoryZip(object):

	def __init__(self):
		# Create the in-memory file-like object
		self.in_memory_zip = StringIO.StringIO()


	def appendFile(self, file_path, file_name=None):
		u"从本地磁盘读取文件,并将其添加到压缩文件中"

		if file_name is None:
			p, fn = os.path.split(file_path)
		else:
			fn = file_name

		c = open(file_path, "rb").read()
		self.append(fn, c)

		return self


	def append(self, filename_in_zip, file_contents):
		"""Appends a file with name filename_in_zip and contents of
				  file_contents to the in-memory zip."""

		# Get a handle to the in-memory zip in append mode
		zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)

		# Write the file to the in-memory zip
		zf.writestr(filename_in_zip, file_contents)

		# Mark the files as having been created on Windows so that
		# Unix permissions are not inferred as 0000
		for zfile in zf.filelist:
			zfile.create_system = 0

		return self


	def read(self):
		"""Returns a string with the contents of the in-memory zip."""

		self.in_memory_zip.seek(0)

		return self.in_memory_zip.read()


	def writetofile(self, filename):
		"""Writes the in-memory zip to a file."""

		f = file(filename, "wb")
		f.write(self.read())
		f.close()


if __name__ == "__main__":
	# Run a test
	imz = InMemoryZip()
	imz.append("test.txt", "Another test").append("test2.txt", "Still another")
	imz.writetofile("test.zip")

我只改了两处:
1、添加了 appendFile 方法;
2、把 writetofile 方法中的文件打开模式由“w”改为了“wb”,因为测试发现在 Windows 系统下,“w”可能会导致最终保存的文件不完整。

在 Django 中的用法很简单,一个简单的示例如下:

# ...
def downloadFiles(request):

	fns = ("/tmp/1.log", "/tmp/2.log", "/tmp/3.log",)
	imz = InMemoryZip()
	for fn in fns:
		imz.appendFile(fn)

	data = img.read()

	response = HttpResponse(mimetype="application/octet-stream")
	response["Content-Disposition"] = "attachment; log.zip"
	response["Content-Length"] = len(data)
	response.write(data)

	return response

这个方法虽然很方便,不过很耗资源,我试着用它在 Django 里压缩一个 1.4G 的文本文件,差不多用了 8 分钟,期间 CPU 使用率一直是 100%,所以,如果要压缩的是大文件,或者压缩任务比较频繁,可能需要认真处理一下性能问题。

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s