AES encryption of files in Python with PyCrypto
2013-07-04 17:34
211 查看
转自:http://eli.thegreenplace.net/2010/06/25/aes-encryption-of-files-in-python-with-pycrypto/
The PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a
Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on
this page.
My only gripe with PyCrypto is its documentation. The auto-generated
API doc is next to useless, and
this overview is somewhat dated and didn’t address the questions I had about the module. It isn’t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.
In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.
Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.
Having keys of exact length isn’t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from
hashlib to generate a 32-byte key from it. Just replace the assignment to
key in the code above with:
Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (kitty probably won’t do). What’s useful about this technique is that you don’t
have to worry about manually padding your password – SHA-256 will scramble a 32-byte block out of any password for you.
The next thing the code does is set the
block mode of AES. I won’t get into all the details, but unless you have some special requirements, CBC should be good enough for you.
We create a new AES encryptor object with Crypto.Cipher.AES.new, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the
encrypt method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).
The encryptor object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again – you will get different results. So be careful
to create a fresh AES encryptor object for any encryption/decryption job.
And you get your plaintext back again.
initialization vector (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I’ve ignored the IV, but for a more serious application this is a grave mistake. I don’t want to get too deep
into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the
WEP encryption for wireless LAN.
PyCrypto allows one to pass an IV into the
AES.new creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won’t help the attacker
crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.
Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first
sizeof(long
long) bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces,
if required.
Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!
Decrypting the file can be done with:
First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it’s truncated to the original size, so the padding is
thrown out.
The PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a
Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on
this page.
My only gripe with PyCrypto is its documentation. The auto-generated
API doc is next to useless, and
this overview is somewhat dated and didn’t address the questions I had about the module. It isn’t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.
In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.
Simple AES encryption
Here’s how one can encrypt a string with AES:from Crypto.Cipher import AES key = '0123456789abcdef' mode = AES.MODE_CBC encryptor = AES.new(key, mode) text = 'j' * 64 + 'i' * 128 ciphertext = encryptor.encrypt(text)
Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.
Having keys of exact length isn’t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from
hashlib to generate a 32-byte key from it. Just replace the assignment to
key in the code above with:
import hashlib password = 'kitty' key = hashlib.sha256(password).digest()
Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (kitty probably won’t do). What’s useful about this technique is that you don’t
have to worry about manually padding your password – SHA-256 will scramble a 32-byte block out of any password for you.
The next thing the code does is set the
block mode of AES. I won’t get into all the details, but unless you have some special requirements, CBC should be good enough for you.
We create a new AES encryptor object with Crypto.Cipher.AES.new, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the
encrypt method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).
The encryptor object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again – you will get different results. So be careful
to create a fresh AES encryptor object for any encryption/decryption job.
Decryption
To decrypt the ciphertext, simply add:decryptor = AES.new(key, mode) plain = decryptor.decrypt(ciphertext)
And you get your plaintext back again.
A word about the initialization vector
Theinitialization vector (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I’ve ignored the IV, but for a more serious application this is a grave mistake. I don’t want to get too deep
into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the
WEP encryption for wireless LAN.
PyCrypto allows one to pass an IV into the
AES.new creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won’t help the attacker
crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.
Encrypting and decrypting files
The following function encrypts a file of any size. It makes sure to pad the file to a multiple of the AES block length , and also handles the random generation of IV.import os, random, struct from Crypto.Cipher import AES def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024): """ Encrypts a file using AES (CBC mode) with the given key. key: The encryption key - a string that must be either 16, 24 or 32 bytes long. Longer keys are more secure. in_filename: Name of the input file out_filename: If None, '<in_filename>.enc' will be used. chunksize: Sets the size of the chunk which the function uses to read and encrypt the file. Larger chunk sizes can be faster for some files and machines. chunksize must be divisible by 16. """ if not out_filename: out_filename = in_filename + '.enc' iv = ''.join(chr(random.randint(0, 0xFF)) for i in range(16)) encryptor = AES.new(key, AES.MODE_CBC, iv) filesize = os.path.getsize(in_filename) with open(in_filename, 'rb') as infile: with open(out_filename, 'wb') as outfile: outfile.write(struct.pack('<Q', filesize)) outfile.write(iv) while True: chunk = infile.read(chunksize) if len(chunk) == 0: break elif len(chunk) % 16 != 0: chunk += ' ' * (16 - len(chunk) % 16) outfile.write(encryptor.encrypt(chunk))
Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first
sizeof(long
long) bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces,
if required.
Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!
Decrypting the file can be done with:
def decrypt_file(key, in_filename, out_filename=None, chunksize=24*1024): """ Decrypts a file using AES (CBC mode) with the given key. Parameters are similar to encrypt_file, with one difference: out_filename, if not supplied will be in_filename without its last extension (i.e. if in_filename is 'aaa.zip.enc' then out_filename will be 'aaa.zip') """ if not out_filename: out_filename = os.path.splitext(in_filename)[0] with open(in_filename, 'rb') as infile: origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0] iv = infile.read(16) decryptor = AES.new(key, AES.MODE_CBC, iv) with open(out_filename, 'wb') as outfile: while True: chunk = infile.read(chunksize) if len(chunk) == 0: break outfile.write(decryptor.decrypt(chunk)) outfile.truncate(origsize)
First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it’s truncated to the original size, so the padding is
thrown out.
相关文章推荐
- AES encryption of files (and strings) in java with randomization of IV (initialization vector)
- An example usage of an encryption algorithm (AES, in this case) is:
- Working with Excel Files in Python
- How to craw the Info of BiliBIli with python in Parallel
- HousePrice of big city with python crawl in lianjia
- The best way to deal with large files in Python style
- compare of the CBC and CTR mode in PyCrypto AES
- detect the encoding of files in Python
- Working with Files(Chapter 16 of Programming in Objective-C 2.0)
- How to create a project with existing folder of files in Visual Studio?
- Face Recognition with Python, in Under 25 Lines of Code
- 基于POSIX在Linux下C语言实现读取整个目录文件[Get a List of Files in a Directory with C]
- Three ways of dealing with Alert in automation(Python+Webdriver)
- HTTP Status 500 - The absolute uri: http://java.sun.com/jsp/jstl/core cannot be resolved in either web.xml or the jar files deployed with this application
- Using #region Directive With JavaScript Files in Visual Studio 【转载】
- 警告:Initalizing 'AppDelegate * __strong' with an expression of incompatible type ''id<UIApplication..
- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-wttdx0g1/secp256k1/
- [emerg] the size 10485760 of shared memory zone “tmpcache” conflicts with already declared size 0 in
- Comparison of a float with a value in C
- Q & A category in Prism forums, with some answers and samples of mine.