如何将一维数据或其他非图像数据转换成lmdb
发布网友
发布时间:2022-04-25 12:57
我来回答
共1个回答
热心网友
时间:2022-04-08 20:37
caffe事儿真多,数据必须得lmdb或者leveldb什么的才行,如果数据是图片的话,那用caffe自带的
convert_image.cpp就行,但如果不是图片,就得自己写程序了。我也不是计算机专业的,我哪看得懂源码,遂奋发而百度之,然无甚结果,遂
google之,尝闻“内事不决问百度,外事不决问google”,古人诚不我欺。在caffe的google
group里我找到了这个网址:http://deepdish.io/2015/04/28/creating-lmdb-in-python/
代码如下:
import numpy as np
import lmdb
import caffe
N = 1000
# Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64)
# We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10
env = lmdb.open('mylmdb', map_size=map_size)
with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i)
# The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())
这是用python将数据转为lmdb的代码,但是我用这个处理完数据再使用caffe会出现std::bad_alloc错误,后来经过艰苦地奋斗,查阅了大量资料,我发现了问题所在:
1.caffe的数据格式默认为四维(n_samples, n_channels, height, width) .所以必须把我的数据处理成这种格式
2.最后一行txn.put(str_id.encode('ascii'), datum.SerializeToString())一定要加上,我一开始一维python2不用写这个,结果老是出错,后来才发现这行必须写!
3.如果出现mdb_put: MDB_MAP_FULL: Environment mapsize limit reached的
错误,是因为lmdb默认的map_size比较小,我把lmdb/cffi.py里面的map_size默认值改了一下,改成了
1099511627776(也就是1Tb),我也不知道是不是这么改,然后我又把上面python程序里map_size = X.nbytes
这句改成了map_size = X.nbytes * 10,然后就成功了!