如何将一维数据或其他非图像数据转换成lmdb

发布网友发布时间：2022-04-25 12:57

共1个回答

热心网友时间：2022-04-08 20:37

caffe事儿真多，数据必须得lmdb或者leveldb什么的才行，如果数据是图片的话，那用caffe自带的
convert_image.cpp就行，但如果不是图片，就得自己写程序了。我也不是计算机专业的，我哪看得懂源码，遂奋发而百度之，然无甚结果，遂
google之，尝闻“内事不决问百度，外事不决问google”，古人诚不我欺。在caffe的google
group里我找到了这个网址：http://deepdish.io/2015/04/28/creating-lmdb-in-python/
代码如下：

import numpy as np
import lmdb
import caffe

N = 1000

# Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64)

# We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10

env = lmdb.open('mylmdb', map_size=map_size)

with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i)

# The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())

　　这是用python将数据转为lmdb的代码，但是我用这个处理完数据再使用caffe会出现std::bad_alloc错误，后来经过艰苦地奋斗，查阅了大量资料，我发现了问题所在：

　　1.caffe的数据格式默认为四维(n_samples, n_channels, height, width) .所以必须把我的数据处理成这种格式

　　2.最后一行txn.put(str_id.encode('ascii'), datum.SerializeToString())一定要加上，我一开始一维python2不用写这个，结果老是出错，后来才发现这行必须写！

　　3.如果出现mdb_put: MDB_MAP_FULL: Environment mapsize limit reached的
错误，是因为lmdb默认的map_size比较小，我把lmdb/cffi.py里面的map_size默认值改了一下，改成了
1099511627776（也就是1Tb），我也不知道是不是这么改，然后我又把上面python程序里map_size = X.nbytes
这句改成了map_size = X.nbytes * 10，然后就成功了！