Skip to content

Commit

Permalink
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
Browse files Browse the repository at this point in the history
… develop
  • Loading branch information
zchen0211 committed Nov 1, 2017
2 parents a0acfc6 + 8401039 commit 4e22802
Show file tree
Hide file tree
Showing 39 changed files with 1,812 additions and 327 deletions.
99 changes: 99 additions & 0 deletions doc/howto/cross_compiling/cross_compiling_for_ios_cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# 构建iOS平台上的PaddlePaddle库
交叉编译iOS平台上适用的PaddlePaddle库,需要在MacOS系统上进行。本文的将介绍在MacOS上,从源码交叉编译iOS平台上适用的PaddlePaddle库。

## 准备交叉编译环境
Apple官方为iOS开发提供了完整的交叉编译工具和集成开发环境,用户从App Store下载安装Xcode即可。也可自行前往官网下载,[Xcode](https://developer.apple.com/cn/xcode/)。安装完成之后,可在命令行执行`xcodebuild -version`,判断是否安装成功。

```bash
$ xcodebuild -version
Xcode 9.0
Build version 9A235
```

## 配置交叉编译参数

PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/ios.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/ios.cmake),以提供一些默认的编译器和编译参数配置。

交叉编译iOS版本的PaddlePaddle库时,有一些必须配置的参数:

- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`iOS`。在设置`CMAKE_SYSTEM_NAME=iOS`后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(`WITH_C_API=ON``WITH_GPU=OFF``WITH_AVX=OFF``WITH_PYTHON=OFF``WITH_RDMA=OFF`)。
- `WITH_C_API`,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。
- `WITH_SWIG_PY`,必须设置为ON。在iOS平台上不支持通过swig调用来训练或者预测。

iOS平台可选配置参数:

- `IOS_PLATFORM`,可设置为`OS/SIMULATOR`,默认值为`OS`
- `OS`,构建目标为`arm`架构的iPhone或者iPad等物理设备。
- `SIMULATOR`,构建目标为`x86`架构的模拟器平台。
- `IOS_ARCH`,目标架构。针对不同的`IOS_PLATFORM`,可设置的目标架构如下表所示:

| IOS_PLATFORM | IOS_ARCH |
|--------------|----------------------|
| OS | armv7, armv7s, arm64 (默认) |
| SIMULATOR | i386, x86_64 (默认) |

- `IOS_DEPLOYMENT_TARGET`,最小的iOS部署版本,默认值为`7.0`
- `IOS_ENABLE_BITCODE`,是否使能[Bitcode](https://developer.apple.com/library/content/documentation/IDEs/Conceptual/AppDistributionGuide/AppThinning/AppThinning.html#//apple_ref/doc/uid/TP40012582-CH35-SW3),可设置`ON/OFF`,默认值为`ON`
- `IOS_USE_VECLIB_FOR_BLAS`,是否使用[vecLib](https://developer.apple.com/documentation/accelerate/veclib)框架进行BLAS矩阵计算,可设置`ON/OFF`,默认值为`OFF`
- `IOS_DEVELOPMENT_ROOT``Developer`目录,可显式指定为`/path/to/platform/Developer`。若未显式指定,PaddlePaddle将会根据`IOS_PLATFORM`自动选择`Xcode`对应`platform``Developer`目录。
- `IOS_SDK_ROOT`,所使用`SDK`的根目录,可显式指定为`/path/to/platform/Developer/SDKs/SDK`。若未显式指定,PaddlePaddle将会自动选择`IOS_DEVELOPMENT_ROOT`目录下最新的`SDK`版本。

其他配置参数:

- `USE_EIGEN_FOR_BLAS`,是否使用Eigen库进行矩阵计算,在`IOS_USE_VECLIB_FOR_BLAS=OFF`时有效。可设置`ON/OFF`,默认值为`OFF`
- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。默认值为环境变量`CC/CXX`的值;若环境变量`CC/CXX`未设置,则使用`cc/c++`编译器。

常用的cmake配置如下:

```bash
cmake -DCMAKE_SYSTEM_NAME=iOS \
-DIOS_PLATFORM=OS \
-DIOS_ARCH="arm64" \
-DIOS_ENABLE_BITCODE=ON \
-DIOS_USE_VECLIB_FOR_BLAS=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_TESTING=OFF \
-DWITH_SWIG_PY=OFF \
..
```

```bash
cmake -DCMAKE_SYSTEM_NAME=iOS \
-DIOS_PLATFORM=SIMULATOR \
-DIOS_ARCH="x86_64" \
-DIOS_USE_VECLIB_FOR_BLAS=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_TESTING=OFF \
-DWITH_SWIG_PY=OFF \
..
```

用户还可根据自己的需求设置其他编译参数。比如希望最小化生成库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望得到最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS`来影响PaddlePaddle的编译过程。

**性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议:

- 设置`CMAKE_BUILD_TYPE``Release`
- 设置`IOS_USE_VECLIB_FOR_BLAS=ON`,调用`vecLib`框架提供的BLAS函数进行矩阵计算。

## 编译和安装

CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。

```
$ make
$ make install
```

注意:如果你曾在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。

执行完安装命令后,`your/path/to/install`目录中会包含以下内容:

- `include`目录,其中包含所有C-API的头文件
- `lib`目录,其中包含PaddlePaddle的C-API静态库
- `third_party`目录,其中包含所依赖的所有第三方库

注意,不同架构的PaddlePaddle库建议安装到不同的目录下,然后使用`lipo`工具将多个静态库合并成一个支持多个架构的fat库。

自此,PaddlePaddle库已经安装完成,用户可将合成的fat库用于深度学习相关的iOS App中,调用方法见C-API文档。
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,4 @@ make install

注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。

执行完安装命令后,`your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。
执行完安装命令后,`your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ cmake -DCMAKE_SYSTEM_NAME=RPi \
..
```

To build the inference library, please set the argument WITH_API to ON: `WITH_C_API=ON`.
To build the inference library, please set the argument WITH\_C\_API to ON: `WITH_C_API=ON`.

You can add more arguments. For example, to minimize the size of the generated inference library, you may use `CMAKE_BUILD_TYPE=MinSizeRel`. For performance optimization, you may use `CMAKE_BUILD_TYPE=Release`.

Expand Down
8 changes: 4 additions & 4 deletions paddle/framework/lod_tensor_test.cu
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ TEST(LoDTensor, LoDInGPU) {
lod_tensor.mutable_data<float>(place);

lod_tensor.set_lod(src_lod);
CHECK_EQ(lod_tensor.lod_element(0, 2).first, 4UL);
CHECK_EQ(lod_tensor.lod_element(0, 4).first, 8UL);
EXPECT_EQ(lod_tensor.lod_element(0, 2).first, 4UL);
EXPECT_EQ(lod_tensor.lod_element(0, 4).first, 8UL);

auto lod = lod_tensor.lod();

test<<<1, 8>>>(lod[0].data(), lod[0].size());
cudaDeviceSynchronize();

for (size_t i = 0; i < src_lod[0].size(); ++i) {
CHECK_EQ(lod[0].data()[i], src_lod[0].data()[i] * 2);
EXPECT_EQ(lod[0].data()[i], src_lod[0].data()[i] * 2);
}
}
}
16 changes: 8 additions & 8 deletions paddle/framework/operator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -37,32 +37,32 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const {
std::string OperatorBase::Input(const std::string& name) const {
auto& ins = Inputs(name);
PADDLE_ENFORCE_LE(ins.size(), 1UL,
"Op %s input %s should contain only one variable", type_,
name);
"Operator %s's input %s should contain only one variable.",
type_, name);
return ins.empty() ? kEmptyVarName : ins[0];
}

const std::vector<std::string>& OperatorBase::Inputs(
const std::string& name) const {
auto it = inputs_.find(name);
PADDLE_ENFORCE(it != inputs_.end(), "Op %s do not have input %s", type_,
name);
PADDLE_ENFORCE(it != inputs_.end(), "Operator %s does not have the input %s.",
type_, name);
return it->second;
}

std::string OperatorBase::Output(const std::string& name) const {
auto& outs = Outputs(name);
PADDLE_ENFORCE_LE(outs.size(), 1UL,
"Op %s output %s should contain only one variable", type_,
name);
"Operator %s's output %s should contain only one variable.",
type_, name);
return outs.empty() ? kEmptyVarName : outs[0];
}

const std::vector<std::string>& OperatorBase::Outputs(
const std::string& name) const {
auto it = outputs_.find(name);
PADDLE_ENFORCE(it != outputs_.end(), "Op %s does not have output called %s",
type_, name);
PADDLE_ENFORCE(it != outputs_.end(),
"Operator %s does not have an output called %s.", type_, name);
return it->second;
}

Expand Down
3 changes: 2 additions & 1 deletion paddle/framework/operator.h
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,8 @@ class OperatorWithKernel : public OperatorBase {
int tmp = static_cast<int>(ToDataType(t->type()));
VLOG(3) << "Input " << ipt_name << " with data_type " << tmp;
PADDLE_ENFORCE(tmp == data_type || data_type == -1,
"DataType of Paddle Op %s must be same.", Type());
"DataType of Paddle Op %s must be the same.",
Type());
data_type = tmp;
}
}
Expand Down
8 changes: 5 additions & 3 deletions paddle/framework/tensor.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,12 @@ class Tensor {
const platform::DeviceContext& ctx);

/**
* @brief Return the slice of the tensor.
* @brief Return a sub-tensor of the given tensor.
*
* @param[in] begin_idx The begin index of the slice.
* @param[in] end_idx The end index of the slice.
* @param[in] begin_idx The index of the start row(inclusive) to slice.
* The index number begins from 0.
* @param[in] end_idx The index of the end row(exclusive) to slice.
* The index number begins from 0.
*/
inline Tensor Slice(const int& begin_idx, const int& end_idx) const;

Expand Down
17 changes: 10 additions & 7 deletions paddle/framework/tensor_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,10 @@ inline void* Tensor::mutable_data(platform::Place place, std::type_index type) {
if (holder_ != nullptr) {
holder_->set_type(type);
}
PADDLE_ENFORCE_GT(numel(), 0,
"Tensor's numel must be larger than zero to call "
"Tensor::mutable_data. Call Tensor::set_dim first.");
PADDLE_ENFORCE_GT(
numel(), 0,
"When calling this method, the Tensor's numel must be larger than zero. "
"Please check Tensor::Resize has been called first.");
int64_t size = numel() * SizeOfType(type);
/* some versions of boost::variant don't have operator!= */
if (holder_ == nullptr || !(holder_->place() == place) ||
Expand Down Expand Up @@ -229,10 +230,12 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src,

inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
check_memory_size();
PADDLE_ENFORCE_GE(begin_idx, 0, "Slice begin index is less than zero.");
PADDLE_ENFORCE_LE(end_idx, dims_[0], "Slice end index is out of bound.");
PADDLE_ENFORCE_LT(begin_idx, end_idx,
"Begin index must be less than end index.");
PADDLE_ENFORCE_GE(begin_idx, 0,
"The start row index must be greater than 0.");
PADDLE_ENFORCE_LE(end_idx, dims_[0], "The end row index is out of bound.");
PADDLE_ENFORCE_LT(
begin_idx, end_idx,
"The start row index must be lesser than the end row index.");

if (dims_[0] == 1) {
return *this;
Expand Down
6 changes: 4 additions & 2 deletions paddle/gserver/layers/CRFLayer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,10 @@ void CRFLayer::backward(const UpdateCallback& callback) {
: real(1.0f);
instanceWeight *= coeff_;

MatrixPtr grad = output.grad->subRowMatrix(starts[i], starts[i + 1]);
grad->add(*crfs_[i].getXGrad(), real(1.0f), instanceWeight);
if (output.grad) {
MatrixPtr grad = output.grad->subRowMatrix(starts[i], starts[i + 1]);
grad->add(*crfs_[i].getXGrad(), real(1.0f), instanceWeight);
}
if (needWGrad) {
weight_->getWGrad()->add(
*crfs_[i].getWGrad(), real(1.0f), instanceWeight);
Expand Down
1 change: 0 additions & 1 deletion paddle/gserver/layers/LinearChainCRF.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,6 @@ real LinearChainCRF::forward(real* x, int* s, int length) {
}

void LinearChainCRF::backward(real* x, int* s, int length, bool needWGrad) {
MatrixPtr matX = Matrix::create(x, length, numClasses_);
Matrix::resizeOrCreate(matGrad_, length, numClasses_);
Matrix::resizeOrCreate(beta_, length, numClasses_);
real* b = b_->getData();
Expand Down
20 changes: 16 additions & 4 deletions paddle/gserver/layers/SequenceReshapeLayer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,23 @@ void SequenceReshapeLayer::forward(PassType passType) {
size_t outDim = getSize();

size_t numSequences = input.getNumSequences();
auto startPositions = input.sequenceStartPositions->getVector(false);
const int* starts = startPositions->getData();

CHECK_EQ(starts[numSequences], input.getBatchSize());
CHECK_EQ(numSequences, startPositions->getSize() - 1);
// by default, we assume each instance as a sequence
IVectorPtr seqStarts;
IVector::resizeOrCreate(seqStarts, input.getBatchSize() + 1, false);
int* startsData = seqStarts->getData();
for (int i = 0; i < input.getBatchSize() + 1; i++) {
startsData[i] = i;
}
const int* starts = startsData;

// if there is sequence, then use start positions
if (input.sequenceStartPositions) {
auto startPositions = input.sequenceStartPositions->getVector(false);
starts = startPositions->getData();
CHECK_EQ(starts[numSequences], input.getBatchSize());
CHECK_EQ(numSequences, startPositions->getSize() - 1);
}

for (size_t seqID = 0; seqID < numSequences; seqID++) {
size_t inNumIns = starts[seqID + 1] - starts[seqID];
Expand Down
11 changes: 10 additions & 1 deletion paddle/memory/detail/system_allocator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,16 @@ void* CPUAllocator::Alloc(size_t& index, size_t size) {

index = 0; // unlock memory

void* p = malloc(size);
void* p;

#ifdef PADDLE_USE_MKLDNN
// refer to https://github.com/01org/mkl-dnn/blob/master/include/mkldnn.hpp
// memory alignment
PADDLE_ENFORCE_EQ(posix_memalign(&p, 4096ul, size), 0);
#else
PADDLE_ENFORCE_EQ(posix_memalign(&p, 32ul, size), 0);
#endif
PADDLE_ENFORCE(p, "Fail to allocate CPU memory: size = %d .", size);

if (p != nullptr) {
if (FLAGS_use_pinned_memory) {
Expand Down
12 changes: 7 additions & 5 deletions paddle/operators/cross_entropy_op.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@ class CrossEntropyOp : public framework::OperatorWithKernel {

auto x_dims = ctx->GetInputDim("X");
auto label_dims = ctx->GetInputDim("Label");
PADDLE_ENFORCE_EQ(x_dims.size(), 2, "Input(X)'s rank should be 2.");
PADDLE_ENFORCE_EQ(label_dims.size(), 2, "Input(Label)'s rank should be 2.");
PADDLE_ENFORCE_EQ(x_dims.size(), 2UL, "Input(X)'s rank should be 2.");
PADDLE_ENFORCE_EQ(label_dims.size(), 2UL,
"Input(Label)'s rank should be 2.");
PADDLE_ENFORCE_EQ(x_dims[0], label_dims[0],
"The 1st dimension of Input(X) and Input(Label) should "
"be equal.");
Expand All @@ -38,8 +39,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
"If Attr(soft_label) == true, the 2nd dimension of "
"Input(X) and Input(Label) should be equal.");
} else {
PADDLE_ENFORCE_EQ(label_dims[1], 1,
"If Attr(soft_label) == false, the 2nd dimension of "
PADDLE_ENFORCE_EQ(label_dims[1], 1UL,
"If Attr(softLabel) == false, the 2nd dimension of "
"Input(Label) should be 1.");
}

Expand All @@ -48,7 +49,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
}

protected:
// CrossEntropy's data type just determined by "X"
// Explicitly set that data type of the output of the cross_entropy operator
// is determined by its input "X".
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("X")->type());
Expand Down
Loading

0 comments on commit 4e22802

Please sign in to comment.