[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Deprecating C codegen facility of Treelite + pivoting the project scope #438

Closed
hcho3 opened this issue Feb 8, 2023 · 11 comments

Comments

@hcho3
Copy link
Collaborator
hcho3 commented Feb 8, 2023

Since its first inception in November 2017, Treelite project has undergone many changes. So has the machine learning ecosystem.
Over the last few years, many prediction runtimes have emerged that can run prediction efficiently with decision tree ensembles, such as the FIL backend for Triton and the ONNX runtime. In many use cases, the runtimes exhibit superior performance compared to the C code generated by Treelite. The runtimes also offer better user experience, as users no longer need to invoke the C compiler to convert the generated C code into binaries, an often time-consuming step.

In addition, I no longer have much bandwidth to maintain or improve the C codegen facility.

Consequently, I propose paring down the scope of the Treelite project to what it currently does its very best: a universal, lightweight exchange format for all tree models.

Proposal:

  • The C codegen facility will be deprecated and eventually be removed. Removing the C codegen will reduce the size of Treelite code base significantly and make it easier for future contributors to maintain it.
  • The Treelite runtime will be phased out, since Treelite would no longer offer a dedicated runtime for compiled binaries.
  • Treelite will offer the model parsers (loaders), as well as GTIL (General Tree Inference Library). GTIL will continue to serve as a reference implementation for tree prediction.
  • Downstream libraries will continue to use the Treelite exchange format to store and transmit tree ensemble models.

Alternative Solution. Move the C codegen facility to a separate project. For this scenario, I'm keen to pass the ownership of the new project to someone else, as I don't have time to work on the codegen.

Comments will be highly appreciated.

cc @wphicks

@hcho3 hcho3 changed the title [Proposal] Deprecating C codegen facility of Treelite [Proposal] Deprecating C codegen facility of Treelite + pivoting the project scope Feb 8, 2023
@hcho3 hcho3 pinned this issue Feb 8, 2023
@hcho3
Copy link
Collaborator Author
hcho3 commented Feb 8, 2023

@edwardwliu Your example notebook currently uses export_lib function, which this proposal proposes to deprecate. The notebook can be updated straightforwardly to use the treelite.gtil module instead. Besides, the features you requested (predict_leaf and predict_bytree) will be added to treelite.gtil, not export_lib. (I'm working on these features this month; stay tuned) I am more than happy to assist you further in migrating your use case.

@hcho3
Copy link
Collaborator Author
hcho3 commented Feb 17, 2023

@getumen @dovahcrow @Earthson Pinging you here, since you create Go and Rust bindings for Treelite. If you'd like to take ownership of the C codegen part of Treelite, let me know. (See Alternative Solution.)

@hcho3
Copy link
Collaborator Author
hcho3 commented Mar 28, 2023

Given the lack of explicit objection, I will proceed with the proposal. See #465

@hcho3 hcho3 closed this as completed Mar 28, 2023
@hcho3 hcho3 unpinned this issue Mar 28, 2023
@dovahcrow
Copy link
Contributor

Given the lack of explicit objection, I will proceed with the proposal. See #465

Sure, go ahead. Unfortunately, I don't have the bandwidth right now to review changes and upgrade the Rust lib. However, I would happily migrate to the new lib / API later once I have time.

@getumen
Copy link
getumen commented May 10, 2023

@hcho3 Thank you for your notification.
I don't have enough C language experiments.
However, I would happily migrate to the new API later.

@getumen
Copy link
getumen commented Aug 25, 2023

I try to use FIL in Go.
FIL on GPU is two times slower than treelite in my workload.
https://github.com/getumen/cuml

@hcho3
Copy link
Collaborator Author
hcho3 commented Aug 25, 2023

@getumen My colleague alerted me of your Go binding for RAPIDS cuML. Nicely done!

Some remarks:

  • There have been performance improvements in the latest version of FIL. The latest version is 23.08. Is there anything we can help you to adopt the latest cuML?
  • RAPIDS team is working on a next generation of FIL which is faster than the current FIL. It is provided under the experimental namespace. The next-generation FIL includes an CPU algorithm. For smaller workloads, it is often faster to run prediction on CPU to avoid the latency overhead of data copy to and from GPU memory.

@getumen
Copy link
getumen commented Aug 25, 2023

@hcho3 Thank you for your information!
I try to update cuml version.
I also try to increase batch size in my workload.

@getumen
Copy link
getumen commented Aug 27, 2023

@hcho3 I have a trouble in building cuML 23.08.00.
When I build cuML 23.08.00, segmentation fault occurs in building RAFT.
I'm building cuML in Windows Docker Desktop.
I use nvidia/cuda:11.8.0-devel-ubuntu22.04 as base image and the Dockerfile is the following.
Do I need any additional setup to build cuML?

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN sed -i -r 's@http://(jp\.)?archive\.ubuntu\.com/ubuntu/?@http://ftp.jaist.ac.jp/pub/Linux/ubuntu/@g' /etc/apt/sources.list

ARG TREELITE_VERSION=3.2.0
ARG CUML_VERSION=v23.08.00

RUN apt-get update \
    && apt-get install -y \
    sudo \
    vim \
    less \
    git \
    wget \
    libssl-dev \
    autoconf \
    libtool \
    devscripts \
    debhelper \
    libblas-dev \
    liblapack-dev \
    zlib1g \
    cython3 \
    cuda-toolkit-11-8 \
    clang \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN wget https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3.tar.gz \
    && tar -zxf cmake-3.27.3.tar.gz \
    && cd cmake-3.27.3 \
    && ./bootstrap \
    && make -j$(nproc) \
    && make install \
    && cd .. \
    && rm -rf cmake-3.27.3 cmake-3.27.3.tar.gz

# install ucx
RUN git clone --recursive https://github.com/openucx/ucx.git \
    && cd ucx \
    && ./autogen.sh \
    && ./contrib/configure-release \
    && make -j$(nproc) install \
    && cd .. \
    && rm -rf ucx

# install nccl for gtx 1080
RUN git clone https://github.com/NVIDIA/nccl.git \
    && cd nccl \
    && make -j$(nproc) pkg.debian.build NVCC_GENCODE="-gencode=arch=compute_61,code=sm_61" \
    && dpkg -i build/pkg/deb/* \
    && cd .. \
    && rm -rf nccl

# ref. https://github.com/rapidsai/cuml/issues/2528#issuecomment-656847070
RUN git clone https://github.com/rapidsai/cuml.git -b ${CUML_VERSION} \
    && cd cuml/cpp \
    && mkdir build \
    && cd build \
    && cmake .. \
    -DBUILD_CUML_TESTS=OFF \
    -DBUILD_PRIMS_TESTS=OFF \
    -DENABLE_CUMLPRIMS_MG=OFF \
    -DBUILD_CUML_EXAMPLES=OFF \
    -DBUILD_CUML_BENCH=OFF \
    -DDETECT_CONDA_ENV=OFF \
    && make install -j$(nproc) \
    && cd ../../.. \
    && rm -r cuml 
Segmentation fault
make[2]: *** [CMakeFiles/raft_lib.dir/build.make:781: CMakeFiles/raft_lib.dir/src/neighbors/detail/cagra/search_multi_cta_float_uint32_dim1024_t32.cu.o] Error 139
make[2]: *** Waiting for unfinished jobs....

Segmentation fault
make[2]: *** [_deps/raft-build/CMakeFiles/raft_lib.dir/build.make:1156: _deps/raft-build/CMakeFiles/raft_lib.dir/src/neighbors/detail/ivf_pq_compute_similarity_float_float.cu.o] Error 139
make[2]: *** Waiting for unfinished jobs....

@hcho3
Copy link
Collaborator Author
hcho3 commented Aug 28, 2023

I was able to run the docker build with the Dockerfile you posted. It seems like you might be running out of memory. Can you reduce parallelism -j to a reasonable number?

If the issue persists, let's post a new issue in the cuML repository. Our team will investigate.

@getumen
Copy link
getumen commented Aug 30, 2023

@hcho3 Thank you for your advice!
Reducing parallelism solves segmentation fault.
However, another error was detected in the compilation of src/arima/batched_arima.cu.
I posted a issue about that.
rapidsai/cuml#5566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants