Installation#
Xorbits can be installed via pip from PyPI.
pip install xorbits
It will install the latest version of Xorbits and dependencies like pandas, numpy, etc.
We recommend you to use environment management tools like conda or venv to create
a new environment. conda will install the pre-compiled packages, while pip will
install the wheel (which is pre-compiled) or compile the packages from source code if no wheel
is available.
Python version support#
Officially support Python 3.9, 3.10, 3.11, and 3.12.
Packages support#
Xorbits partitions large datasets into chunks and processes each individual chunk using single-node packages (such as pandas). Currently, our latest version strives to be compatible with the latest single-node packages. The table below lists the highest versions of the single-node packages that Xorbits are compatible with. If you are using an older version of pandas, you should either upgrade your pandas or downgrade Xorbits.
GPU support#
Xorbits can also scale GPU-accelerated data science tools like CuPy and cuDF. To enable GPU support, you need to install
GPU-accelerated packages. As GPU software stacks (i.e.,GPU driver, CUDA, etc.)
are complicated from CPU, you need to make sure NVIDIA driver and CUDA toolkit are properly installed.
We recommend you to use conda to install cuDF first, it will install both cudf and cupy,
and then install xorbits with pip.
conda will help resolve the dependencies of cuDF and provides supporting software like CUDA.
Refer to RAPIDS_INSTALL_DOCS for more details about how to install cuDF.
When using Xorbits with GPU, you need to add the gpu=True parameter to the data loading method.
For example:
import xorbits.pandas as pd
df = pd.read_parquet(path, gpu=True)
If you find installing GPU-accelerated packages too complicated, you can use our docker images with pre-installed GPU drivers and CUDA toolkit. Please refer to Docker image for more details.
Dependencies#
Required packages#
Xorbits depends on the following libraries, which are mandatory. When you run
pip install xorbits, pip will download the latest versions of these packages from the PyPI.
cloudpickle
pyyaml
psutil
tornado
sqlalchemy
defusedxml
tqdm
uvloop (for systems other than win32)
Recommended dependencies#
Note
You are highly encouraged to install these libraries, as they provide speed improvements, especially when working with large datasets.
Recommended dependencies can be installed using pip.
pip install 'xorbits[extra]'
The following extra dependencies will be installed.
numexpr: for accelerating certain numerical operations.
numexpruses multiple cores as well as smart chunking and caching to achieve large speedups.pillow: the Python Imaging Library.
pyarrow: python API for Arrow C++ libraries.
lz4: python bindings for the LZ4 compression library.
fsspec: for cloud data accessing.
Docker image#
To simplify the installation of Xorbits, we provide docker images with pre-installed Xorbits and its dependencies.
CPU image:
xprobe/xorbits:v{version}-py{python_version}, e.g.,xprobe/xorbits:v0.8.0-py3.12GPU image:
xprobe/xorbits:v{version}-cuda{cuda_version}-py{python_version}, e.g.,xprobe/xorbits:v0.8.0-cuda12.0-py3.12