Python and Mirroring PyPI (Beta)#
RSPM supports creating a Python repository with a source that mirrors the Python Package Index (PyPI).
Adding a PyPI repository to an RSPM installation will:
- Provide a full mirror of all packages available on PyPI
- Enable fully reproducible dependency management through historic PyPI snapshots
- Locally cache all downloaded Python packages for quicker installs
Beta notice#
PyPI repositories are currently a beta feature. These are intended for testing purposes. They do not fall under our support agreement, and are not recommended for use in production. However, if you have an issue as you are using the feature, we'd like to hear about it here.
System Requirements#
In addition to the system requirements recommended in the Installation section, supporting Python packages will require additional disk storage depending on the number of packages being used.
Info
The entirety of PyPI currently requires about 10 TB of storage. Your actual storage needs will depend on your usage. Deep learning packages, such as Tensorflow and PyTorch, are notoriously large, with hundreds of gigabytes needed for each project's collection of files. If you do not anticipate using deep learning packages, a starting storage size of 50 GB is likely adequate. If you do intend to use deep learning packages, you should plan for 500 GB or more.
Quickstart#
The quickest way to make PyPI packages available for your RSPM installation is by running these commands:
Terminal
$ rspm create repo --name=pypi --type=python --description='Access PyPI packages'
$ rspm subscribe --repo=pypi --source=pypi
$ rspm sync --type=pypi
For more information about these commands, scroll down to the Python PyPI Repository section.
User Configuration#
Once a Python repository has been successfully created and synced with the RStudio PyPI service, users need to configure their local system and pip to install from RSPM.
To find instructions specific to your RSPM installation:
- Follow the Quickstart or Creating a Python PyPI Repository instructions.
- Navigate to the RSPM homepage.
- Select the relevant Python repository from the sidebar.
- Click the Setup button at the top of the page.
In general, users can either install from RSPM in a one-off basis:
Terminal
$ pip install --index-url http(s)://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL
or configure pip
to use RSPM in a persistent manner:
Terminal
$ pip config set global.index-url http(s)://[HOST:PORT]/latest/simple
Note
If you use HTTP, pip will ignore your repository by default. Using only the configuration above, pip will show a warning message like this:
WARNING: The repository located at [HOST] is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host [HOST]'.
To configure pip to use the unencrypted HTTP RSPM server, you must use the --trusted-host
flag or configuration option.
Terminal
$ pip install --trusted-host [HOST] --index-url http://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL
or configure pip
to use RSPM in a persistent manner:
Terminal
$ pip config set global.index-url http://[HOST:PORT]/latest/simple
$ pip config set global.trusted-host [HOST]
Note
If you use HTTPS but do not provide your RSPM installation with a valid SSL certificate, pip will throw SSL: CERTIFICATE_VERIFY_FAILED
errors when installing packages, because it attempts to verify proper HTTPS configuration by default. To configure
pip to ignore these errors, you need to use the --trusted-host
flag or configuration option.
Terminal
$ pip install --trusted-host [HOST] --index-url https://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL
or configure pip
to use RSPM in a persistent manner:
Terminal
$ pip config set global.index-url https://[HOST:PORT]/latest/simple
$ pip config set global.trusted-host [HOST]
Creating a Python PyPI Repository#
In the Quickstart section above, we're performing the following operations:
- Create a
Python
repository with a description:
Terminal
$ rspm create repo --name=pypi --type=python --description='Access PyPI packages'
<< Repository: pypi - Python
- Subscribe the repository to the preconfigured
PyPI
source:
Terminal
$ rspm subscribe --repo=pypi --source=pypi
<< Repository: pypi
<< Sources:
<< --pypi (Python)
- Ensure that RSPM has the appropriate metadata using the
sync
command. RSPM pulls packages and metadata from the RStudio PyPI service.
Terminal
$ rspm sync --type=pypi
<< Initiated PyPI synchronization for pypi. Depending on how much data has been previously synchronized, this could take a while. Actions will appear in the Package Manager UI as they are completed.
<< Snapshots for pypi: 0 / 34 [----------------------------------------------------------------------------------------------------------------------------------]
<< Packages in pypi snapshot: 14127 / 231734 [======>-------------------------------------------------------------------------------------------------------] 5m9
Note
If you try subscribing a non-Python type repository to a Python source, you'll get the error source type must be compatible with repository type
.
The PyPI Source#
After syncing with the RStudio PyPI service, the local RSPM installation will have all of the metadata, for all the
packages on PyPI. Only when a package is requested, for example by pip
, is it retrieved from the
RStudio PyPI service.
Scheduled Synchronization#
By default, RSPM will sync with the RStudio PyPI service once daily. This schedule can be configured using the PyPI SyncSchedule
option, for example:
; /etc/rstudio-pm/rstudio-pm.gcfg
...
[PyPI]
SyncSchedule = 0 1 * * *
...
Note
Although RSPM automatically syncs daily, the RStudio PyPI service may not update packages every day.