Frequently Asked Questions
Login
Forgotten password
I forgot my password. How can I reset it? Your HELIOS account uses the same authentication process as your OneView account, so your password is always the same as your OneView password. The login name is your p-number. If you reset your OneView password, your server password should be synced within an hour, or so.
Login fails
My linux account is enabled and confirmed by the admins, but login fails…
Make sure to enter you P-number and credentials in lower case. Also, do not use the domain name before or after your account name. Be aware that some special characters (like ~) can cause problems when used in a password. Depending on the keyboard and language settings, you might have to enter an addition space directly after such a symbol to enter it (or first enter it in Notepad and then copy). Note that an AUMC account will be locked for at least 5 minutes when a wrong password was entered three times in a row.
Slurm
Slow session
My slurm session is slow. What's happening?
When computations take more time then expected, you might double check the following known causes: - Network storage (i.e. R-disk or NFS mounts) is known to behave unpredictable and relatively slow. Especially when using a large number of small files. So, try to prevent using network storage in your slurm jobs. Prepare data upfront on your scratch when possible. Also, put (matlab and python) scripts, toolboxes and environments on your scratch or home folder.
-
Applications will perform badly when you assign insufficient working memory to your job. The operating system will start swapping working memory between fast RAM and slow disk when working memory is a limiting resource. Try to increase the amount of working memory for the job with 25% or so. You can check information about your job with the commands in this section
-
When using GPU, make sure to have (or stage) your data on a fast storage medium, like scratch or home. Optimizing GPU-performance can be a complex procedure, but you can always try a GPU device with more memory. However, try to prevent oversized slurm configurations because unused resources will also be accounted for.
Slurm job on hold
Why is my slurm job on hold (Reserved for maintenance)?
If you submit a job that is longer than the time available until the next scheduled maintenance, your job will remain on hold until after the maintenance. Such a job will be labeled with ‘(ReqNodeNotAvail, Reserved for maintenance)’ in the squeue listing. The best way to prevent this, is to keep your jobs as short as possible and possibly cut them up.
NB. You can check the scheduled reservations with:
scontrol show reservation
Why is my slurm job on hold (UnavailableNodes)?
If you submit a job to a node that is not available, your job will remain on hold until the node becomes available again. Such a job will be labeled with ‘(ReqNodeNotAvail, UnavailableNodes:xxx)’ in the squeue listing.
In some cases, a node will enter the so-called draining mode, in which no new jobs will be started. This can happen when a running job cannot be terminated properly. Contact the admins to double check if the node can be resumed.
NB. You can check the state of the nodes with:
sinfo -l
Why is my slurm job on hold (priority)?
This indicates that another job has a higher priority and is waiting to get sufficient resources available.
sprio
spriox
Also be aware that a job with a longer duration might stay in a pending or waiting state for a longer time compared to a job with a shorter duration, so don't set the required duration to a larger value than required.
Why is my slurm job on hold (resources)?
This means that your job is the next one to start but has to wait because there are not enough resources available yet. Use these commands to get more info on queued jobs:
Useshow-my-slurm-limits
to see resource limits.
Why is my slurm job on hold (Max…PerAccount)?
This means that the (slurm banking) account that you use for slurm, reached the limit it is allowed to. This can relate to any resource, such as working memory, cpu's and gpu's. Note that jobs of other users that use the same slurm account are also included in the usage statistics. Use sprio
and spriox
to get more info on queued jobs. Use show-my-slurm-limits
to see resource limits.
Applications
Required application is missing
I need a specific application but cannot find it. How to install it?
Most pre-installed applications that are available on the system can be found and activated using the module environment.
When an application is not available as module or menu shortcut, you can contact the system admin's and ask if it possible (and useful) to install it. In some cases, the admin's may redirect to another application that offers the same functionality, or explain how the application can be installed in the users' personal working space in case the application is less suitable for other users.
Note that python and R packages can often be installed by the user itself in a so-called virtual environment.
Note that you cannot install software system wide using the sudo command, so please don't blindly follow installation instructions found on the web if you're less experienced with installing software on (Red Hat) linux systems. Use of the sudo command will be reported to the admin's to prevent stability and security issues.
Storage
Disk quotum
Disk quotum reached. How can I solve and prevent it?
A typical application that could fill up you home folder, is apptainer. If you download containers, they will be cached in you home folder under ~/.apptainer/cache
. You can inspect and manage the cache using:
apptainer cache --help
Another application that may fill up your home directory is conda. Conda typically uses a hidden ~/.conda
directory to cache data it uses for environments and packages. See the conda manuals for more info.
In some cases you will have to double check the disk usage of regular and hidden folders in your home directory, but excluding network shares:
find ~ -mindepth 1 -maxdepth 1 -fstype xfs ! -type l -exec du -sh '{}' \;
In words: search your home folder (~) and match only items that are direct children, only match linux filesystems (i.a. skip network share), skip softlinks to other folders, and finally run the disk usage summary on each item.
To check your actual personal quota, type the following in a terminal:
show-my-quota