Some common problems encountered when using the HEC:
- Job fails with errors such as: "/etc/profile: No such file or directory" or "module: command not found"
- Job fails with the error: "TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.""
- Compilation fails with the error
message: "relocation truncated to fit: R_X86_64_PC32
- Secure file transfer fails with the message: "File transfer server could not be started or exitedunexpectedly. Exit value 0 was returned. Most likely the sftp-server is not in the path of the user on the server side
Job fails with errors such as: "/etc/profile: No such file or directory" or "module: command not found"
The most common cause for this error message is writing a job submission script on a Windows systems, and then transferring the file across to the HPC in the wrong mode. Windows and Unix systems use slightly different formats for simple ASCII text files, and the differences are normally not immediately visible to the user.
To fix the problem, ensure that the job submission script is transferred to the HPC using text mode - not binary or automatic mode. The problem file can be converted in situ on the HPC using the dos2unix command followed by the name of the file. An alternative solution is to write job submission scripts directly on the HPC using one of the many text editors (emacs, ue, vi, nedit, etc).
The job has been automatically terminated after exceeding its allowed memory resource limit. The job must be submitted with a valid memory resource request as described on the Advanced Jobs page.
The error normally occurs when compiling a code whose static data structures exceed 2G in size. If your code requires static data structures in excess of 2G, the code must be recompiled and linked using the medium memory controller model. The flag -mcmodel=medium must be added to all compiler and linker commands.
Secure file transfer fails with the message: "File transfer server could not be started or exitedunexpectedly. Exit value 0 was returned. Most likely the sftp-server is not in the path of the user on the server side"
This is typically caused by unexpected output in one of the user's shell startup scripts, generally .tcshrc or .cshrc. Temporarily renaming the file and attempting to use Secure FTP will confirm if this is the cause. The problem can be fixed by moving the offending lines to a shell logn script (.login for tcsh, .profile for bash).
Note: The offending lines are typically module directives.