General

The keywords “MUST”, “MUST NOT”, “SHOULD”, etc. are to be interpreted as described in RFC 2119.

1 Permissions to host on nf-core/configs

Configs hosted on nf-core/configs SHOULD have permission from the administrators of the given infrastructure to host the config publicly within the nf-core GitHub organisation, except for the exceptions described below.

  • Configs for sensitive data clusters MUST have permission from the system administrators.
Tip

A config MAY be declared unofficial if the system administrators agree to public hosting but do not maintain it.

2 Alignment of config with local policies

Configurations SHOULD comply with, and document, administrative policies of the infrastructure where possible. For example, if multiple possible partitions exist but there is a policy to use specific partitions for specific cases, this SHOULD be represented in the config. Another example is if executing the main Nextflow run command on login/submit nodes is not allowed, include a sample job submission script (for example a SLURM sbatch script) in the documentation.

3 Sensitive data and offline environments

3.1 Sensitive data clusters

Configs for infrastructure handling sensitive or restricted data SHOULD document any relevant data governance policies that affect pipeline execution. A config for sensitive data infrastructure SHOULD use a local container registry rather than pulling from public registries (e.g., Docker Hub, Quay.io).

3.2 Offline/air-gapped clusters

A config for infrastructure without external internet access SHOULD describe how to set singularity.cacheDir or apptainer.cacheDir to a directory pre-populated using nf-core download rather than configuring a container registry. It SHOULD set params.igenomes_ignore = true and provide paths to locally available reference genomes instead.

4 Size of configs

4.1 Number of infrastructures in a config

A single configuration file SHOULD only be used to represent a single cluster or type of infrastructure.

For HPC infrastructure, a single config MAY represent multiple similar or linked HPCs that are dynamically selected within the config.

4.2 Structure when multiple configs

If multiple HPCs are supported in a single config, any sub-configs that are selected based on a condition in the main config MUST be placed in a subdirectory. The subdirectory MUST have the same base name as the config file, for example, for a config called myinstitute.config, a directory named myinstitute, and it MUST be loaded in the main config with includeConfig().

Example:

conf/
├── <myinstitute>/
│   ├── <hpc1>.config
│   └── <hpc2>.config
└── <myinstitute>.config

5 Scope of configs

Two config types are possible:

5.1 Institutional config

An institutional config MUST be compatible with any nf-core pipeline or user. It defines how the pipeline interacts with the infrastructure, such as scheduling options, software environment settings, and resource limits.

An institutional config SHOULD NOT define any resource defaults with withName or withLabel. It SHOULD provide reasonable default settings for operating on the infrastructure (e.g. resourceLimits, beforeScript, clusterOptions, runOptions).

5.2 Pipeline-specific institutional config

A pipeline-specific config MAY modify the default resource values (memory, CPUs, and time).

Where possible, it is RECOMMENDED to provide defaults for using locally available references and similar resources.

6 Naming

6.1 Size of name

Institutional configs SHOULD use a short name or acronym as the config name.

6.2 Formatting

Config names MUST be written, for example in file names, or referred to in documentation, in all lowercase letters or numbers.

Config names MAY use an underscore. Config names MUST NOT use any other symbols.

6.3 Names when multiple infrastructure in a single institution

When multiple computational infrastructures exist for a single institution, an institutional prefix SHOULD be used.

For example, for the Max Planck Computing and Data Facilitation (MPCDF) institution that has two HPCs named raven and viper:

mpcdf_raven.conf
mpcdf_viper.conf

7 Required files

An institutional config MUST consist of two files:

  • conf/<config name>.config for the config itself.
  • docs/<config name>.md for documentation about the config.

Files for a pipeline-specific institutional config must be located:

Furthermore, the config MUST be referred to in two additional places:

8 Required information

A config MUST have a current contact person responsible for maintaining the config.

9 Parameters

9.1 Required parameters

A config MUST include three descriptive parameters:

ParameterPurpose
config_profile_descriptionA short description of which infrastructure the config is used for.
config_profile_contactThe name and GitHub handle of the person currently maintaining the config.
config_profile_urlA URL to details about the infrastructure or the institution.
params {
    config_profile_description = 'The <name of infrastructure> cluster profile'
    config_profile_contact     = '<maintainer name> (@<github handle>)'
    config_profile_url         = 'https://<url>.com'
}

9.2 Optional parameters

9.2.1 Backwards-compatible max parameters

A config MAY also define the max_* parameters with the same values as the resourceLimits directive. This provides backward compatibility for older pipelines with older Nextflow versions.

params {
    igenomes_ignore = true
    max_memory      = 750.GB
    max_cpus        = 200
    max_time        = 30.d
}

9.2.2 Custom parameters

Custom config- or infrastructure-specific parameters MAY be used, such as for cluster scheduler ‘account’ or ‘project’ parameters.

Custom config- or infrastructure-specific parameters MUST be documented in the config .md file. Custom config- or infrastructure-specific parameters MUST be included in an nf-schema validation scope ignoreParams parameter.

For example:

validation {
    ignoreParams = ['cluster_account']
}

10 Resource limits

10.1 Directive

A config MUST define the maximum resource limits of a computing infrastructure using the resourceLimits process directive.

process {
    resourceLimits = [
        memory: 750.GB,
        cpus: 200,
        time: 30.d
    ]
    executor = 'slurm'
    queue    = { task.memory <= 250.GB ? (task.time <= 24.h ? 'fast' : 'long') : 'bigmem' }
}