14 KiB
title | mount | redirects | tags | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Code check | publication/ppc/code-check |
|
|
How-to: Pass a PPC code check
The code part of the pre-publication checks is supposed mainly to check the compliance of the research with the LCSB code policy and the re-usability of the research for the others, mainly enabled by licensing.
What is checked within the code check?
To successfully pass the code check, you need to fulfill 3 main requirements on the code:
- Code deposition: Any code associated to the publication is archived in a way that it will stay accessible for the LCSB.
- Minimal documentation: The code has a minimal
README
file attached, which documents the code purpose and its basic use required to reproduce the results. - Licensing requirements: The license of the code is compatible with the licenses of the dependencies (such as various libraries). If the code is public, the license file must be present to clarify the code use conditions.
What is irrelevant to the code check?
The code check does not evaluate the following properties of the code, as they are not implied by the code policy:
- Open-source code publication. It is not required to make your code public in order to pass the code check. PPC coaches may detect a situation when open-sourcing of the code might be viable, in which case they will recommend (but never mandate) the code publication.
- Code quality. Code check will not evaluate advanced aspects of the code
apart from the basic reproducibility and readability that is implied by the
presence of the
README
, and code quality has no impact on the result of code check. In case of serious code reusability flaws, PPC coaches may recommend best way to make the code more useful for the users, but will not require fixing the code.
Code check in 3 steps
Step 1: Move your code to the correct place
Your code repository must be stored in a common
GitLab group that belongs to your research group
(and is typically managed by your PI). Examples of such groups include e.g. the
ones by MFN and
TNG. Your project
should not be deposited in the "personal" namespace, such as at
gitlab.lcsb.uni.lu/firstname.lastname/some-hidden-project
,
as this restricts the availability of your code for others.
To make sure the project is in the right group,
- when creating a new project, find the matching group in a dropdown below Project URL, in the field marked with Pick a group or namespace.
- if your project already exists, you can transfer it to the desired group by clicking Setting (in the menu on the left) → General → Advanced (all the way below) → Transfer project; there you select the target group in the box labeled with Select a new namespace.
For all projects, make sure that the project visibility matches the expectations:
- Closed-source projects should display a small padlock 🔒 next to the project title on the main project page, meaning the access is restricted
- Open-source project should display a small globe icon 🌐, meaning the project is accessible globally.
The project visibility can be changed from the menu on the left, via Settings → General → Visibility, project features, permissions → Project visibility.
Troubleshooting:
- If you do not know what group to choose and there is no information about your group in the list, ask your PI.
- If your research group has no namespace in GitLab yet, ask your PI or the R3 team to create one for you.
- If you want to host your code somewhere else (e.g., on GitHub), consult the code policy for detailed conditions. Most notably, you will need an approval from your PI.
R3 team may be contacted using lcsb-r3@uni.lu.
Step 2: Write a simple README
README
is a plain text or markdown file (README.md
in that case) which sits
in the root of the repository and answers the following questions to the
potential code users:
- what does the code do?
- what are the expected inputs and outputs?
- what algorithms does it run, and for what purpose?
- what is required to run the code?
- what commands the user has to type?
- what other software (dependencies) the user has to install before running the code, and how?
- if applicable, what research (or publication) does the code support?
A sufficient README
file may be as short as 2 sentences that briefly clarify
the above, but it is recommended to be slightly more verbose. We recommend to
follow the example of existing repositories with good README
files:
- Supporting code for a computational publication:
- Internal tooling:
- Open-source software:
Especially for open source projects, remember that a nice README
file is one
of the main main decisive factors for the users to try using your project
(eventually generating citations of your research). Consider adding the
following to your README
s to increase both their appearance and practical
value:
- badges for CI, documentation, and many other things
- institute, university, lab or project acknowledgements and logos
- simple code example to run the library
- links to dockerized software versions
- short copyable scripts that install the software
- links to use-cases
- a nice, self-explanatory picture of the result
- link to a comprehensive documentation, if available
- link to the publication, possibly accompanied by a preprint link and a
CITATION
file
Step 3: Add a license
The choice of code sharing policy and the license is the responsibility of your PI. If you are not sure what your license should be, ask your PI. You can talk to R3 team (lcsb-r3@uni.lu), TTO (lcsb-innovation@uni.lu), and possibly the legal support (lcsb-legal@uni.lu) for recommendations.
The main considerations on open- vs. closed-sourcing the code (which have licensing implications) are as follows:
- Open-source code: Open-sourcing the code related to your publication is usually recommended (also, it is very welcome by journals and we have seen several cases of reviewers rejecting a paper based on code unavailability). You may pick any OSS-compliant license that is compatible with your code's dependencies. Remember that the code publication and the license choice must be approved by your PI. Preferably, aim for using licenses like Apache2.0 (recommended as a default for open-source projects), MIT, or GPL.
- Closed-source code: Closed-source code is possible, and is usually a good choice for internal projects and projects that involve a lot of manipulation of sensitive data (where code may already leak sensitive information). Initially the code does not require any license treatment. When you later need to share the code, it is recommended to attach short copyright notices. (See the FAQ section below.)
- Potentially patentable code: With all projects it is recommended to delay open-sourcing the code AFTER the TTO has examined the potential commercial value of the code, even after the PI's approval of publication. Algorithms and pipelines may be patented, and publishing the code prematurely typically makes any patenting impossible. Ask the TTO before publishing the code if you have even the slightest suspicion that the code might be commercially useful.
When the license is decided, you simply add a file LICENSE
or LICENSE.md
in
the root of your repository, and copypaste the desired license into it. The
license text may be obtained from OSS licensing websites
(choosealicense.com,
GitHub) or copied from other
projects. Examples of correct LICENSE applications may be found among the
open-source projects listed above.
Hint: If you are adding the license file in GitLab, after clicking the big
"plus" button, following with New file button and filling in the name of the
file LICENSE
, GitLab will offer you a drop-down with several prepared
open-source licenses. This way you do not need to copypaste the whole license
text manually. (Sometimes, GitLab will even directly offer a big Add LICENSE
button, which does the same.)
Commonly asked&answered questions (FAQ)
Do I need to upload my code to GitLab even if it is not supposed to be published?
Yes. Unless treated otherwise with a signed contract with the University, all code you produce within the LCSB is a research output that belongs to the University and must be deposited for internal archival, as required by the code policy. As explained above, uploading to GitLab does not imply that the code would have to be published (i.e., open-sourced) -- the repository with the code can stay private (and closed-source) forever.
Notably, this also concerns code that is not associated to any publication output.
How do I handle non-public code that is supposed to be disclosed upon request?
There are no standard guidelines, because the specific situation and requirements vary a lot between cases. It is advisable to contact the R3 team and the TTO office to discuss the current applicable policies and recommendations. At the same time, you should understand that the code availability "upon request" creates an additional significant long-term administrative and legal load on LCSB staff, and should be avoided unless absolutely necessary.
The general advice for such cases includes the following:
- If the code is not supposed to be circulated publicly, add a clear
copyright statement to the header of each source code file (as a code
comment) and to the separate
LICENSE
file in the root of the repository. The copyright notice for the code header comments and a closed-source LICENSE file may look simply as follows:
Copyright (c) 2022-2023 University of Luxembourg; all rights reserved.
- If you are sharing the code as a response to a request, you may demand the third party to sign a non-disclosure agreement (NDA) and possibly other contracts (depending on the specific case) before receiving the code.
- To ensure on-demand code availability in the future, archive the code into a
single "bundle" (such as a
zip
file), and treat this archive as non-public dataset that is shared upon request: Deposit it to a suitable archival medium (such as GitLab and Atlas), make a record in DAISY, and add an "availability upon request" statement that is compliant with the data check. Most notably, the code contact must not point to a single person, but to a sustainable LCSB service that manages the requests (such as TTO or LCSB data stewards). - The code disclosure may be limited because the code implicitly contains sensitive data, for example because the programs include very detailed handling of sensitive datasets. In that case, the code itself must be additionally treated as sensitive data, as regulated by the existing policies.
Can I pass the code check if I want to publish the code only after the paper is accepted?
Yes, but you will have to provide another way for the PPC coaches to have a look at your code.
Typically, the coaches may ask for temporary access to the GitLab repository. You may add the code coach to your GitLab project using Project information → Members → Invite members. You must select at least the Reporter role, because the guest role does not offer access to the actual code. To simplify things, you may add an access expiration date (but please allow a week or two for possible later checks).
In cases when adding members to your repository is impractical or impossible, it is also viable to arrange a short meeting with the PPC coaches to check the basic requirements together.
If your code is subject to non-disclosure, strict closed-source copyright or contains sensitive or secret information, please notify the PPC coaches in advance, so that they use appropriate security measures.