exa/reploy

Fork 0

Mirek Kratochvil 4a7dcc2dbe split tags and ehtags, separate listings from hierarchical views

2023-06-18 17:02:51 +02:00

14 KiB

Raw Blame History

title

mount

redirects

How-to: Pass a PPC code check

The code part of the pre-publication checks is supposed mainly to check the compliance of the research with the LCSB code policy and the re-usability of the research for the others, mainly enabled by licensing.

What is checked within the code check?

To successfully pass the code check, you need to fulfill 3 main requirements on the code:

Code deposition: Any code associated to the publication is archived in a way that it will stay accessible for the LCSB.
Minimal documentation: The code has a minimal README file attached, which documents the code purpose and its basic use required to reproduce the results.
Licensing requirements: The license of the code is compatible with the licenses of the dependencies (such as various libraries). If the code is public, the license file must be present to clarify the code use conditions.

What is irrelevant to the code check?

The code check does not evaluate the following properties of the code, as they are not implied by the code policy:

Open-source code publication. It is not required to make your code public in order to pass the code check. PPC coaches may detect a situation when open-sourcing of the code might be viable, in which case they will recommend (but never mandate) the code publication.
Code quality. Code check will not evaluate advanced aspects of the code apart from the basic reproducibility and readability that is implied by the presence of the README, and code quality has no impact on the result of code check. In case of serious code reusability flaws, PPC coaches may recommend best way to make the code more useful for the users, but will not require fixing the code.

Code check in 3 steps

Step 1: Move your code to the correct place

Your code repository must be stored in a common GitLab group that belongs to your research group (and is typically managed by your PI). Examples of such groups include e.g. the ones by MFN and TNG. Your project should not be deposited in the "personal" namespace, such as at gitlab.lcsb.uni.lu/firstname.lastname/some-hidden-project, as this restricts the availability of your code for others.

To make sure the project is in the right group,

when creating a new project, find the matching group in a dropdown below Project URL, in the field marked with Pick a group or namespace.
if your project already exists, you can transfer it to the desired group by clicking Setting (in the menu on the left) → General → Advanced (all the way below) → Transfer project; there you select the target group in the box labeled with Select a new namespace.

For all projects, make sure that the project visibility matches the expectations:

Closed-source projects should display a small padlock 🔒 next to the project title on the main project page, meaning the access is restricted
Open-source project should display a small globe icon 🌐, meaning the project is accessible globally.

The project visibility can be changed from the menu on the left, via Settings → General → Visibility, project features, permissions → Project visibility.

Troubleshooting:

If you do not know what group to choose and there is no information about your group in the list, ask your PI.
If your research group has no namespace in GitLab yet, ask your PI or the R3 team to create one for you.
If you want to host your code somewhere else (e.g., on GitHub), consult the code policy for detailed conditions. Most notably, you will need an approval from your PI.

R3 team may be contacted using lcsb-r3@uni.lu.

Step 2: Write a simple `README`

README is a plain text or markdown file (README.md in that case) which sits in the root of the repository and answers the following questions to the potential code users:

what does the code do?
- what are the expected inputs and outputs?
- what algorithms does it run, and for what purpose?
what is required to run the code?
- what commands the user has to type?
- what other software (dependencies) the user has to install before running the code, and how?
if applicable, what research (or publication) does the code support?

A sufficient README file may be as short as 2 sentences that briefly clarify the above, but it is recommended to be slightly more verbose. We recommend to follow the example of existing repositories with good README files:

Supporting code for a computational publication:
Internal tooling:
- gitlab.lcsb.uni.lu/R3/outreach/templates/presentations/markdown
Open-source software:

Especially for open source projects, remember that a nice README file is one of the main main decisive factors for the users to try using your project (eventually generating citations of your research). Consider adding the following to your READMEs to increase both their appearance and practical value:

badges for CI, documentation, and many other things
institute, university, lab or project acknowledgements and logos
simple code example to run the library
links to dockerized software versions
short copyable scripts that install the software
links to use-cases
a nice, self-explanatory picture of the result
link to a comprehensive documentation, if available
link to the publication, possibly accompanied by a preprint link and a CITATION file

Step 3: Add a license

The choice of code sharing policy and the license is the responsibility of your PI. If you are not sure what your license should be, ask your PI. You can talk to R3 team (lcsb-r3@uni.lu), TTO (lcsb-innovation@uni.lu), and possibly the legal support (lcsb-legal@uni.lu) for recommendations.

The main considerations on open- vs. closed-sourcing the code (which have licensing implications) are as follows:

Open-source code: Open-sourcing the code related to your publication is usually recommended (also, it is very welcome by journals and we have seen several cases of reviewers rejecting a paper based on code unavailability). You may pick any OSS-compliant license that is compatible with your code's dependencies. Remember that the code publication and the license choice must be approved by your PI. Preferably, aim for using licenses like Apache2.0 (recommended as a default for open-source projects), MIT, or GPL.
Closed-source code: Closed-source code is possible, and is usually a good choice for internal projects and projects that involve a lot of manipulation of sensitive data (where code may already leak sensitive information). Initially the code does not require any license treatment. When you later need to share the code, it is recommended to attach short copyright notices. (See the FAQ section below.)
Potentially patentable code: With all projects it is recommended to delay open-sourcing the code AFTER the TTO has examined the potential commercial value of the code, even after the PI's approval of publication. Algorithms and pipelines may be patented, and publishing the code prematurely typically makes any patenting impossible. Ask the TTO before publishing the code if you have even the slightest suspicion that the code might be commercially useful.

When the license is decided, you simply add a file LICENSE or LICENSE.md in the root of your repository, and copypaste the desired license into it. The license text may be obtained from OSS licensing websites (choosealicense.com, GitHub) or copied from other projects. Examples of correct LICENSE applications may be found among the open-source projects listed above.

Hint: If you are adding the license file in GitLab, after clicking the big "plus" button, following with New file button and filling in the name of the file LICENSE, GitLab will offer you a drop-down with several prepared open-source licenses. This way you do not need to copypaste the whole license text manually. (Sometimes, GitLab will even directly offer a big Add LICENSE button, which does the same.)

Commonly asked&answered questions (FAQ)

Do I need to upload my code to GitLab even if it is not supposed to be published?

Yes. Unless treated otherwise with a signed contract with the University, all code you produce within the LCSB is a research output that belongs to the University and must be deposited for internal archival, as required by the code policy. As explained above, uploading to GitLab does not imply that the code would have to be published (i.e., open-sourced) -- the repository with the code can stay private (and closed-source) forever.

Notably, this also concerns code that is not associated to any publication output.

How do I handle non-public code that is supposed to be disclosed upon request?

There are no standard guidelines, because the specific situation and requirements vary a lot between cases. It is advisable to contact the R3 team and the TTO office to discuss the current applicable policies and recommendations. At the same time, you should understand that the code availability "upon request" creates an additional significant long-term administrative and legal load on LCSB staff, and should be avoided unless absolutely necessary.

The general advice for such cases includes the following:

If the code is not supposed to be circulated publicly, add a clear copyright statement to the header of each source code file (as a code comment) and to the separate LICENSE file in the root of the repository. The copyright notice for the code header comments and a closed-source LICENSE file may look simply as follows:

Copyright (c) 2022-2023 University of Luxembourg; all rights reserved.

If you are sharing the code as a response to a request, you may demand the third party to sign a non-disclosure agreement (NDA) and possibly other contracts (depending on the specific case) before receiving the code.
To ensure on-demand code availability in the future, archive the code into a single "bundle" (such as a zip file), and treat this archive as non-public dataset that is shared upon request: Deposit it to a suitable archival medium (such as GitLab and Atlas), make a record in DAISY, and add an "availability upon request" statement that is compliant with the data check. Most notably, the code contact must not point to a single person, but to a sustainable LCSB service that manages the requests (such as TTO or LCSB data stewards).
The code disclosure may be limited because the code implicitly contains sensitive data, for example because the programs include very detailed handling of sensitive datasets. In that case, the code itself must be additionally treated as sensitive data, as regulated by the existing policies.

Can I pass the code check if I want to publish the code only after the paper is accepted?

Yes, but you will have to provide another way for the PPC coaches to have a look at your code.

Typically, the coaches may ask for temporary access to the GitLab repository. You may add the code coach to your GitLab project using Project information → Members → Invite members. You must select at least the Reporter role, because the guest role does not offer access to the actual code. To simplify things, you may add an access expiration date (but please allow a week or two for possible later checks).

In cases when adding members to your repository is impractical or impossible, it is also viable to arrange a short meeting with the PPC coaches to check the basic requirements together.

If your code is subject to non-disclosure, strict closed-source copyright or contains sensitive or secret information, please notify the PPC coaches in advance, so that they use appropriate security measures.

14 KiB Raw Blame History