Skip to content

Add userguides for As1#132

Merged
tristanpwdennis merged 8 commits into
malariagen:masterfrom
tristanpwdennis:As1-docs
Apr 13, 2026
Merged

Add userguides for As1#132
tristanpwdennis merged 8 commits into
malariagen:masterfrom
tristanpwdennis:As1-docs

Conversation

@tristanpwdennis
Copy link
Copy Markdown
Collaborator

See #131

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Think this is ready for review, thanks guys!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Hi @tristanpwdennis, I don't think the date of the end of the embargo is correct. I don't know what the TOU are for stephensi but I assume a 2 year embargo starting from the date of the public release (probably coinciding with the publication date of the paper).

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Have put 2y from today - can update pending discussion with David W and Martin D but I think this is fine

@jonbrenas
Copy link
Copy Markdown
Collaborator

Sounds good. We just need to remember to use the same one as the value in the metadata and on the comms website.

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

tristanpwdennis commented Apr 5, 2026 via email

@jonbrenas
Copy link
Copy Markdown
Collaborator

Yes, the terms_of_use_expiry_date should be the same as the date here for all sample sets, except for the already published data from Thakare et al. (which should be out of embargo).

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Staged, released ToU - this is now reflected in the user guides. Let me know how are are looking & I can merge when you feel it is ready.

Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In download.ipynb, there are still some references to funestus.

Comment thread docs/as1/api.md Outdated
@@ -0,0 +1,3 @@
# Afs API
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a typo ;)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look fixed.

Copy link
Copy Markdown
Collaborator

@ahernank ahernank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On download.ipynb:

  • Would be good to change "SNP calls in VCF and Zarr formats are hosted on S3-compatible object storage at the Sanger Institute" to "SNP calls in VCF and Zarr formats are hosted on S3-compatible object storage", to future-proof.
  • the site filters text comes from Af.
  • "see the Af1 cloud data access guide" needs updating to As
  • "vo_agam_release_master_us_central1" reference on the last cell

On as1.ipynb:

  • Suggest changing "..sequenced individually to high coverage using Illumina technology by Novogene Ltd" to "...sequenced individually to high coverage using Illumina technology by a commercial provider".

On cloud.ipynb:

  • Although not directly on the user guide, it would be good to update the values on terms_of_use_expiry_date to NaN for thakare-2022, we don't put a date on these, as we don't want to talk about third-party terms of use, we only want to say that for literature sets, our terms of use don't apply.

@jonbrenas
Copy link
Copy Markdown
Collaborator

Would be good to change "SNP calls in VCF and Zarr formats are hosted on S3-compatible object storage at the Sanger Institute" to "SNP calls in VCF and Zarr formats are hosted on S3-compatible object storage", to future-proof.

@ahernank, should I go through the other VUGs and future-proof them, as well?

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Thank you @jonbrenas and @ahernank
You would think I would have CTL-F'd Af but apparently that eludes me!
I've updated accordingly.

Comment thread docs/as1/api.md Outdated
@@ -0,0 +1,3 @@
# Afs API
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look fixed.

@jonbrenas
Copy link
Copy Markdown
Collaborator

  • as1.ipynb:
  1. The major release for every other resource introduces the resource in more details (e.g., for funestus:

Af1.0 (Anopheles funestus Project Phase 1 Data Release)

The MalariaGEN Vector Observatory Anopheles funestus Genomic Surveillance Project is a collaborative project using whole-genome sequencing to enhance the monitoring and surveillance natural populations of mosquitoes in the major African malaria vector Anopheles funestus

The Af1.0 release provides a first baseline understanding of Anopheles funestus genetic diversity and population structure across Africa using 656 whole genome sequenced individuals. Over the coming years, the MalariaGEN Vector Observatory Anopheles funestus Genomic Surveillance Project will continue to carry out further spatiotemporal sampling of Anopheles funestus that builds upon Phase 1.

This page provides an introduction to open data resources released as part of the first phase of the Anopheles funestus Genomic Surveillance Project project, known as Af1.0 for short. We hope the data from Af1.0 will be a valuable resource for research and surveillance of malaria vectors.

The title, at the very least, should be "As1.0 (Anopheles stephensi Phase 1 Data Release)". I am conscious that there isn't a page for a MalariaGEN Vector Observatory Anopheles stephensi Genomic Surveillance Project (and that needs to be addressed) but the doc for the major release should introduce the project.
2. The text at the beginning of the Partner studies section would be best at the beginning to introduce the project.
3. I would put the part about enquiries in the Terms of Use section to mirror the other resources.
4. Literature sample sets are not introduced the same as partner studies (e.g., "This release also includes data from one study openly available in the literature: small-2023" from Af1.2)
5. Everywhere, it should be 'As1.0', not 'As1'.
6. A paragraph about downloads and row data being available on ENA is missing in 'Data hosting' which makes the first sentence sound incorrect.
7. In 'Sample sets', there are way more than 3 sample sets (contrary to what the first sentence says)

  • cloud.ipynb
  1. 'Data are organised into different releases.' Not right now, maybe in the future.
  2. There is still a reference to funestus here: "E.g., access SNP calls for chromosome 2RL for all samples in Af1.0."
  3. There is another here: "Not all of these alternate alleles will actually have been observed in the Af1 samples."
  4. The same is true in the version for Af but taxon == 'stephensi' isn't "further subsetting"
  5. Also a more general problem but, at the beginning of 'Running larger computations', MyBinder is no longer supported.
  6. At the end of 'SNP sites and alleles', is the sentence "See the example below." about segregating sites ... but the associated code in the Af version is missing here.
  • download.ipynb
  1. Why does the sample set used for the example change at some point?
  2. In 'SNP calls (VCF format) - SNP genotypes', "1229-VO-GH-DADZIE-VMF00095" should be "1363-VO-ET-GADISA-VMF00316"
  3. In 'SNP calls (VCF format) - site filters', it says 'These data are available as Zarr datastores, one per chromosome.'. If that is true, it shouldn't be in this section.

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks, @tristanpwdennis.

In cloud.ipynb, because it is at the resource level and not the release level, it should be As1 and not As1.0. Sorry for the confusion!

Comment thread docs/as1/as1.ipynb Outdated
"If you have any questions about this guide or how to use the data, please [start a new discussion](https://github.com/malariagen/vector-public-data/discussions/new) on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please [raise an issue](https://github.com/malariagen/vector-public-data/issues/new/choose)."
"As part 1 of this project, partners from various countries across the native and invasive range of *An. stephensi* contributed mosquito samples to a genomic surveillance study, where we identified the invasion source (South Asia), route (into Djibouti, seeding separate invasion fronts in Sudan, Ethiopia-Kenya, and Yemen), and architecture of insecticide resistance (mainly metabolic). You can learn more about our findings in our preprint [here](https://www.biorxiv.org/content/10.1101/2025.03.24.644828v1.full). This will be published shortly.\n",
"\n",
"The mosquitoi samples sequenced as part of the CEASE project form the basis of of the MalariaGEN Vector Observatory *Anopheles stephensi* Phase 1 Data Release, known as As1.0 for short. This will form the basis of future genomic surveillance work in this species. We hope that these data will prove a valuable source for the community for investigations into the biology, evolution and control of *An. stephensi* in the native and invasive range.\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "Mosquitoi"

Comment thread docs/as1/as1.ipynb
"\n",
"The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as [Google Colab](https://colab.research.google.com/). Further information about analysing these data in the cloud is provided in the [cloud data access guide](cloud).\n",
"\n",
"More information on accessing and downloading these data are available under `download` and `cloud`."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download and cloud could be hyperlinks here.

Comment thread docs/as1/as1.ipynb
"\n",
"If you have any questions about the data and how to use them, please do get in touch by [starting a new discussion](https://github.com/malariagen/vector-data/discussions/new) on the malariagen/vector-data repository on GitHub."
"We hope this page has provided a useful introduction to the `As1.0` data resource. If you would like to start working with these data, please visit the [cloud data access guide](cloud) or the [data download guide](download) or continue browsing the other documentation on this site.\n",
"\n"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was fine to have the part about starting a discussion here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just moved it

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Similarly - is there anything outstanding that you can see here before this gets merged?
TY both

Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tristanpwdennis tristanpwdennis merged commit 4c49447 into malariagen:master Apr 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants