Finding and Downloading Single-Cell Data: A Beginner's Guide

2026年6月2日 1点热度 0人点赞 0条评论

Language: 🇨🇳 中文版 🇬🇧 English

After spending two days quickly going through introductory videos on single-cell analysis, I was excited to come up with two project plans and jump right in. But I hit a wall at the very first step: finding the data.

Yes, you guessed it — I got stuck at the data search stage. Compared to the mature TCGA database, single-cell data resources are still quite limited. When I searched GEO for single-cell data using "lung cancer" as a keyword, I got only about 20 results. That's far too few for any serious data mining. Data mining requires a large pool of datasets; after filtering those 20 studies, only a handful were suitable. Then came the next problem.

I finally found a decent dataset that met my requirements, but when I tried to download it, either the study used inDrop instead of 10X Genomics, or the data wasn't provided in the standard 10X three-file format (barcodes.tsv, features.tsv, matrix.mtx). After a full day of searching, I could barely find one downloadable, appropriate dataset.

On the bright side, this shows that single-cell technology is still cutting-edge with huge potential for growth.

Regardless, the most critical part of any single-cell analysis is finding and processing the data. This often involves dealing with various data formats, and my office computer with only 8GB of RAM is often completely inadequate. A server is essential.

For the rest of this week, I'll continue learning how to download and process data in different formats.

After all, data is king!