After spending two days quickly going through introductory videos on single-cell analysis, I was excited to come up with two project plans and jump right in. But I hit a wall at the very first step: finding the data.
Yes, you guessed it — I got stuck at the data search stage. Compared to the mature TCGA database, single-cell data resources are still quite limited. When I searched GEO for single-cell data using "lung cancer" as a keyword, I got only about 20 results. That's far too few for any serious data mining. Data mining requires a large pool of datasets; after filtering those 20 studies, only a handful were suitable. Then came the next problem.
I finally found a decent dataset that met my requirements, but when I tried to download it, either the study used inDrop instead of 10X Genomics, or the data wasn't provided in the standard 10X three-file format (barcodes.tsv, features.tsv, matrix.mtx). After a full day of searching, I could barely find one downloadable, appropriate dataset.
On the bright side, this shows that single-cell technology is still cutting-edge with huge potential for growth.
Regardless, the most critical part of any single-cell analysis is finding and processing the data. This often involves dealing with various data formats, and my office computer with only 8GB of RAM is often completely inadequate. A server is essential.
For the rest of this week, I'll continue learning how to download and process data in different formats.
After all, data is king!
文章评论