Building Meta Cell Groups¶
For incredibily large datasets where it's helpful to create meta cell groupings, AnnSQL conveniently is packaged with functionality to build groupings of cells by specfic observational aggregations. In this example we build groupings of all mean gene expression per group (cell).
Import Libraries¶
In [2]:
Copied!
import scanpy as sc
import pandas as pd
from AnnSQL import AnnSQL
import scanpy as sc
import pandas as pd
from AnnSQL import AnnSQL
Open a database¶
In [11]:
Copied!
asql = AnnSQL(db="db/splatter/data_10000.asql")
asql.query("DESCRIBE obs")
asql = AnnSQL(db="db/splatter/data_10000.asql")
asql.query("DESCRIBE obs")
Out[11]:
column_name | column_type | null | key | default | extra | |
---|---|---|---|---|---|---|
0 | cell_id | VARCHAR | YES | None | None | None |
1 | cell_type | BIGINT | YES | None | None | None |
Build the meta groupings based on the obs.cell_type grouping¶
In [13]:
Copied!
asql.build_meta_cells(primary_cluster="cell_type", aggregate_type="AVG", chunk_size=750, print_progress=True)
asql.build_meta_cells(primary_cluster="cell_type", aggregate_type="AVG", chunk_size=750, print_progress=True)
Processing chunk 1 of 750 Processing chunk 751 of 1500 Processing chunk 1501 of 2250 Processing chunk 2251 of 3000 Processing chunk 3001 of 3750 Processing chunk 3751 of 4500 Processing chunk 4501 of 5250 Processing chunk 5251 of 6000 Processing chunk 6001 of 6750 Processing chunk 6751 of 7500 Processing chunk 7501 of 8250 Processing chunk 8251 of 9000 Processing chunk 9001 of 9750 Processing chunk 9751 of 10500 meta_cells table created. You may now query the table for results.
Take a look at the meta_cell table¶
In this example, there are 5 cell types (1-5) and no secondary grouping (None). The remaining columns represent the average expression of all genes in each cell type grouping.
In [14]:
Copied!
asql.query("SELECT * FROM meta_cells")
asql.query("SELECT * FROM meta_cells")
Out[14]:
cell_type | None | cell_count | cell_id | gene_0 | gene_1 | gene_2 | gene_3 | gene_4 | gene_5 | ... | gene_9990 | gene_9991 | gene_9992 | gene_9993 | gene_9994 | gene_9995 | gene_9996 | gene_9997 | gene_9998 | gene_9999 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | None | 1961.0 | None | 0.037226 | 0.058134 | 0.095869 | 0.013259 | 0.001530 | 0.004080 | ... | 0.192249 | 0.111168 | 0.065783 | 0.088220 | 4.595614 | 0.142784 | 0.036206 | 0.876594 | 0.180010 | 0.018358 |
1 | 2 | None | 1980.0 | None | 0.048990 | 0.050000 | 0.089899 | 0.013131 | 0.000000 | 0.004545 | ... | 0.160101 | 0.068182 | 0.070707 | 0.104545 | 4.339394 | 0.122727 | 0.048485 | 0.816667 | 0.176263 | 0.028788 |
2 | 3 | None | 2046.0 | None | 0.040078 | 0.122678 | 5.152493 | 0.010753 | 0.000000 | 0.007820 | ... | 0.143206 | 0.082600 | 0.071848 | 0.102639 | 4.347996 | 0.160802 | 0.042522 | 0.844086 | 0.187195 | 0.021505 |
3 | 4 | None | 1973.0 | None | 0.047136 | 0.047643 | 0.059807 | 0.008109 | 0.003548 | 0.006082 | ... | 0.168272 | 0.075013 | 0.040547 | 0.082615 | 2.714141 | 0.045109 | 0.038013 | 0.573746 | 0.129752 | 0.017739 |
4 | 5 | None | 2040.0 | None | 0.038725 | 0.052941 | 0.077941 | 0.011765 | 0.000980 | 0.005882 | ... | 0.173039 | 0.085294 | 0.044608 | 0.076471 | 2.862255 | 0.018137 | 0.048039 | 0.582353 | 0.154902 | 0.014216 |
5 rows × 10004 columns