Basic Usage¶
Below, we give the the most streamlined example of how to use AnnSQL for a smaller datasets. This notebook shows how to use the in-memory querying capabilities of AnnSql. Depending on your system memory and the AnnData filesize, you may want to consider building local database using AnnSQL; however, if your dataset contains fewer than around 30k cells, then this approach should work well.
Install the AnnSQL package¶
pip install annsql
Import libraries¶
In [1]:
Copied!
from AnnSQL import AnnSQL
import scanpy as sc
from AnnSQL import AnnSQL
import scanpy as sc
Load the dataset¶
Here, we load the sample pbmc68k reduced dataset provided by Scanpy.
In [2]:
Copied!
adata = sc.datasets.pbmc68k_reduced()
print(adata)
adata = sc.datasets.pbmc68k_reduced()
print(adata)
AnnData object with n_obs × n_vars = 700 × 765 obs: 'bulk_labels', 'n_genes', 'percent_mito', 'n_counts', 'S_score', 'G2M_score', 'phase', 'louvain' var: 'n_counts', 'means', 'dispersions', 'dispersions_norm', 'highly_variable' uns: 'bulk_labels_colors', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups' obsm: 'X_pca', 'X_umap' varm: 'PCs' obsp: 'distances', 'connectivities'
Pass the adata object to the AnnSQL class¶
Below, we simply instantiate AnnSQL class with the adata object from above. Alternatively, you may pass a filepath to a h5ad file.
In [3]:
Copied!
asql = AnnSQL(adata=adata)
asql = AnnSQL(adata=adata)
Time to make var_names unique: 0.03985190391540527 Time to create X table structure: 0.004041910171508789 Time to insert X data: 0.07247447967529297
In [4]:
Copied!
asql.query("SELECT * FROM X LIMIT 5")
asql.query("SELECT * FROM X LIMIT 5")
Out[4]:
cell_id | HES4 | TNFRSF4 | SSU72 | PARK7 | RBP7 | SRM | MAD2L2 | AGTRAP | TNFRSF1B | ... | ATP5O | MRPS6 | TTC3 | U2AF1 | CSTB | SUMO3 | ITGB2 | S100B | PRMT2 | MT_ND3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AAAGCCTGGCTAAC-1 | -0.326 | -0.191 | -0.728 | -0.301 | 3.386 | -0.531 | 2.016 | 3.377 | 4.841 | ... | -0.146 | -0.532 | -0.341 | 0.303 | 1.404 | 4.294 | 0.519 | -0.21 | -0.636 | 4.011 |
1 | AAATTCGATGCACA-1 | 1.171 | -0.191 | 0.795 | -1.200 | -0.174 | -0.531 | 1.889 | -0.486 | -0.459 | ... | -1.136 | -0.532 | -0.341 | -0.905 | 2.849 | -0.585 | 1.172 | -0.21 | 2.630 | -0.490 |
2 | AACACGTGGTCTTT-1 | -0.326 | -0.191 | 0.483 | -1.200 | -0.174 | -0.531 | -0.451 | 0.971 | -0.459 | ... | -1.136 | 2.606 | -0.341 | -0.905 | -0.455 | -0.585 | 0.722 | -0.21 | 0.663 | -0.490 |
3 | AAGTGCACGTGCTA-1 | -0.326 | -0.191 | 1.134 | -0.157 | -0.174 | -0.531 | -0.451 | -0.486 | -0.459 | ... | 1.161 | -0.532 | -0.341 | -0.905 | -0.119 | -0.585 | 0.766 | -0.21 | -0.636 | -0.490 |
4 | ACACGAACGGAGTG-1 | -0.326 | -0.191 | -0.728 | -0.607 | -0.174 | -0.531 | -0.451 | 0.787 | -0.459 | ... | -1.136 | 0.839 | 1.679 | -0.108 | -0.534 | -0.585 | -0.007 | -0.21 | -0.636 | -0.490 |
5 rows × 766 columns
Filter Example¶
In [5]:
Copied!
asql.query("SELECT * FROM X WHERE HES4 > 4")
asql.query("SELECT * FROM X WHERE HES4 > 4")
Out[5]:
cell_id | HES4 | TNFRSF4 | SSU72 | PARK7 | RBP7 | SRM | MAD2L2 | AGTRAP | TNFRSF1B | ... | ATP5O | MRPS6 | TTC3 | U2AF1 | CSTB | SUMO3 | ITGB2 | S100B | PRMT2 | MT_ND3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GATCTACTTTATCC-3 | 4.430 | -0.191 | 0.885 | -0.297 | -0.174 | 0.995 | -0.451 | -0.486 | 6.641 | ... | -1.136 | -0.532 | -0.341 | -0.905 | 1.416 | 1.049 | 2.078 | -0.21 | -0.636 | -0.490 |
1 | GTAGCTGATTCCAT-4 | 6.504 | -0.191 | -0.728 | 0.356 | -0.174 | -0.531 | -0.451 | -0.486 | -0.459 | ... | -0.279 | -0.532 | -0.341 | 0.141 | 1.071 | -0.585 | -0.357 | -0.21 | 0.854 | -0.490 |
2 | ATGTTCACGTACGT-6 | 5.209 | -0.191 | 0.398 | 0.061 | 4.820 | -0.531 | 1.279 | -0.486 | -0.459 | ... | 0.253 | 0.926 | -0.341 | 1.638 | 1.244 | 0.556 | -0.484 | -0.21 | 0.571 | -0.490 |
3 | GCATTGGAATGCCA-6 | 4.047 | -0.191 | 2.238 | -1.200 | -0.174 | -0.531 | 1.828 | -0.486 | -0.459 | ... | -0.221 | -0.532 | -0.341 | 0.211 | 2.745 | 0.917 | 1.828 | -0.21 | -0.636 | -0.490 |
4 | ACGGAACTAGTCGT-7 | 4.703 | -0.191 | -0.728 | -0.484 | -0.174 | -0.531 | 1.514 | 1.053 | 2.357 | ... | -0.347 | -0.532 | -0.341 | -0.905 | 0.900 | -0.585 | 2.051 | -0.21 | -0.636 | -0.490 |
5 | ATGTTGCTGGGCAA-7 | 5.649 | -0.191 | 0.792 | -0.349 | -0.174 | -0.531 | -0.451 | -0.486 | 2.886 | ... | -1.136 | 1.436 | -0.341 | 2.526 | -0.296 | 0.954 | 1.167 | -0.21 | 2.623 | 1.641 |
6 | CAGAAGCTTAGTCG-7 | 6.444 | -0.191 | -0.728 | -1.200 | -0.174 | -0.531 | -0.451 | 1.171 | 1.057 | ... | 0.563 | -0.532 | -0.341 | 1.168 | -1.080 | 0.810 | 2.287 | -0.21 | -0.636 | 1.441 |
7 | TTCCTAGATCGATG-7 | 4.419 | -0.191 | 0.479 | -0.524 | -0.174 | -0.531 | -0.451 | -0.486 | 2.198 | ... | -1.136 | -0.532 | -0.341 | 0.003 | 0.165 | -0.585 | -0.445 | -0.21 | 0.658 | -0.490 |
8 rows × 766 columns
Filter by cell type¶
In [6]:
Copied!
asql.query("SELECT * FROM adata WHERE bulk_labels = 'Dendritic' AND percent_mito > 0.025")
asql.query("SELECT * FROM adata WHERE bulk_labels = 'Dendritic' AND percent_mito > 0.025")
Out[6]:
cell_id | bulk_labels | n_genes | percent_mito | n_counts | S_score | G2M_score | phase | louvain | cell_id_1 | ... | ATP5O | MRPS6 | TTC3 | U2AF1 | CSTB | SUMO3 | ITGB2 | S100B | PRMT2 | MT_ND3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AAATTCGATGCACA-1 | Dendritic | 1080 | 0.027458 | 2695.0 | 0.067026 | -0.889498 | S | 1 | AAATTCGATGCACA-1 | ... | -1.136 | -0.532 | -0.341 | -0.905 | 2.849 | -0.585 | 1.172 | -0.21 | 2.630 | -0.490 |
1 | ATGGACACTCGTTT-1 | Dendritic | 1198 | 0.028549 | 3783.0 | -0.155325 | -1.028952 | G1 | 2 | ATGGACACTCGTTT-1 | ... | -1.136 | -0.532 | 1.730 | 1.545 | 0.040 | -0.585 | 1.062 | -0.21 | -0.636 | -0.490 |
2 | CATCCCGATCTCAT-1 | Dendritic | 1105 | 0.025213 | 3054.0 | -0.091713 | -0.867561 | G1 | 5 | CATCCCGATCTCAT-1 | ... | -0.307 | -0.532 | -0.341 | -0.905 | -0.387 | -0.585 | 2.207 | -0.21 | -0.636 | -0.490 |
3 | AACCACGACGTTGA-3 | Dendritic | 1190 | 0.029216 | 3012.0 | -0.232771 | -0.777265 | G1 | 5 | AACCACGACGTTGA-3 | ... | -1.136 | 1.233 | -0.341 | -0.905 | 1.029 | -0.585 | 0.285 | -0.21 | 0.825 | -0.490 |
4 | AGGCAACTTGGTAC-3 | Dendritic | 1233 | 0.029189 | 3563.0 | -0.151107 | -0.800019 | G1 | 5 | AGGCAACTTGGTAC-3 | ... | 0.995 | 0.960 | 1.858 | -0.038 | 0.109 | -0.585 | -0.472 | -0.21 | -0.636 | -0.490 |
5 | AACTCACTGGGTGA-5 | Dendritic | 1194 | 0.027857 | 3841.0 | 0.159842 | -0.919129 | S | 2 | AACTCACTGGGTGA-5 | ... | 0.841 | -0.532 | 1.699 | -0.905 | 0.023 | -0.585 | -0.512 | -0.21 | -0.636 | -0.490 |
6 | CGCTACTGGACAGG-6 | Dendritic | 1241 | 0.029662 | 3405.0 | -0.187843 | -0.945558 | G1 | 2 | CGCTACTGGACAGG-6 | ... | 0.351 | 1.029 | -0.341 | 0.002 | 1.408 | 1.857 | 1.294 | -0.21 | -0.636 | -0.490 |
7 | TATACAGAGGTATC-6 | Dendritic | 1138 | 0.028225 | 3543.0 | -0.103357 | -1.097992 | G1 | 6 | TATACAGAGGTATC-6 | ... | 1.007 | 0.968 | -0.341 | 0.839 | -0.482 | -0.585 | -1.026 | -0.21 | -0.636 | -0.490 |
8 | TGTGAGACTTTACC-6 | Dendritic | 1165 | 0.027603 | 3659.0 | -0.194973 | -0.756680 | G1 | 6 | TGTGAGACTTTACC-6 | ... | -0.444 | 0.921 | -0.341 | -0.905 | -1.080 | -0.585 | -1.026 | -0.21 | -0.636 | 2.656 |
9 | CGGACTCTTGGATC-8 | Dendritic | 1029 | 0.029797 | 3054.0 | -0.181240 | -0.989204 | G1 | 2 | CGGACTCTTGGATC-8 | ... | 0.522 | 1.209 | -0.341 | 0.107 | -0.387 | 0.776 | -1.026 | -0.21 | -0.636 | -0.490 |
10 rows × 775 columns
Count total cells by cell type¶
In [7]:
Copied!
asql.query("SELECT bulk_labels, COUNT(*) as total FROM obs GROUP BY bulk_labels ORDER BY total DESC")
asql.query("SELECT bulk_labels, COUNT(*) as total FROM obs GROUP BY bulk_labels ORDER BY total DESC")
Out[7]:
bulk_labels | total | |
---|---|---|
0 | Dendritic | 240 |
1 | CD14+ Monocyte | 129 |
2 | CD19+ B | 95 |
3 | CD4+/CD25 T Reg | 68 |
4 | CD8+ Cytotoxic T | 54 |
5 | CD8+/CD45RA+ Naive Cytotoxic | 43 |
6 | CD56+ NK | 31 |
7 | CD4+/CD45RO+ Memory | 19 |
8 | CD34+ | 13 |
9 | CD4+/CD45RA+/CD25- Naive T | 8 |
Show all tables in the database¶
In [8]:
Copied!
asql.query("SHOW tables;")
asql.query("SHOW tables;")
Out[8]:
name | |
---|---|
0 | X |
1 | adata |
2 | obs |
3 | obsm_X_pca |
4 | obsm_X_umap |
5 | obsp_connectivities |
6 | obsp_distances |
7 | uns_raw |
8 | var |
9 | var_names |
10 | varm_PCs |