Build an In-memory Database¶
Using AnnSQL, there are two types of databases you can build. The first is a simple in-memory database for smaller datasets which we demonstrate how to build in this notebook. Building an in-memory AnnSQL allows you to quickly build a database and perform queries. The database is NOT persistent and will be cleared from memory, once the python process has finished executing.
Install the AnnSQL package¶
pip install annsql
Import Libraries¶
from AnnSQL import AnnSQL
from AnnSQL.MakeDb import MakeDb
import scanpy as sc
import os
Load the dataset¶
Here, we load the sample pbmc3k raw dataset provided by Scanpy. Note: For very large datasets, it is necessary to open a dataset using the AnnData backed mode. Backed mode is fully supported. If opening in backed mode, the database will build in chunks. Depending on the size of your dataset and your compute source, this process may take time.
adata = sc.datasets.pbmc3k_processed()
print(adata)
AnnData object with n_obs × n_vars = 2638 × 1838 obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain' var: 'n_cells' uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups' obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr' varm: 'PCs' obsp: 'distances', 'connectivities'
Build and open the Database¶
Below we instantiate the AnnSQL class with the db parameter pointing to our newly created database. By default the database files contain the .asql
extension.
asql = AnnSQL(adata)
asql.show_tables()
table_name | |
---|---|
0 | obs |
1 | obsm_X_draw_graph_fr |
2 | obsm_X_pca |
3 | obsm_X_tsne |
4 | obsm_X_umap |
5 | obsp_connectivities |
6 | obsp_distances |
7 | uns_raw |
8 | var |
9 | varm_PCs |
10 | var_names |
11 | X |
12 | adata |
Query the Database¶
asql.query("SELECT * FROM X LIMIT 5")
cell_id | TNFRSF4 | CPSF3L | ATAD3C | C1orf86 | RER1 | TNFRSF25 | TNFRSF9 | CTNNBIP1 | SRM | ... | DSCR3 | BRWD1 | BACE2 | SIK1 | C21orf33 | ICOSLG | SUMO3 | SLC19A1 | S100B | PRMT2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AAACATACAACCAC-1 | -0.171470 | -0.280812 | -0.046677 | -0.475169 | -0.544024 | 4.928495 | -0.038028 | -0.280573 | -0.341788 | ... | -0.226570 | -0.236269 | -0.102943 | -0.222116 | -0.312401 | -0.121678 | -0.521229 | -0.098269 | -0.209095 | -0.531203 |
1 | AAACATTGAGCTAC-1 | -0.214582 | -0.372653 | -0.054804 | -0.683391 | 0.633951 | -0.334837 | -0.045589 | -0.498264 | -0.541914 | ... | -0.317530 | 2.568866 | 0.007155 | -0.445372 | 1.629285 | -0.058662 | -0.857164 | -0.266844 | -0.313146 | -0.596654 |
2 | AAACATTGATCAGC-1 | -0.376887 | -0.295084 | -0.057528 | -0.520972 | 1.332647 | -0.309362 | -0.103108 | -0.272526 | -0.500798 | ... | -0.302938 | -0.239801 | -0.071774 | -0.297857 | -0.410920 | -0.070431 | -0.590721 | -0.158656 | -0.170876 | 1.379000 |
3 | AAACCGTGCTTCCG-1 | -0.285241 | -0.281735 | -0.052227 | -0.484929 | 1.572679 | -0.271825 | -0.074552 | -0.258876 | -0.416752 | ... | -0.262978 | -0.231807 | -0.093818 | -0.247770 | 2.552078 | -0.097402 | 1.631685 | -0.119462 | -0.179120 | -0.505670 |
4 | AAACCGTGTATGCG-1 | -0.256483 | -0.220394 | -0.046800 | -0.345859 | -0.333409 | -0.208122 | -0.069514 | 5.806442 | -0.283112 | ... | -0.202237 | -0.176765 | -0.167350 | -0.098665 | -0.275836 | -0.139482 | -0.310096 | -0.006877 | -0.109614 | -0.461946 |
5 rows × 1839 columns