First case: Using counts matrix as input for creating co-expression network

The input in this case should be a tab-separated matrix with observations as columns and variables as rows. In other words, in the rows should be the entities that will be the nodes of our network.

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Let’s import FAVA

Make sure you have loaded the latest version. If you are not sure visit here: https://pypi.org/project/favapy/

[2]:

# !pip install favapy==0.3.9.4
from favapy import fava

Our data is a single-cell RNAseq dataset.

[ ]:

data = pd.read_csv("data/Example_dataset_GSE75748_sc_cell_type_ec.tsv", sep="\t")

Our matrix looks like this:

[ ]:

data

Run FAVA

[ ]:

my_FAVAourite_network = fava.cook(
    data=data,
    log2_normalization=True,  # If your data are normalized set this to False
    hidden_layer=None,  # If None, it will be adjusted base on the input size.
    latent_dim=None,  # If None, it will be adjusted base on the size of the hidden layer.
    epochs=50,
    batch_size=32,
    PCC_cutoff=0.95,
)  # This is arbitraty. We need to plot the score distribution.
my_FAVAourite_network

Second case: Using anndata as input for creating co-expression network

If input is an AnnData object, it contains an expression matrix X, which stores n_obs observations (cells) of n_vars variables (genes).

Here we follow the steps as described in the scanpy tutorial here: https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html 

[ ]:

# !mkdir data
# !wget http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz
# !cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz

[3]:

import scanpy as sc

adata = sc.read_10x_mtx(
    "data/filtered_gene_bc_matrices/hg19/",  # the directory with the `.mtx` file
    var_names="gene_symbols",  # use gene symbols for the variable names (variables-axis index)
    cache=True,
)  # write a cache file for faster subsequent reading

[4]:

adata.var_names_make_unique()  # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`

# Basic filtering
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

adata.var["mt"] = adata.var_names.str.startswith(
    "MT-"
)  # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(
    adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)

# Filtering
adata = adata[adata.obs.n_genes_by_counts < 2500, :]
adata = adata[adata.obs.pct_counts_mt < 5, :]
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

sc.pp.regress_out(adata, ["total_counts", "pct_counts_mt"])


sc.tl.pca(adata, svd_solver="arpack")
sc.pl.pca(adata, color="CST3")

../_images/tutorials_How_to_use_favapy_in_a_notebook_15_0.png

If the data is in Compressed Sparse Row format so we have to “uncompress” –> adata.X = adata.X.A

[5]:

print(adata)
print(adata.X)

AnnData object with n_obs × n_vars = 2638 × 13714
    obs: 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt'
    var: 'gene_ids', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
    uns: 'log1p', 'pca'
    obsm: 'X_pca'
    varm: 'PCs'
[[-6.12200936e-03 -1.48024922e-03 -2.13638437e-03 ... -4.51365160e-03
  -6.40674978e-02 -3.70223857e-02]
 [-9.40247439e-03 -1.55478332e-03 -1.05254687e-02 ... -1.18392045e-02
  -1.09831728e-01 -6.34919181e-02]
 [-5.42712444e-03 -2.48085614e-03 -5.94359590e-03 ... -7.02461973e-03
  -8.56298953e-02 -5.57584986e-02]
 ...
 [-4.73570684e-03 -1.70380529e-03  7.48216098e-06 ... -2.43741483e-03
  -5.25713079e-02 -3.19451652e-02]
 [-3.90673382e-03 -1.65717863e-03  2.28009513e-03 ...  2.37765670e+00
  -4.01520319e-02 -2.45906953e-02]
 [-4.08808235e-03 -2.32962915e-03 -1.85557234e-03 ... -3.55155533e-03
  -6.32345751e-02 -4.20607105e-02]]

Run FAVA

[6]:

my_FAVAourite_network = fava.cook(
    data=adata,
    log2_normalization=True,  # If your data are normalized set this to False
    hidden_layer=None,  # If None, it will be adjusted base on the input size.
    latent_dim=None,  # If None, it will be adjusted base on the size of the hidden layer.
    epochs=10,
    batch_size=32,
    interaction_count=5000,  # How many interactions (A-B & B-A) you will have in the output file.
    PCC_cutoff=None,  # If this is specified, it overwrites the `interaction_count`.
)
my_FAVAourite_network

WARNING:root: Negative values are detected or log2_normalization was set to False, so log2 normalization is not applied.

Epoch 1/10
429/429 [==============================] - 18s 41ms/step - loss: 49.5736 - accuracy: 3.6459e-04 - val_loss: 32.5569 - val_accuracy: 1.4584e-04
Epoch 2/10
429/429 [==============================] - 17s 40ms/step - loss: 32.4826 - accuracy: 5.1043e-04 - val_loss: 32.3956 - val_accuracy: 0.0000e+00
Epoch 3/10
429/429 [==============================] - 17s 39ms/step - loss: 32.2335 - accuracy: 4.3751e-04 - val_loss: 32.1900 - val_accuracy: 5.8335e-04
Epoch 4/10
429/429 [==============================] - 17s 40ms/step - loss: 32.1407 - accuracy: 7.2918e-04 - val_loss: 32.1556 - val_accuracy: 6.5626e-04
Epoch 5/10
429/429 [==============================] - 17s 39ms/step - loss: 32.1056 - accuracy: 4.3751e-04 - val_loss: 32.0937 - val_accuracy: 2.9167e-04
Epoch 6/10
429/429 [==============================] - 17s 40ms/step - loss: 32.0705 - accuracy: 5.1043e-04 - val_loss: 32.0439 - val_accuracy: 0.0010
Epoch 7/10
429/429 [==============================] - 17s 40ms/step - loss: 32.0442 - accuracy: 8.7502e-04 - val_loss: 32.0099 - val_accuracy: 0.0022
Epoch 8/10
429/429 [==============================] - 17s 40ms/step - loss: 31.9951 - accuracy: 0.0015 - val_loss: 31.9918 - val_accuracy: 0.0012
Epoch 9/10
429/429 [==============================] - 17s 40ms/step - loss: 31.9585 - accuracy: 0.0017 - val_loss: 31.9184 - val_accuracy: 0.0020
Epoch 10/10
429/429 [==============================] - 17s 40ms/step - loss: 31.9184 - accuracy: 0.0027 - val_loss: 31.8498 - val_accuracy: 0.0045

WARNING:root: The number of interactions in the output file is 5000 in which both directions are included: proteinA - proteinB and proteinB - proteinA.

[6]:

	Protein_1	Protein_2	Score
12466933	S100A8	S100A9	0.975751
12439507	S100A9	S100A8	0.975751
178307593	NKG7	CST7	0.967761
162921607	CST7	NKG7	0.967761
178299409	NKG7	GZMA	0.964404
...	...	...	...
12476869	S100A8	GABARAP	0.724809
161700733	FKBP1A	TYROBP	0.724802
173809312	TYROBP	FKBP1A	0.724802
12470836	S100A8	RAB32	0.724787
65965249	RAB32	S100A8	0.724787

5000 rows × 3 columns

Now we have our network! Here we remove the AB-BA interactions (keeping only AB)

[7]:

my_FAVAourite_network_single = my_FAVAourite_network.iloc[::2, :]
my_FAVAourite_network_single

[7]:

	Protein_1	Protein_2	Score
12466933	S100A8	S100A9	0.975751
178307593	NKG7	CST7	0.967761
178299409	NKG7	GZMA	0.964404
59907122	LST1	AIF1	0.959885
177508056	FTL	FTH1	0.958008
...	...	...	...
14453904	FCGR2A	LILRB3	0.724868
61438016	CDKN1A	FPR1	0.724859
148701811	GABARAP	S100A8	0.724809
161700733	FKBP1A	TYROBP	0.724802
12470836	S100A8	RAB32	0.724787

2500 rows × 3 columns

Finding modules on the netwrok

[8]:

import igraph as ig
import leidenalg as la
import networkx as nx

# Create a graph from the pandas DataFrame using networkx
G_nx = nx.from_pandas_edgelist(
    my_FAVAourite_network_single,
    source="Protein_1",
    target="Protein_2",
    edge_attr="Score",
)

# Convert the networkx graph to igraph graph
G = ig.Graph(directed=False)
G.add_vertices(list(G_nx.nodes))
G.add_edges(list(G_nx.edges))

# Perform Leiden clustering
partition = la.find_partition(G, la.ModularityVertexPartition)

# Get the modules
modules = [frozenset(G.vs[v]["name"]) for v in partition]

# Print the modules
for module in modules:
    print(module)

print(f"----")
print(f"The number of modules is: {len(modules)}")

frozenset({'C19orf59', 'GCA', 'GSTP1', 'HAUS4', 'TNFSF13B', 'RBP7', 'SAT2', 'RILPL2', 'IL1B', 'GRN', 'RNASE2', 'RNF181', 'CDKN1A', 'LGALS2', 'NFKBIA', 'MARCO', 'CD4', 'IER3', 'S100A6', 'TKT', 'PSMB3', 'CATSPER1', 'RASSF4', 'TMEM176A', 'KLF4', 'CD36', 'LYZ', 'NDUFA11', 'CDA', 'AMICA1', 'VCAN', 'TNFAIP2', 'PFDN5', 'CSF2RA', 'S100A9', 'FIBP', 'CD14', 'NUP214', 'LGALS3', 'ID1', 'LRP1', 'LINC00936', 'CEBPD', 'CSF3R', 'DHRS4L2', 'AGTRAP', 'NRG1', 'EIF3K', 'BST1', 'CTSB', 'GABARAPL1', 'ASGR1', 'ARPC1B', 'VSTM1', 'MTMR11', 'MGST1', 'CD33', 'RAB3D', 'BLVRB', 'VIM', 'MNDA', 'IL8', 'CLEC12A', 'AP1S2', 'FCGR1A', 'MT-ND1', 'MS4A6A', 'NEAT1', 'TMEM205', 'RAB32', 'SLC25A6', 'PLBD1', 'CNPY3', 'CD302', 'EIF4A1', 'TSPO', 'S100A10', 'NDUFS7', 'FOLR3', 'S100A8', 'GPX1', 'TGFBI', 'AP2S1', 'GAPDH', 'GRINA', 'QPCT', 'NUDT16', 'FCN1', 'FPR1', 'TCEB2', 'LGALS1', 'RNF130', 'ANXA2', 'S100A12', 'TALDO1', 'TMEM176B', 'CD1D', 'CSTA', 'NCF2', 'CLEC4E', 'WDR83OS', 'ALDH2'})
frozenset({'PPIA', 'KLRF1', 'CAPZB', 'CLIC3', 'GZMH', 'RARRES3', 'RAC2', 'BATF', 'PLEKHF1', 'C5orf56', 'AOAH', 'CD99', 'DHRS7', 'HLA-B', 'C1orf21', 'RHOC', 'HSPA5', 'LITAF', 'FASLG', 'FCGR3A', 'HLA-A', 'CD300A', 'KLRD1', 'HCST', 'FGFBP2', 'GZMM', 'MATK', 'GPR56', 'C12orf75', 'CCL5', 'GZMB', 'APMAP', 'HOPX', 'TTC38', 'PTGDR', 'CCL4', 'PSMB9', 'ITGB2', 'CTSW', 'KLRC1', 'PLEK', 'S1PR5', 'FCRL6', 'ID2', 'KIR3DL2', 'PSMA2-1', 'IGFBP7', 'B2M', 'SPON2', 'TBX21', 'CTSC', 'IFITM2', 'CD7', 'PRSS23', 'CD244', 'XCL1', 'XCL2', 'TXNIP', 'PDCD6', 'GZMA', 'IFNG', 'GNLY', 'PLAC8', 'IFITM1', 'PPP1R18', 'HAVCR2', 'HLA-C', 'NKG7', 'ABI3', 'UBB', 'CD247', 'SH2D1B', 'AKR1C3', 'C9orf142', 'ZAP70', 'IL2RB', 'MYL12B', 'APOBEC3G', 'ACTG1', 'RP11-81H14.2', 'ARPC4', 'CST7', 'HLA-E', 'PRF1'})
frozenset({'ARPC3', 'FGR', 'RHOG', 'TBXAS1', 'TYROBP', 'ARRB2', 'CD68', 'NINJ1', 'FTL', 'ANXA5', 'OAZ1', 'LILRB3', 'APOBEC3A', 'CARD16', 'LRRC25', 'PYCARD', 'C1orf162', 'CSTB', 'S100A11', 'SH3BGRL3', 'NPC2', 'CFP', 'GABARAP', 'IFI30', 'PSAP', 'CTSZ', 'CYBA', 'APOBEC3B', 'PILRA', 'FTH1', 'CD300LF', 'OSCAR', 'LFNG', 'H3F3A', 'SERPINA1', 'VAMP5', 'LILRA5', 'COTL1', 'FCER1G', 'BLVRA', 'FKBP1A', 'RGS2', 'HCK', 'GLRX', 'AIF1', 'CTSS', 'FCGR2A', 'RAC1', 'CD300E', 'OAZ2', 'STX11', 'SLC7A7', 'PRELID1', 'TIMP1', 'LY6E', 'CEBPB', 'IFITM3', 'SAT1', 'C5AR1', 'SPI1', 'CST3', 'NOP10', 'MAFB', 'FCGRT', 'IFNGR2', 'FGL2', 'CFD', 'TYMP', 'LST1'})
frozenset({'MS4A7', 'CKB', 'ZNF703', 'LILRB2', 'SLC2A6', 'AP2A1', 'CDH23', 'HMOX1', 'HES4', 'LILRB1', 'TNFRSF1B', 'EMR2', 'FAM49A', 'CTSL', 'HOTAIRM1', 'LILRA3', 'BATF3', 'RNF144B', 'STXBP2', 'TCF7L2', 'CTD-2006K23.1', 'CD86', 'CDKN1C', 'RP11-362F19.1', 'TESC', 'TPPP3', 'RP11-290F20.3', 'SLC31A2', 'MS4A4A'})
frozenset({'HLA-DRB1', 'CD40', 'HLA-DPA1', 'VPREB3', 'CD74', 'FCER2', 'HLA-DQA1', 'CD79B', 'TCL1A', 'LINC00926', 'HLA-DRA', 'MS4A1', 'HLA-DRB5', 'HLA-DMA', 'BANK1', 'HLA-DQA2', 'HLA-DPB1', 'HLA-DQB1', 'CD79A'})
frozenset({'S100A4', 'BST2', 'CLIC1', 'RHOA', 'ARPC2', 'GLIPR2', 'MT-CO1', 'EFHD2', 'MYL6', 'ENO1', 'PFN1', 'CD63', 'SRGN', 'ACTB', 'EMP3', 'CFL1', 'PGAM1'})
frozenset({'RRM2', 'GINS2', 'EFCAB11', 'KIAA1524', 'MCM7', 'RMI2', 'KIAA0101', 'APOBEC3H', 'PXMP2', 'TK1', 'MCM2', 'TYMS'})
frozenset({'TUBB1', 'PPBP', 'CLU', 'ITGA2B', 'GNG11', 'SDPR', 'GP9', 'AP001189.4', 'SPARC', 'PF4'})
frozenset({'CD3D', 'CD3E', 'GZMK', 'IL32', 'CD2', 'DUSP2'})
frozenset({'CD8B', 'MALAT1', 'KLRG1', 'CD8A'})
frozenset({'ZFP36L2', 'BTG1', 'JUN'})
frozenset({'CXCR2', 'STRN', 'FAM161B'})
frozenset({'SF3B4', 'FAM71D'})
frozenset({'GMFG', 'HN1'})
frozenset({'FRG1B', 'AL592183.1'})
frozenset({'IFI6', 'ISG15'})
frozenset({'SH3BGRL', 'LYN'})
frozenset({'CAMK1', 'CSF1R'})
----
The number of modules is: 18

[9]:

# Draw the graph
plt.figure(figsize=(10, 8))
pos = nx.spring_layout(G_nx)
nx.draw_networkx_nodes(G_nx, pos, node_size=200, node_color="lightgray")
nx.draw_networkx_edges(G_nx, pos, alpha=0.5)
for i, module in enumerate(modules):
    nx.draw_networkx_nodes(
        G_nx, pos, nodelist=list(module), node_size=200, node_color=f"C{i}"
    )
plt.axis("off")
plt.show()

../_images/tutorials_How_to_use_favapy_in_a_notebook_24_0.png

Diffrently from the network API method, which retrieves only the interactions between the set of input proteins and between their closest interaction neighborhood (if add_nodes parameters is specified), interaction_partners API method provides the interactions between your set of proteins and all the other STRING proteins.

[10]:

import json
import textwrap

import matplotlib as mpl
import requests
import seaborn as sns


class ModulePlotter:
    def __init__(self, modules):
        self.modules = modules
        self.num_plots_per_row = 2
        self.request_url = "/".join(
            ["https://version-11-5.string-db.org/api", "json", "enrichment"]
        )

    def fetch_GO_list(self, module, GO_category):
        GO_list = []

        my_genes = list(module)
        params = {
            "identifiers": "%0d".join(my_genes),  # your protein
            "species": 9606,  # species NCBI identifier
            "caller_identity": "www.awesome_app.org",  # your app name
        }

        response = requests.post(self.request_url, data=params)

        data = json.loads(response.text)

        for row in data:
            term = row["term"]
            preferred_names = ",".join(row["preferredNames"])
            fdr = float(row["fdr"])
            description = row["description"]
            category = row["category"]
            if category == GO_category and fdr < 0.01:
                GO_list.append(row)

        return GO_list

    def create_plots(self, GO_category):
        num_modules = len(self.modules)
        num_cols = self.num_plots_per_row
        num_rows = (num_modules + num_cols - 1) // num_cols

        fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 12 * num_rows))

        plot_idx = 0

        for module_idx, module in enumerate(self.modules):
            GO_list = self.fetch_GO_list(module, GO_category)
            df = pd.DataFrame(GO_list)

            if not df.empty and "p_value" in df.columns:
                df = df.sort_values(by="fdr")
                df = df.head(20)

            if not df.empty and "p_value" in df.columns:
                row_idx = plot_idx // num_cols
                col_idx = plot_idx % num_cols

                if num_rows == 1:  # Handle 1-dimensional axes array when num_rows is 1
                    ax = axes[col_idx]
                else:
                    ax = axes[row_idx, col_idx]

                cmap = mpl.cm.bwr_r
                norm = mpl.colors.Normalize(
                    vmin=df["p_value"].min(), vmax=df["p_value"].max()
                )
                mapper = mpl.cm.ScalarMappable(norm=norm, cmap=cmap)

                sns.barplot(
                    data=df,
                    x="number_of_genes",
                    y="description",
                    palette=mapper.to_rgba(df["p_value"]),
                    ax=ax,
                )
                ax.set_yticklabels([textwrap.fill(e, 22) for e in df["description"]])
                ax.set_title(f"Module {module_idx + 1}")
                ax.tick_params(axis="both", labelsize=8)

                plot_idx += 1

        for i in range(plot_idx, num_rows * num_cols):
            row_idx = i // num_cols
            col_idx = i % num_cols

            if num_rows == 1:  # Handle 1-dimensional axes array when num_rows is 1
                ax = axes[col_idx]
            else:
                ax = axes[row_idx, col_idx]

            ax.axis("off")

        plt.tight_layout()
        fig.subplots_adjust(
            right=1, hspace=0.2, wspace=0.5
        )  # Adjust the spacing between subplots

        # Find the last non-empty subplot
        last_plot_idx = plot_idx - 1

        if last_plot_idx >= 0:
            last_row_idx = last_plot_idx // num_cols
            last_col_idx = last_plot_idx % num_cols

            if num_rows == 1:  # Handle 1-dimensional axes array when num_rows is 1
                last_ax = axes[last_col_idx]
            else:
                last_ax = axes[last_row_idx, last_col_idx]

            # Remove empty subplots
            for i in range(last_plot_idx + 1, num_rows * num_cols):
                row_idx = i // num_cols
                col_idx = i % num_cols

                if num_rows == 1:  # Handle 1-dimensional axes array when num_rows is 1
                    ax = axes[col_idx]
                else:
                    ax = axes[row_idx, col_idx]

                fig.delaxes(ax)

            # Adjust the figure size based on the number of non-empty subplots
            fig.set_size_inches(12, 12 * (last_row_idx + 3))

        cax = fig.add_axes([1.1, 0.9, 0.05, 0.08])
        cbl = mpl.colorbar.ColorbarBase(
            cax, cmap=cmap, norm=norm, orientation="vertical"
        )

        plt.show()

Category options:

Process, COMPARTMENTS, TISSUES, DISEASES, Keyword, KEGG, SMART, InterPro, Pfam, PMID, Function, Component, RCTM, WikiPathways, HPO, NetworkNeighborAL

[11]:

# Usage example:
selected_module = list(
    modules
)  # [0:2]  # Change the index [0] to the desired module's index

plotter = ModulePlotter(selected_module)

print("\033[1m\033[4mGO Process:\033[0m")
plotter.create_plots("Process")
print("\033[1m\033[4mGO Function:\033[0m")
plotter.create_plots("Function")
print("\033[1m\033[4mDiseases:\033[0m")
plotter.create_plots("DISEASES")
print("\033[1m\033[4mKEGG pathways:\033[0m")
plotter.create_plots("KEGG")

GO Process:

../_images/tutorials_How_to_use_favapy_in_a_notebook_29_1.png

GO Function:

../_images/tutorials_How_to_use_favapy_in_a_notebook_29_3.png

Diseases:

../_images/tutorials_How_to_use_favapy_in_a_notebook_29_5.png

KEGG pathways:

../_images/tutorials_How_to_use_favapy_in_a_notebook_29_7.png

[12]:

import io
from time import sleep

string_api_url = "https://version-11-5.string-db.org/api"
output_format = "image"
method = "network"

selected_modules = [
    modules[1],
    modules[4],
    modules[5],
]  # Modify the indices as needed, e.g. list(modules) for all.

for module_index, module_genes in enumerate(selected_modules):
    my_genes = module_genes
    gene_identifiers = "%0d".join(my_genes)

    ##
    ## Construct URL
    ##

    request_url = "/".join([string_api_url, output_format, method])

    ## For each gene, call STRING
    num_genes = len(my_genes)

    params = {
        "identifiers": gene_identifiers,
        "species": 9606,  # species NCBI identifier
        "add_white_nodes": 15
        - num_genes,  # comment out for no proteins outside your query
        "network_flavor": "evidence",  # "confidence" to show confidence links
        "show_query_node_labels": 1,  # 0 to hide labels
        "caller_identity": "www.awesome_app.org",
    }

    ##
    ## Call STRING
    ##

    response = requests.post(request_url, data=params)

    if response.ok:
        ##
        ## Plot the network
        ##

        plt.figure(figsize=(12, 8), dpi=300)  # Adjust figsize and dpi values as needed
        image_stream = io.BytesIO(response.content)
        plt.imshow(plt.imread(image_stream), aspect="auto")
        module_name = f"Module_{module_index}"
        plt.title("Interaction network for %s" % module_name)
        plt.axis("off")
        plt.show()

    else:
        print("Error retrieving network for module: %s" % module_index)
        print(response.content)

    # Generate the filename based on the module index and name
    module_name = f"Module_{module_index}"
    file_name = f"{module_name}_network.png"
    print("Saving interaction network to %s" % file_name)

    with open(file_name, "wb") as fh:
        fh.write(response.content)

    sleep(1)

../_images/tutorials_How_to_use_favapy_in_a_notebook_31_0.png

Saving interaction network to Module_0_network.png

../_images/tutorials_How_to_use_favapy_in_a_notebook_31_2.png

Saving interaction network to Module_1_network.png

../_images/tutorials_How_to_use_favapy_in_a_notebook_31_4.png

Saving interaction network to Module_2_network.png

First case: Using counts matrix as input for creating co-expression network

The input in this case should be a tab-separated matrix with observations as columns and variables as rows. In other words, in the rows should be the entities that will be the nodes of our network.

Let’s import FAVA

Make sure you have loaded the latest version. If you are not sure visit here: https://pypi.org/project/favapy/

Our data is a single-cell RNAseq dataset.

Our matrix looks like this:

Run FAVA

Second case: Using anndata as input for creating co-expression network

If input is an AnnData object, it contains an expression matrix X, which stores n_obs observations (cells) of n_vars variables (genes).

Here we follow the steps as described in the scanpy tutorial here: https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html

If the data is in Compressed Sparse Row format so we have to “uncompress” –> adata.X = adata.X.A

Run FAVA

Now we have our network! Here we remove the AB-BA interactions (keeping only AB)

Finding modules on the netwrok

Category options:

Here we follow the steps as described in the scanpy tutorial here: https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html 