SNPs are among most represented mutations and responsible of a wide implication of phenotype variation, particularly due to structural alterations.
Lot of data that may infer structural SNPs impact is available online but remains scattered and flawed.
Aims
aggregate available data;
generate missing data;
attribute predictive structure impact values;
provide data-sets as input to an associated visualization project;
automatize most of the pipeline.
Method
Collect genomics and protein data
quality control, consistency and standardization
indicator generation for structural study (protocol GALT-DB)
Technically speaking
Scripting : Perl / Python / Bash
Data formats : FASTA / PDB / CSV / HTML / JSON /
Data gathering : SQL / API / Parsing / MySQL Workbench