[go: up one dir, main page]

Biostatistics: Difference between revisions

Content deleted Content added
ce
Line 125:
==== Scatter Plot ====
 
A [[scatter plot]] is a mathematical diagram that uses Cartesian coordinates to display values of a dataset. A scatter plot shows the data as a set of points, each one presenting the value of one variable determining the position on the horizontal axis and another variable on the vertical axis.<ref>{{Cite book|title=Seeing through statistics|last=Utts|first=Jessica M.|date=2005|publisher=Thomson, Brooks/Cole|isbn=978-0534394028|edition= 3rd|location=Belmont, CA|oclc=56568530}}</ref> They are also called '''scatter graph''', '''scatter chart''', '''scattergram''', or '''scatter diagram'''.<ref>{{Cite book|title=Basic statistics|last=B.Jarrell|first=Jarrell, Stephen B.|date=1994|publisher=Wm. C. Brown Pub|isbn=978-0697215956|location=Dubuque, Iowa|oclc=30301196}}</ref>
 
==== Mean ====
Line 229:
=== Bioinformatics advances in databases, data mining, and biological interpretation ===
 
The development of [[biological database]]s enables storage and management of biological data with the possibility of ensuring access for users around the world. They are useful for researchers depositing data, retrieve information and files (raw or processed) originated from other experiments or indexing scientific articles, as [[PubMed]]. Another possibility is search for the desired term (a gene, a protein, a disease, an organism, and so on) and check all results related to this search. There are databases dedicated to [[Single-nucleotide polymorphism|SNPs]] ([[dbSNP]]), the knowledge on genes characterization and their pathways ([[KEGG]]) and the description of gene function classifying it by cellular component, molecular function and biological process ([[Gene ontology|Gene Ontology]]).<ref name=":4">{{cite journal|doi=10.1002/jcp.21218|pmid=17654500|title=Bioinformatics|journal=Journal of Cellular Physiology|volume=213|issue=2|pages=365–9|year=2007|last1=Moore|first1=Jason H|s2cid=221831488}}</ref> In addition to databases that contain specific molecular information, there are others that are ample in the sense that they store information about an organism or group of organisms. As an example of a database directed towards just one organism, but that contains much data about it, is the ''[[Arabidopsis thaliana]]'' genetic and molecular database – TAIR.<ref>{{cite web|url=https://www.arabidopsis.org/|title=TAIR - Home Page|website=www.arabidopsis.org}}</ref> Phytozome,<ref>{{cite web|url=https://phytozome.jgi.doe.gov/pz/portal.html|title=Phytozome|website=phytozome.jgi.doe.gov}}</ref> in turn, stores the assemblies and annotation files of dozen of plant genomes, also containing visualization and analysis tools. Moreover, there is an interconnection between some databases in the information exchange/sharing and a major initiative was the [[International Nucleotide Sequence Database Collaboration]] (INSDC)<ref>{{cite web|url=http://www.insdc.org/|title=International Nucleotide Sequence Database Collaboration - INSDC|website=www.insdc.org}}</ref> which relates data from DDBJ,<ref>{{cite web|url=https://www.ddbj.nig.ac.jp/index-e.html|title=Top|website=www.ddbj.nig.ac.jp}}</ref> EMBL-EBI,<ref>{{cite web|url=https://www.ebi.ac.uk/|title=The European Bioinformatics Institute < EMBL-EBI|website=www.ebi.ac.uk}}</ref> and NCBI.<ref>{{cite web|url=https://www.ncbi.nlm.nih.gov/|title=National Center for Biotechnology Information|first1=National Center for Biotechnology|last1=Information|first2publisher=U. S. National Library of Medicine 8600 Rockville|last2=Pike|first3=Bethesda|last3=MD|first4=20894|last4=Usa|website=www.ncbi.nlm.nih.gov}}</ref>
 
Nowadays, increase in size and complexity of molecular datasets leads to use of powerful statistical methods provided by computer science algorithms which are developed by [[machine learning]] area. Therefore, data mining and machine learning allow detection of patterns in data with a complex structure, as biological ones, by using methods of [[Supervised learning|supervised]] and [[unsupervised learning]], regression, detection of [[Cluster analysis|clusters]] and [[Association rule learning|association rule mining]], among others.<ref name=":4"/> To indicate some of them, [[self-organizing map]]s and [[k-means clustering|''k''-means]] are examples of cluster algorithms; [[Artificial neural network|neural networks]] implementation and [[support vector machine]]s models are examples of common machine learning algorithms.
Line 295:
* Biostatistics<ref>{{cite web|url=https://academic.oup.com/biostatistics|title=Biostatistics - Oxford Academic|website=OUP Academic}}</ref>
* International Journal of Biostatistics<ref>{{Cite web|url=https://www.degruyter.com/view/j/ijb|title=The International Journal of Biostatistics}}</ref>
* Journal of Epidemiology and Biostatistics<ref>{{cite web|url=https://ncbiinsights.ncbi.nlm.nih.gov/2018/06/15/pubmed-journals-shut-down/|title=PubMed Journals will be shut down|first=NCBI|last=Staff|date=15 June 2018}}</ref>
* Biostatistics and Public Health<ref>https://ebph.it/ Epidemiology</ref>
* Biometrics<ref>{{cite web|url=https://onlinelibrary.wiley.com/journal/15410420|title=Biometrics|website=onlinelibrary.wiley.com|doi=10.1111/(ISSN)1541-0420}}</ref>