Various useful scripts

This document gathers various minor scripts/regexps we used to clean refs and typos

Fixing refs

The correct citation style is:

déclaration: {#clef}
citation: \@ref(clef)

Furthermorht, it seems that several characters are forbidden in the clef (#, _, :, …). Our first attempts with refs were quite unsuccessful so we went for a rather free style and fixed them afterward.

Invalid reference syntax detection

Here is simple regexp trying to detect wrongly formated expressions:

grep -e '[^a-zA-Z ]ref' *.md | grep -v -e '\@ref([^)]*)'

18_zbis_biblio.md:<div id="refs"></div>
Conventions_ecriture.md: - relecteur, rapporteur, *referee* ou *reviewer* : essaie-t-on de s'en tenir au même terme tout du long ou n'est-ce pas utile ?
README.md:- [Les références d'un chapitre à l'autre](https://bookdown.org/yihui/bookdown/cross-references.html)
index.md:bibliography: [refs.bib]

Semi-automatic fix

The following sequence of sed commands was used to automatically transform most refs with an incorrect syntax into good ones. We keep it here only for the record as there should not bee too much need for it now.

git stash
# grep 'ref(' *.md
sed -i 's|ref{#\([^}]*\)}|ref(\1)|g' *.md  
sed -i 's|ref{@\([^}]*\)}|ref(\1)|g' *.md  
sed -i 's|ref(#\([^}]*\))|ref(\1)|g' *.md  
sed -i 's|@\\ref(|\\@ref(|g' *.md  
sed -i 's|\\ref(|\\@ref(|g' *.md  
sed -i 's|ref{\([^}]*\)}|ref(\1)|g' *.md  
grep 'ref' *.md | grep -v '@ref('

Saved working directory and index state WIP on master: 9252acc Update 12_format_des_donnees.md
04_questions_cauchemars_recurrents.md:l'informaticienne du labo l'a déjà reformaté pour le
12_format_des_donnees.md:Bref, même si une table, une hiérarchie ou un texte libre peut
13.5_partage_et_archivage.md:adaptée (chapitre @\ref(C:licences)), un format adapté (chapitre
13.5_partage_et_archivage.md:@\ref(C:data:format)), d'élaborer une documentation extensive levant les
13.5_partage_et_archivage.md:ambiguïtés d'interprétation (chapitre @\ref(C:code:good)), et enfin de choisir
13.5_partage_et_archivage.md:@\ref(C:versioning) des solutions pour le suivi de version (GitLab, GitHub,
14_apprendre_a_programmer.md:Notez que nous parlons de " langages " au pluriel pour refléter leur diversité.
15_rendre_son_code_comprehensible.md:> FIXME diagramme à refaire en plus joli et à améliorer *ECG*
15_rendre_son_code_comprehensible.md:œuvre très simplement en refusant par exemple l'usage des " arguments par
20_Conventions_ecriture.md: - relecteur, rapporteur, *referee* ou *reviewer* : essaie-t-on de s'en tenir au même terme tout du long ou n'est-ce pas utile ?
README.md:- [Les références d'un chapitre à l'autre](https://bookdown.org/yihui/bookdown/cross-references.html)
index.md:bibliography: [refs.bib]

Key syntax

Now, this script changes the key convention to replace : by - as this bookdown does not seem to like keys with :.

LABELS="A:preambule 
     A:introduction 
     A:personas 
     A:cauchermars 
     B:data:acquisition 
     B:data:input 
     B:code:aspect 
     B:data:output 
     C:intro 
     C:data:acquisition 
     C:data:format 
     C:data:share 
     C:versioning 
     C:code:learn 
     C:code:good 
     C:code:bugs 
     C:code:env 
     C:licences 
     D:appendix 
     D:convention";

#grep "C:code:learn" *.md;
for l in $LABELS ; do 
    nl=`echo $l | sed 's|:|-|g'`; 
    sed -i "s|$l|$nl|g" *.md;
done
#grep "C:code:learn" *.md;

Checking for invalid keys

The goal of this script is to list keys whose syntax seems spurious. The first script can safely be ignored and was just a shorthand used to create the second one.

LABELS="A:preambule 
     A:introduction 
     A:personas 
     A:cauchermars 
     B:data:acquisition 
     B:data:input 
     B:code:aspect 
     B:data:output 
     C:intro 
     C:data:acquisition 
     C:data:format 
     C:data:share 
     C:versioning 
     C:code:learn 
     C:code:good 
     C:code:bugs 
     C:code:env 
     C:licences 
     D:appendix 
     D:convention";

echo $LABELS | sed -e 's| | -e |g'  -e 's|:|-|g'

A-preambule -e A-introduction -e A-personas -e A-cauchermars -e B-data-acquisition -e B-data-input -e B-code-aspect -e B-data-output -e C-intro -e C-data-acquisition -e C-data-format -e C-data-share -e C-versioning -e C-code-learn -e C-code-good -e C-code-bugs -e C-code-env -e C-licences -e D-appendix -e D-convention

I assume here that references are correctly formated.xs

grep -e @ref *.md | grep -v -e A-preambule -e A-introduction -e A-personas -e A-cauchermars -e B-data-acquisition -e B-data-input -e B-code-aspect -e B-data-output -e C-intro -e C-data-acquisition -e C-data-format -e C-data-share -e C-versioning -e C-code-learn -e C-code-good -e C-code-bugs -e C-code-env -e C-licences -e D-appendix -e D-convention

README.md:* citation:    `\@ref(clef)`

Looks good.

Citations

Fixing keys

My bad. Most references to keys have been entered with the [key] syntax instead of the [@key] syntax. Use these regexps carefully…

grep -e '\[[^@][^]]*\][^(]' *.md
echo "    "
grep -e '\[[^@][^]]*\]$' *.md
echo "    "
  grep -e '\[[^@][^]]*\] (' *.md
echo "    "

06_aspects_computationnels.md:<!-- fonctionner correctement. [Oui] [Non] [Annuler]`, `segmentation fault -->
10_format_des_donnees.md:que la plus petite valeur propre de la matrice garde deux chiffres significatifs). [*SG*] 
13_apprendre_a_programmer.md:table_tri <- table[order(table$age), ] # Réordonne toutes les lignes du tableau
13_apprendre_a_programmer.md:mean(table_tri$IMC[1:(nrow(table_tri)/2)])
14_rendre_son_code_comprehensible.md:le chapitre 8 "Des problèmes de calculs" [bien mettre le renvoi vers ce chap.8]) 
TODO.md:- [ ] Export vers bookdown
TODO.md:- [ ] Petit crayon pour éditer vers github  
TODO.md:- [ ] Makefile avec commande qui va bien
TODO.md:- [ ] Espaces insécables devant les ?, les :, et les !
TODO.md:- [X] Sauts de lignes avant les listes
TODO.md:- [ ] Améliorer un minimum le style
TODO.md:- [ ] Déplacer la bibliographie avant les annexes
TODO.md:- [ ] Revoir la numérotation des annexes
TODO.md:- [ ] Déplacer les fichiers dans un répertoire
TODO.md:- [ ] Renuméroter les fichiers
TODO.md:- [ ] Effacer les images inutiles
    
14_rendre_son_code_comprehensible.md:turtle = 3.2   # This corresponds to the speed at which the ninja is moving [m.s-1]

git checkout *.md
# sed -i 's|\[\([^@][^]]*\)\]\([^(]\)|[@\1]\2|g' *.md
sed -i 's|\[\([^@][^]]*\)\]$|[@\1]|g' *.md

Fixing bibtex (Recent bibtex export)

First, let’s fix the fun-mooc URL plus various cosmetics…

sed -i 's|url = {//www.fun-mooc.fr/|url = {https://www.fun-mooc.fr/|g' refs.bib
sed -i 's|« |«~|g' refs.bib
sed -i 's| » |~»|g' refs.bib
mv refs.bib orefs.bib
perl bib-fix.pl < orefs.bib > refs.bib
rm orefs.bib

use strict;

# open(INPUT,"refs.bib");

my($bib_head, $bib_body, $bib_isbn, $bib_url, $bib_year, $bib_urldate);

sub reset_bib {
    $bib_body = $bib_head = $bib_url = $bib_isbn = $bib_year = $bib_urldate = "";
}

reset_bib();

while(defined(my $line=<>)) {
    chomp($line);
    if($line =~ /^@/){
	$line =~ s/,\s*//g;
	$bib_head = $line;
	$bib_body = $bib_head;
	next;
    } 
    if($bib_head eq "") { next; }

    if($line =~ /^\s*year = \s*{(.*)}/) {
	$bib_year = $1;
	next;
    }
    if($line =~ /^\s*urldate = \s*{(.*)}/) {
	$bib_urldate = $1;
	$bib_urldate =~ s/-.*//g;
	next;
    }

    if($line =~/^}$/) {
	print $bib_body;

	my $bib_suffix = "";
	
	#  note (URLS + ISBN)
	my $bib_note = $bib_url;
	if($bib_isbn ne "") {
	    if($bib_note ne "") {
		$bib_note .= ". ".$bib_isbn;
	    } else {
		$bib_note = $bib_isbn;
	    }
	}
	if($bib_note ne "") {$bib_suffix = "    note = {$bib_note},\n"; }
	#  year
	if($bib_year eq "") { $bib_year = $bib_urldate; }
	if($bib_year ne "") { $bib_suffix .= "    year = {$bib_year}\n"; }

	if($bib_suffix ne "") {
	    print ",\n".$bib_suffix;
	}
	
	print "}\n";
	reset_bib();
	next;
    }
    
    if($line =~ /^\s*note = /) {
	if(!(($line =~ /URL~/) or ($line =~ /ISBN~/))) {
	    next;
	}
    }    	
    if($line =~ /^\s*url = \s*{(.*)}/) {
	$bib_url = $1;
	$bib_url = "URL~: \\url{$bib_url}";
#	next;
    }
    if($line =~ /^\s*isbn = \s*{(.*)}/) {
	$bib_isbn = $1;
	$bib_isbn = "ISBN~: $bib_isbn";
#	next;
    }

    $line =~ s/\s*,\s*$//g;
    $bib_body .= ",\n".$line;
}

Fixing bibtex only though sed (deprecated)

Then make sure urls are visible.

sed -i 's|url\s*=\s*{\(.*\)}|note = {URL:~\\url{\1}}|g' refs.bib

Make sure dates are visible.

sed -i 's|urldate = {\([0-9]*\)}| year = {\1}|g' refs.bib
sed -i 's|urldate = {\([0-9]*\)-.*}|year = {\1}|g' refs.bib

Make sure ISBN is visible.

sed -i 's|isbn\s*=\s*{\(.*\)}|note = {ISBN:~\\textsf{\1}}|g' refs.bib

Fixing bibtex OLD (betterbibtex)

grep ' date = ' refs.bib

grep howpublish refs.bib

sed -i 's|howpublished = {//www.fun-mooc.fr/|howpublished = {https://www.fun-mooc.fr/|g' refs.bib
sed -i 's|howpublished = {\(.*\)}|howpublished = {\\url{\1}}|g' refs.bib

for i in `bibtex booksprintrr | grep "empty year" | sed 's/.* in //'`; do
    sed -i "s/$i,/$i,\n  year = {2018},/g" refs.bib;
done

Spell checking

Plateforme

grep -e plateforme *.md

Space before commas, question marks, etc.

Here is the regexp to detext possibly invalid syntax:

grep -e '[^ ][:?!]' *.md | grep -v -e 'http:' -e 'https:'

02_RR_kezako.md:du monde de la recherche *SG* [Baker M. 1,500 scientists lift the lid on reproducibility. Nature 2016,533:452-454.]. Le sujet est ancien, mais la situation semble avoir atteint un point critique. Des études ont par exemple démontré qu'il n'était pas possible d'obtenir de nouveau les résultats d'études pré-cliniques ou cliniques *SG*[Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature 2012,483:531-533.] [Perrin S. Preclinical research: Make mouse studies work. Nature 2014,507:423-425.] Si la reproductibilité des résultats ne peut être considérée comme seul critère de la scientificité d'une recherche, cette crise suscite des interrogations au sein même de la communauté scientifique.
02_RR_kezako.md:- aller à la "pêche" aux résultats significatifs parmi tous les tests statistiques réalisés ("p-hacking") [Nuzzo R. Scientific method: statistical errors. Nature 2014,506:150-152.], 
02_RR_kezako.md:- générer une hypothèse de recherche *a posteriori*, c’est-à-dire après avoir obtenu un résultat significatif (« harking ») [Kerr NL. HARKing: hypothesizing after the results are known. Pers Soc Psychol Rev 1998,2:196-217.], 
02_RR_kezako.md:- sur-interpréter le résultat statistique qui est significatif (« Probability That a Positive Report is False ») [Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst 2004,96:434-442.] 
02_RR_kezako.md:Pour tous ces sujets cités *supra*, nous invitons le lecteur à se documenter : [Munafo MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Sert NPd, Simonsohn U, Wagenmakers E-J, Ware JJ, Ioannidis JPA. A manifesto for reproducible science. Nature Human Behav 2017,1:0021.]
08_aspects_computationnels.md:<!-- Dans le pire des cas, le programme plante avec un message d'erreur -->
08_aspects_computationnels.md:<!-- cryptique du genre `Un problème a fait que le programme a cessé de -->
08_aspects_computationnels.md:<!-- fonctionner correctement. [Oui] [Non] [Annuler]`, `segmentation fault -->
08_aspects_computationnels.md:<!-- Core dumped`, ou encore `java.lang.ExceptionInInitializerError: null`. -->
08_aspects_computationnels.md:<!-- Plus subtil voire fourbe, il se peut que le programme s'exécute à première vue -->
08_aspects_computationnels.md:<!-- normalement mais qu'en y regardant de plus près, on s'aperçoive que -->
08_aspects_computationnels.md:<!-- le résultat (valeur numérique, caractères illisibles, mise en page -->
08_aspects_computationnels.md:<!-- d'une figure, ...) ait changé. -->
08_aspects_computationnels.md:Package: python3-matplotlib
08_aspects_computationnels.md:Version: 2.1.1-2
08_aspects_computationnels.md:Depends: python3-dateutil, python-matplotlib-data (>= 2.1.1-2),
08_aspects_computationnels.md:libjs-jquery, libjs-jquery-ui, python3-numpy (>= 1:1.13.1),
08_aspects_computationnels.md:python3-cycler (>= 0.10.0), python3:any (>= 3.3.2-2~), libc6 (>=
08_aspects_computationnels.md:2.14), libfreetype6 (>= 2.2.1), libgcc1 (>= 1:3.0), libpng16-16 (>=
08_aspects_computationnels.md:1.6.2-1), libstdc++6 (>= 5.2), zlib1g (>= 1:1.1.4)
08_aspects_computationnels.md:Reproduciblity: An Astrophysical Exemple of Computationnal Uncertainty in the
09_donnees_de_sortie.md:(2004) The Inconstancy of the Fundamental Physical Constants: Computational
12_format_des_donnees.md:data management and stewardship. Sci. Data 3:160018 doi:
12_format_des_donnees.md:résultats d'analyse (par exemple des estimations) *SG* [@GUM: Évaluation des données de mesure – 
12_format_des_donnees.md:Guide pour l'expression de l'incertitude de mesure, JCGM 100:2008 
12_format_des_donnees.md:*SG* [@GUM: Évaluation des données de mesure – 
12_format_des_donnees.md:Guide pour l'expression de l'incertitude de mesure, JCGM 100:2008 
12_format_des_donnees.md:uncertainty in measurement" – Extension to any number of output quantities JCGM 102:2011, Definition 3.21  
12_format_des_donnees.md:[*SG*] tidy data [Ref Wickham, Hadley. "Tidy data". *Journal of Statistical Software* 59(10) (2014): 1-23]
13.5_partage_et_archivage.md:American Statistician, 72:1, 80-88, DOI: 10.1080/00031305.2017.1375986]()
13_outils_de_gestion_de_version.md:La chronologie du nommage de fichiers successifs de scripts R pourrait être:
13_outils_de_gestion_de_version.md:d'obtenir aisément une recherche reproductible:
14_apprendre_a_programmer.md:qu'il peut être nécessaire d'apprendre à programmer en conseillant deux langages devenus incontournables en traitement et analyse des données: Python et R.
14_apprendre_a_programmer.md:mean(table_tri$IMC[1:(nrow(table_tri)/2)])
15_rendre_son_code_comprehensible.md:<!--
15_rendre_son_code_comprehensible.md:> " *There are only two hard things in Computer Science: cache invalidation and naming things*. "  
15_rendre_son_code_comprehensible.md:<!-- Cette approche de *Don't Repeat Yourself* est un principe qui s'oppose à *Write Everything Twice*. -->
15_rendre_son_code_comprehensible.md:Il existe de nombreux concepts pour vous permettre d'y arriver: 
17_environnement_logiciel.md:`devtools::session_info()`]. Mais cette méthode relativement
18_licence_et_privacy.md:# Sortez couverts! {#C-licences}
18_zbis_biblio.md:`r if (knitr::is_html_output()){ '
18_zbis_biblio.md:`r if (knitr::is_latex_output()){ '
19_annexes.md:## Que peut apporter un book sprint à des chercheurs.euses?
22_quatrieme_de_couv.md:`r if(knitr::is_latex_output()){ '
README.md:## Est-il possible de contribuer, de proposer de nouveaux contenus? 
README.md:## Clef de références pour les chapitres:
README.md:**Syntaxe**:
README.md:* déclaration: `{#clef}`
README.md:* citation:    `\@ref(clef)`
README.md:  Sortez couverts!
README.md:partir de CRAN ou de  Github:
README.md:# devtools::install_github("rstudio/bookdown")
README.md:Pour compiler ce livre au format html, il vous suffit de faire:
README.md:ou bien en R:
README.md:rmarkdown::render_site(output_format = 'bookdown::gitbook', encoding = 'UTF-8')
README.md:Pour compiler ce livre au format pdf, il vous suffit de faire:
README.md:ou bien en R (ou presque...):
README.md:rmarkdown::render_site(output_format = 'bookdown::pdf_book', encoding = 'UTF-8')
brainstorm.md:## Partie 1: La recherche en pratique
brainstorm.md:* Objectif/contexte traité: Permettre à un chercheur de vérifier/réobtenir les résultats d'un autre
brainstorm.md:### Recherche reproductible: de quoi est-il question ?
brainstorm.md:## Partie 2: Origine des problèmes
brainstorm.md:## Partie 3: Les solutions de la recherche reproductible
brainstorm.md:* Le cahier de labo ???
brainstorm.md:*XMind: ZEN - Trial Version*
index.md:title: "Vers une recherche reproductible"
index.md:subtitle: "Faire évoluer ses pratiques"
index.md:author: "Loïc Desquilbet, Sabrina Granger, Boris Hejblum, Arnaud Legrand, Pascal Pernot, Nicolas Rougier <br> Facilitatrice : Elisa de Castro Guerra"
index.md:date: "`r Sys.Date()`"
index.md:site: bookdown::bookdown_site
index.md:documentclass: book
index.md:bibliography: [refs.bib]
index.md:biblio-style: apalike
index.md:link-citations: yes
index.md:colorlinks: yes
index.md:fontsize: 12pt
index.md:description: "Livre d'introduction à la recherche reproductible rédigé lors booksprint."
index.md:url: 'https\://bookdown.org/alegrand/bookdown/'
index.md:github-repo: alegrand/bookrr

Here is the (conservative) sed command to use unsplitable space almost everywhere:

# grep -e ' [:?!]' *.md
sed -i 's/ \([:?!]\)/ \1/g' *.md

Compiling

These scripts have now moved into the main Makefile

HTML

library(bookdown)
rmarkdown::render_site(output_format = 'bookdown::gitbook', encoding = 'UTF-8')

processing file: booksprintrr.Rmd

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |.................................................................| 100%
   inline R code fragments


output file: booksprintrr.knit.md

/usr/bin/X11/pandoc +RTS -K512m -RTS booksprintrr.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output booksprintrr.html --email-obfuscation none --wrap preserve --standalone --section-divs --table-of-contents --toc-depth 3 --template /home/alegrand/R/x86_64-pc-linux-gnu-library/3.5/bookdown/templates/gitbook.html --highlight-style pygments --number-sections --css style.css --include-in-header /tmp/RtmpOzSigs/rmarkdown-str113f7b65abc9.html --mathjax --filter /usr/bin/X11/pandoc-citeproc 
pandoc-citeproc: reference REF not found
pandoc-citeproc: reference GUM not found
pandoc-citeproc: reference GUM not found
pandoc-citeproc: reference REF not found
pandoc-citeproc: reference REF not found

Output created: _book/A-preambule.html
Warning message:
In split_chapters(output, gitbook_page, number_sections, split_by,  :
  You have 23 Rmd input file(s) but only 22 first-level heading(s). Did you forget first-level headings in certain Rmd files?

LaTeX

library(bookdown)
rmarkdown::render_site(output_format = 'bookdown::pdf_book', encoding = 'UTF-8')

processing file: booksprintrr.Rmd

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |.................................................................| 100%
   inline R code fragments


output file: booksprintrr.knit.md

/usr/bin/X11/pandoc +RTS -K512m -RTS booksprintrr.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output booksprintrr.tex --table-of-contents --toc-depth 2 --template /home/alegrand/R/x86_64-pc-linux-gnu-library/3.5/rmarkdown/rmd/latex/default-1.17.0.2.tex --number-sections --highlight-style tango --pdf-engine xelatex --natbib --include-in-header preamble.tex --variable graphics=yes --wrap preserve --variable 'geometry:margin=1in' --variable 'compact-title:yes' --variable tables=yes --standalone 

Output created: _book/booksprintrr.pdf

Code cruft

Ninja

In the text, we have a simple ninja obfuscated code example. This is just to check for its correctness.

mass = 100
speed = 3.2
E = 1/2 * mass * speed^2
print(E)

[1] 512

ninja = 100
XX = 2.0000
a = 0.5
turtle = 3.2
bluE_Pizza = a * ninja * turtle ** XX
print(bluE_Pizza)

[1] 512

Has my computer gone mad ?

1  +  1 == 2          # Who needs a computer anyway ?
.1 + .1 == .2         # So far so good.
.1 + .1 + .1          # Well, that's obvious.
.1 + .1 + .1 == .3    # What ?
.2 + .1 == .3         # Uuh ?
.1*3    == 0.3        # WTF!
3-2.9   == 0.1        # OK, I've just lost faith. Who broke my computer ?

[1] TRUE
[1] TRUE
[1] 0.3
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts.org

scripts.org

Various useful scripts

Fixing refs

Invalid reference syntax detection

Semi-automatic fix

Key syntax

Checking for invalid keys

Citations

Fixing keys

Fixing bibtex (Recent bibtex export)

Fixing bibtex only though sed (deprecated)

Fixing bibtex OLD (betterbibtex)

Spell checking

Plateforme

Space before commas, question marks, etc.

Compiling

HTML

LaTeX

Code cruft

Ninja

Has my computer gone mad ?

Files

scripts.org

Latest commit

History

scripts.org

File metadata and controls

Various useful scripts

Fixing refs

Invalid reference syntax detection

Semi-automatic fix

Key syntax

Checking for invalid keys

Citations

Fixing keys

Fixing bibtex (Recent bibtex export)

Fixing bibtex only though sed (deprecated)

Fixing bibtex OLD (betterbibtex)

Spell checking

Plateforme

Space before commas, question marks, etc.

Compiling

HTML

LaTeX

Code cruft

Ninja

Has my computer gone mad ?