This online supplement shows some additional Figures and Tables, that were not included in the main manuscript but are nevertheless relevant results.

1 Extra Figures

Figure 1.1 shows the distribution of the positive word count per proposal depending on the institution type and funding scheme. Figure 1.2 then show the share of proposals with at least one occurence of the specific positive word in the texts depending on the applicant’s gender and funding scheme.

Distribution of the number of positive words by Institution Type and funding scheme.

Figure S1.1: Distribution of the number of positive words by Institution Type and funding scheme.

Gender differences in positive language usage in the life sciences, per fundingi scheme.

Figure S1.2: Gender differences in positive language usage in the life sciences, per fundingi scheme.

2 Linear probability model results

The Table below shows the results of the multivariate linear probability model adjusted for age (in years), the institution type and the length of the texts. Evidence for a gender effect on the presence of a positive word cannot be found in any of the models for the different funding scheme. Unsurprisingly, the length of the text increases the chance of a positive word. Additionally, we find evidence for an effect of the institution type in proposals to Projects: compared to researchers from cantonal universities, researchers affiliated to the ETH Domain have a higher probability of using at least one positive word when describing their research projects.

TAB S1: Multivariate linear probability model of applicants using a positive term at least once in their texts. The reference for the gender variable is female and for the institution (inst.) type factor variable is Cantonal University.

Coefficient Estimate (95% CI)

Projects: 5'736 observations

Intercept

0.397 (0.305–0.488)

gender - male

0.019 (-0.011–0.049)

age (years)

-0.002 (-0.003–0)

inst. type - ETH Domain

0.057 (0.019–0.095)

inst. type - UAS and Other

0.002 (-0.048–0.051)

text length (per 100 words)

0.051 (0.043–0.059)

Careers: 1'802 observations

Intercept

0.46 (0.203–0.717)

gender - male

0.021 (-0.024–0.067)

age (years)

-0.003 (-0.009–0.004)

inst. type - ETH Domain

0.03 (-0.032–0.093)

inst. type - UAS and Other

-0.111 (-0.233–0.01)

text length (per 100 words)

0.057 (0.045–0.068)

Pilot: 612 observations

Intercept

0.489 (0.256–0.723)

gender - male

0.056 (-0.02–0.131)

age (years)

-0.005 (-0.01–0)

inst. type - ETH Domain

0.061 (-0.038–0.16)

inst. type - UAS and Other

-0.088 (-0.207–0.031)

text length (per 100 words)

0.075 (0.048–0.102)

3 Model comparisons and tests

Poisson and negative binomial (NB) models are nested, since the Poisson model is a special case of the NB. Hence we can use a likelihood ratio test to compare the two model specifications. Additionally, we performed a check for overdispersion, using the {performance} R-package.

# For Project funding
lrtest(pf_reg_ls$count, pf_reg_ls$nbcount) 
## Likelihood ratio test
## 
## Model 1: sum_pos ~ ResponsibleApplicantGender + ResponsibleApplicantAgeAtSubmission + 
##     ResearchInstitutionType + text_length100
## Model 2: sum_pos ~ ResponsibleApplicantGender + ResponsibleApplicantAgeAtSubmission + 
##     ResearchInstitutionType + text_length100
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1   6 -9025.4                         
## 2   7 -8545.2  1 960.32  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
check_overdispersion(pf_reg_ls$count)
## # Overdispersion test
## 
##        dispersion ratio =     1.769
##   Pearson's Chi-Squared = 10135.170
##                 p-value =   < 0.001
## Overdispersion detected.
# For Career funding
lrtest(career_reg_ls$count, career_reg_ls$nbcount) 
## Likelihood ratio test
## 
## Model 1: sum_pos ~ ResponsibleApplicantGender + ResponsibleApplicantAgeAtSubmission + 
##     ResearchInstitutionType + text_length100
## Model 2: sum_pos ~ ResponsibleApplicantGender + ResponsibleApplicantAgeAtSubmission + 
##     ResearchInstitutionType + text_length100
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1   6 -2974.4                         
## 2   7 -2804.5  1 339.91  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
check_overdispersion(career_reg_ls$count)
## # Overdispersion test
## 
##        dispersion ratio =    1.814
##   Pearson's Chi-Squared = 3258.285
##                 p-value =  < 0.001
## Overdispersion detected.
# For the pilot
lrtest(sparkabs_reg_ls$count, sparkabs_reg_ls$nbcount) 
## Likelihood ratio test
## 
## Model 1: sum_pos ~ ResponsibleApplicantGender + ResponsibleApplicantAgeAtSubmission + 
##     ResearchInstitutionType + text_length100
## Model 2: sum_pos ~ ResponsibleApplicantGender + ResponsibleApplicantAgeAtSubmission + 
##     ResearchInstitutionType + text_length100
##   #Df   LogLik Df  Chisq Pr(>Chisq)    
## 1   6 -1018.76                         
## 2   7  -975.61  1 86.294  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
check_overdispersion(sparkabs_reg_ls$count)
## # Overdispersion test
## 
##        dispersion ratio =    1.710
##   Pearson's Chi-Squared = 1036.237
##                 p-value =  < 0.001
## Overdispersion detected.

The model performance of the models with logit and probit specification was compared using likelihood ratio test, using the function test_performance(). No evidence for a difference between the models’ performance was found.

# For Project funding
test_performance(pf_reg_ls$log, pf_reg_ls$prob)
## Some of the nested models seem to be identical
## Name | Model |   Omega2 | p (Omega2) |    LR | p (LR)
## -----------------------------------------------------
## ..1  |   glm |          |            |       |       
## ..2  |   glm | 1.69e-05 |     < .001 | -1.36 | > .999
## Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
# For Career funding
test_performance(career_reg_ls$log, career_reg_ls$prob)
## Some of the nested models seem to be identical
## Name | Model |   Omega2 | p (Omega2) |    LR | p (LR)
## -----------------------------------------------------
## ..1  |   glm |          |            |       |       
## ..2  |   glm | 2.75e-05 |     < .001 | -0.68 | > .999
## Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
# For the pilot
test_performance(sparkabs_reg_ls$log, sparkabs_reg_ls$prob)
## Some of the nested models seem to be identical
## Name | Model |   Omega2 | p (Omega2) |    LR | p (LR)
## -----------------------------------------------------
## ..1  |   glm |          |            |       |       
## ..2  |   glm | 4.72e-05 |      0.035 | -0.13 |  0.802
## Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.

4 Poisson regression model results

TAB S2: Summary of the Poisson model using rate ratios (RR), with 95% confidence intervals (CI). The reference for the gender variable is female and for the institution (inst.) type factor variable is ‘Cantonal University’.

## Waiting for profiling to be done...
## Waiting for profiling to be done...
## Waiting for profiling to be done...

RR Estimate (95% CI)

Projects: 5'736 observations

Intercept

0.61 (0.514-0.724)

gender - male

1.044 (0.987-1.105)

age (years)

0.996 (0.993-0.999)

inst. type - ETH Domain

1.132 (1.055-1.213)

inst. type - UAS and Other

1.05 (0.956-1.15)

text length (per 100 words)

1.176 (1.159-1.192)

Careers: 1'802 observations

Intercept

0.628 (0.398-0.995)

gender - male

1.077 (0.992-1.17)

age (years)

0.998 (0.986-1.01)

inst. type - ETH Domain

1 (0.893-1.117)

inst. type - UAS and Other

0.763 (0.596-0.961)

text length (per 100 words)

1.192 (1.169-1.215)

Spark: 612 observations

Intercept

1.059 (0.68-1.649)

gender - male

1.093 (0.953-1.255)

age (years)

0.985 (0.975-0.994)

inst. type - ETH Domain

1.152 (0.97-1.362)

inst. type - UAS and Other

0.906 (0.714-1.135)

text length (per 100 words)

1.216 (1.157-1.279)