Alexandru Marcoci

University of Cambridge


Alexandru Marcoci

I am a Senior Research Associate in AI Risk and Foresight in the Centre for the Study of Existential Risk (CSER), a Research Fellow at Clare Hall, University of Cambridge and a UKRI Policy Fellow in the Department for Science, Innovation and Technology. I am also a Steering Committee member of the Cambridge Centre for Data-Driven Discovery. Most of my work focuses on collective decision-making and argumentation about long-run risks and AI policy.

Before coming to CSER I was a Research Associate in Linguistics and Argumentation Theory at the Centre for Argument Technology, University of Dundee, a Teaching Assistant Professor and a core faculty member in the Philosophy, Politics and Economics Program at the University of North Carolina, Chapel Hill and a Fellow in Political Theory in the Department of Government at the London School of Economics and Political Science.

I have a PhD in Philosophy from the London School of Economics and Political Science (2018). Contact me


Institute for Replication I am a Co-Director of the Institute for Replication (I4R). I4R works to improve the credibility of science by systematically reproducing and replicating research findings in leading academic journals. Our team collaborates with researchers to: promote and generate reproductions and replications through one-day hackatons, establish an open access website, prepare standardized file structure, code and documentation, and develop educational materials on replication. From 2024 we have exciting new collaborations with Nature Human Behaviour and Psychological Science.

Next Generation Event Horizon Telescope Collaboration I am a member of the History, Philosophy, and Culture Working Group of the next generation Event Horizon Telescope Collaboration. We contribute social science and humanities perspectives on responsible telescope siting, outreach, education, foundations, algorithms, inferences, visualizations, governance structures and knowledge formation in scientific collaborations. I co-lead the Collaborations Focus Group, the task force on managing dissent in large scientific collaborations and lead the expert forecasting task force.

Collaborative Assessment for Trustworthy Science I collaborated with the DARPA-funded repliCATS project on using structured expert elicitation techniques to predict the reliability of research in the social & behavioural sciences. The project received the University of Melbourne Research Excellence Award for Interdisciplinary Research in 2022 and you can watch a video that summarises the aims and achievements of repliCATS here:

Research Funding

British Academy I am Principal Investigator of the Measuring the quality of collective reasoning project, funded by the British Academy/Leverhulme Small Research Grants (2023-2024). The Co-Investigator is Ans Vercammen (University of Queensland).

UK Research and Innovation I am a UKRI Policy Fellow working on the Future of Online Regulation on part-time secondment to the Department for Science, Innovation and Technology (2023-2025).

Open Philanthropy I am Co-I of the Benchmarking LLM agents on real-world tasks: Reproducibility project, funded by Open Philanthropy (2024-2025). The other investigators are Abel Brodeur (University of Ottawa/Institute for Replication) and Rohan Alexander (University of Toronto).


Alexandru Marcoci, David P. Wilkinson, Ans Vercammen, Bonnie C. Wintle, Anna Lou Abatayo, Ernest Baskin, Henk Berkman, Erin M. Buchanan, Sara Capitán, Tabaré Capitán, Ginny Chan, Kent Jason G. Cheng, Tom Coupé, Sarah Dryhurst, Jianhua Duan, John E. Edlund, Timothy M. Errington, Anna Fedor, Fiona Fidler, James G. Field, Nicholas Fox, Hannah Fraser, Alexandra LJ Freeman, Anca Hanea, Felix Holzmeister, Sanghyun Hong, Raquel Huggins, Nick Huntington-Klein, Magnus Johannesson, Angela M. Jones, Hansika Kapoor, John Kerr, Melissa Kline Struhl, Marta Kołczyńska, Yang Liu, Zachary Loomas, Brianna Luis, Esteban Méndez, Olivia Miske, Fallon Mody, Carolin Nast, Brian A. Nosek, E. Simon Parsons, Thomas Pfeiffer, W. Robert Reed, Jon Roozenbeek, Alexa R. Schlyfestone, Claudia R. Schneider, Andrew Soh, Anirudh Tagat, Melba Tutor, Andrew Tyner, Karolina Urbanska, Sander van der Linden. (2024). Predicting the replicability of social and behavioural science claims in a crisis: The COVID-19 Preprint Replication Project. Conditionally accepted in Nature Human Behaviour Abstract

Replications are important for assessing the reliability of published findings. However, they are costly, and it is infeasible to replicate everything. Accurate, fast, lower-cost alternatives such as eliciting predictions could accelerate assessment for rapid policy implementation in a crisis. We elicited judgments from participants on 100 claims from preprints about an emerging area of research (COVID-19 pandemic) using an interactive structured elicitation protocol, and we conducted 29 new high-powered replications. After interacting with their peers, participant groups with lower task expertise (‘beginners’) updated their estimates and confidence in their judgements significantly more than groups with greater task expertise (‘experienced’). For experienced individuals, the average accuracy was 0.57 (95% CI: [0.53, 0.61]) after interaction, and they correctly classified 61% of claims; beginners’ average accuracy was 0.58 (95% CI: [0.54, 0.62]), correctly classifying 69% of claims. The difference in accuracy between groups was not statistically significant, and their judgments on the full set of claims were correlated (r=.48). These results suggest that both beginners and more experienced participants using a structured process have some ability to make better-than-chance predictions about the reliability of ‘fast science’ under conditions of high uncertainty. However, given the importance of such assessments for making evidence-based critical decisions in a crisis, more research is required to understand who the right experts in forecasting replicability are and how their judgements ought to be elicited.

Alexandru Marcoci, Margaret E. Webb, Luke Rowe, Ashley Barnett, Tamar Primoratz, Ariel Kruger, Christopher W. Karvetski, Benjamin Stone, Michael L. Diamond, Morgan Saletta, Tim van Gelder, Philip E. Tetlock, Simon Dennis. (2023). Validating a forced choice method for eliciting quality of reasoning judgments. Behavior Research Methods Abstract

In this paper we investigate the criterion validity of forced choice comparisons of the quality of written arguments with normative solutions. Across two studies, assessing quality of reasoning through a forced choice design enabled both novices and experts to choose arguments supporting more accurate solutions – 62.2% (SE=1%) of the time for novices and 74.4% (SE=1%) for experts – and arguments produced by larger teams - up to 82% of the time for novices and 85% for experts – with high inter-rater reliability - 70.58% (95% CI = 1.18) percent agreement for novices and 80.98% (95% CI = 2.26) for experts. We also explored two methods for increasing efficiency. We found that the number of comparative judgments needed can be substantially reduced with little accuracy loss by leveraging transitivity and producing quality of reasoning assessments using an AVL tree method. Moreover, a regression model trained to predict scores based on automatically derived linguistic features of participants’ judgments achieved a high correlation with the objective accuracy scores of the arguments in our dataset. Despite the inherent subjectivity involved in evaluating differing quality of reasoning, the forced choice paradigm allows even novice raters to perform beyond chance and can provide a valid, reliable and efficient method for producing quality of reasoning assessments at scale.

Lexin Zhou, Pablo A. Moreno-Casares, Fernando Martínez-Plumed, John Burden, Ryan Burnell, Lucy Cheke, Cèsar Ferri, Alexandru Marcoci, Behzad Mehrbakhsh, Yael Moros-Daval, Seán Ó hÉigeartaigh, Danaja Rutar, Wout Schellaert, Konstantinos Voudouris, José Hernández-Orallo. (2023). Predictable Artificial Intelligence. arXivAbstract

We introduce the fundamental ideas and challenges of "Predictable AI", a nascent research area that explores the ways in which we can anticipate key indicators of present and future AI ecosystems. We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems, and thus should be prioritised over performance. While distinctive from other areas of technical and non-technical AI research, the questions, hypotheses and challenges relevant to "Predictable AI" were yet to be clearly described. This paper aims to elucidate them, calls for identifying paths towards AI predictability and outlines the potential impact of this emergent field.

Alexandra Oprea and Alexandru Marcoci. (2023). How Should Colleges Select Students? Justice, Toleration, and University Admissions. Forthcoming in the Georgetown Journal of Law & Public PolicyAbstract

As undergraduate education becomes a key formative experience for a larger percentage of the population, it is imperative that political philosophers consider the role of universities in bringing about a more just society. In this paper, we contribute to this task by assessing which university admissions policies are compatible with justice and conducive to the epistemic and civic missions of the university. Scholars agree that universities require a tolerant campus culture, but concrete proposals have focused on interventions at the level of faculty and administrators. The empirical literature, however, shows that students are more influenced by reputational consequences among their peers. We therefore argue that universities should also attend to the selection of the student body. We consider and reject a popular proposal that colleges should select students with underrepresented moral and political beliefs to increase viewpoint diversity. Instead, we propose directly weighing students’ tolerance and open-mindedness in the admission process.

Alexandru Marcoci, Ann C. Thresher, Niels C. M. Martens, Peter Galison, Sheperd S. Doeleman, Michael D. Johnson. (2023). Big STEM collaborations should include humanities and social science. Nature Human Behaviour 7: 1229-1230

Seán Ó hÉigeartaigh, Yolanda Lannquist, Alexandru Marcoci, Jaime Sevilla, Mónica Alejandra Ulloa Ruiz, Yaqub Chaudhary, Tim Schreier, Zach Stein-Perlman and Jeffrey Ladish. (2023). Do companies’ AI Safety Policies meet government best practice?. Leverhulme Centre for the Future of Intelligence Lead

Rapid review finds leading AI companies are not meeting UK Government best practice for frontier AI safety.

Bonnie C. Wintle, Eden T. Smith, Martin Bush, Fallon Mody, David P. Wilkinson, Anca M. Hanea, Alexandru Marcoci, Hannah Fraser, Victoria Hemming, Felix Singleton Thorn, Marissa F. McBride, Elliot Gould, Andrew Head, Daniel G. Hamilton, Steven Kambouris, Libby Rumpff, Rink Hoekstra, Mark A. Burgman, Fiona Fidler. (2023). Predicting and reasoning about replicability using structured groups. Royal Society Open Science 10(6): 221553 Abstract

This paper explores judgements about the replicability of social and behavioural sciences research and what drives those judgements. Using a mixed methods approach, it draws on qualitative and quantitative data elicited from groups using a structured approach called the IDEA protocol (‘Investigate’, ‘Discuss’, ‘Estimate’ and ‘Aggregate’). Five groups of five people with relevant domain expertise evaluated 25 research claims that were subject to at least one replication study. Participants assessed the probability that each of the 25 research claims would replicate (i.e., that a replication study would find a statistically significant result in the same direction as the original study) and described the reasoning behind those judgements. We quantitatively analysed possible correlates of predictive accuracy, including self-rated expertise and updating of judgements after feedback and discussion. We qualitatively analysed the reasoning data to explore the cues, heuristics and patterns of reasoning used by participants. Participants achieved 84% classification accuracy in predicting replicability. Those who engaged in a greater breadth of reasoning provided more accurate replicability judgements. Some reasons were more commonly invoked by more accurate participants, such as ‘effect size’ and ‘reputation’ (e.g. of the field of research). There was also some evidence of a relationship between statistical literacy and accuracy.

Michael D. Johnson, Kazunori Akiyama, Lindy Blackburn, Katherine L. Bouman, Avery E. Broderick, Vitor Cardoso, Rob Fender, Christian Fromm, Peter Galison, Jose L. Gómez, Daryl Haggard, Matthew L. Lister, Andrei Lobanov, Sera Markoff, Ramesh Narayan, Priyamvada Natarajan, Tiffany Nichols, Dominic W. Pesce, Ziri Younsi, Andrew Chael, Koushik Chatterjee, Ryan Chaves, Juliusz Doboszewski, Richard Dodson, Sheperd S. Doeleman, Jamee Elder, Garret Fitzpatrick, Kari Haworth, Janice Houston, Sara Issaoun, Yuri Kovalev, Aviad Levis, Rocco Lico, Alexandru Marcoci, Niels C.M. Martens, Neil Nagar, Aaron Oppenheimer, Daniel C. M. Palumbo, Angelo Ricarte, María J. Rioja, Freek Roelofs, Ann C. Thresher, Paul Tiede, Jonathan Weintroub, Maciek Wielgus. (2023). Key Science Goals for the Next-Generation Event Horizon Telescope. Galaxies 11(3): 61 (SI: From Vision to Instrument - Creating a Next-Generation Event Horizon Telescope for a New Era of Black Hole Science). Abstract

The Event Horizon Telescope (EHT) has led to the first images of a supermassive black hole, revealing the central compact objects in the elliptical galaxy M87 and the Milky Way. Proposed upgrades to this array through the next-generation EHT (ngEHT) program would sharply improve the angular resolution, dynamic range, and temporal coverage of the existing EHT observations. These improvements will uniquely enable a wealth of transformative new discoveries related to black hole science, extending from event-horizon-scale studies of strong gravity to studies of explosive transients to the cosmological growth and influence of supermassive black holes. Here, we present the key science goals for the ngEHT and their associated instrument requirements, both of which have been formulated through a multi-year international effort involving hundreds of scientists worldwide

Mark Burgman, Rafael Chiaravalloti, Fiona Fidler, Yizhong Huan, Marissa McBride, Alexandru Marcoci, Juliet Norman, Ans Vercammen, Bonnie C. Wintle and Yurong Yu. (2023). A toolkit for open and pluralistic conservation science. Conservation Letters 16(1): e12919 Abstract

Conservation science practitioners seek to pre-empt irreversible impacts on species, ecosystems, and social-ecological systems, requiring efficient and timely action even when data and understanding are unavailable, incomplete, dated, or biased. These challenges are exacerbated by the scientific community's capacity to consistently distinguish between reliable and unreliable evidence, including the recognition of questionable research practices (QRPs, or ‘questionable practices’), which may threaten the credibility of research, including harming trust in well-designed and reliable scientific research. In this paper, we propose a ‘toolkit’ for open and pluralistic conservation science, highlighting common questionable practices and sources of bias and indicating where remedies for these problems may be found. The toolkit provides an accessible resource for anyone conducting, reviewing, or using conservation research, to identify sources of false claims or misleading evidence that arise unintentionally, or through misunderstandings or carelessness in the application of scientific methods and analyses. We aim to influence editorial and review practices and hopefully to remedy problems before they are published or deployed in policy or conservation practice.

Peter Galison, Juliusz Doboszewski, Jamee Elder, Niels C.M. Martens, Abhay Ashtekar, Jonas Enander, Marie Gueguen, Elizabeth A. Kessler, Roberto Lalli, Martin Lesourd, Alexandru Marcoci, Sebastián Murgueitio Ramírez, Priyamvada Natarajan, James Nguyen, Luis Reyes-Galindo, Sophie Ritson, Mike D. Schneider, Emilie Skulberg, Helene Sorgner, Matthew Stanley, Ann C. Thresher, Jeroen van Dongen, James Owen Weatherall, Jingyi Wu, Adrian Wüthrich. (2023). The Next Generation Event Horizon Telescope Collaboration: History, Philosophy, and Culture. Galaxies 11(1): 32 (SI: From Vision to Instrument - Creating a Next-Generation Event Horizon Telescope for a New Era of Black Hole Science)Abstract

This white paper outlines the plans of the History Philosophy Culture Working Group of the Next Generation Event Horizon Telescope Collaboration.

Hannah Fraser, Martin Bush, Bonnie Wintle, Fallon Mody, Eden Smith, Anca Hanea, Elliot Gould, Victoria Hemming, Dan Hamilton, Libby Rumpff, David Peter Wilkinson, Ross Pearson, Felix Singleton Thorn, Raquel Ashton, Aaron Willcox, Charles T Gray, Andrew Head, Melissa Ross, Rebecca Groenewegen, Alexandru Marcoci, Ans Vercammen, Timothy H Parker, Rink Hoekstra, Shinichi Nakagawa, David R Mandel, Don van Ravenzwaaij, Marissa McBride, Richard O Sinnott, Peter Vesk, Mark Burgman and Fiona Fidler. (2023). Predicting reliability through structured expert elicitation with the repliCATS (Collaborative Assessments for Trustworthy Science) process. PLoS ONE 18(1): e0274429Abstract

As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with accuracy that meets or exceeds that of other techniques used to predict replicability while providing additional benefits. The repliCATS process is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to assist with problems like understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.

Luc Bovens and Alexandru Marcoci. (2023). The Gender-Neutral Bathroom: A New Frame and Some Nudges. Behavioural Public Policy 7(1), 1-24 Abstract

Gender-neutral bathrooms are usually framed as an accommodation for trans and other gender non-conforming individuals. In this paper we show that the benefits of gender-neutral bathrooms are much broader. First, our simulations show that gender-neutral bathrooms reduce average waiting times: while waiting times for women go down invariably, waiting times for men either go down or slightly increase depending on usage intensity, occupancy time differentials, and the presence of urinals. Second, our result can be turned on its head: firms have an opportunity to reduce the number of facilities and cut costs by making them all gender-neutral without increasing waiting times. These observations can be used to reframe the gender-neutral bathrooms debate so that they appeal to a larger constituency, cutting across the usual dividing lines in the “bathroom wars”. Finally, there are improved designs and behavioural strategies that can help overcome resistance. We explore what strategies can be invoked to mitigate the objections that gender-neutral bathrooms (1) are unsafe; (2) elicit discomfort; and (3) are unhygienic.

Alexandru Marcoci, Margaret E. Webb, Luke Rowe, Ashley Barnett, Tamar Primoratz, Ariel Kruger, Benjamin Stone, Morgan Saletta, Tim van Gelder, Simon Dennis. (2022). Measuring Quality of General Reasoning. In J. Culbertson, A. Perfors, H. Rabagliati & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022), 3229-3235 Abstract

Machine learning models that automatically assess reasoning quality are trained on human-annotated written products. These “gold-standard” corpora are typically created by prompting annotators to choose, using a forced choice design, which of two products presented side by side is the most convincing, contains the strongest evidence or would be adopted by more people. Despite the increase in popularity of using a forced choice design for assessing quality of reasoning (QoR), no study to date has established the validity and reliability of such a method. In two studies, we simultaneously presented two products of reasoning to participants and asked them to identify which product was ‘better justified’ through a forced choice design. We investigated the criterion validity and inter-rater reliability of the forced choice protocol by assessing the relationship between QoR, measured using the forced choice protocol, and accuracy in objectively answerable problems using naive raters sampled from MTurk (Study 1) and experts (Study 2), respectively. In both studies products that were closer to the correct answer and products generated by larger teams were consistently preferred. Experts were substantially better at picking the reasoning products that corresponded to accurate answers. Perhaps the most surprising finding was just how rapidly raters made judgements regarding reasoning: On average, both novices and experts made reliable decisions in under 15 seconds. We conclude that forced choice is a valid and reliable method of assessing QoR.

Alexandru Marcoci, Ans Vercammen, Martin Bush, Daniel Hamilton, Anca Hanea, Victoria Hemming, Bonnie C. Wintle, Mark Burgman and Fiona Fidler. (2022). Reimagining peer review as an expert elicitation process. BMC Research Notes 15, 127 (SI: Reproducibility and Research Integrity) Abstract

Journal peer review regulates the flow of ideas through an academic discipline and thus has the power to shape what a research community knows, actively investigates, and recommends to policymakers and the wider public. We might assume that editors can identify the ‘best’ experts and rely on them for peer review. But decades of research on both expert decision-making and peer review suggest they cannot. In the absence of a clear criterion for demarcating reliable, insightful, and accurate expert assessors of research quality, the best safeguard against unwanted biases, uneven power distributions and general inefficiencies is to introduce greater transparency and structure into the process. This paper argues that peer review would therefore benefit from applying a series of evidence-based recommendations from the empirical literature on structured expert elicitation. We highlight individual and group characteristics that contribute to higher quality judgements, and elements of elicitation protocols that reduce bias, promote constructive discussion, and enable opinions to be objectively and transparently aggregated.

Ans Vercammen, Alexandru Marcoci and Mark Burgman. (2021). Pre-screening workers to overcome bias amplification in online labour markets. PLoS ONE 16(3), e0249051. Abstract

Groups have access to more diverse information and typically outperform individuals on problem solving tasks. Crowdsolving utilises this principle to generate novel and/or superior solutions to intellective tasks by pooling the inputs from a distributed online crowd. However, it is unclear whether this particular instance of “wisdom of the crowd” can overcome the influence of potent cognitive biases that habitually lead individuals to commit reasoning errors. We empirically test the prevalence of cognitive bias on a popular crowdsourcing platform, examining susceptibility to bias of online panels at the individual and aggregate levels. We then investigate the use of the Cognitive Reflection Test, notable for its predictive validity for real-life reasoning, as a screening tool to improve collective performance. We find that systematic biases in crowdsourced answers are not as prevalent as anticipated, but when they occur, biases are amplified with increasing group size, as predicted by the Condorcet Jury Theorem. The results further suggest that pre-screening individuals with the Cognitive Reflection Test can substantially enhance collective judgement and improve crowdsolving performance.

Diana Popescu and Alexandru Marcoci. (2020). Coronavirus: allocating ICU beds and ventilators based on age is discriminatory. The Conversation, April 22 Lead

Being a member of a certain age group shouldn't be a liability.

Alexandru Marcoci and James Nguyen. (2020). Judgement aggregation in scientific collaborations: The case for waiving expertise. Studies in History and Philosophy of Science Part A 84, 66-74 Abstract

The fragmentation of academic disciplines forces individuals to specialise. In doing so, they become experts over their narrow area of research. However, ambitious scientific projects, such as the search for gravitational waves, require them to come together and collaborate across disciplinary borders. How should scientists with expertise in different disciplines treat each others' expert claims? An intuitive answer is that the collaboration should defer to the opinions of experts. In this paper we show that under certain seemingly innocuous assumptions, this intuitive answer gives rise to an impossibility result when it comes to aggregating the beliefs of experts to deliver the beliefs of a collaboration as a whole. We then argue that when experts' beliefs come into conflict, they should waive their expert status.

Alexandru Marcoci. (2020). Monty Hall saves Dr. Evil: On Elga's restricted principle of indifference. Erkenntnis 85(1), 65-76 Abstract

In this paper I show that Elga's argument for a restricted principle of indifference for self-locating belief relies on the kind of mistaken reasoning that recommends the 'staying' strategy in the Monty Hall problem.

Gregg Willcox, Louis Rosenberg, Mark Burgman and Alexandru Marcoci. (2020). Prioritizing Policy Objectives in Polarized Societies using Artificial Swarm Intelligence. In the Proceedings of the IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA 2020), 1-9 Abstract

Groups often struggle to reach decisions, especially when populations are strongly divided by conflicting views. Traditional methods for collective decision-making involve polling individuals and aggregating results. In recent years, a new method called Artificial Swarm Intelligence (ASI) has been developed that enables networked human groups to deliberate in real-time systems, moderated by artificial intelligence algorithms. While traditional voting methods aggregate input provided by isolated participants, Swarm-based methods enable participants to influence each other and converge on solutions together. In this study we compare the output of traditional methods such as Majority vote and Borda count to the Swarm method on a set of divisive policy issues. We find that the rankings generated using ASI and the Borda Count methods are often rated as significantly more satisfactory than those generated by the Majority vote system (p<0.05). This result held for both the population that generated the rankings (the “in-group”) and the population that did not (the “out-group”): the in-group ranked the Swarm prioritizations as 9.6% more satisfactory than the Majority prioritizations, while the out-group ranked the Swarm prioritizations as 6.5% more satisfactory than the Majority prioritizations. This effect also held even when the out-group was subject to a demographic sampling bias of 10% (i.e. the out-group was composed of 10% more Labour voters than the in-group). The Swarm method was the only method to be perceived as more satisfactory to the “out-group” than the voting group.

Alexandru Marcoci and James Nguyen. (2019).Objectivity, ambiguity and theory choice. Erkenntnis 84(2), 343–357 Abstract

Kuhn argued that scientific theory choice is, in some sense, a rational matter, but one that is not fully determined by shared objective scientific virtues like accuracy, simplicity, and scope. Okasha imports Arrow's impossibility theorem into the context of theory choice to show that rather than not fully determining theory choice, these virtues cannot determine it at all. If Okasha is right, then there is no function (satisfying certain desirable conditions) from 'preference' rankings supplied by scientific virtues over competing theories (or models, or hypotheses) to a single all-things-considered ranking. This threatens the rationality of science. In this paper we show that if Kuhn's claims about the role that subjective elements play in theory choice are taken seriously, then the threat dissolves.

Alexandru Marcoci, Ans Vercammen and Mark Burgman. (2019). ODNI as an analytic ombudsman: Is Intelligence Community Directive 203 up to the task? Intelligence and National Security 34(2), 205-224 Abstract

In the wake of 9/11 and the war in Iraq, the Office of the Director of National Intelligence adopted Intelligence Community Directive (ICD) 203 – a list of analytic tradecraft standards – and appointed an ombudsman charged with monitoring their implementation. In this paper, we identify three assumptions behind ICD203: (1) tradecraft standards can be employed consistently; (2) tradecraft standards sufficiently capture the key elements of good reasoning; and (3) good reasoning leads to more accurate judgments. We then report on two controlled experiments that uncover operational constraints in the reliable application of the ICD203 criteria for the assessment of intelligence products.

Alexandru Marcoci, Mark Burgman, Ariel Kruger, Elizabeth Silver, Marissa McBride, Felix Singleton Thorn, Hannah Fraser, Bonnie Wintle, Fiona Fidler and Ans Vercammen. (2019). Better together: Reliable application of the post-9/11 and post-Iraq US intelligence tradecraft standards requires collective analysis. Frontiers in Psychology 9, 2634 (SI: Judgment and Decision Making Under Uncertainty) Abstract

Background. The events of 9/11 and the October 2002 National Intelligence Estimate on Iraq's Continuing Programs for Weapons of Mass Destruction precipitated fundamental changes within the US Intelligence Community. As part of the reform, analytic tradecraft standards were revised and codified into a policy document – Intelligence Community Directive (ICD) 203 – and an analytic ombudsman was appointed in the newly created Office for the Director of National Intelligence to ensure compliance across the intelligence community. In this paper we investigate the untested assumption that the ICD203 criteria can facilitate reliable evaluations of analytic products.
Method. Fifteen independent raters used a rubric based on the ICD203 criteria to assess the quality of reasoning of 64 analytical reports generated in response to hypothetical intelligence problems. We calculated the intra-class correlation coefficients for single and group-aggregated assessments.
Results. Despite general training and rater calibration, the reliability of individual assessments was poor. However, aggregate ratings showed good to excellent reliability.
Conclusions. Given that real problems will be more difficult and complex than our hypothetical case studies, we advise that groups of at least three raters are required to obtain reliable quality control procedures for intelligence products. Our study sets limits on assessment reliability and provides a basis for further evaluation of the predictive validity of intelligence reports generated in compliance with the tradecraft standards.

Alexandru Marcoci. (2018). On a dilemma of redistribution. Dialectica 72(3), 453-460 Abstract

McKenzie Alexander presents a dilemma for a social planner who wants to correct the unfair distribution of an indivisible good between two equally worthy individuals or groups: either she guarantees a fair outcome, or she follows a fair procedure (but not both). In this paper I show that this dilemma only holds if the social planner can redistribute the good in question at most once. To wit, the bias of the initial distribution always washes out when we allow for sufficiently many redistributions.

Luc Bovens and Alexandru Marcoci. (2018). Gender-neutral restrooms require new (choice) architecture. Behavioural Public Policy Blog, April 17 Lead

"What’s not to love about gender-neutral restrooms?" ask Bovens and Marcoci. Their spread could only come about trough a sensitive mix of good design and nudges; working on social norms and behaviours. Some discomforts may, however, prove to be beyond nudging, and an incremental, learning approach is probably required.

Luc Bovens and Alexandru Marcoci. (2017). To those who oppose gender-neutral toilets: they’re better for everybody. The Guardian, December 1 Lead

Bovens and Marcoci's research into the economics of these facilities shows they cut waiting for women, and address the concerns of trans and disabled people.

Alexandru Marcoci and James Nguyen. (2017). Scientific rationality by degrees. In M. Massimi, J.W. Romeijn, and G. Schurz (Eds.), EPSA15 Selected Papers. European Studies in Philosophy of Science, Vol. 5 (Cham: Springer), 321-333 Abstract

In a recent paper, Samir Okasha imports Arrow's impossibility theorem into the context of theory choice. He shows that there is no function (satisfying certain desirable conditions) from profiles of preference rankings over competing theories, models or hypotheses provided by scientific virtues to a single all-things-considered ranking. This is a prima facie threat to the rationality of theory choice. In this paper we show this threat relies on an all-or-nothing understanding of scientific rationality and articulate instead a notion of rationality by degrees. The move from all-or-nothing rationality to rationality by degrees will allow us to argue that theory choice can be rational enough.

Alexandru Marcoci. (2015). Review of Quitting Certainties: A Bayesian Framework Modeling Degrees of Belief, by Michael G. Titelbaum. Economics and Philosophy 31(1), 194–200

Zoé Christoff, Paolo Galeazzi, Nina Gierasimczuk, Alexandru Marcoci and Sonja Smets (Eds.). Logic and Interactive RAtionality Yearbook 2012: Volumes 1 & 2. Institute for Logic Language and Computation, University of Amsterdam

Alexandru Baltag, Davide Grossi, Alexandru Marcoci, Ben Rodenhäuser and Sonja Smets (Eds.). Logic and Interactive RAtionality Yearbook 2011. Institute for Logic Language and Computation, University of Amsterdam

Centre for the Study of Existential Risk, 16 Mill Lane, Cambridge, CB2 1SB, United Kingdom