Thank you for sharing this exciting evolution in your publishing model. Given the shift toward empowering scientists to have more agency over when and how they share their work, how do you envision balancing open science principles with the protection of potentially commercializable discoveries? What strategies or tools are you considering to ensure that open publishing enhances Arcadia’s ability to pursue commercialization opportunities without inadvertently disclosing valuable intellectual property prematurely?
Thanks for this question! The strategy we’re employing here is to ask our scientists to think more carefully about commercialization and not act out of fear or conservatism. We’re saying a vague notion of future value is insufficient to justify protecting information. This has the dual benefit of scientists educating themselves on a path to commercialization for their ideas and also allows us to leverage all the benefits of open science to improve and speed our science. Further we think (and have heard feedback to this effect) that sharing evidence of our discoveries increases faith on the part of investors that they can vet if the advancements we are making are solid and useful rather than banking on unsupported promises. We believe the vast majority of early stage discovery is not what makes sense to protect and is better leveraged by sharing. Only trying this will yield confidence to this effect and we have the unique opportunity and privilege to do this experiment.
I’m curious about the relationship between rigor and timeliness, especially given how quickly a scientific story can change with more data.
No scientific work IS ever done, but I think some of it is more done than others. It seems like from a structural standpoint, pubs are replacing traditional peer-reviewed journal articles, and if I were a scientist operating at that assumption at Arcadia, I’d probably be pretty slow to share in-progress results too!
But then I was thinking about how sharing research at all stages of “doneness” is important and still a pretty standard part of academic research (at least for trainees): you get an interesting result or run into a weird technical problem and you bring it up at lab meeting, where you get feedback from your peers and colleagues.
You plug that feedback in, get more robust results, then present your research-in-progress to your department. Their comments improve how you design your experiments or do your statistical analysis or the conclusions you draw.
Rinse and repeat for poster presentations/conference talks until you get to a peer-reviewed publication.
Is the idea that at least some of the pubs might be representing a story at these earlier stages, making it more of a global lab meeting presentation than a publication? (Sorry this is so long!)
Thanks for your thoughts! Yes, the idea is to share pubs across all stages of the research cycle, even early ideas we haven’t even tested yet. Some may evolve over time as we add new data and release new versions, but some may stay as-is and only ever reflect the early stages of a project. When we share early or less-developed work, we try to be abundantly clear about what stage it’s in, pointing out any caveats so as not to mislead readers. For example, in this pub, we couldn’t replicate some initial findings, but since we weren’t going to pursue the project further, we decided to share what we’d found in case it’s useful to others, just with multiple flags dropped in about reproducibility.
I like the publishing process. I was just wondering how other groups and individuals working on open science or re-envisioning publishing contribute to or learn from Arcadia’s experiment and what lessons from the first phase of Arcadia’s publishing experiment can be applied to help other organizations or individuals aiming to reimagine their own scientific publishing models.
As we have learned from many other open science experiments done by others, we hope that others innovating on their publishing models note the aspects of our model that suit their needs and which we have derisked. Aspects of our model I think were informative:
People can and do comment on our publications even though they live outside the ecosystem (something many were skeptical of).
CRediT taxonomy really helped solidify a culture of collaboration that I think is difficult to do with a hierarchical/winner-take-all author lists.
Author agency is essential to maintain otherwise we risk gatekeeping in new ways.
Scientists are rigorous and have perfectionist tendencies that make them reluctant to share early even if all the incentives and motivations are aligned for them to do so.
Measuring impact and reuse in the absence of proxies is difficult but essential work.
There are many others and we hope to release another pub soon that tries to capture learnings from our latest iteration. As far as contributing, please reach out if you or anyone else has run experiments that we might find informative! Thanks for the question!
Buried the lede a little bit here, I think! How hands-on did you find you had to be with this workflow? The content itself is awesome, but I am extra amazed if these methods generalize; the implications are cool to think about!
Hi Maxine — thank you for commenting! Agreed that the implications are very cool to think about, and we want to explore trying this with scientific publications to see if it’s able to capture what we’d need it to. With this pub in particular, I’d say that the workflow was fairly hands-on. But, it significantly sped up the release and provided a solid base to build on. The main elements that required human intervention were:
editing the video down to fit into the 1,048,576 token context window (the limit at the time of writing) alongside detailed style instructions, a pub template, and the full text of ‘The experiment begins.’ We did put the style guide and pub template in the system instructions, so we’ve been able to test this a few times without needing to transfer that information over each time.
expanding on points the model had no way of knowing or connecting. Naturally, the model couldn't intuit certain details or connections that we didn’t include in the talk. This was an internal presentation where we omitted some information because most Arcadians are already well-versed in our publishing process. This created gaps for the model, which was a helpful indication that over-explaining is an effective strategy when asking an LLM to create a publication from a talk. We also expanded upon some of our points that the model summarized too briefly. Something I’ve found useful is to explicitly encourage the model to generate long outputs, as they tend toward brevity (and these long outputs are still fairly summarized). Being succinct is something we strive for, but these tools aren’t always the best at deciding which pieces to cut, and it tends to be faster for us to remove text rather than expand text.
editing the voice and style slightly. We found that Gemini provided text that was closest in style to our voice in pubs compared to other LLMs, but there were still areas where it came across slightly robotic. This was not a huge lift though and it came surprisingly close to our style.
Hopefully that gives you some more context! Happy to answer any questions about it too.
As a template for publishing at a for-profit company, how does this interplay with Arcadia IP? For example, public disclosure regulations behind filing patents or trade secret for spinoff companies? Do scientists still go through legal prior to publication?
Hi Bobby, thanks for your question. Our scientists are explicitly not required to go through a mandatory check through legal for their pubs anymore. Like all other aspects of publishing, we aim to make educational resources/expertise available to them so they can learn more about what might have commercial value. We are still figuring out all the mechanisms for this but the certain thing is that there is no mandatory gate through legal. I personally believe that thinking more carefully about commercialization themselves will make our scientists more equipped to pursue the more promising revenue streams AND share things that will accelerate their science upstream. We expect to learn a lot more by trying out this shift in a way that allows more experimentation across the team than via a single mandatory model.