KEYNOTE: “Show me the Data”
Ruth Ahnert (Queen Mary University of London)
This paper seeks to provide an account of the complex and fragmented ecosystem of digital collections and humanities data, and the challenges – as well as opportunities – it presents for humanities scholars, their practices and the future of the discipline. I will draw on my own experiences both as a users and creator of historical datasets, as well as some institutional histories, to explore the complex arrangements humanities scholars must navigate in relation to discoverability (can we find the data?) and accessibility (can we get hold of it?). The issue is urgent because it concerns nations’ cultural assets which, in the words of the Vancouver Statement on Collections as Data, should be ‘widely accessible, within the bounds of ethical, legal, and community expectations’. Crucially, however, we contend that the blanket notion of ‘digital collections’ needs to be unpicked, as it currently elides a wide variety of material and stages in the life-cycle of digitization and data creation, with different overlapping forms of ownership. My second purpose is to show where we have opportunities and responsibilities as humanities scholars to change the way that data is created and used, and some practical things we can begin to do right now.
“Composite Epistemic Elements and Composite Epistemic Stances”
Spenser Borrie (University of Toronto) & Hakob Barseghyan (Victoria College, University of Toronto)
Recent developments in theoretical scientonomy suggest the existence of composite epistemic elements and composite epistemic stances. Consider, for instance, the recently accepted notion of disciplines (Patton & Al-Zayadi 2021; Friesen & Patton 2023). According to Patton and Al-Zayadi, a discipline is characterized by an essential set of core questions and a second-order delineating theory that identifies the core questions of the discipline. Thus, it is safe to say that disciplines are composite elements, consisting of other elements. Similarly, the stance of discipline acceptance involves the acceptance of all the core questions of the discipline as well as the acceptance of the delineating theory. This highlights that discipline acceptance is a composite stance, in the sense that its obtainment necessitates the obtainment of respective stances towards its component elements.
A number of questions arise in this context. What are composite epistemic elements? How can they be conceptualized? What are composite epistemic stances? Are composite elements and stances fully reducible to more basic epistemic elements and stances? Is the membership of more basic elements in the composite element static or dynamic, i.e. can the same composite element have different component elements at different times, or should they necessarily have a fixed list of component elements? If it is the latter, then how should we trace the content of composite elements? How should we trace the stances taken towards composite elements? This paper is an attempt to shed light on these questions.
We suggest that there exist composite elements and stances of various types (including many local types), as agents customarily group more basic elements into more complex ones. Historians and philosophers customarily speak of, say, Newtonian theory and stances taken towards it. Clearly, they are referring to what is essentially a composite element consisting of various more basic elements, such as Newton’s three laws of motion, the law of universal gravitation, and numerous theorems deduced from these laws. Taking Patton and Al-Zayadi’s approach to disciplines as our starting point, we argue that a composite epistemic element is characterized by an essential set of component elements plus a second-order theory which identifies this set as being essential to the composite element. As Patton and Al-Zayadi point out, the mere acceptance of each of the component elements is necessary but not sufficient for accepting the composite element; in order to accept the composite element, an agent must also accept the second-order delineating theory that identifies these components as part of the composite element. To generalize, the obtainment of a composite stance amounts to the obtainment of respective epistemic stances towards component elements plus acceptance of the delineating theory.
The introduction of the notions of composite elements and composite stances also allows us to introduce the notion of partial obtainment of a stance, such as partial acceptance. Thus, a composite element can be said to be ‘partially’ accepted by an agent, if the agent accepts its delineating theory as well as some but not all of the component elements. This squares well with how historians and philosophers customarily speak of partial acceptance of composite theories (e.g., partial acceptance of Newton’s theory in the Netherlands in the first half of the 18th century). More generally, a stance S towards composite element E can be said to obtain only partially if the agent accepts the delineating theory of E and takes the respective stance towards some but not all of the component elements of E.
“edhiphy.org: Mention Relations in 20th Century Philosophy”
Gregor Bös (Tilburg University), Eugenio Petrovich (University of Turin), Sander Verhaegh (Tilburg University), Claudia Cristalli (Tilburg University), Fons Dewulf (Hong Kong University of Science and Technology), Ties van Gemert (Tilburg University), & Nina Ijdens (Tilburg University)
I present a database with a companion web-application for the study of 20th century phi-losophy (Petrovich et al. 2024). After summarizing the motivation and an example from the history of logical empiricism, I discuss the technical realization of the database and the accompanying website. The audience will be able to create graphs and network visu-alizations and to test the feedback mechanisms.
Because philosophers have adopted the scholarly norms of the empirical sciences rela-tively recently, citation-based methods are of limited use for the history of philosophy, particularly for the first half of the 20th century. But philosophers have always referred to the work of colleagues by their names. As a group of historians of philosophy, we have constructed a database of such mention-relations and connected it with further metadata, to yield the Enhanced Database for the History of Philosophy (edhiphy). Relying on full texts from JSTOR, the first version of edhiphy extracts and disambiguates 1,095,765 mention links from 22,977 articles published in 12 leading English language philosophy journals from 1890-1979. It was used to study the reception of Logical Empiricism: comparing the prominence of different philosophers and movements, differences across institutions, and by creating network graphs. Among the results are a quantitative demonstration of the dominance of Rudolf Carnap, who comes to rival the influence of Kant; a study of institutional affiliations, which confirms that Columbia University was still dominated by Dewey, when logical empiricism already dominated other leading US departments.
The database design and applications were only possible in a team with historical and technical expertise and required substantial preparation. Yet our team has asked only a small number of the historical questions for which these mention relations will be of interest. The web-application edhiphy.org allows anyone to explore the mention relations between philosophers themselves. Our initial release supports:
- Exploration and search of the data base
- Calculation of mention statistics for individuals and groups
- Creation of mentions/year graphs for individual philosophers
- Creation of co-mention networks via VOSviewer
- For technical users: a SQL interface for custom queries
This enables users with and without technical background to use mention data to study the history of philosophy. Database and website remain in development, with plans for expansion to philosophical literature in more journals and languages.
“Disciplines in the Scientonomic Ontology: the Case of the Rejection of Alchemy”
Izzy Friesen (University of Toronto) & Paul Patton (University of Toronto)
Our paper begins by summarizing the findings of Patton and Al-Zayadi in their 2021 publication ‘Disciplines in the Scientonomic Ontology’ as this paper ultimately draws from their framework. Al-Zayadi and Patton introduce a scientonomic framework for understanding “the role of categories of knowledge, or disciplines,” and demonstrate two components that are essential to a discipline (Patton & Al-Zayadi, 2021). These elements are the discipline’s core questions as well as the discipline’s delineating theory, which is a second-order theory identifying these questions as essential to the discipline (Patton & Al-Zayadi, 2021). This theory already allows for claims related to categories of knowledge throughout the history of science, being complex entities, to be made legible in a database. For example, ‘agent A accepted Discipline D at time t’ is a claim that can be supported by the specific data extracted from a historical case study that indicate agent A accepted specific core questions and the associated delineating theory at that time.
Accordingly, in our work, we present a case study of the development of the chemical discipline (contemporaneously known as “chymistry”) from the 17th century through the early 18th century in Western Europe. We use evidence from the tradition of textbook publication that emerged in the seventeenth-century chymistry to reconstruct the top-level of the question hierarchy of chymistry. Then, we analyze how these questions and their associated theories were received. We determine that, in the 1660s, ‘alchemy’ transitioned from a synonym of ‘chymistry,’ referring to the discipline as a whole, to chymistry’s subdiscipline with a more limited scope. We then identify that a rejection of alchemy’s core questions occurred in the 1720s based on the reception of such questions in scientific publications and by academic institutions. Hence, we conclude that the subdiscipline of alchemy became rejected in the 1720s.
Beyond our historical conclusion, this work in identifying core questions from primary sources also provides a clear example for how such a database of categories of knowledge could be populated from observational research, demonstrating the reciprocal relationship between the case study and the database. The stated goal of current work in observational scientonomy is to ultimately produce a database of intellectual history. Our research closely follows Newman and Principe’s body of work on early modern alchemy and chymistry in order to reconstruct this episode. Yet explicitly using the scientonomic framework to conduct a case study reveals the underlying discipline dynamics, in this instance being the rejection of a sub-discipline. Our work identifies these dynamics in a way that is easier to document precisely in a database than the typical presentation of data in historiographic sources often is. It is admittedly difficult to extract information about the acceptance of questions from secondary sources. Our work illustrates how information from the introductions and organization of textbooks (especially thanks to the thriving ‘textbook tradition’ in the chemical discipline) can reflect question acceptance at specific points in time. This approach, thanks to the ubiquity of textbooks over several centuries of academia, allows for the reconstruction and documentation of discipline dynamics even in case studies which still must broadly draw on reconstructions in secondary scholarship.
“A Database for Sustainable Dietary Recommendations for Generation Z in Switzerland”
Talayeh Dehghani Ghotbabadi (UCSC Silicon Valley extension)
This paper presents a case study on the design and development of a domain-specific, user-centered database intended to support sustainable dietary recommendations for Generation Z in Switzerland. The project integrates interdisciplinary methodologies from behavioral science, nutrition, and digital infrastructure design to create a database that incorporates not only nutritional and environmental metrics but also cultural and motivational drivers specific to the target demographic.
The database is built using a curated set of food items representing a broad spectrum of dietary models including the Mediterranean, EAT-Lancet, and Planetary Health diets. These foods are sourced from national datasets such as the Swiss Food Composition Database, global life cycle assessment (LCA) datasets (Poore & Nemecek, 2018), and Swiss economic data relevant to food purchasing behavior. The dataset is evaluated using multi-criteria decision analysis (MCDA) informed by empirical user data. Motivational constructs derived from Self-Determination Theory (SDT) – specifically autonomy, competence, and relatedness – are incorporated, with weights assigned across five sustainability pillars: Health & Nutrition, Environmental Impact, Affordability, Cultural Acceptability, and Governance Potential.
This study explores the integration of motivational, affective, and cultural dimensions in structured food data. It also addresses the challenge of ensuring long-term sustainability for databases in nutrition, sustainability, and behavioral modeling. The project highlights how intrinsic motivation fuelling by concerns for environmental impact, health benefits, and economic factors can drive food choices. Additionally, it reflects on the role of participatory validation, expert review, and continuous refinement of databases as part of a sustainable, user-centered approach.
The paper concludes by discussing how this intersectional database can support personalised dietary recommendations that meet nutritional needs, minimise environmental impacts, and respect cultural preferences, thus contributing to the broader conversation on sustainable food systems and digital infrastructure in the humanities.
“Narratives in the Cloud: Ethical and Scalable Approaches to Public Discourse Databases in the Humanities”
Augustine Farinola (University of Alberta)
The project employs a hybrid natural language processing (NLP) methodology that integrates traditional and large language model (LLM) based techniques. Comments and replies are cleaned using classical preprocessing techniques, then thematic extraction using Google Cloud’s Vertex AI Gemini model, which is prompted to surface recurring metaphors, discursive frames, and figurative patterns. Text embeddings are generated via text-embedding-005 and visualized through unsupervised machine learning algorithms like KMeans to facilitate distant reading and content clustering. This multi-method approach allows the project to navigate the linguistic complexity, ambiguity, and effective range of discourse on Artificial Intelligence with greater sensitivity than keyword-based techniques alone.
This research constructs a relational database integrating YouTube metadata – videos, channels, comments, replies – with engagement metrics and machine-generated annotations. By combining structured SQL querying with natural language processing and large language models, the database supports both close and distant readings of immigration narratives. Users can examine channel-specific engagement trends, detect recurrent metaphorical framings, cluster comments thematically, or explore sentiment shifts across time and video categories. The database enables granular analysis through this hybrid architecture while preserving its source material’s discursive complexity.
Sustainability is approached both as a technical and epistemological imperative. The database architecture is hosted on Google Cloud SQL and accessed through Jupyter notebooks using SQLAlchemy. Docker containers preserve the environment, while data pipelines (from data scraping to schema normalization) are documented and versioned using Git. Open standards such as CSV and SQL dumps ensure compatibility with other scholarly infrastructures, and plans include a simplified user-facing interface to make the resource accessible to non-technical researchers. Rather than treating sustainability as a purely infrastructural problem, the project understands it as a design ethos – ensuring that the database remains usable, interpretable, and meaningful across time, disciplines, and audiences.
At the heart of the project is a commitment to critical data humanities. The database does not treat humanistic data as mere computational objects but as culturally embedded utterances – traces of authentic voices and emotional labor. The design resists reductive binaries, preserves discursive nuance, and foregrounds user agency by anonymizing data and contextualizing AI outputs as interpretive rather than authoritative. By treating classification and querying as acts of reading, the project repositions the database not just as a technical tool but as a hermeneutic interface.
Ultimately, this paper contributes to ongoing conversations about the role of databases in transforming humanities research. It argues that sustainability must include ethical design, epistemic transparency, and community-oriented access. Through the lens of YouTube discourse on Artificial Intelligence, the project demonstrates that humanities databases can do more than store information – they can sustain dialogue, memory, and cultural complexity in the digital age.
“From Simulation to Understanding: The Epistemic Sustainability of Large Language Models for Digital Humanities”
Giovanni Galli (University of Teramo)
The increasing integration of large language models (LLMs) into digital humanities (DH) infrastructures is reshaping how scholars interact with texts, archives, and structured databases (Rossi, Harrison, and Shklovski, 2024; Sullutrone, 2024). Marketed as tools for intelligent search, summarization, and even interpretation, LLMs promise to enhance access to cultural data and streamline research processes. Yet, their deployment raises significant epistemological concerns about the nature of understanding, representation, and long-term scholarly accountability. This paper critically examines the epistemic implications of using LLMs as tools for querying and interfacing with DH databases, focusing on the form of scientific understanding they afford and the conditions under which their use can be considered epistemically sustainable. Drawing on the hermeneutic concept of Verstehen (Khalifa, 2019), I argue that LLMs risk reducing understanding to a form of plausible simulation, whereby meaning is generated through statistical association rather than critical reconstruction. This threatens to displace key humanistic values, contextuality, reflexivity, and interpretive plurality, in favor of computational fluency.
To address this, I introduce the concept of epistemic sustainability, defined as the capacity of a knowledge system to support the long-term production, transmission, and critique of inclusive, context-rich knowledge without undermining the interpretive and ethical foundations that make such knowledge viable. It follows an articulated set of normative constraints under which LLMs might be integrated into DH workflows without compromising epistemic integrity: (1) traceability and transparency of sources, (2) methodological compatibility with interpretive research, (3) reflexive positioning within scholarly labor, and (4) alignment with archival standards and metadata ecosystems.
The argument is situated within ongoing debates in Critical AI Studies and digital humanities infrastructure research. Drawing on the work of Crawford (2021), Noble (2018), Benjamin (2019), and McPherson (2012), I critique the limits of current explainability (XAI) frameworks, showing that they fall short of the conceptual and historical legibility required for Verstehen. Rather than treating LLMs as epistemic authorities, I propose their use as heuristic agents, tools for provoking new questions and associations within explicitly situated scholarly contexts. Ultimately, this paper argues for a more reflexive and ethically grounded integration of AI in the humanities. Without such a framework, the risk is not only epistemic distortion but a deeper erosion of the values and practices that sustain humanistic knowledge and understanding.
“Databases for Humanities Internships: Lessons Learned from Community Service”
Christine James (Valdosta State University)
One of the most important areas of growth in undergraduate education is experiential learning, often internships, giving students the chance to develop skills that can translate into jobs and opportunities after graduation. As a professor located at a four-year undergraduate university in the southeastern United States, I contribute to internships in two specific programs that serve diverse populations: a Teagle Grant funded program spearheaded by our Political Science department for rising high school seniors doing a year-long civic engagement project, and a Mellon Grant program giving students in the humanities funding to engage with the local community (for example, Religious Studies majors interview and document religious diversity, Spanish majors work at local high schools translating for children of migrant farmworkers.) In the current political context, documenting and creating databases of the student’s work and maintaining a searchable archive of their achievements has become a challenge, with words chosen very carefully. We normally use the services of our campus library, which maintains a repository of scholarship and archives of local community service activities called Vtext. At the same time as I have been a professor, I have also engaged in community service with a variety of different groups locally that also connect with specific groups that include members from a variety of religious and political backgrounds, but they often engage with the community on similar projects that also offer internship opportunities. For example, the local junior service league funded early welfare and USO programs that resulted in procuring the first full-time welfare worker in the community, and the beginning of what is now a Speech and Hearing Clinic on the university campus. Many communities in the southern United States have women’s groups with a history of service projects, specifically projects that were initiated by the women’s group and then expanded to other community members and turned over to others. This is the model used by the Teagle and Mellon Grants, which are meant to seed future collaborations between universities and entities in their communities so that students can benefit from a wide choice of experiential learning and research opportunities. This presentation will share the lessons learned from the history of community service groups and their archival record keeping as we embarked on databases for our internships and community service students at Valdosta State University. This research is in dialogue with current work on library-based humanities databases (Sharkey 2023), humanities databases that rely on qualitative rather than quantitative data inputs (Moulaison-Sandy and Wenzel 2023), internship programs in political science (Jordan and Matzke 2024), and internship programs that ask students to engage with populations that are in traumatic or economically difficult political contexts (Dal Santo et al. 2024)..
“Sustainable Access and Metadata-Driven Design: A Digital Humanities Case Study on ‘Golden Age’ Nile Travel and Archaeology”
Sarah Ketchley (University of Washington) & Matthew Strupp (University of Washington)
This case study details the creation of a sustainable digital humanities database focused on late 19th- and early 20th-century Nile travel and archaeology, developed through a collaboration between a disciplinary expert in Egyptology and DH, and a University of Washington MLIS capstone student. Using the open-source CollectionBuilder platform, the project restructured a complex SQL database into a system of interconnected CSVs while building a flexible, metadata-rich infrastructure that supports long-term accessibility and reuse.
A key objective was to develop a backend metadata template to accommodate comprehensive descriptive fields – including summaries and IIIF-compatible image references – and reflect all original SQL-exported data on the public-facing site. This effort created a pathway to transition from relational database structures to a transparent, flat-file system optimized for browsing, filtering, and long-term preservation. A new date field was also introduced, enabling chronological sorting across the collection, a vital feature for users analyzing historical travel patterns and publication timelines.
Another major milestone involved developing programmatic strategies to retrieve IIIF manifests from the Internet Archive to embed readable, high-resolution scans into item pages – providing users with an intuitive, immersive reading experience. The digital collection will include books rendered in full via an IIIF viewer, with image thumbnails presented on a browse page and interactive filters that allow sorting by author, location, publisher, and other key metadata fields. These filtering capabilities mirror the ‘search’ functionality in the original SQL database, and offer refinements to support targeted exploration of the collection.
The platform’s lightweight, static-site architecture reinforces the project’s sustainability goals by minimizing server dependencies and encouraging longevity. It does not require SQL expertise to maintain, which was an important consideration for ongoing maintenance and expansion of the corpus of material in the CSV. Designed with interoperability in mind, the system supports open standards and can be repurposed or extended for future digital humanities project work.
The requirements of the capstone project include weekly check-ins, and a Capstone Project Charter which provides the skeleton framework for the development of project documentation. This documentation describes both the development process and the design and technical challenges addressed over the course of the project. The CollectionBuilder team at the University of Idaho have provided advice on aesthetic refinements and display limitations, like Google Books notices appearing in place of cover thumbnails, and issues with IIIF sidebar page previews.
At the conclusion of Capstone work, outcomes will include a completed public-facing website with backend standardized metadata, and high-quality documentation offering clear guidance to maintain and expand the site by future collaborators. Documenting the CSV structure, metadata schema, IIIF integration steps, and customizations to the CollectionBuilder framework will be essential to ensuring the platform’s longevity and usefulness.
This case study contributes to ongoing discussions in the digital humanities about sustainability, transparency, and data reuse. It demonstrates how collaborative, interdisciplinary work can yield accessible and adaptable infrastructure that balances technological rigor with humanistic inquiry. Aligned with SDH 2025 themes, the project offers a replicable model for building metadata-first, minimal-computing archives that prioritize usability, accessibility, and preservation across communities of practice.
KEYNOTE: “Basic Formal Ontology (BFO) as a Sustainable Method for Building Databases: Promises and Challenges for Application in the Humanities”
Rasmus Larsen (University of Toronto)
Since the early 2000s, the use of formal and applied ontologies has become central to the development of sustainable databases in the life sciences. High-impact initiatives such as the Gene Ontology and the Ontology for Biomedical Investigations have demonstrated how ontologies can improve the long-term interoperability of databases. A key factor in the success of these projects is their use of the Basic Formal Ontology (BFO) method, a so-called “top-level” ontology designed to provide a domain-neutral, realist framework for organizing information in a coherent and computationally tractable way.
In this keynote, I introduce BFO as a method for building semantically rigorous, maintainable, and future-proof databases. I outline how BFO contributes to sustainability through a set of principles and design patterns that enforce clarity, reduce redundancy, and promote integration across research domains. Drawing from concrete examples in the life sciences, I illustrate how the BFO method supports the development of successful, large-scale collaborative data infrastructures.
I then turn to the question of whether these benefits can translate to the Humanities, a complex amalgam of diverse research domains, with obvious dissimilarities from the sciences. Can BFO serve as a sustainable foundation for structuring cultural, historical, or interpretive knowledge? I explore this question by highlighting the most obvious promises and challenges of using BFO in the Humanities. On the one hand, BFO offers a clear framework for representing temporal entities and fiat objects—data types that are common in Humanities databases. On the other hand, the ambiguity, contextuality, and interpretive flexibility characteristic of humanistic knowledge pose significant hurdles for adopting realist ontology frameworks like BFO.
By mapping out potential opportunities and constraints, I aim to open a critical conversation about the potential role of BFO in the Humanities: not only as a technical solution for database design, but also as a methodological proposal that invites reflection on what it means to represent complex human-centered knowledge in a structured format.
KEYNOTE: “Sustaining Linked Historical Spatial Data: Institutional, Ethical, Semantic, and Environmental Considerations”
Ruth Mostern (University of Pittsburgh)
This talk uses the example of the World Historical Gazetteer to explore the concept of sustainability from a host of different directions. Drawing on our recently published Digital Initiative Sustainability Report, it makes the case that sustainability is not primarily a technical problem. Rather, what must be fostered and sustained in order for an open and linked digital initiative like the WHG to flourish is a community of committed and enthusiastic co-creators who consider the initiative impactful, meaningful, and, indeed, essential to what they wish to accomplish. This, in turn, raises a host of questions that are less tractable than questions about server space or data standards. How do we find funding to maintain the work during times of austerity and uncertainty? How do we communicate with people who we believe would benefit from knowing about the initiative and learn how they hope to engage with it? How do we learn from the work of scholars in reparative description and critical archive studies in order to make the work as inclusive and liberatory as possible? Sustainability is a keyword in environmental politics and information science. Therefore, this talk also draws for inspiration on the United Nations Sustainable Development Goals.
“Towards Universal Digital Bibliographies for the History of Philosophy: A Complete Workflow and Two Applications“
Thijs Ossenkoppele (University of Amsterdam) & Arianna Betti (University of Amsterdam)
With the advent of mass availability of digitized books and book metadata on the internet, present-day history of philosophy is no longer constrained by limiting considerations of source selection – such as limited physical accessibility and heavy reliance on canonical works. In addition, recent progress in technologies that enable accurate information retrieval from large bodies of digital text has opened up the possibility to involve numbers of works in historical research that were previously unmanageable.
Given these developments, it is surprising that to date no sound methods for building universal bibliographies that can form the basis for wide-scope, long-data research in the field have been proposed. In this talk, we propose such a method. We present a workflow for selecting, collecting, curating, and enriching bibliographic metadata proper to large-scale investigations in the history of philosophy, and we show how to cast these metadata into searchable and shareable electronic bibliographies, all informed by best practices in both library science and the Linked Open Data community. Our bibliographies are designed to be fully adaptable and interoperable, allowing others to modify them for different research aims or integrate them in broader bibliographies.
We illustrate our workflow by showcasing two of its applications. The first application. PHIL-DEDELA-18th, concerns a manually curated metadataset of logic and philosophy books, written in German and Latin and published between 1720 and 1789 in (what was then) Germany. PHIL-DEDELA-18th originated as the union of a dataset containing 668 manually curated records derived from expert scholarship in the historiography of philosophy and logic, supplemented with metadata from HathiTrust, and a dataset containing 1090 manually curated records originally harvested from WorldCat. PHIL-DEDELA-18th counts 1549 unique editions (1758 non-unique editions, 209 duplicates), clustered into 796 unique works, written by 518 unique authors, for which authority records are provided.
Our second application of the workflow, BOOKSHELPhS (Books in the History of English Logic, Philosophy, and Science), contains bibliographical entries of logic, philosophy, and science books in English or Latin, published in Britain between 1605 and 1776 and written by authors alive during that period. BOOKSHELPhS is an ongoing project, currently consists of rich metadata of 2.123 editions of 1.272 works written by ±792 authors, and is available in a Linked Open Data format.
In presenting the applications of our workflow, we describe the methods applied, challenges, and limitations encountered in the processes of obtaining and curating our data. Curation included author-, period-, language-, and location identification, as well as correction, deduplication, and so-called FRBRization, namely the hierarchical clustering of records according to the publishing history of a book as representing or not the same edition of the same work. In addition, we describe the tools and strategies used to link our metadata to full-text items, and to convert our initial string-based metadata into Uniform Resource Identifiers (URIs), enabling their use as nodes within the graph structure of our linked datasets.
“SOPHIS: A New Database for Investigating the Social Structure of Philosophy of Science”
Eugenio Petrovich (University of Turin)
The analysis of the scholarly field of philosophy through advanced quantitative methods has gained increasing traction over the past decade. Existing studies leverage major databases (e.g., Scopus) and digital text archives (e.g., JSTOR) to scan philosophy using methods such as topic modeling, citation analysis, text mining, network analysis, and more (Petrovich, 2024).
However, the expansion of these Quantitative Studies of Philosophy has also revealed significant limitations in current databases and digital resources, particularly in terms of geographical and linguistic coverage, temporal scope, and data quality – not to mention the proprietary nature of most datasets, which imposes severe restrictions on open science approaches (Petrovich et al., 2024).
In this talk, I will present SOPHIS (Social Observatory of PHIlosophy of Science), a new open-access database designed to offer a unique perspective on philosophy of science, a subfield of philosophy that has attracted considerable attention in Quantitative Studies of Philosophy (see e.g., Khelfaoui et al., 2021; McLevey et al., 2018; Petrovich & Viola, 2024). SOPHIS enables, for the first time, large-scale investigation of the social dimension of this field, shedding light on the complex web of actors involved in its development. The database includes metadata for 6,826 research articles published in eight leading philosophy of science journals between 2005 and 2019. These articles are linked to two key groups: (i) the authors of the articles (n = 4,395) and (ii) all the scholars that are mentioned in the acknowledgement sections of the articles (n = 9,029). Incorporating this second group of actors is crucial for uncovering informal collaborations and social relationships that remain invisible in standard social analysis based on formal authorship (Petrovich, 2022).
Furthermore, all actors in SOPHIS have been systematically associated with their full academic affiliations (university, city, and country), citation and publication metrics, gender, philosophy of science awards, and roles within major philosophy of science associations. In this way, SOPHIS allows multidimensional analysis of the social structure of philosophy of science, allowing to investigate topics such as prestige flow, community sub-structure, geographical distribution, and gender composition of the field at an unprecedented level of scale and precision.
In the first part of my talk, I will detail the methodology used to construct SOPHIS, focusing in particular on how technical challenges such as reconciling different name variants and affiliation data retrieval were addressed. I will then demonstrate the kinds of analyses that SOPHIS makes possible, including the creation of a co-acknowledgment network map (available at https://tinyurl.com/288wkbxt), which provides novel insights into the social communities within philosophy of science.
“Building and Sustaining a Global Resource: The Islamic Scientific Manuscripts Initiative After Thirty Years”
Jamil Ragep (McGill University) & Sally Ragep (McGill University)
Thirty years ago, when we began development of the Islamic Scientific Manuscripts Initiative, the last thing on our minds was sustainability. As with most developers, our focus was on the myriad problems associated with database development. In addition to the normal problems associated with databases of the 1990s, we also had to confront things like non-western fonts, right-to-left writing, and access to thousands of manuscripts held by libraries worldwide. Our rather audacious goal was to ‘to make accessible information on all Islamic manuscripts in the exact sciences (astronomy, mathematics, optics, mathematical geography, music, mechanics, and related disciplines), whether in Arabic, Persian, Turkish, or other languages.’ Fortunately, we were aided by a long tradition of modern bio-bibliographical literature regarding the Islamic scientific tradition. Unfortunately, this tradition was rife with errors and tended to be heavily Eurocentric both in outlook and in its heavy reliance on manuscripts in European libraries.
Over those thirty years, we have been able, somewhat to our surprise, to overcome many of those early obstacles. In many ways, we were beneficiaries of globalization and the heady optimism of ‘the end of history’ that led rather quickly to Unicode fonts, remarkable advances in the computerization of Asian scripts (in our case Arabic-scripts), and the integration of word processing and database technology. No longer limited to typewriter-style fonts, we could by the early 2000s develop fields that could accommodate both Arabic script and Latin scripts with diacritical markings.
All this was great in terms of character-entry, but the real question was how to populate the database with meaningful data. This required both human and financial resources that we were fortunate enough to obtain. Starting at the University of Oklahoma and then at McGill University, a series of grants allowed us to continue developing the database. In particular, several major Canadian awards gave us the resources to hire expert researchers and also purchase a state of the art server for storing the database and manuscript images. A collaborative agreement with the Max Planck Institute for the History of Science (MPIWG) gave us access to a strong IT team and additional financial aid.
Thanks to the MPIWG collaboration, the database evolved into a powerful resource that allowed not only for standard cataloguing information but also ways to trace sociological information – dissemination of texts, commentary traditions, teacher-student relationships and so on. A Drupal online version allowed us to share our work internationally.
What we hadn’t sufficiently planned for was sustainability. As retirements loomed, and the collaboration with the MPIWG came to an end, several major challenges came into focus: 1) particularities of a bespoke database meant that it would not be easy to transfer it to another institution; 2) limited lifetime of our servers required a new partner with resources to accept the transfer of large amounts of data; 3) finding a library or university willing to take over such a database.
We will discuss how we overcome these challenges and found an institution willing to sustain and strengthen the project.
“From Footwork to Frameworks: Curating a Multimodal Dance Sequence Dataset through AI”
Rohini Srihari (University at Buffalo), Ankitha Sudarshan (University at Buffalo), Tanvi Ranga (University at Buffalo), & Lata Pada (Sampradaya Dance Company)
This project proposes a sustainable, culturally aware, and community-driven database for South Asian dance – one that captures its intricate movement vocabulary and rhythmic structures in a way that supports both scholarly inquiry and creative practice. South Asian classical dance forms (e.g. Bharatanatyam) are deeply rooted in tradition, storytelling, and embodied knowledge passed down through generations. As performance practices evolve and younger dancers engage with digital media, there is a growing need to thoughtfully preserve, study, and share these forms using modern tools.
Our work centres on building a multimodal dataset in collaboration with a premiere South Asian dance company in Canada, known for blending classical Indian dance with contemporary storytelling. In its current phase, the project focuses on recording movement sequences performed by trained dancers, captured using a multi-camera setup and processed with pose-estimation tools to generate structured 3D motion data. We are also creating detailed textual descriptions for each sequence, which serve as metadata and a foundation for future modeling. In parallel, we are compiling an audio dataset of classical rhythmic patterns and spoken syllables used in performance, drawn from archival and studio recordings. While the emphasis is on movement, this framework lays the groundwork for integrating rhythm and expression in future iterations. Later phases aim to develop a unified foundation model for South Asian dance, combining movement, rhythm, and musical elements. This will facilitate the automatic generation of movement sequences and assist dancers with exploring their own creativity.
We are also experimenting with generative AI models like GANs (Generative Adversarial Network) to help create new sequences of movement and rhythm not to replace artists, but to provide tools that enhance creativity and encourage exploration.
Importantly, this is not a project about replacing dancers or musicians with AI. Rather, we see AI and digital tools as enablers – helping dancers, teachers, and researchers better understand, share, and explore this rich tradition. AI becomes a companion, not a substitute. Through structured annotation and metadata (including rhythmic cycle, tempo, movement type, expressive intent, and school-specific style), we aim to build a resource that honors the diversity within Bharatanatyam instead of flattening it.
This approach speaks directly to the themes of SDH 2025, particularly the challenge of representing complex cultural knowledge in sustainable ways. Classical dance is not a standardized form; it varies by region, teacher, and context. We reflect this richness using flexible data schemas that support diverse interpretations and evolving performance practices, ensuring the database grows with the artform rather than freezing it in time.
Sustainability is at the heart of this work. Dancers and educators are collaborators and co-curators from the start. The platform is designed to serve both education and research, with open documentation, standardized formats, and thoughtful versioning to support long-term reuse and accessibility.
By building this database, we hope to show how embodied and oral traditions can be preserved and shared digitally – not by simplifying them, but by carefully capturing their rich, underlying structure. With care and respect, we believe technology can support the very things it’s often feared to erase.