The Holocaust History Podcast

Ep 50. Digital Humanities and the Holocaust with Todd Presner

Waitman Wade Beorn Episode 50

Send us a text

How can the digital humanities address and explore the Holocaust? In these days of Chat GPT, we may be rightly wary about the use of computers to analyze the past.  However, today’s episode shows how an ethical approach to using computational methods can expand our understanding of the past often by showing us new questions that we hadn’t considered before.

 

In this episode, I talk with Todd Presner about his fascinating and impressive work with the “big data” of recorded Holocaust testimony.

 

Todd Presner is Chair of UCLA’s Department of European Languages and Transcultural Studies at UCLA.

 

Presner, Todd. Ethics of the Algorithm: Digital Humanities and Holocaust Memory (2024)

Follow on Twitter @holocaustpod.
Email the podcast at holocausthistorypod@gmail.com

The Holocaust History Podcast homepage is here

You can find a complete reading list with books by our guests and also their suggestions here.

Waitman Beorn (00:00)
Hello, everybody. Welcome to the Holocaust History Podcast. I'm your host, Whitman Bourne. And today we're talking about two subjects really intertwined. One is digital humanities, the use of computing power and various different computer, AI assisted, large language model ways of analyzing Holocaust sources. And then the other thing is testimony, which is the sources.

that are being analyzed by our guests today. And I really can't think of a better person to talk to us about really the merging of these two ideas, these two topics, than Todd Kressner, who is an expert in this and has just written a really, really fascinating book on the topic. And so Todd, thanks so much for coming on.

Can you start, because I'm fascinated with this too, as to sort of how you got involved in this, in this particular way, because one of the things that comes through, I think, really nicely in the book is, you you demonstrate your mastery of a lot of really complex computer topics, software topics, programming issues, but also an understanding and sensitivity of the history as well. So, I mean, that's just really interesting combination.

Todd Presner (01:22)
Thank you. really appreciate it. So, you know, I'm a scholar of German Jewish history and Holocaust studies and it's something that I've been working on and studying for really the whole scope of my career about 23 years at UCLA and it's also a time where I became very interested and involved in what we now call the digital humanities, which is basically bringing together of computational tools and methods with human

of

Foundation and was very interested in the indexing systems that they were developing to make the testimonies searchable and useful for listeners and educators and researchers. One of the problems, of course, that this testimonial archive has, and many others as well, or maybe challenges is a better word, is the sheer size. We're talking now about 55,000 testimonies in 40 languages.

over 120, 130,000 hours of testimony. So it's a massive archive and that's just one of hundreds, if not thousands of different Holocaust archives around the world. And so a single listener can't possibly take in all these testimonies, can't possibly analyze them one by one as much as they demand to be listened to one by one. And so the question really becomes, they're ethical, responsible? And I'd say historically,

grounded ways of using computational tools to help us not only search within the testimonies, but maybe to listen and discover things that we wouldn't otherwise be able to do without those tools. And so that's really kind of the basis of how I started to develop this idea of what we call an ethics, or what I've been calling an ethics of the algorithm. Really responsible ways of using computational tools to delve into massive digital archives like the USC Shoah Foundation.

Waitman Beorn (03:51)
Yeah, I mean, that's a really great introduction to sort of the really to both your project and digital humanities, I think, at a certain level. And we'll talk a lot about about scale, probably this conversation, because, you know, one of the things that the digital world allows you to do often is handle, as you suggest, scales that are unwieldy or impossible for sort of a one human brain to handle at a time.

Before we get into sort of the meat of the project, for those of our listeners who may not be tech savvy or even digital humanity savvy, can you talk about what is an algorithm? What are the parts of it that we're dealing with here? How does that function?

Todd Presner (04:41)
Right, so an algorithm, I mean guess the most shorthand definition would really just be a set of instructions for a computer to do something, usually a set of calculations. the word algorithm kind of now has, I'd say, a widespread use in society. We talk about like the Facebook algorithm or the Google algorithm. And often there are algorithms, of course, behind what we see on Facebook,

recommended to us, for example, based on a whole series of parameters, know, things that they've determined are relevant for our profile and things we may want to see. Same thing with how PageRank works in Google. So when you search on something, there are algorithms that determine what you're going to see. And that's again, set of calculations that are made. And those calculations can change over time, the algorithms can be fine tuned, they can be adjusted. But at the end of the day, we're talking about really

computational processes that give some kind of results. Algorithms can simply just do basic things like count. You can count the number of words or co-occurrences of things. You have search algorithms that allow us to kind of more or less understand a set of sources and that those algorithms may rank them. So algorithms can rank, they can count, they can optimize traffic directions.

out what's the best way to get from point A to point B, I put it into Google Maps and I get essentially a route. Again, a series of calculations had to take place, many of which I'm not privy to as well because those algorithms can be guarded by corporate interests and things like that. I'm interested in using, I would say, algorithmic thought and algorithmic approaches more broadly.

But also again, it goes back to that question of responsibility, trying to figure out not to predict things or to obviously in this case, you know, to I'd say optimize or even do really bad things with algorithms like police or profile, but to see if they can change the way we attend to certain things in the cultural record. So even again, kind of enhanced thinking that could come through the use of algorithmic methods, again, calculations,

quantification, use of these kind of tools to help us understand something about culture, society, history.

Waitman Beorn (07:22)
And I'm guessing, maybe I'm wrong, but that algorithmic sort of processes are not the same as AI. And of course, AI is a buzzword in all kinds of ways now. And of course, there's generative AI as well. is it fair to say that they're not the same thing, but that an AI might use an algorithm, but that an algorithm itself is not sort of it's not a learning sort of process.

Todd Presner (07:50)
Right, right. So algorithmic thinking, even the term algorithm has been around for centuries and really just refers to algebraic calculations. So, you know, they don't even need computers per se. mean, we're just, we're really at end of the day, we're talking about counting. And so on some level, AI is based on, you know, many algorithmic calculations. Certainly when you're talking about like a large language model in order for it to function effectively

and produce accurate probabilities, which is basically what it's doing. It's it's the probabilities that a particular word or token will come next. A whole sequence of calculations is happening, you know, at the level of billions of calculations. And so it's certainly an algorithmic process. And the other thing that happens with these kinds of algorithms, at least, you know, machine learning algorithms, is they become better at modeling the content that they have. They essentially are like learning algorithms, right?

get better at understanding whatever their task is. And so in many ways we see this with, know, chat GPT, right? The first versions didn't work so well. They didn't make a lot of sense, like grammatical and so forth. But, you know, now we have language models that produce pretty coherent, I mean, very coherent sentences, outputs. And so they really, again, get better at their function. So yeah, I think the basic sense is yes, AI relies on

Waitman Beorn (09:26)
Yeah. so moving slightly on from that, because as you say, the title of your book is the Ethics of the Algorithm. And the first chapter is really a discussion of the ethics or lack thereof, or the fact that algorithms can be sort of neutral or not. mean, can you talk a little bit about, mean, because as you point out in the book and as listeners might surmise, algorithms are created by people ultimately. And so

You know, that's there, we always are at risk of importing our own biases, et cetera, into them. But can you talk about what you mean by sort of an ethical algorithm or even the search for ethical algorithm, even if an ethical algorithm is even possible?

Todd Presner (10:13)
Right, right. That's a really important point. And so I think we, many of us have experienced or we've maybe read about, you know, what we might call unethical algorithms, algorithms that my colleague like Sophia Noble will talk about as algorithms of oppression. So this is largely algorithms that are maybe making non-consensual profiling of people to maybe optimize, you know, roundups or policing or surveillance or what have you. And so, you know, this is kind of the

dialectical underbelly of any kind of computation or ethics, or sorry, approach of using algorithms. And so it raises ethical questions, right? And even in the history of the Holocaust, and this is something I deal with in second chapter, I want to talk about the fact that computational approaches, in this case using punch card systems, but also counting data collection, data visualization, essentially algorithmic approaches to people, right? The history of eugenics and

biostatistics that goes right up to registration forms, censuses and roundups, these were all algorithmic approaches. So the potential of algorithms to do unethical things is even rooted in the history of the Holocaust in very fundamental ways. We know this in the very famous book, IBM and the Holocaust, again, that complicity between technology, a corporation involved in computation and really the character

out of the dehumanizing workforces and roundups and violence of the Holocaust. So I would just say that this question of ethics has always been with us when we're thinking about algorithms. The question for me is, are there ways of thinking about algorithm approaches that are not unethical, right? That are there ways of thinking about how computation could extend what we recognize or what we do or what we attend to and how we listen?

I think a really interesting challenge for us. Are algorithms by definition always oppressive? I don't think so. And in fact, the whole book is about that there are ways of thinking about computational algorithmic processes that are not only responsive and responsible, but can help us listen and understand in new ways. And that's really what I'm trying to get at with an ethics of the algorithm. And I'm also referencing ethical traditions here. I mean, it go really back to Aristotle, who first talked about ethics in terms of

character and behavior and decision making. You might even call the book title, know, you know, ethics of the algorithm could also be character of the algorithm, could be behavior of the algorithm, it could be decisions of the algorithm, all of which then kind of raise these really interesting questions about, okay, how do we program a character into an algorithm? How do we look at its behavior? How do we assess the decisions that an algorithm is making, which is really interesting, right? Because humans are not the

the ones making decisions anymore, especially with regard to, again, when I put a can think of a search query I put into Google, there's no human telling me like this is the most important thing. The algorithm is deciding certain things. And so we really need to have, I think, an ethics of algorithms precisely because of the potential for violence, biases, misinformation. And when we do really think about what it would be like to develop an ethics of algorithms, I think that we can think about values and principles

Waitman Beorn (13:28)
Thank

Todd Presner (13:42)
that we think are important for the decision making, the character, and the behavior of these technologies.

Waitman Beorn (13:50)
Yeah, and I think that's really interesting because at one point in that discussion, you kind of addressed the issue of whether you could program a morality sort of variable into an algorithm, you know, that would sort of that would make, you know, moral, ethical decisions. You know, and it's really interesting because ultimately, you know, if I'm reading you correctly, ultimately, the answer is no, because that's that's something that only a human, you know, with human consciousness and experience.

is able to sort of answer the should we or shouldn't we, know, the Jurassic Park question, right? Of like, you know, sure, we can do this thing, but you know, should we kind of thing? Is that a right reading of that?

Todd Presner (14:30)
That's a correct

reading. I in fact argue in the book that algorithms make decisions, but they don't have ethical decision making agency. And in fact, when we outsource ethical decision making agency to algorithms, you know, about what we should and shouldn't do, we're in a very dangerous territory. But again, there's something, there's a little bit of a nuance here because it's not so much that it's an either or like either we outsource or we don't, but that there are decisions that algorithms are making and how we arrange data, how we perceive

the world, how we interact with data, and so it's in that regard that we have to really think about the ethical questions. It's not that algorithms can be, you know, moral, immoral, or amoral, although certainly there are, I think, algorithmic approaches that are fundamentally immoral. I think there are. I think many people would agree, especially if we're, say, outsourcing decisions to, say, kill to an algorithm, like maybe a killer drone, for example. I think

that this is a very problematic territory. But I think we take a couple of steps back away from that. think really again, I want to look at what can algorithmic thinking or algorithms do that help us to attend to things that are there in the cultural record that human beings have not adequately attended to. And so there's really this question of attention is for me really important. And it does go very much up to how language models and AI work.

which is around attention mechanisms, like what information do they think is important and what they think is less important. But again, those are also really interesting value questions, like who gets to make these decisions? How are these algorithms optimized or programmed to do certain things? Which is precisely the questions I want to ask when we have a very large body of digitized archival material and we begin to approach it using computational tools. What does it help?

Waitman Beorn (16:49)
Yeah, and before we get, because I want to get into that really quickly. But before we do, again, just maybe some basic scene setting for the audience. I promise we're to get into the Holocaust applicability shortly. But maybe you can really quickly describe to us, explain to us what a large language model is, because that's the other, know, algorithms are one sort of really key component of your work and research along with your partners and your...

your students, cetera, but so is large language models as a tool. So maybe you could talk a little bit about sort of an idiot's guide to a large language model.

Todd Presner (17:24)
you

Well, I I'll certainly try. So one of the interesting things about this book is it did come out right on the edge of when chat GPT became popularized. 2023, the book came out in 2024. And to be honest, most of the research happened in the previous five or 10 years where we didn't really have large language models in quite the way, well, in terms of their size that we do today. We had large language models, but they were largely what we

called machine learning models and these are the kind of models that were used to do more or less natural language search that at Google that for example you put in a sentence into Google you know where's the closest restaurant to my house it more or less can understand you know the semantic you know aspects of the question the meaning of the question what's intended by the by the user and it's largely because it can process the language into what are called tokens which are individual components and it understands the relationship between

those tokens, not just like grammatically, but also the context in which the different words function. And so, you know, it's about a restaurant and it's a relationship to my house and where's the nearest one. They want to know, you know, it's a maybe a location based, you know, question. So, you know, with regard to large language models, these are largely using the same technologies that were even there five, eight years ago, but the size is different. So you have, you have really billions and

billions of what are called attention mechanisms and billions of parameters that move through essentially what's kind of a model as a neural network, essentially a model of more or less how the brain works. And the idea is that if you, one, have a large amount of training data, and the training data is pretty much scraped from the web, Wikipedia, books, but even like social media and other things,

you begin to create what are called embeddings which are basically words that tend to like occur around other words like I don't know, take the word pineapple, know, it's gonna have, you know, there's certain words that are probably familiar like blueberries and fruit and so forth because these words more or less are connected to it. Other words much further away. This is important because the language model is predicting the likelihood of a next word. There's a kind of probabilistic, you know, set of calculations that are being made.

made

and it's based on these embeddings and essentially the attention to certain words that are likely to inflect other words later on in the sentence. Essentially, that's basically the way in which these early, like Bert, these early transformers worked, which are what we use in the book. And I say early, I mean 2018, 19, it's not that early ultimately. But the only difference is that they're just simply

Waitman Beorn (20:15)
Thank

Todd Presner (20:24)
more parameters in the large language models of like, know, GPT-4 or other things. In any case, the basic idea again is next token prediction, which is like, what's the probability of the next word occurring? And does it make sense, you know, semantically? And it's based on really billions of calculations that happen more or less in real time to predict logical sequences of words. That's really what a large language model is. And it raises all kinds of interesting questions.

questions. How can these models be used for historical research? How do they create the possibility of misinformation, hallucinations? We can talk about this later, but we're working on a project right now about how AI models answer Holocaust related questions. So do they answer it accurately? Do they have nuance? Is it complete? Do they hallucinate? Do they sanitize? And these are critical questions because people are relying more and more on gemini

I

So that's what we're currently doing and it kind of builds directly out of the work of the book.

Waitman Beorn (21:59)
And does, a curiosity, mean, is there a difference in using a model that's sort of trained on, you know, everything versus a model that would be only trained on the text from Holocaust testimonies? is, because is there a different language and then the probability to change about what someone might, what word might follow what? And does that, is that helpful or unhelpful?

Todd Presner (22:18)
Yeah, absolutely, absolutely.

Right. think that it's not, well, one, it's there is a difference and two, it is helpful. So that's often what's now called, what are called RAG or this retrieval augmented generation. That's a very strange word or strange phrase, but it basically means that the model has been like basically fine tuned to an authoritative set of sources. So rather than just everything is equal because, you know, if you just have a generic language model, you know, stuff that's on Reddit, you know, is the same as stuff on the Holocaust Museum website, which is

same as what's in a book, it's the same what's on CNN or a social media site. It's a kind of leveling effect, right? So this is not really very good because it also means that we don't ultimately know what training data was fed into these language models and it can include some very unsavory, downright racist or white supremacist material because that stuff is on the web and if these companies are scraping at it, it's just like everything, we need more data, more data, more data, you're going to

have those that really problematic content also contributing to the way these calculations and probabilities are generated. It could very much depending on the weights assigned change the outputs. And I think you also see this across languages. So, you your answers to one query in Ukrainian may be actually very different than in Russian, maybe very different in, you know, I don't know, Mandarin and different in English. That's actually really something else that we're looking into.

because again, the training data available in the language also inflects the way these models work, even though there is some translation that happens across the models. So the point here is that you're exactly right. If you have a more fine-tuned set of material, like an authoritative archive, maybe of testimonial sources or from a museum or archive, it's likely that the answers you get will be more constrained. And I mean that in a better, in a good way, because they're more

precise and I think also more authoritative and you could potentially have this other issue and I mentioned this at end of the book. You have the idea of you can know the provenance network of the model. Like you know what it's getting. You could even go back and say, it's this testimony. It's this source. Right now we can't do that, right? If you're using a language model, we have no idea what it's ultimately, you know, quote unquote consulting. You know, there's, very few

that you can actually trace back the how did you how did the model arrive at this output you know occasionally you get sources cited now you know using you know Claude or Gemini but again it doesn't tell you how the language model came up with the sentences that it did

Waitman Beorn (25:16)
But again, I guess this is part of the challenge, I not to go off on an AI tangent or particularly a generative AI tangent, ultimately one of the issues is that the AI in answering your question is not exactly thinking through the answer. It's just trying to again, as you suggest, where that question has been asked before, what was the answers, maybe what were the sources that a book had cited that was somewhat.

relevant, it isn't actually doing the work of thinking through how can I answer this question. It's trying to sort of replicate something that already exists.

Todd Presner (25:54)
That's right. That's right. It's basically, again, it's massive set of calculations that are being made to produce an output that may or may not answer the question asked. And ultimately, we don't know, if we call it thinking, maybe in quotes, we don't know at least the thought process or logic that ultimately happened to generate that answer. So in the book, mean, part of what I'm calling as well for an ethics algorithm is actually the ability to trace back those provenance networks.

you know, for AI systems to actually, you know, come forward and tell us, actually what they were trained on. And for us to be able to get under the hood and understand those calculations. I think this is one of the huge challenges for AI, generative AI, and misinformation or disinformation, you know, deliberate, you know, kind of spreading of, say, hatred. And it's all the more risky when these models get integrated into applications you're using, right? So if you're using

using X and Grok is the one, the AI model, they're generating responses to things, it's very easy to propagate anything potentially without having the ability to understand one, how the model come up with this, and two, the decisions that may have been made to say, enhance certain behaviors, to guardrail certain things, to prevent certain things. There's a whole debate going on right now in the tech world about how open

these models should be in terms of their behaviors. can they, many refuse to do certain things, but it's because humans have created these guardrails and those guardrails are in place to stop misogynistic, racist, antisemitic language from propagating, although there's plenty of workarounds that exist to create those exact things. But clearly, again, this relationship between ethics, algorithms, AI, this is so timely and so important and it just kind of keeps

kind of accelerating as these technologies continue to develop and engage with issues.

Waitman Beorn (28:01)
Well, it's

interesting too, because, you know, I just gave a presentation today to a number of graduate students at my university about digital humanities in particular, the visualization products that I'm working on. But one of the things that I've brought up repeatedly that I think is actually really good that a lot of digital humanists do is, you know, our comfort with expressing our ambiguity and the things that we don't understand to hundred percent.

You know, and one of the things that digitally managed, I think has helped me do as a scholar is to be very present and conscious about recognizing ambiguity in a way that I may not have been, you know, way back when, when I really was thinking about it, you know, where I think historians can often be kind of very defensive about not, not in a mean way, but I mean, a very self-protective way of not wanting to be too open about what I don't know or what I don't think I know.

And that ambiguity is something that we all live with in the analog historical research world as well as in the digital world. But when you're doing it in the digital world, you're almost in some sense forced to be more transparent about it. At as a scholar, right? Not necessarily as a groc, but as a scholar, because the answer, you have to put something into the field and it can't be nothing. It has to be a zero, a one or something. And then you have to...

be able to justify sort of how and why you chose to put that thing in. So I think it's interesting that, of course, you come to the conclusion that this is a model of behavior that would be great if everybody adhered to, or was able to say, look, here's what our model does, or here's what our AI does, and here's what it can't do, and here's what we think it's based on, and that kind of stuff. I think it's really, really, really important. And again, it's something that I think is really important.

in doing the work that Digital Humanist do, which is to sort of be explicit and say, look, I know I'm 80 % sure about this thing, but I'm only kind of 20 % sure about this thing. And then what does that mean for my work?

Todd Presner (30:11)
This is so important and thanks for putting your kind of your finger on such an important issue. This is ultimately, the book I wrote is ultimately a humanities book. It's not a computer science book, it's not an information studies book. It's a humanities book. so questions of ambiguity, accountability, authorship, decision-making, particularity, nuance, would say positionality, perspective, contingency, these are all really critical.

elements that often are seen somehow to be at loggerheads with computation and for me it's all about bringing those together. Computational tools are not being used in this book in the kind of like declarative ways like we finally found the answers now we use computation and we figured everything out. It's like no it's an interpretive tool it's a way of even perhaps even locating ambiguity and then the way you proceed is you have to document the decision-making that was made along the way the way the data was

the way the visualization was created, the way the code operated, all these things. so side by side with the book are a set of online resources, including spreadsheets, code modules, all kinds of things, visualizations. Much of the data can also be kind of looked at and downloaded. And I think part of that is because of this question, like what you said, is kind of the accountability that comes from being a humanities scholar. And it also means that

we're

going to interpret things in different ways. I'm going to use these tools and I think, again, I can trace back the decisions, but I can trace them back to sources too. There's no source that's made up of this book. At the end of the day, every testimonial fragment down to the word or the sound comes from a primary source, comes from someone who gave their testimony and intended that testimony to be part of the historical record. This is really

different than generative AI, where it may not only make something up, but we can't trace it back where it's getting that material. I'm not interested in making things up. I'm interested in using these tools to do real historical interpretive work that's accountable and that will also focus on those things that we don't entirely know, the nuance, the ambiguity, the debates, the problems, the contingencies. There's interpretive work that we're trying

Waitman Beorn (32:42)
And I want to get, I want to get into that because, but I want to, want to take at least a couple of minutes because you do something that's really brilliant in the book, I think. And it's interesting because just today, like I said, I was talking to students about, this digital project, but I showed them the Charles Minot visualization of Napoleon's invasion of Russia, right? Where he has, you know, one of the greatest visualizations for the listeners, one of the greatest visualizations in the history of the world, where he sort of shows, you know, on one image.

the size of Napoleon's army going and it's overlaid on a map to Moscow and then the way back it gets smaller and smaller. But then he also has the temperature at the bottom. It's a classic, right? And it was done in 1869. There's nothing digital about it, but it's a very digital humanities way of thinking about visualizing an historical event. And I think one of the things that you do in the book, and I just have to have you talk about it a little bit, is you talk about, you alluded to it in the beginning.

about the Dutch statistician, who, city planner, you know, kind of guy who, who essentially is a massive collaborator in the Holocaust by creating not only, we're not only rounding up the data about the Jews at the Netherlands, but also, you know, manipulating it and providing lots of different ways to look at it. mean, all things that we do is, did a humanist that you do looking at the testimonies, but he's doing it.

A, obviously for horrible reasons, but also in a complete analog, pre-digital, pre-computer calculating for the most part kind of world. Can you talk a little bit about that for our listeners? Because I think I thought that was a really, really fascinating way of starting a book that you sort of expect to jump into, know, gee whiz, you know, computer stuff, but then it starts with the paper.

Todd Presner (34:34)
Right, well it starts with paper and it also starts with a very strong critique about these data collection practices. So this is Jacobus Lentz, who is the Inspector General in charge of the Dutch Census. And so when the registration forms are mandated for Jews to fill out in 1941, those fields and the processing of all that data happens in his office. This is also where they optimize workforces to figure out how many men, how many women,

of a certain age there are, where they live. And they also visualize this data, it wasn't directly his office, but his office worked with the different municipalities and got all the data from them as maps. And so one of the maps that I reproduced, that's fairly, know, somewhat well known, but you know, I think it's essentially it's a dot density map of where the Jews live in Amsterdam in 1941. But what I'm saying here is that this is an example of data collection, data structure,

data organization use of kind of proto-computation tools. They're using punch cards to still process this data. So there is computation happening, just not with the systems that we have today. And there's also computation in terms of what it literally means, counting. So this is what's happening with the census. You're counting the number of Jews, but you're also accounting for them as a workforce and eventually as people to be deported as well.

The reason I deal with this again is that dialectical underbelly of computation and especially with regard to the Holocaust. We obviously have this in the history of biostatistics and eugenics, but essentially that's what this is. This is really a biostatistical accounting of reducing people to the barest set of data to describe them. I call this bare data and referencing Agamben's work on bare life here because the reduction of people just to a set of data points that you can do calculations on is something

that really was a critical condition of possibility of the Holocaust. mean, Zygmunt Bauman also gets at some of this in a slightly different way, but does talk about this in his famous book on the Holocaust and modernity, connecting it to the processes of bureaucracy, of counting and systems that again are just radically dehumanizing. The reason that this is so important again is that one, these kind of systems of dehumanization

really being, I think, dialectical thinker with regard to these technologies. For me, I mean, it's again trying to see where that inversion point is. Is there a possibility where we can think about or imagine practices of say data collection and data analysis, of listening and reading, of understanding patterns and relationships, of making visualizations and mappings that will help to humanize, that will help to enhance

Waitman Beorn (38:16)
Yeah, I mean, and that's what I want to get to now, because you provide a number of different ways of sort of slicing the cake and looking at testimony and analyzing, it does what great digital humanities does, which is sometimes it answers a question, but more often not it raises questions that we didn't know were questions. And so we could sort of see it.

Can you talk about maybe some of the different specific things that you're doing with the testimonies? Because as you point out, it would take something like 30 years to, you're watching them 12 hours a day for one person to watch them all. So it's an impossibility that we could ever, one person could sort of wrap their brain around it. So can you talk a little about maybe some examples of some of the stuff that's in the book that you do with the data and how you visualize it?

Todd Presner (39:44)
oral histories in 1946 with survivors. And of course, Alan Rosen wrote, you know, really the definitive book on these testimonies that work. But what I do in our book is actually show that Boder was also doing something else after he recorded these testimonies. And that is to say, he was a scientific psychologist and a linguist who was not only creating an archive, but he was figuring out like, do I structure an archive like this? How do I mark it up? How do I make it accessible? Right? And then how do I analyze it?

which is really interesting. So Boder in many ways is a kind of digital humanist avant la lettre for me. We didn't have computers in the sense again of the large machines that we have today, but he was doing with his team and they were a collaborative team essentially like digital humanities, very much interdisciplinary, working across the social sciences, the humanities, what I would call archive studies, certainly history, trauma studies. He was trying to figure out how

specificity of particular words used at moments in the narrative that may indicate like say moments of trauma or moments of relief or moments of expressivity that he thought was very interesting or important for understanding how people put trauma into words. So, Boder is kind of like the first one that I sort of talk about in there. I talk about his analytical work, the team's work, and I do some analytical work also of Boder's archive. And, you know, one of the things, I guess one of the projects Alib's mentioned

because of Voder here, is I was really interested in the questions that Voder himself asked to the people he interviewed. And it turns out he asked more than 16,000 questions to 120 people. And I thought, that's so interesting. mean, was a real, he was really asking and interrogating, and I mean that in a light way, interrogating, asking questions about these experiences at a time where we really didn't have a testimony archives.

were not quite understanding. And the fact is he's also interviewing in nine different languages. And so there's a really interesting code switching that happens in many of the testimonies, differences across languages and even expressivity. that's one example of what we do. This question around actually asking what did people ask Holocaust survivors, like what questions did they ask, becomes like a huge and important project in the book.

90,000 questions that have been asked of Holocaust survivors from 1946 to the Fortunoff Archive to the USC Shoah Foundation, and finally the Dimensions and Testimony Project, which is not traditional interviewers, of course. It's more the public asking questions. But that too is really interesting because the priorities and the way people ask questions, what we privilege, what kind of questions we ask of survivors is also going to impact what kind of answers we hear, obviously,

and

testimony. And it turns out those questions and priorities have actually changed over time. And that's one of the things that we do in the book by looking at 90,000 questions and grouping them together and showing certain trends and disparities and things like that. That's just one example of how you could use digital humanities, again, to kind of like begin to make comparative questions, both within a particular archive, as well as across archives.

Waitman Beorn (44:10)
Well, it raises a good point, you know, that I've had other, I had Hank Greenspan on here earlier. And I talked to Noe Schenker, but it's interesting that you're coming at, you know, an important, I think a really important takeaway for the public, for general folks is, you know, again, that archives are not neutral and interviews are not neutral in the sense that, you know, what the interviewer asks.

It helps to create the document, right? It helps to create the source, you know, in a way that, I mean, all sources are created for someone, by someone, but it really, you know, while a Holocaust survivor speaking has a certain truthiness to it, it's also been elicited or solicited by the interviewer. And also the archive in a certain sense is also directing that as well. And I wonder if you could talk maybe a little bit about how

how you can visualize that. Because I think some of the work that you've done shows, and again, this isn't a critique of any particular archive because they all have different goals. I'm not suggesting one is better than the other, but one of the things I think you can do with your visualizations is show the ways in which archives sort of have their fingers on the scales in certain senses about what gets talked about.

Todd Presner (45:34)
That's right. That's right. Yeah. And I thank you for referencing like Noah, Sienkirch, you know, really phenomenal work. And that of course, the foundational work with that Hank. You know, this is an opportunity, I think, to really think about the ways in which archives play a role, really a critical role in both preserving testimony, but also creating that testimony and producing, I guess, the conditions by which we can access and make sense of it. For me, you know, one of the things we do in the book is look at indexing systems. And that's really such a critical part because

Again, the only way really to search within testimonies, especially because most of them have not been transcribed as of yet, is through essentially the metadata associated with the testimony. So the topics, generally those are topical things like places mentioned, geographic locations tend to be the most frequent, names also, events, things like that. And Boder was actually one of first to develop a series of indexing systems to make sense of these early names.

1946 testimonies, phenomenally interesting and important work, which is like, how do I create an ontology to allow people to understand what themes and topics come up in this nearly a million words, it's what he has at end of the day, of recorded testimony of which he transcribed it. So take that to the next level with the Fortuna of Archive or the Shoah Foundation or the Holocaust Museum or Yad Vashem's record, you have really millions and millions and millions, tens of millions

understand really the critical, those critical verbs, which often are not in indexing systems at all. One, because they could surface what I call in the book, know, micro expressions of resistance, but also that they can be used to understand, you know, coercive language. They can be understood of like what people did, what they experienced, what they, what they remember was done to them. And I thought it would be really interesting to kind of produce almost like a, wouldn't call it quite a counter index, but an additional index on

on top of the topical indexing that would focus on expressions of agency. And so that's an example where, again, it's not a critique, for say, of the archive, but it's a way of turning these testimonies over in some new ways using computational tools to extract out these expressions of agency or action that people felt that they wanted to say or talk about in the testimonies. That's, I think, a very concrete way that we use some natural language processing tools

at scale to produce some new information and insights about expressions and experiences that haven't been previously indexed, they'd be findable if you happen to land on that part of a testimony and you happen to be listening to it. But if you're searching on it, they wouldn't be searchable otherwise.

Waitman Beorn (48:53)
Yeah, I mean, this is a great point. And again, it's bear repeating for those listeners who may not be super familiar, know, but the, Shoah Foundation, the indexing was done by people, you know, it was, was ultimately done by hand. Like, so you have a human being who has to listen to a testimony and then decide in this block of time, what did they talk about? And then maybe match it up to a set of keywords that they've been given or perhaps suggest their own. Um, but

This is all mediated by a person who could be tired or could know a lot about the Holocaust or nothing about the Holocaust. so while I think the index and as I've used the Shoah Foundation frequently, but I think particularly when you first start using it, gives a false sense of security that you're searching for the keyword. And if you search for the keyword, you know, that you're going to come up with exactly all of the testimonies that handle that particular area, but you don't.

Because how could you? Because again, this is done by people.

Todd Presner (49:56)
Right, exactly. And so, mean, this is, again, it's really interesting to see what was heard or privileged over time and what was maybe not heard or not privileged. And it's also a function of many things, right? The training of the person doing the indexing, it's a function of maybe certain priorities that we have about what we expect or want to hear. But it turns out there's a lot more in these testimonies than just what's indexed. One of the interesting things we do in the book is also look at the history of the indexing system.

and heard adequately and may actually be there, again, depending on what the indexer is attuned to.

Waitman Beorn (52:03)
Yeah, I mean, and this is also really one of the things that you do that I think is really neat, which is visualizing again, this speaks about talk about earlier kind of visualizing ambiguity, but visualizing silences. You you talk maybe you talk about several of these things, you you also look at sound. so you're literally visualizing and analyzing how loud and soft people are.

when they're speaking and then suggesting, you and I think one things that's great again about this sort of distant looking is that as you suggest, you're not in the book and in the project saying, we analyzed 16,000 testimonies and we have concluded that. It's much more that we've analyzed 16,000 testimonies, we've identified these interesting patterns, and if you want to figure out why these patterns are the way they are,

you probably have to go back and actually listen to some of the individuals to try to figure out, you know, what's going on there.

Todd Presner (53:38)
that then can identify say patterns, variations, something that demands us to go back and say, what's going on here? Like, let me actually listen really carefully. Let me go back to the testimony. The computer can't tell me what it means, right? So if there's a change in pitch, a long period of silence, you know, for example, a code switch, you know, moving from one language to another, certain paralinguistic, you know, cues like maybe stuttering or what have you, sure. I mean, this may mean that that's a particularly difficult

Waitman Beorn (54:58)
Well, I think one of the things that comes across really clearly, and part of this is bleeding over from last lessons and legacies where I went to a panel discussion, which clearly had some of your students and folks in it as well talking about doing this kind of work, is that when you do the sort of large scale pattern search, when you're looking for patterns, one that I remember seeing a visualization of was essentially

When do people talk about certain topics in the chronology of a Shoah Foundation testimony? And because the testimony is structured as an interview with a beginning and middle and end and sort of questions that guide that, there is a general arc to the most testimonies that you would expect to see because the interviewer is asking certain questions in the beginning, more questions in the middle than certain questions at the end. And so what was actually interesting in that visualization was not the general trend.

It was the ones that are outliers of the trend. Like, why is this person starting out talking about Auschwitz, for example? And again, you don't know the answer. The visualization and the calculations haven't shown you the answer. What they've said is, hey, scholar, you might be interested in looking at this testimony and figuring out why this person is so different from what we might expect in the rest of them, which I think is a, know, it's a perfect example of sort of the value.

of digital humanities and sort of working at scale.

Todd Presner (58:00)
interpret it, how to make sense of it, and that to me is where the interesting humanities questions come in. And that's what I'm trying to model in the book, is like examples where we do that work.

Waitman Beorn (58:09)
Yeah, absolutely. mean, one of the things that comes through, you know, is that you're doing large scale computations about large sets of data. Um, but ultimately sort of those visualizations often are creating a background for looking at individuals and returning us back to the sort of the fact that this is an individual story that's unique to that individual in some way, or form. And that if we really want to answer the questions,

you know, the visualization can suggest questions or suggest areas of research. But ultimately at some level, we're also still required to do the same kind of research that we would do in the analog world, you know, of, you know, looking at the testimony, reading the memoir, understanding the historical context of the event. You know, it's not, it's not the sort of chat GPT, write my research paper for me kind of.

kind of research, right? And which I think, yeah, I think it's important because, you know, it's, you know, there, is the danger, I suppose, that, that you talk about this and other people as well about flattening, you know, flattening, you know, a historical event or historical testimony, you know, to sort of a, here's a standard Holocaust, here's a standard show foundation testimony. But I don't think you do that. I don't think that, and it certainly the visualizations and the work in the book, you know, does it.

But I wonder, as we sort of move towards the end, if you can sort of talk a little about what the challenges are or what the dangers perhaps are, you because clearly this is in responsible hands, in your hands, you know, and in other people, you know, there was a talk, interesting talk, again, Lessons in Legacy is the Holocaust Conference, you know, and one of the groups that was presenting about their work said, look, we don't provide our corpora of testimonies to like the internet.

You know, it's, firewalled off so that our large language model only works on them, but it doesn't share it with the rest of the world. Right. So it's, it's not put out there in the, in the public world so that another chat GPT or another large language model could, for example, create a false testimony based off of, you know, real testimony. what do you think some of the challenges and opportunities, I mean, let's go positive and negative of sort of future work moving forward in this area.

Todd Presner (1:01:35)
orient our listening or attend to certain things in the corpus that we haven't either fully appreciated or found or heard or known about. And even ways of analysis that could be really interesting. I didn't mention the example of what we did with Mala Zimitma, which is like a testimonial ensemble, I call it, where we pull many different snippets from different testimonies, actually several dozen testimonies, all of whom these different women and men provide information about who this aid giver in Auschwitz, Mala Zimitma,

particular things about why were Jews rounded up on the basis of what to the meaning of the Holocaust and so forth. The point here is that if those questions are answered by the Holocaust Encyclopedia, the students are going to learn from an authoritative source. If those answers are answered by a language model, by AI, they may or may not have the same thing. There may be a lack of nuance. There may be even questions of the problems of factually

of very authentic important material out there, but sometimes that material is not being consulted or at least not being used by the general public in the way that it could be. And more and more, as people rely on AI, we do need to assess what's working. And it's not to say that everything's bad, but when it's not working, it is bad. And we need to know about that.

Waitman Beorn (1:05:28)
Yeah, I mean, it's funny because a while back there was somebody who was trotting out a sort of talk to an historical person AI. know, of course I'm a Holocaust story. So I said, I want to talk to Himmler. And the question I asked him was like, you know, do you feel bad at all about the Holocaust? You know, and of course, like the thing, the chat response was something, you know, like, well, you know,

actually it was kind of regrettable. And it's like, would never say, you know, like, he would never say that, but, know, it's just a horrible, a horrible use of that sort of thing. then I think back to when I, this gets back to another danger that you sort of talked about, which is the danger of sort of flattening. And I remember I was, this is when I, many years ago, when I first started using the Shoah Foundation, I was,

I, as a grad student, I was a bit frustrated because they didn't have transcripts. And, you know, I, I just wanted to look at the transcript. I just wanted to read it because I wanted to find, I wanted to search a PDF, and find, and find what I was looking for. And of course, you know, I was, I was putting up with a learner at that point, you know, and looking back on it, because the, asked them about it at Shoah Foundation way back when about it. And they said, well, actually we don't want to have transcripts for the most part, because we want people to watch the testimony.

you know, and, and, know, it didn't really make as much sense when I was first getting into the field, but of course it does because there's, there's a, there's a completely different interaction with the source when you're watching it individually, watching the human being, then when you're just doing a text search on a document. And so I could see that one of the dangers perhaps of even like an improved index where, know, if presumably you can use, you know,

voice to text and you could create eventually at some point, you know, very good transcripts of every single testimony that then could be searchable by word or phrase. And while on the one hand that might be great because you could really perhaps identify testimonies, it might also lead to the problem of people just using the text and not even situating the phrases or the paragraph in the moment in which the survivor is talking.

Todd Presner (1:07:52)
That's exactly right. mean, there's a sense of, you know, in our kind of hyper computerized worlds, there's a sense of kind of quickness, That you just want instant answers, right? You just want to search and get the answers. But you know, what testimonies demand is actually slowness, right? They demand attentiveness. They demand that taking the time to listen to the voice and to watch them. And so this book is not about, know, it's very much about the need to still do that.

Waitman Beorn (1:09:19)
And of course, also, know, history marches on as well in terms of the discipline. And so there are questions that historians and scholars are interested in now that they weren't interested in or weren't actively interested in 20 years ago, for the most part. And so, again, you have alongside that is you're always going to have new approaches and new perspective on the testimonies just based off of, you know, the good kind of historical revisionism, which is us just revising the way we think about the past.

and analyze it. Before we get to our final question, I just want to ask another one, which is much more of a process question, because clearly one person cannot encompass all of the skills involved in doing the projects that you're doing. You're very clear in the book that you're very good at attributing work to other folks.

I'm just curious how that, how that in general, how that works. because you know, each person working my own project, you know, with, with people that are experts in things like digital modeling or an architect working on a CAD, you know, we all have different expertise, but we have to figure out a way to bring them together. And, and we ended up explaining ourselves to each other, you know, in, in the process. I'm curious how can you talk a little about the team that you have and sort of how that works in practice with programming and coding and all that other stuff.

Todd Presner (1:11:59)
and they were working with transcripts, were working with data sets, were working with spreadsheets, they were working with code, and I thought, we actually have to work with the human beings that are still with us as well. We have to listen to them. And it was very formative for, I think, my students to be able to have that opportunity to talk to Holocaust survivors in 2018 and 19 and to work in this lab, which was much more computationally inflected. It was, again, the fact that

different kinds of approaches to, know, particular kinds of expertise. And my hope is that as the field of Holocaust studies continues to mature in a way that uses digital or computational tools, it's through modeling, through mapping, through language processing, you potentially through AI, we're going to really see, I think, a flourishing of these new mixed methods and potentially, you know, very new questions that help us understand, you know, parts of the historical record, perhaps

Waitman Beorn (1:14:55)
But one of the things that I think is really great as we sort of close, I want to talk, I want to ask the final question, but is that I think I often find digital humanities is a very generous discipline in that, you know, we create or come up with ways of looking at stuff and then say, hey, you might want to look at something using this, looking at different sources using our methods or might want to look at our sources using different methods.

or might want to use our method to look at something completely different. And that's okay. It's not a secret. It's not a copyright. It's not a trademark that we're not here for. It's just, here is a tool. Here's how we used it. You might use it in a different way or on a different topic or whatever. And that's okay because it's all about sort of the sort of thesis, antithesis, synthesis of sort of like, we look at things differently and then we...

we come to sort of a conclusion about how we might use them. But I want to close because we've taken up a lot of your time already and just amazing conversation. Yeah, no problem. I just wanted to close with our question, is, what is one Holocaust book, one book in the Holocaust that you would recommend? And again, this is, you it can change tomorrow. But in this moment in time, what would you recommend to our listeners?

Todd Presner (1:16:01)
Thank you, Whiteman, your generosity and thoughtfulness.

Waitman Beorn (1:17:58)
Well, thanks so much. And for everybody else, as always, if you're finding these conversations fascinating as I do, please leave us a comment, a like, a subscribe, whatever, it all helps. And again, Todd, thank you so much for coming on and talking about this topic.

Todd Presner (1:18:15)
Thank you so much for having me. It's been really, really wonderful.


People on this episode