The Systems Linguist: How Mapping Data, AI, and Language Builds Smarter ResearchOps
by Shivanjali Mishra
Subscribe to get sharp thinking all about ResearchOps delivered straight to your email inbox. It’s free!

The ResearchOps Review is brought to you by Rally—scale research operations with Rally’s robust user research CRM, automated recruitment, and deep integrations into your existing research tech stack.
About six years ago, I transitioned from academia to UX research. While in academia, I spent seven years studying linguistics, including a master’s in clinical linguistics and psycholinguistics. During that time, I learned about how languages are wired in our brains and how to assess and inform targeted interventions to treat language disorders.
Given my background, I think a lot about research systems and knowledge management—and working on a research repository in some capacity has been a consistent item on my to-do list for more than four years (almost as long as I’ve been involved in user research). In that time, I’ve set up research repositories in four very different tools (in terms of features and popularity), then reevaluated and offboarded those tools, and finally settled on one that seems to be sticking. And it’s sticking, not because of its features, but because of one key insight that changed everything.
That insight? Easy access to relevant and accurate user insights isn’t about finding the perfect software. It’s about embracing something more fundamental: A research repository isn’t just a tool; it’s a linguistic system.
If those two words—linguistic system—just muddled your mind, don’t worry. By the end of this article, you’ll know exactly what they mean and why they’re important. I posit that the modern ResearchOps professional must become a systems linguist: someone who can map an organization’s hidden data architecture, translate between different functional “languages,” and build truly integrated research systems. Given my formal training as a linguist, this article isn’t just a theoretical musing. It presents a practical framework that has transformed the way I approach every aspect of research operations, from tool selection and stakeholder engagement to preparing for an LLM-enabled future. And I believe this approach can transform how you manage research knowledge, too.
The Source of Meaning
First, let’s look at what makes a research repository unsticky. Even though a repository might have promising features, getting people to regularly use it—and the information stored in it—as a functional tool, can be challenging. On the surface, the challenge may seem like an onboarding problem, but it usually turns out to be deeper than that. I’ve learned that research knowledge can’t be treated as distinct from organisational knowledge and context; it must be seen as one and the same. Additionally, most companies aren’t leveraging the many sources of user voice they have available—social media mentions, support tickets, survey answers, interview transcriptions, and more—to support decision making. (More on this in a moment). These realizations fundamentally changed what I was building, and, therefore, the questions I asked. Instead of asking, “Which repository tool has the best features?” I asked:
Who needs to find answers, and how do they naturally look for them?
What questions are they asking, and in what language?
Where does relevant data already live in the organization?
How can we make these disparate sources speak to each other?
These aren’t tool-selection questions; they’re systems questions. And more than that, they’re linguistic questions. They’re about understanding how meaning flows through an organization, how different groups encode and decode information, and how we can build translation layers between them.
This mirrors a fundamental principle in linguistics: meaning emerges from systems, not isolated units. In other words, meaning is relational, not inherent. For example, in a medical context, the word “sick” means unwell (I am sick), but “sick” also could be used to communicate awesomeness (That painting is sick!), or to express frustration (I’m sick of this traffic!). A word doesn’t mean anything by itself; it means something in relation to other words, in a particular context, used by particular people for particular purposes. The same is true for organizational knowledge.
Diagnosing the Language Problem
The role of a clinical linguist involves observing how language breaks down. You use the tools of linguistics to dissect the unique pattern of communication, move from surface-level symptoms to a theoretical understanding of the underlying breakdown, and, in turn, drive precise and effective interventions.1
When I first became a ResearchOps professional, I couldn’t help but see similar patterns. I saw communication break down between teams due to incoherent processes and disconnected goals. Often, different teams were simultaneously and separately trying to solve the same problems, and research insights weren’t reaching the teams that needed them. Not just because of workflow issues, but because of siloed communication.
The real breakthrough came when I realized that teams weren’t just using different terms to describe similar meanings, they were operating with entirely different grammars to communicate about data, processes, and even value.2 And by grammars, I mean unwritten, foundational rules, structures, and logic. For instance:
Product teams speak in the grammar of metrics, outcomes, and user flows. They need research insights that are tied to key performance indicators (KPIs) and roadmap decisions.
Design teams speak in the grammar of experience, pain points, and journey maps. They need insights rich in context and emotional resonance.
Engineering teams speak in the grammar of constraints, requirements, and edge cases. They need insights that are specific, actionable, and technically grounded.
Leadership teams speak in the grammar of strategy, market positioning, and business value. They need insights that connect customers’ needs to a competitive advantage.
These grammars aren’t wrong per se; they just vary depending on the role of the insights consumer. But when a research finding is documented in only one grammar—say the grammar of a product manager—it becomes invisible to everyone else. This isn’t a training problem or a process problem, or even a tooling problem; it’s a translation problem.
This is where an understanding of linguistics comes in handy. Linguistics gives you frameworks for understanding how language works across three dimensions:
Syntax (Structure). How elements are organized and related to each other.
Semantics (Meaning). What those elements signify.
Pragmatics (Use). How meaning changes based on context and who’s communicating.
You needn’t have a master’s degree in linguistics to leverage this understanding. You can start applying a linguist-informed approach by exploring the following questions across dimensions:
Syntax. Consider all the sources of customer feedback your organisation collects, such as customer support tickets, feedback channels, and social media, and identify the patterns among them. Questions to ask:
How is data currently organized in your systems?
How are insights tagged, categorized, and linked to each other?
Semantics. “Engagement is down,” can mean different things for different teams. To a product manager, it could mean a drop in activation, average session time or retention rate. For designers, it could signal a usability issue. For customer relationship management (CRM) and marketing teams, it could mean lower email open rates or fewer social media interactions. All of which leads to the next point. Questions to ask:
What does one common word or term mean to one group versus another?
What does ”customer satisfaction” or “usability issue,” for instance, mean to your colleagues in product versus design versus support?
Pragmatics. If you’re presenting findings from discovery interviews to a product team, you might include concrete recommendations for features to add and pain points to address. If you were to present the same findings to leadership, you might want to focus on business value, market share, and long-term product strategy. Questions to ask:
How does the same finding need to be presented differently depending on who’s receiving it and the decision they’re making?
By exploring these questions, you’ll stop treating the “language problem” as a “tooling problem,” or as something that only happens between internal teams, and start seeing knowledge exchange (and research impact) as a systemic property of how knowledge moves through an organisation. And once you have a linguist’s eye for syntax, semantics, and pragmatics, you can apply it not just to how insights are communicated, but to how they are collected, stored, and surfaced in the first place. In other words, you’ll be able to map the hidden architecture of your organization’s language—the often invisible structure that shapes how customer realities either get amplified or lost in translation.
Mapping the Hidden Architecture: What Linguists See That Others Miss
As research professionals, customer language is our data. It flows in from every channel: support tickets, in-product feedback, survey verbatims, app store reviews, and social media mentions, to name a few. The goal of every ResearchOps professional should be to architect a knowledge system that doesn’t just count keywords but understands the underlying messages, concerns, and emotions of customers and end users within the constant stream of words—written and spoken.
Research (and the operations that make it tick) is about transforming fragmented, raw text and dialogue into structured, contextual insights that teams can act on, ensuring the customer’s true voice is never lost. But here’s what most organizations miss: organizational data is inherently variable. Just as sociolinguists understand that language varies systematically (not randomly) across demographics, regions, and contexts, a ResearchOps professional needs to understand that customer data varies systematically across sources, channels, and collection methods. For example:
Customer support tickets typically capture problem-focused language. Support requestors are often stressed or frustrated, so the register is often transactional, the urgency is high, and the context is reactive.
Survey responses capture reflective language and are collected in a structured environment. So the register is more formal, the urgency is lower, and the context is evaluative.
Social media mentions use conversational language, often performative or community-oriented. The register is informal, the authenticity varies, and the context is public.
User interviews use narrative language that’s cocreated with the researcher. The register adapts to the interviewer’s style, the depth is greater, and the context is exploratory.
Each of these sources provides a different “dialect” of customer truth. And just as a linguist would never claim that one dialect is “correct” while another is “incorrect,” a systems linguist (that’s you!) must recognize that each source provides legitimate but only partial insight. This variety of dialects needn’t be read as a hindrance, but as an opportunity to build knowledge systems that enable the pairing or triangulating of data sources to provide a more complete picture of the customer experience. More data isn’t always better—that’s not the lesson here—but linguistic diversity in your data ensures that you’re capturing the full range of how customers express their needs, frustrations, and desires.
Data diversity is especially important if you’re integrating AI into your knowledge stack. And who isn’t? Studies on language models3 make one thing clear: the variety baked into the training data―its words, grammar, and topics―sets a ceiling on how varied the model’s own language can be. Any model essentially mirrors the specific dialects and registers present in its corpus (a large, structured dataset), so voices that never make it into the data remain invisible in the model’s responses.
The AI Imperative: Why This Perspective Matters Now
It’s near cliché to say it, but the industry is consumed by talk of AI. To prepare your knowledge systems for the future—the not-so-far-away future—it’s important to see through the AI hype by understanding the mechanisms underlying LLMs. In the simplest terms, current AI offerings are Large Language Models (LLMs), and the solutions they provide are based on the data on which they’re trained.
So, what happens when we apply this understanding to AI-enabled research knowledge systems that span the entire organization? LLMs are fundamentally linguistic technologies: they learn patterns in language use and generate text based on those patterns. When you deploy an AI tool to help stakeholders find insights in your research repository, in a way, that tool is learning the language of your organization, or more accurately, the language of the context you’ve provided it with for that search. But here’s the challenge: if your organizational data is siloed, inconsistently structured, or dominated by only one type of source (say qualitative interview transcripts), the AI will learn a partial, biased language. It will excel at answering certain types of questions while remaining oblivious to others. It will perpetuate existing silos rather than break them down.
Research on LLMs makes this clear. A sociolinguistically informed approach to curating training data can improve the social impact of language models. Linguistic insight can inform the broader development and application of modern LLMs, including reinforcement learning—when LLMs learn by interacting with users through trial and error—and prompt engineering, all of which are ultimately grounded in patterns of language use, or data.
Translated to ResearchOps: your research repository is training data. Every time someone searches for insights, tags a finding, uploads a study, or links studies together, they’re teaching the system (and any AI tools built on top of it) what matters and how things connect. Without understanding the sociolinguistics of your organization—who speaks what language, what gets prioritized in which contexts, what voices are systematically excluded—AI tools will:
Miss nuances that matter
Perpetuate existing silos
Reflect biases in dominant data sources
Fail to serve teams whose dialect wasn’t well-represented in the training
This is why your role as a systems linguist isn’t just about organizing knowledge; it’s about curating the linguistic diversity of your organization’s learning system to ensure that when AI tools are implemented, they serve not just the loudest voices, but everyone who needs to understand user needs and accurately represent the user.
Building the System: Practical Applications
So what does all of this actually look like in practice? There are three key steps to take a systems linguistics approach to ResearchOps. First, you must partner with the product organization, then you must work with data teams, and, finally, you must close the feedback loop—a common callout when it comes to research operations.
1. Partner with the Product Organization
First, you’ll need to work closely with the product organization—even better if you partner up with ProductOps. Partner with these teams to create a shared vocabulary for what insights are needed, when they’re needed, and in what format. Remember that the goal of research isn’t only to produce research, it’s to build a common grammar for how research integrates into product decisions. ResearchOps is a team sport, and identifying knowledge gaps doesn’t have to (and shouldn’t) be a one-person job.
ProductOps professionals explicitly understand the operational cadence of product development—the rhythms of planning cycles, launch timelines, and metrics reviews—and they’re fluent in the language of the product organization. Do everything you can to leverage this, by:
Cocreating taxonomies that make sense to both researchers and product teams
Aligning research tagging systems with how product teams actually organize their work
Building bridges between research findings and product briefs, OKRs, and roadmaps
You’ll want to work with data teams, too, to help you understand the existing data-related ecosystem driving your organization, and identify the right dots to connect.
2. Work with Data Teams
I don’t often hear ResearchOps folks talking about data teams as allies. Data (platform, analysis, and engineering) teams have a wealth of knowledge about what data is being collected, how it is collected, and how to access it. Data teams understand user behavior patterns, engagement signals, and the technical structure of how information flows through systems. There are so many opportunities for ResearchOps professionals to partner with data teams, I can’t list them all! So, I’ll stick to highlighting those most relevant to this article. Data teams can help you understand:
Where data quality issues might introduce linguistic noise: elements in open-text data that create a hindrance in data analysis, such as typos, misspellings, filler words, or cynical and sarcastic remarks.
How different data sources can be integrated without losing context.
What metadata needs to be preserved for insights to remain meaningful.
How to ensure that your data sources are diverse and representative of your customer base.
As mentioned earlier, the data you collect from different sources must be paired with metadata to triangulate it. When research and data analysis teams have access to enriched data, they can form more meaningful insights and models and support continuous learning.
Once you’ve defined a common language and understand the flow of information, it’s essential to close the loop to maintain both data quality and focus.
3. Create Feedback Loops
As AI systems become more integrated into research workflows, it will become essential for you to build validation mechanisms. A validation mechanism is a feedback loop that encourages researchers (or people who do research)—your linguistic experts—to check AI-generated summaries, suggested connections, and correct automated categorizations. The system needs to learn your organization’s language over time, which only happens if there’s a continuous cycle of:
AI suggesting patterns or connections
Humans validating or correcting those patterns and suggestions
The system learning from validation and correction
And so, accuracy improving over time
This is exactly how computational linguistics, an interdisciplinary field concerned with the computational modelling of natural language approaches and model refinement, can make your research repository continuously (and almost automatically) smarter and more effective.
The Interoperable Future: From Fragmented to Fluent
There’s something deeply compelling to me about the idea that research—in some form—can be done by anyone with a serious commitment to intellectual inquiry.
For research to be impactful, it must be understandable and relatable, which means it must be contextual, complete, diverse, and accessible. It also needs to feel approachable, like something any team member with a genuine curiosity and a serious commitment to intellectual inquiry can engage in. This is the promise of a systems linguistics approach: to build research systems that don’t just store information—but instead actively translate information across the different languages, or grammars, practised in your organization. When it works, the payoff is transformative across disciplines:
Researchers find the information they need without the requirement of knowing exactly where it came from or what it was originally called. The system understands synonyms, related concepts, and contextual meaning.
Product teams understand insights in their context, connected to metrics they care about, framed in terms of product decisions, and linked to relevant roadmap items.
Engineers can trace decisions back to the data that informed them, understanding not just what users want but why, with enough specificity to inform technical implementation.
Leadership sees the through-line from customer voice to strategy, understanding how insights connect to business outcomes and competitive positioning.
This represents a fundamental shift from reactive support to strategic architecture, from tool management to systems design, and from research gatekeeping to researchers functioning as organizational translators. The research repository stops being a place where research goes to live (or die), and, instead, becomes a living system—one requiring continuous care, maintenance, and translation across the linguistic communities in your organization.
Zooming out and viewing my ResearchOps role as a systems linguist’s role helped me translate research across researchers, data engineers, and product teams to build a more integrated, meaningful, and impactful research practice. You don’t need a degree in linguistics to adopt this perspective; you only need to develop linguistic awareness and uncover opportunities to integrate it.
The challenge facing ResearchOps isn’t whether we’ll evolve, it’s whether we’ll learn to speak the language of the systems we’re building. As AI becomes more and more embedded in organizations’ customer learning practices, the quality of that learning will depend entirely on the intentionality and sophistication of the underlying linguistic systems. Systems we’ll build.
Sponsor and Credits
The ResearchOps Review is made possible thanks to Rally UXR—scale research operations with Rally’s robust user research CRM, automated recruitment, and deep integrations into your existing research tech stack. Join the future of Research Operations. Your peers are already there.
Edited by Kate Towsey and Katel LeDu.
If diagnosing language problems piqued your interest, read “Clinical Linguistics” in Speech Sciences Entries by Gloria Gagliardi. It shows how clinical linguists sit at the intersection of linguistic theory and the medical profession, and how they are used to diagnose language disorders and design interventions.
Thanks to the many wonderful exchanges in the ResearchOps Community, I learned that teams using different terms to describe similar meanings is a common problem in many organisations.
Read “Benchmarking Linguistic Diversity of Large Language Models,” published in MIT Direct Press, and “The Sociolinguistic Foundations of Language Modeling,” published in Frontiers.




