CMSC 435 PROBLEM STATEMENTS
Last edited 2025-09-14. Projects below are under consideration for use in Fall 2025 semester. We will select from list below.
Timelines for the semester will be as called out in a separate document. How a team establishes its intermediate goals in order to meet requirements and hit hard deadlines is up to the team. I suggest that you conduct a risk assessment right away, but how long you put off discovering thorny issues is also up to the team. You can even treat the project as a big hackathon at the end if you like; it's your career, after all, and it isn't like campus doesn't promote such things. But, I bet you'll also find that hackathons are better for campus than for you, and in any event this approach has yet to end well in a 435 project. Your call.
All students on a project are equal stakeholders in the effort. No one person must work at the direction of another; we cooperate in order to win. The incentive to hold others accountable is clear: all of these projects are scoped with the expectation we'll have full effort from everyone, and there is no partial credit for partial success. We tolerate others' lack of engagement at a cost paid in our own time and grades.
The incentive to fully participate should be clear. First, the final exam is constructed to reward those who have done the work all along. It chiefly address deep technical issues involving this project. Those who didn't do the work won't know how to answer questions and pass the class. Similarly if we have done the work, but not established a paper trail to this effect along the way, then we won't be able to support answers to our questions which require reflection. There is only one snapshot in time from which to work - the end of project. Without the paper trail to analyze what the perspectives were historically, then there is not much credibility in the answers. Said another way, nobody has yet been able to recreate a project record that is credible enough for exam answers, and of course there is no ability later to go back in time to insert material into the record. (Translation: Waiting until near the final exam in order to manufacture history for convenience of answering questions is deemed "not credible".)
Second, the cover sheet submitted (as our academic integrity and intellectual property statement) lists who gets credit; no name, no credit. The decision of who signs the sheet is ultimately a team consensus. Basically the rest of the team can vote someone off the island, though this is not the common occurrence, and we'd like to have exhausted our inventory of practices to promote positive engagement before it reaches that point.
My advice: do the work and document it to pass the class. It might just be that these practices actually work too. Bonus!
I offer these projects as an opportunity for us to practice substantive application of software engineering principles. We will learn by trying them out, making design decisions and then studying the nuanced consequences. We can't close that loop if we don't have a detailed record of the decisions we made, however, and that is the most important reason for our serious obligations to log activity and articulate our reasoning as we go. Working code alone won't tell us we reached our learning objectives. Please take this process seriously from the start and we will win best value from 435.
-- Jim Purtilo
1. LEXICOGRAPHY
As part of its regular operations, Ectralex monitors all major PRC press conferences and other events featuring questions and answers or official statements by ministry spokespersons (as well as some additional PAI). Notable among these events are the daily Ministry of Foreign Affairs PC, the weekly Ministry of Commerce PC, and the bi-weekly Ministry of National Defense PC.
The three press events specifically mentioned above are of particular interest to Echtralex because, in addition to a Chinese transcript, official English translations are provided - sometimes very quickly after the event itself (in the case of MoFA), and other times within a few days or weeks of the event (in the case of MOFCOM and MoND).
Echtralex provides end users with parallel versions of these events featuring the Chinese and English set against each other, aligned at the sentence level. Staff members construct these documents from both sides of each event, making everything available on the archive portion of its website.
While the process of preparing these documents is relatively straightforward, the task itself is somewhat onerous. There is a need to wait for both sides of the event to be posted; the text must then be manually aligned to the sentence level (though the transcript and translation generally appear aligned at the sentence level when they are posted.) This does not necessarily make it simpler or less time-consuming.
The opportunity is provide Echtralex with a software solution that will smooth the work flow and make the process more efficient. We would like to automate this to the maximum extent possible. However, because of a number of well-understood factors related to the translation of Chinese into English (and perhaps because of other issues unknown), we anticipate challenges.
The core functionality will be reliable production of high quality documents with the text alignments. However, the task of identifying relevant documents and moving them to and through the tools could itself be a costly manual activity, so we anticipate that the team will need to model this work flow carefully and ensure their solution automates the capture of text in the first place and integrates it to the rest of the Echtralex processing tools which handle the publication of documents. To be clear, this project is not about building a web site for public access; it is about streamlining the preparation of documents for Echtralex staff to then push to their web site.
Our client in project is Echtralex, for which the point of contact is Michael Horlick.
2. ANT POSING
Long form: Automated Identification and Annotation of Movable Joints in 3D CAD Models using Vision-Language Models (VLMs)
Current era digital technologies have been a boon to scientists who are now able to create sophisticated models of their subjects from nature for detailed study. This is especially true for entomologists who can learn much from these studies, and of importance, also apply much. Whole new avenues of investigation into, say, locomotion and robotics can be opened by study of ants, for example. Engineers are eager to learn more about how nature managed to do it.
The problem however is that creating these detailed models is labor intensive, especially for high acuity systems where (as an example of interest to us) we must capture the joints in insect limbs accurately to be of any use. The axis of movement, range of motion and more are critical.
The overall research opportunity is to develop a software pipeline that leverages Vision-Language Models (VLMs) to analyze 3D CAD models of these subjects (ants), identify movable joints, and generate an annotated 3D model with labeled articulations. We'd like if system will facilitate automated recognition of mechanical components' degrees of freedom, enhancing efficiency in robotics, simulation and digital twin applications. We'd like if that is true and we can reduce the cost of creating such models ... but we don't know. This project is intended to let us find out.
Last semester we made a spectacular start on this problem. The prior team came up with a way to process a static 3D scan of some specimen and construct a dynamic model. This still needs a lot of shaking out but so far it looks like it works. Cool!
The next technical step is thus to figure out how the captured speciem should be posed. After all, it may have movable joints but that doesn't mean we can move them any old way for purposes of scholarly study. This is where the AI comes in: we want to pose them in a realistic way as predicted by study of prior, validated models.
The overall goal remains: we want to engineer an effective tool that enables scientists to conduct research. This semester we should be able to reach this goal by producing a software tool capable of analyzing and annotating mechanical joints in CAD models, and posing them in 'realistic' ways that can be tuned. This means also that we must be on top of the work flow of users - enabling them to check and correct models smoothly. Our initial domain will be of modeling ants, though the design should not limit us in application to other domains.
We will recognize success when we are able to observe scholars prepare (and improve) accurate and dynamic models by mostly automatic means, at substantially less cost than by manual techniques; and export models compatible with robotic simulation and kinematic analysis. So yes, in the end, our AI system should allow scientists to start with images of ants and create animations that teach them how to walk correctly.
As with any project supporting research needs, many ambiguities are involved and the team should be prepared to handle change smoothly. There is no one-and-done solution to be had in this project. This project offers spectacular publication potential, however, and there is plenty of follow on research opportunity if successful. Our client is Prof. Evan Economo in Department of Entomology.
3. DIGITAL SPECIMEN SCHEMA
The world's top museums have spectacular collections of insect specimens that have been harvested and curated for more than a century. Many of these collections have been digitally scanned in hopes of making them more broadly available to scholars, but here is the problem. The labels on these specimens are usually hand written and lacking any in coherent system for organizing the location, times or other associated meta-data. In other words, you can find the insect if you know the museum's index number to cite, but searching based on properties is not possible.
The opportunity is to leverage AI to extract the image text (that's the easy part), then figure out how to register ("make standard") that text for use in a database schema to enable searching on properties. (That's the hard part.)
The mission does not invite us to simply make guesses on what data go where in some database; there is no one-and-done program to be crafted and just thrown over the wall. Instead, a fair amount of experimentation (and potentially some model training) may be involved. How we measure accuracy must be sorted out. Also, we will need not just a tool to classify text, but also much apparatus to facilitate the checking of these labels by scholars. We can anticipate the presence of errors, but a system that offers no smooth work flow for scientists to check and correct issues will be an unambiguous failure. A tool that only works on one's and two's of records is a failure - we need this at scale, meaning hundreds of thousands of records per data set.
As with projects involving research, we need to process ambiguity smoothly and be prepared to pivot smartly as active discovery techniques show more about the problem. Our client in this project is the Smithsonian Institute, with local point of contact Prof. Evan Economo.
4. FARM NUTRIENT REPORT
All farms in Maryland making a minimum of $2,500 or having more than 8 Animal Units (Animal Unit =1,000 pounds) must file and maintain up to date nutrient Management Plans which will result in inspections by the Maryland Department of Agriculture. It can be a data collection nightmare that is fraught with pitfalls for the faint of heart. Our client's state-wide responsibilities are to train Nutrient Management Advisors, farmers and service providers to help farmers with data collection and nutrient management plan writing, though even under the best of circumstances it remains an extensive paperwork exercise for farmers.
This is a job for AI. We will create a tool to interact with certified plan writers, guide them through the data they must collect/submit, and then (the cool part) do an independent vetting of the report. This critique is intended to help certified plan writers by flagging potential gaps or issues before they commit to the official plan writing process.
This project will involve understanding the report needs, modeling the user workflow and of course some prompt engineering in crafting the desired system. As with all of our projects, there is no one-and-done solution here; potentially a fair amount of experimentation will be needed to ensure quality of both the report checking and the user experience. After all, we have no win if the cumbersome state process is replaced with an even more cumbersome software alternative. Our win will be based on working out clear U/X goals; strong success rates of farm data submitted to the state and private certified plan writers after our vetting process; and of course time savings to those users in the overall process.
Our client in this project is Dr. Bill Phillips with AGNR.
5. COMAR CHATBOT
The same farm folks working with students on nutrient reports also identify a key pain point in operations, which is the lengthy and complex process of finding specific information from COMAR (Code of Maryland Regulation) as relates to agricultural needs. While regulation is intended to be specific, it is also fair to say that COMAR is not written for the faint of heart. It takes time and discipline to find content. For an official answer, clients must still connect with a legal assistant in Annapolis, but much preliminary work could be accomplished if those involved could quickly find insight on their own.
This is another job for AI. Our opportunity is to create a chatbot that is specifically informed by COMAR. Our MVP will be a service that is specific to agricultural info, but ideally we will enable this service for the entire state code.
There are several ways to accomplish this, meaning a team tacking this problem will need to experiment with potential ways forward in order to make an informed decision (not guess) on the design. We will also need to model the work flow of our likely users in order to ensure a successful user experience. The scope of our work must include sustaining and maintaining the system, since of course COMAR is not a static picture of the world - it changes. Our service will quickly become useless if we only offer a one-and-done solution to finding data from yesteryear.
We will need to work out specific success criteria along the way. U/X is critical, as after all, a crummy presentation will only invite users to just take the pain of digging into raw COMAR text instead. And accuracy is of course important, so a clear and credible test procedure will be needed.
6. NEURAL MODELING ENVIRONMENT
Neural network-based models of the human cerebral cortex consist of many key components, including (but not limited to) the network structure, connectivity patterns (local and long-range) between network units, the activation mechanism, and the learning mechanism. When constructed carefully, such a model can offer new insights into the emergent properties of neural nets and help to inform our understanding of the neural mechanisms underlying human cognitive abilities. However, implementing any of these neuro components in an ordinary programming environment is challenging, and the complexity goes up with the number of components to be modeled. Worse, since this application of neural networks is outside the mainstream (e.g., LLMs, CNNs), there are few tools for visualizing the network activity during a simulation.
The opportunity is to create software which can automate the process of writing code for these neural models. This will tremendously speed up the process of testing a variety of different architectures, provide non-programmers with the ability to conduct neural modeling experiments, and leverage the power of available visualization tools.
A successful project will enable scientists to use a graphical user interface to quickly create a unique, detailed network structure, along with a full network configuration, and have the software automatically generate the code necessary to implement the model and run the desired simulation. The editing process should scale from crafting small networks of individually-tailored neurons up through expression of patterns of neurons to facilitate study of networks at large scale. The software should also display a meaningful visualization of the network simulation to show commonly studied emergent properties (e.g., Mexican hat pattern connectivity). The system should allow easy exporting and importing of configured models.
What makes a meaningful visualization is slightly subjective, which will require testing on the part of the team. However, the underlying components leading to such visualization should be implemented without ambiguity. Ideally the system will enable interactive control over the network in real-time, i.e. deleting, modifying, or inserting new cortical units in a simulation while observing the effects. This project is done in support of a research project, which means no single one-and-done solution is lurking for us to discover. Neither is this project performative. The team will need to embrace the ambiguity, work through it and reach accord with our client on substantive ways to test the tools and measure success.
Our client in this project is Arya Teymourlouei.
7. ARBORETUM DATA INTEGRATION AND ETL PIPELINE
The University of Maryland Arboretum maintains a central database for managing information on the thousands of plants across campus. This database supports essential operations associated with the university's living plants collection, such as enabling plant identification and plant location. The cost of managing this system is high, with significant inefficiencies in the work flows around curating these data. Adding, editing, or removing entries often requires cumbersome manual steps. This creates frustration for stakeholders and slows the pace of data-driven research and management.
Our opportunity is to create a scalable, automated ETL (Extract, Transform, Load) pipeline that can streamline Arboretum data workflows. This will enhance usability, reduce costs, improve quality and extent the value of the campus plant data. Our system will integrate with existing Arboretum resources. It should also dovetail with our Routes and Routes projects within the Department of Plant Science (developed in CMSC435), which have demonstrated the potential for a modernized, student-driven plant information platform.
Key elements of our solution should include:
This project will require a fair amount of modeling of user work flows, and experimentation, in order to ensure we are capturing salient operations and bringing our stakeholders to the best engineering solution possible. There is no "one and done" answer here. This means we will need to immediately reach accord with stakeholders on our KPIs. What does workflow improvement look like? We need to be able to measure it.
Our deliverables include: A functional ETL pipeline for Arboretum data; a prototype intake system with role-based submission workflows; documentation on architecture, usage, and maintenance; a viable sustainability plan; and a final demonstration (beyond just acceptance testing, which is part of the assignment timeline) to stakeholders, highlighting how the system improves efficiency and usability.
Our clients in this project are Prof. Maile Neel in the Department of Plant Science and Landscape Architecture as well as the Department of Entomology UMD, and UMD Arboretum & Botanical Garden (Data Team). We will be backstopped by an expert tech consultant, Parker Homann, who is a veteran of 435 and our previous Plants projects.
8. VTEAM PLANNER
The Neuromotor Control and Learning (NMCL) lab is the home of a research group at UMD which examines the human cognitive-motor mechanisms when individuals collaborate with AI-driven humanoid robots having adaptive planning capabilities. One experimental research platform we use is a VR system called VTEAM which enables experimental studies where humans complete a task by taking turns with a simulated humanoid robot (Baxter, Rethink Robotics) as controlled via a basic adaptive planner; the measurements taken during this activity are what offer insights on cognitive loads place on the users, and ultimately would let us improve designs of our human-robot work flows.
VTEAM has an interface (previously developed in capstone projects) which allows us to flexibly parametrize characteristics of the task and of the robot. But really, up to this point it the module used to guide interaction has been something of a placeholder while we fleshed out the rest of the framework. Now we can return to the study of more complex planning methods.
We need a more expressive planner module to enable the robot to use various AI-based planning systems to complete task with its human teammate. This new module should enable the experimenter to select which planning system (e.g., BFS, A*) with its different possible parameters that can be selected. In addition, this new module should also enable visualization of the tree/graph found by the robot to complete the task for each instance problem either offline (before or after the experiment) or online. This new planning tool visualization when used offline should allow visualization of the plan tree for a given problem instance as well as to define a new problem instance and send it to a specific planner. The online visualization should provide real-time feedback to the participants as they perform with the humanoid robot. This new planning module should also have the capability to store and export all the tree/graph data of subsequent analysis.
(Don't panic. There is a lot in that paragraph to unpack, but it will make a lot more sense once you see the current framework in action. And we talk about this as a "module" but let's be clear. The software we drop into this part of the VTEAM framework may itself be a pretty elaborate. The goal is to find one way to accomplish it that meets all the several engineering needs along the way.)
This project is to deliver a software solution that fully implements the parametrization of the planning system based on the specifications offered by our client. The goal is not to develop new planning algorithms but rather to augment the current interface of VTEAM to enable flexible parametrization and visualization of existing established planner (e.g., A*) allowing diversified experimental manipulations to address a broad range of problems and questions in human-robot teaming. We might come up with new planning algorithms along the way -- that would be cool! -- but the mission is to enable the NMCL Lab to realize the full potential of its VTEAM framework.
Our client in this project is the NMCL lab, for which the points of contact are Jayesh Jayashankar and Hunter Frisk.
9. VIRTUAL VITRUVIAN
Waiting on text from Jim Hagberg for cardiovascular educational/training applications in KNES.