CMSC 435 PROBLEM STATEMENTS

Last edited 2024-02-11.

Timelines for the semester will be as called out in a separate document. How a team establishes its intermediate goals in order to meet requirements and hit hard deadlines is up to the team. I suggest that you conduct a risk assessment right away, but how long you put off discovering thorny issues is also up to the team. You can even treat the project as a big hackathon at the end if you like; it's your career, after all, and it isn't like campus doesn't promote such things. But, I bet you'll also find that hackathons are better for campus than for you, and in any event this approach has yet to end well in a 435 project. Your call.

All students on a project are equal stakeholders in the effort. No one person must work at the direction of another; we cooperate in order to win. The incentive to hold others accountable is clear: all of these projects are scoped with the expectation we'll have full effort from everyone, and there is no partial credit for partial success. We tolerate others' lack of engagement at a cost paid in our own time and reduced grades.

The incentive to fully participate is two-fold: First, the final exam is constructed to reward those who have done the work all along. When we haven't done the work then we won't know how to answer questions and pass the class. Similarly if we have done the work, but not established a paper trail to this effect along the way, then we won't be able to support answers and pass. Historically nobody has ever been able to recreate a project record that is credible enough for exam answers, and of course there is no ability later to go back in time to insert material into the record. (Translation: Waiting until near the final exam in order to manufacture history for convenience of answering questions is deemed "not credible".)

Second, the cover sheet submitted (as our academic integrity and intellectual property statement) lists who gets credit; no name, no credit. The decision of who signs the sheet is ultimately a team consensus. Basically the rest of the team can vote someone off the island, though this is not the common occurrence, and we'd like to have exhausted our inventory of practices to promote positive engagement before it reaches that point.

My advice: do the work and document it to pass the class. It might just be that these practices actually work too. Bonus!

I offer these projects as an opportunity for us to practice substantive application of software engineering principles. We will learn by trying them out, making design decisions and then studying the nuanced consequences. We can't close that loop if we don't have a detailed record of the decisions we made, however, and that is the most important reason for our serious obligations to log activity and articulate our reasoning as we go. Working code alone won't tell us we reached our learning objectives. Please take this process seriously from the start and we will win best value from 435.

-- Jim Purtilo

 


1. RADIOLOGY

The purpose of this project is to streamline the integration of artificial intelligence in radiology. Despite significant advances in medical image analysis in recent years, many of the latest models are never applied in clinical settings because standalone research-grade models (e.g. state-of-the-art cancer classification and organ segmentation) do not easily interface with existing medical image viewers. The goal of this project is to build a unified platform for medical image viewing and analysis to simplify the process of using AI in clinical settings.

This is where 435 students get involved. The first task is to build a front-end web interface that can load a folder of DICOM images from local storage and visualize a medical scan using the OHIF open-source viewer. DICOM images are a standardized representation of 3D data using a collection of 2D image slices along the z-axis. The OHIF open-source viewer contains significantly more features than are required for this project. Specifically, this task will involve understanding how to select and integrate the minimum required parts of the open-sourced OHIF viewer into a standalone web-app. All medical images will be provided by the client.

Next, the team will build a web interface to upload and store ML models (which will be provided by the client) in a Docker Registry (or similar container storage). Note that training models is strictly out of scope for this project. Each model in the registry should be automatically connected to a Google Cloud instance to facilitate cloud-hosted model inference via API. Importantly, the team must design a standardized format for each docker container to ensure that the instance creation and API creation is automated.

Lastly, the team will integrate the cloud-hosted model inference with the web-viewer such that a radiologist can select any model from the docker registry, run inference by calling the API, and overlay the model predictions (with a toggle option) on top of the DICOM image. Users should have the option to save model predictions to disk (in the same folder as the input). Re-opening this folder should automatically load previous model predictions.

This is a research project. As much as any software this semester, what we build in this project chases a moving target. This is not a one-and-done exercise. Our success is tied with how easily a radiologist can use the tool to load a DICOM image and run inference with an off-the-shelf model. Ideally, teams should be comfortable with python, web development and the basics of machine learning (with PyTorch). Teams should be able to deal with ambiguity.

A baseline project will enable a user load a DICOM image from disk, upload an ML model to the Docker Registry, select this model in the image-viewer and run cloud-based inference. The output of the model should be overlaid on top of the original DICOM image for analysis. Users should be able to save the predictions locally and view model predictions at a later time. A successful project will enable users to conduct those operations with measurable improvement in performance and satisfaction as compared with existing tools offered to radiologists. The reach goal will be recognized as all of that plus papers published in the archival literature.

Our client in this project is Rahul Pemmaraju (a student in the Rutgers RWJMS-Princeton MD/PhD Program with extensive experience in AI for Radiology) with additional mentoring offered by (Terp alumnus) Neehar Peri (a Robotics PhD student at CMU).


2. CONNECTIONS

Last year one of the 435 projects was to create a tool to support study of how science propagates via Connections. The tool pulled public data about co-authors and co-patent holders, created a graph of these relationships and (via a slider showing the time scale of the publications) allowed the user to visualize how some early publication might spread over time and potentially influence or inspire other research.

It works! And now that we have great early evidence of the value of this study, our mission is to enrich the model to see if we can create an even sharper picture of what is enabled by collaboration over time.

To get an informal sense of what is going on before diving into the existing tool, look up Six degrees to Kevin Bacon, which was a popular game. The goal was to come up with the connections between Bacon and some other actor based on appearing together in some film. The assertion was that no actor you could name was more than six hops away from Bacon. That is what the science considers here. In order to understand the potential influence of some 'idea' we need to know something objective about where it has gone. That inspired the first study via tracking co-authorships. Author A published a paper with B, who maybe then co-authoried a paper with C and so on. Put on a timeline and seen with a slider, we saw early ideas spread in a graph form. It was pretty revealing, even if not nearly the whole story. Now that we know the idea works, we can track richer features to see what they reveal.

As you'll see, the original Connections tool integrated search and visualization functions. Each author search resulted in creation of a graph for display, and similarly the display was updated with the results of each of the narrow kinds of edits someone might make to the graph. The scale of data we're dealing with makes this approach an expensive activity. We knew this going in; the lean process was to build what we did and validate the approach by winning preliminary results. The opportunity now is to unbundle some of the basic operations, which should reduce much of the overhead; experiment with alternate packages for the display and visualization to improve performance; and enrich the graph model in order to enable more and richer kinds of queries which we have confirmed arise in this study. Instead of looking at only whether some paper was co-authored, we could consider filtering on papers which have content similarities. (We can talk about ways to do that.) We might like to create other kinds of links in the graph based on mentor-student relationships, affiliations in common or other general attributes which might help prune what is otherwise a large search space.

Some features we'll mention, though they will make more sense once you play with the original Connections tool: First, while the pilot did all analysis on data pulled from the external sources on each launch, we are confident that much of those data can be retained and curated locally, saving much time; there will thus be some data management features to add. Second, while the pilot grew a connections graph as directed by the user, this was not very useful in helping the user find connections to specific other labs or scholars. We'd thus like to enable queries and path searches that focus on specific targets, not just grow unbounded until the user recognizes the target of potential interest. And so third, instead of just replacing the data vis package with something potentially more efficient, we would like to evaluate this approach with graph database tools (like neo4j in particular) and see whether this gives better computational properties at scale.

We will recognize success in this project when the tool enables our users to discover new and "interesting" connections between diverse parts of the scholarly community. Just burning a lot of resources to search and discover what which we might have already known is success.

As with the original Connections project Dr. Christopher Nissen, a visiting Professor of the Practice on campus, will serve as client and mentor. He will assist by identifying data for our tests and will judge when connections we find are “interesting.” Koushik Thiyagarajan (a 435 veteran working with Dr. Purtilo on this) will serve as an additional resource.


3. PLANTS

Commitment to environmental sustainability is enhanced when people have strong connections with other species. Strong connections with nature, in turn, enhance human health. The campus landscape supports a diversity of plant species that provide rich opportunities for developing those connections. But for most people those plants are little more than the blurry green backdrop against which we rush through our daily lives. Few students know they are in the midst of an Arboretum and Botanic Garden. Fewer know that the plants that surround us have fascinating stories to tell. Plants create the oxygen that allows our very existence. The food and fiber they provide have driven the development of society and culture as we know it. Plants also enrich our intellectual, aesthetic, and emotional lives, having played fundamental roles in arts and literature throughout human history. Plants are the source of pigments in paintings, dyes in textiles, subjects and symbols in literature, and form the very pages on which books are printed. We seek to increase connections to the UMD landscape and foster appreciation for the biodiversity of Maryland by sharing the stories plants have to tell through location aware technology to facilitate informal learning where the plants actually live. These connections will encourage people to get out and explore campus and to enrich their understanding of what they are passing while they go from one place to another.

The goal of this project is to develop a location-aware digital system through which people can engage with and learn about trees and shrubs they pass by as they move around campus. Taking advantage of the campus Arboretum’s GIS database of trees and shrubs (https://maps.umd.edu/abg/), the system would cue people that they were near a plant of interest, or would allow people to query for plants in their vicinity.

In addition to the names of the plants, people would have access to narratives from a range of interconnected perspectives including ecology, literature, art, chemistry, and history. Information would be conveyed in multiple formats including straight narrative, quizzes or games. The modes of accessing information would reflect and highlight the non-linear interconnectedness of the different disciplines.

Users would also have a way to access the same information from their computer when they have more time to read in depth. Ideas to encourage repeat engagement include a way to “bookmark” or “collect” plants for that one wanted to learn more about, a way to find and navigate to other examples of the same plant on campus, a treasure hunt/find-that-plant feature, and a reward mechanism for regular engagement.

We are crafting an experience, not just making a location-aware application that dispenses plant information. We will recognize success when we see measurable improvement in the levels of engagement with the plant scene on campus, consistent with the abstract goals called out above. A substantial part of our project will thus be to derive and validate effective metrics, which will allow us to experiment with potential presentations and work flows.

Finally, we must recognize that no such system is useful without the data, so let's be clear from the start that our success criteria will also consider the quality of experiences of users who are charged with capturing and curating the plant data. Whatever we create must be sustainable.

Our client in this project is Prof. Maile Neel in the Department of Plant Science and Landscape Architecture as well as the Department of Entomology UMD.


4. DECIDIO UX

Decidio is the name of a SEAM project which conducts research in group decision and negotiation. Can machine support enable better collaboration within larger groups of people? We'd like for the world to research different ways forward using Decidio.

One of the claims we make is that we reduce the cost of experimenting with "meeting types". Anyone with an idea about some better way they want users to interact should be able to create that meeting type as a plug-in, and then get the rest of the services of our framework for free. Is that claim true? We'd like to show the way forward and find out.

The purpose of this semester's capstone project will be to create a suite of graphics based meeting types which invite richer collaboration and interaction among participants of an on-line meeting. We're moving past the simplistic level of just sharing text boxes and would like to get to fine-grain techniques to involve more people. We have specific examples to share at our kick off meeting, but will invite the team to come up with their own ideas to pilot. (Our prime directive is that we want participants to think about the quality of ideas being shared and not just 'how to use the tool'.) But to offer one solid example here, we have a specific interest in a graphic meeting type to assist a group in decding how to allocate some resource; the use case would be a conference committee that is trying to decide how to task members to conduct reivews. Papers would be drawn from a separate research service called Colloquium, the committee interacts graphically by moving represenative tokens from one to another bin (signifying potential allocations), and those results (once agreed to) are passed to the next meeting according to the meeting rules.

The efficacy of meetings in ths project will be evaluated based on user students which will be on-going through the semester. We need to know more than the tool can be started and simply not crash; we need to know it qualitatively improves the results of meetings in which it is used. To be clear, this is very much a research project, which entails a fair amount of ambiguity. There is no one-and-done solution to be crafted and tossed over some wall, rather success will come from willingness to experiment, evaluate and improve as the semester progresses. A liberal dose of imagination and creativity can't hurt either! The potential payoff is in participating in user studies with potential publication value later.

Prof. Purtilo is our customer in this project, but we also have with us two veterans of 435 who today work as research assistants with the Decidio project. Thomas Purtscher and Samantha Taskale are on point to facilitate the meeting definition and user studies, and we will pull in other members of the Decidio team as questions about the framework come up.


5. DOCKET

We would like to have our own replacement for the Quuly system, which you may have used as officehours.cs.umd.edu.

The department has worked on such things before. Unfortunately, the conditions under which we would like to use replacements are a moving target. Instructors have a richer blend of in-person and on-line demands, and the scale at which we operate only keeps getting bigger. This is our chance to do it right and help out some long-suffering instructors in CS.

When the CS department was much smaller there was no online system for managing office hours. Courses were small enough that office hours were usually not busy enough to need any type of management; at most only a few students would be waiting to be helped at any time. If office hours got busier at any specific times (such as when a project was due), it was sufficient for the TA to keep a simple list of waiting students on paper. As our courses got much larger this did not scale up. Quuly was developed by a couple of now former students who were undergraduate TAs while they were here, to automate the queue of students waiting for help in office hours.

Since there is already a working system, you may ask why we are interested in a replacement for it. Good question. Quuly came and went as a commercial system, and honestly at this point is in a questionable state. Even if it worked well, then it doesn't fully address the hybrid needs for instructors handling a combination of in-person and virtual hours. And it isn't clear it does this at scale nor offer the metrics we'd like - info about how long which students are taking on what sorts of problems. We'd like to leverage data to improve student outcomes.

If we had better information and analytics about office hours (for examples, the times that are busiest, which TAs help the least and most students, and which students come to office hours the most) perhaps we could optimize office hours, identify struggling students early, or improve office hours or the office hours schedule in various ways. Notice please the connection to the other instructor-supportive project this semester.

This project is an opportunity to create anew not simply try to mimic other tools already out there. Our replacement must be sustainable over the long run, meaning it is based on robust frameworks and tools which are familiar to our product owner, the long-suffering instructor also who serves as voice of our customers and users, Larry Herman. One of our recent veterans of 435, Franklin Ngongang, will also be available to consult with the team on the compressed build process and some relevant technologies.

Heads up! If you have studied recent projects then you know this one is a do-over from last semester, albeit under a different project name. We got close but didn't deploy for reasons which can be discussed. Not least among our interests in another shot at it is the importance of linking the scheduler with engagement. Reliance on office hours and quality of questions asked seem to be important features which we can harvest in the mission to create spectacular outcomes for our students. The systems must work together.

Copyright © 2024 James M. Purtilo