Krish Rana – Data Scientist and ML Researcher
Krish Rana, a Data Scientist working for Intel out of Portland, Oregon, interned with PSC’s Biomedical Applications group and is a former CMU Teaching Assistant. Recently I met with Krish over Zoom to chat about his experiences here and to catch-up on what he has been up to recently.
BIOMEDICAL APPLICATIONS GROUP INTERN
CC
Hi, Krish. It’s nice to meet you. When you interned at PSC, you worked with Mariah, Ivan, and Luke (in PSC’s Biomedical Applications group). What kind of projects did you work on?
KR
I mostly worked on the Brain Image Library (BIL) project. They were trying to build out a website and wanted to automate the process of uploading the documents and tagging them. Previously they were using Excel, but they wanted to automate it and use a web-based interface to do the kind of tagging needed. This was more than a year ago, but I was basically working on a data processing pipeline for BIL.
CC
Interesting. From your LinkedIn profile I see that you also interned with Intel in Arizona that same year. What can you share about that?
KR
For about 3 1/2 months during the summer, I worked on three projects. I was dealing with industrial or production grade systems and production grade databases, that data is kind of messy. The main part of the project was to deal with this messy data and convert it into data sets to be fed into ML models. Out of the entire time, maybe 70% of my time was spent preparing the data and getting to know the systems of Intel’s data infrastructure. It’s massive and it takes a while… even now, I don’t know everything about it. I’m still learning even while holding a full-time position with them. In terms of PSC, because I was using mostly open frameworks and open technology, working with Intel definitely helped.
CMU TEACHING ASSISTANT AND INTEL INTERN
CC
You were also a teaching assistant at CMU. How did that come about?
KR
I was splitting my time between PSC and working as a Teaching Assistant for a VR course. It was a new course, I think it was the first time the course was offered. I just happened to be in touch with the professor, I had done a course previously with her. So that’s how I got to know about that. There was one other thing about this, about the statistics and I’m not sure whether PSC helped with that, but the concepts were pretty similar.
CC
Often things like that work out that way. You make certain contacts through one experience and then it kind of spills over.
KR
One thing PSC helped with during the [internship], I took a course on deep learning and needed access to GPUs to actually train the deep learning models for the course. That tends to really help because you spin up instances as needed, especially with GPUs. That helped me set up one training over there, which was really helpful.
CC
That doesn’t surprise me, PSC is well known for its GPU expertise. Can you tell us a little bit more about this VR course?
KR
The course was about exploring VR environments and using them to enhance the communication skills of students. At CMU almost every department or course requires one communication course for your degree or over the course of it. So just to give students an immersive feel because in a classroom environment, you can speak in front of 10-15 people? That’s easy to do. But to speak in front of a larger audience in a proper auditorium setting, that’s a very different experience to have.
It was an interesting idea to explore, if we could help leverage immersive experiences to provide real world experiences. I was mostly helping the professor set up the VR devices and the environments, so more on the technical side of things.
CC
You did an internship with Intel, now you work for Intel. Are you working in the same areas that you did as an intern or a slightly different focus?
KR
It’s interesting, my internship was initially focused on traditional modeling and dealing with tabular data, but then towards end of the internship, my interest turned to deep learning. I knew that there was one team who was working on building models and using LLMs for knowledge discovery use cases.
That was one of my final projects during the internship and over time I continued working on the project with the same team, basically continuing where I left off. We expanded the scope of our project, making a proper great platform out of it.
DATA SCIENCE AND AREAS OF INTEREST
CC
As a data scientist, what do you think are some of the most interesting areas of research and development in that field?
KR
Recently, Generative AI has been dominating the deep learning landscape. It’s really interesting to be honest. But the thing is, within the next 10 years there will be so many areas to work on. You can work at the foundational layer, basically creating your own foundational models from scratch, but that requires some amount of resources and multimillion dollar funding. So that’s not possible with every company. But, if you are working at, let’s say, Meta or Open AI, and if you’re interested in that, that’s an interesting area of research. Applying search algorithms on top of that is also a really interesting area of research right now. So that’s the foundational layer.
But if you want to work in the data or infrastructure layer, or maybe a more general area, then there’s model inferencing, it’s pretty interesting to me, personally. You can build a trillion parameter model, but how do you actually put it into production and have a million user base or have concurrent access to it and do everything in a fast way? Also, keeping the user experience in mind, you don’t want to wait for 5 minutes to get the answer from that. So, the AI inferencing part is really interesting and there are a bunch of startups working on this, on the hardware as well as the software layer. That is a good idea for a feature I would say, and then comes the data itself.
When you’re working with enterprises, they are pretty strict about where the data goes, who has access to it, and those kinds of things. There are good opportunities to build startup as well as internal teams within the companies themselves that handle the data for the enterprises as well as how you put that data into the LLMs. For example, if you’re building access, there are many levers that you can pull. So how do you parse out the data? Companies mostly use PowerPoint, SharePoint, and the like, building out good parsers to extract the relevant data and also the images. How do you encode the images and then pass it into the model, and how do you combine the multimodality aspect of it? Since it’s a new field there are no boundaries that confine you, it’s pretty much open. It’s a good area to work in right now. A lot of opportunities at every level, I would say.
CC
Are there any areas that you’re not currently working in, in data science or related fields, that you would be interested in exploring in the future?
KR
Yes, definitely. I would say there has not been much focus in data science, or even general applications, in terms of the UX, the user experience, part of things. For example, an ML model or maybe an LLM talking to a user. At what point does the user know that the LLM is hallucinating and when should you believe the answers that the LLM or ML model is giving you? So, areas around reliability or the UX of using ML models are interesting to me.
Also, how do you display the LLM’s output to the users? Building a chat interface works for the most part, but then once you expand the field into, let’s say, protein modeling or something else, then how do you interact with the LLMs? Broadly speaking, that is interesting, I would want to research more in that area moving forward.
CC
Sounds fascinating! Well, it was nice to sit down and talk with you and I appreciate you spending the time. Have a nice afternoon.
KR
Thank you so much.