Research Engineer in the HPC AI and Big Data group; supporting the User Community for PSC's Bridges-2, and Neocortex; while also leading the development of COSMO, a REST API for exploring the BlueTides simulation data from the McWilliams Center for Cosmology.
Julian joined the Pittsburgh Supercomputing Center in 2019 after working as Senior Technical Support and Customer Operations Engineer at Hortonworks and Cloudera Inc. Prior to that, He contributed to the University of Delaware Global Computing Lab, University of Los Andes COMIT Research Group, Technological University of Pereira Sirius HPC Research Group; and worked as a Software Developer for top-performing companies in Latin America and the Caribbean region (VeriTran, BeMovil) after earning his Master's degree in Systems and Computer Engineering from Universidad de los Andes (University of the Andes) in 2012.
Interests: Virtualization, Web Development, Entrepreneurship, User Experience Optimization.
- Primary Author for the Practice and Experience in Advanced Research Computing 2022 (PEARC22) Conference paper COSMO: a Research Data Service Platform and Experiences from the BlueTides Project. Winner of the best short-paper award for the Applications and Software track.
- Won the 2022 Outstanding Achievement Award for the Pittsburgh Supercomputing Center from the Mellon College of Science, for significant contributions to the Neocortex project, which is funded by the NSF, having helped deploy the innovative supercomputer and keep it running despite notable technical challenges.
- Deployed a multi-GPU test system for the Open-Compass project with AMD Instinct M100 GPUs.
- Deployed a Graphcore testbed using a BOW-2000 IPU Machine.
- Supported the Neocortex migration from Cerebras CS-1 to CS-2 machines.
- Participated as presenter in the Annual Neocortex NSF Acquisition Review Panel: Software Stack, Operations, Security, Virtual Neocortex Display.
- As a member of the Bridges-2 Continuous Improvement Committee (CIC), proposed ideas and helped implement improvements for the Bridges-2 OnDemand platform.
- Joined the Bridges-2 Software Task Force, for working as a team to generate: Bridges-2 Services Definitions, Requirements, and Deployment Task Force Standard Operating Procedures.
- Aided the Bridges to Bridges-2 migration process by thoroughly documenting and running performance evaluation benchmarks in preparation for acceptance testing, needed for the Bridges-2 supercomputer to be successfully accepted and for performance reviews, and questions from the evaluators, to advance without problems.
- Implemented the first version of the Data Sharing Portal for the McWilliams Center for Cosmology. This allowed a limited set of files of the BlueTides Simulation to be made publicly available via a web interface, something that was being done to a limited degree in a decentralized way with a web server.
- Migrated and revamped the AIBD website to the CMU Content Management System, including a Neocortex System section. This helped centralize the content under one manageable platform that is kept up to date by the CMU Technology team, in which an ongoing website design effort had been applied on the old website but needed to be replaced because of the effort required to develop and adapt requirements.
- Implemented the Neocortex Portal webpage for the Neocortex users to have access to news and documentation, as a way to have a fluid and structured way to share information with them while keeping track of their project statuses and information. This involved the inclusion of the MkDocs documentation system to the Portal, as a way to enable the content to be edited by the multiple Neocortex Team members while having it under version control.
- Implemented the Neocortex Documentation page using MkDocs, as an independent component of the Neocortex Portal.
- Coordinated efforts for deploying and configuring the Neocortex system, constantly communicating with multiple teams over three companies (PSC, Cerebras Inc, HPE) for successfully deploying on site the technical equipment required for the specialized hardware to start operating, a $5M project sponsored by the NSF.
- Attended the "Neocortex Admin Training" by Cerebras, covering contents for System Administration and Operation on February 10, 2021.
- Generated software modules for the most popular AI frameworks and libraries to be used on the Bridges-2 cluster.
- Worked with the Bridges-2 Acceptance Testing team for making sure the cluster was able to perform as expected when using the MXnet and TensorRT frameworks.
- Led the Neocortex Superdome Flex system configuration, a multi-chassis (8 chassis total) configured as logical partitions, each partition with 16 CPUs, 100TB of multi-drive NVMe flash RAID storage, 12TB of RAM memory, 100GbE network interfaces, and 8 InfiniBand network interfaces which are all expected to work to the top of their performance (aggregated speeds) when running jobs on the Neocortex system.
- Started working as member of the Bridges-2 Continuous Improvement Committee (CIC) from improving the experience researchers have when using the Bridges-2 cluster.
- Participated on the ICDAR 2021 Conference paper MiikeMineStamps: A Long-Tailed Dataset of Japanese Stamps via Active Learning, on the Document Analysis and Recognition category (co-author).
- Participated on the Practice and Experience in Advanced Research Computing 2021 (PEARC21) Conference paper "System Integration of Neocortex, a Unique, Scalable AI Platform" (co-author).
- Part of the team that won the 2021 Mellon College of Science Outstanding Team Recognition Award for the Pittsburgh Supercomputing department from the Mellon College of Science, for deploying two world-class supercomputing resources, BRIDGES-2 and Neocortex, in the midst of the ongoing pandemic. Despite several challenges, delays and hardships, the BRIDGES-2/Neocortex team persevered and through extreme dedication and heroic efforts were able to field these two machines for the scientific research community.
- Participated as presenter in the Annual Neocortex NSF Operations Review Panel: Operations Performance for Neocortex Testbed Operations Year 1.
- Implemented a navigation tool for the image labeler software “Label Me”, a critical piece of software that enabled research for an XSEDE ECSS project that has already generated a paper for the International Conference on Document Analysis and Recognition 2021 (ICDAR 2021).
- Helped the HuBMAP Consortium launch containers on demand via SLURM on the Bridges supercomputer, enabling the project to pave the way for running research workflows on multiple clusters.
- Performed a refresh of the first CALIMA software implementation for mining data from the Bridges supercomputer SLURM history and started a whitepaper on this subject. The goal is to predict when it is that jobs will start their execution, and what modifications could be made for getting jobs to start running with lower queuing times.
- Supported PSC's critical role in the COVID-19 High-Performance Computing Consortium by helping multiple groups with their COVID-19 research efforts by going above and beyond providing technical support and suggesting optimizations for their job executions on a daily basis, continuously spending late night hours with the different teams for solving any issues encountered on those world-wide-interest research projects, critical for the wellbeing of everyone.
- Applied to the McWilliams Center/PSC Seed Funding 2020 Program with COSMO, a REST API for Cosmology Data.
- Earned the McWilliams/PSC Seed Grant 2020, for which ~$20K in funds were used to hire and lead two graduate interns on implementing a platform for sharing petabyte-scale simulations and datasets via multiple endpoints, such as having a web portal, a RESTful API, and a Globus endpoint for users to explore the BlueTides Simulation data on the way that best suit their needs.
- Won the Editor's Choice Award from HPCWire, a leading publication in the high-performance computing field, for "Best Use of High-Performance Data Analytics & Artificial Intelligence" during the virtual 2020 International Conference for High-Performance Computing, Networking, Storage and Analysis (SC20), as part of the team lead by Dr. Olexandr Isayev from CMU.
- Nominated for an Andy Award in the Teamwork and Collaboration category as part of the PSC COVID-19 Rapid Response Team.
- Attended the tutorials section of the HotChips Conference 2020 for general information on how to scale out Machine Learning using NVIDIA, Google, and Cerebras Inc systems.
- Attended NVIDIA's GPU Tech Conference, GTC 2020, for the latest developments on NVIDIA technology.
- Attended the ICML20 conference virtually (July 2020)
- Attended the XSEDE HPC Workshop: MPI, on September 1-2, 2020, a basic MPI-focused training for developing code for HPC environments.
- Attended the MIT Professional Education - Short Programs: Designing Efficient Deep Learning Systems. A two-day course on how Deep Learning works and what systems are considered good for running DL workflows.
- For outstanding contributions to the center and more specifically to the AI and Big Data group’s mission, was awarded the Staff Recognition Rookie Award from the Mellon College of Science, given to Staff members that have contributed greatly to the Pittsburgh Supercomputing Center.
- Part of the team that won the Staff Recognition PSC- COVID-19 Outstanding Team Achievement Award from the Mellon College of Science, for all the effort and results while collaborating with research groups on crucial COVID-19 projects.
- Assisted with a section of the "Hands-on Virtual Training - Getting Ready to Use the Neocortex System" training on December 8 and 9, 2020. An overview of how to run compilation and training workflows using the Neocortex system with the Cerebras CS-1 boxes.