STEM Student Interns at TEAM
STEM Student Interns at TEAM
This summer four San Diego high school students interned for TEAM as part of the Research Experience for High School (REHS) program at the San Diego Supercomputer Center (SDSC). The students primarily analyzed TEAM's portal visitors and data downloads in order to discover patterns in portal traffic and core protocol downloads. A summary of their work was presented in a poster session at SDSC (view a PDF of the two student posters). Some of the students were also featured on the local news in a story about the internship program.
Two of TEAM's interns wrote the blog posts below about their experience working on the TEAM project. Allen Cao:
"We interns were given a collection of tables that belonged to the TEAM portal that contained limited information on users, downloads, and protocols. Our task was to analyze this data and present it in formats that revealed meaningful trends in the usage of the portal. Over the course of this internship, the data was analyzed primarily according to users and downloads, which in turn were analyzed according to protocols, roles, and domain.
In order to even begin extracting information from tables, I refreshed myself on SQL. SQLZoo and code from the mentors provided an excellent crash course that allowed me to begin importing and querying the tables. I had to do some research to overcome some import hurdles, but eventually I was on my way to joining and extracting necessary information. There were initially five tables, with two more added later to link some of the tables together. After exporting the output into a CSV file and importing that into Excel, I was able to create a large quantity of related graphs concerning user and downloads by site role, email domain, and last login time. By uploading the info and recreating the graphs in Google Spreadsheet, I shared my information with the time as we pooled together our work. I then created a map showing relative downloads by site including protocol information and started analyzing the information.
Over the course of this internship, I learned a lot about how to manipulate information. By using Excel, PostgreSQL, Google Docs, and Fusion tables I was able to create high-quality charts. The charts revealed some interesting trends regarding portal usage. Spikes in user registrations and visitors occurred during studies and other events that promoted usage. Users tended to visit for a single download, with a small minority being repeat downloaders. These repeats downloaded large amounts, too. Finally, the majority of traffic came from North America and Europe and the majority of TEAM downloads were about South American sites."
"Well, it's been eight weeks here at the Supercomputer Center and my internship with the Tropical Ecology Assessment and Monitoring (TEAM) Network is drawing to a close. I certainly had a good time being a data scientist (as I like to think of myself) this summer. It's weird - I only had a few big tasks assigned to me - but I feel like I learned more from my eight weeks here than I would ever learn in eight (or even 80) weeks at school. For one thing, the atmosphere was a lot more hands-off than it would ever be at school; I actually had to figure out what I had to do with my time and how to be productive rather than have my time segmented into 90-minute intervals – I actually had to be self-motivated.
Our main task this summer was to search through several large database tables containing data downloads, user registrations, user roles, and other portal data and use it to make infographics using several different programs such as Microsoft Excel, Google Docs, and Google Fusion Tables. We also had to learn the Structured Querying Language (SQL) and use it in a program called pgAdmin so that we wouldn’t have to look through all the data manually (that would have been the worst internship ever!). At the end of the internship, we had to make a poster summing up our findings, which was actually really convenient since all our work was visual and thus poster-ready. Basically, what we found was that:
- Most users of the TEAM website came from North America and Europe, with some also visiting from TEAM tropical sites
- Most users did not download heavily and those who did fell into two categories: long-term downloaders who downloaded a few things per visit and those who came once or twice and downloaded huge amounts of data
- Downloads of animal pictures and vegetation data from South America were most popular
- Spikes in visits to the TEAM site came when the media covered some of TEAM's camera trap work
What I really liked about this internship was getting to learn something and then apply it in actual work that would be presented in a workplace environment. Sharing this experience with the other interns was a blast – between learning how to be of value to an organization, simply walking to lunch, and getting to look through awesome animal pictures, I had a really good time this summer and I would definitely do it again."