Episode 1 features Achintya Pillai, a data scientist who worked for Panasonic at the Tesla Giga factor in Buffalo. Achintya talks about his experience working as a data scientist in manufacturing including how automation can reduce human error. We also learn about his journey to becoming a data scientist and his advice for those who want to follow the same path.
(upbeat music) - This is Buffalo State Data Talk, the podcast where we introduce
you to how data is used and explore careers that involve data. Tune into this episode of
Buffalo State Data Talk,
where Brian Barry will be
interviewing Achintya Pillai, a data scientist who worked for Panasonic at the Tesla Gigafactory in Buffalo. Achintya will talk about what he does, and what it's like
working in manufacturing
as a data scientist. Make sure to keep listening, to find out what skills
he thinks are critical, and his advice for starting
your career in data science. - Thanks for joining us.
Welcome to episode one. I'm your host, Brian Barry. And today we'll be
talking to Achintya Pillai about his experiences in data science. The first question is
what does your typical schedule look like? And does it vary from day to day, or is it fairly consistent? - So, my typical schedule
when I was in New York, it was very,
it was not very mundane. It didn't look the same every day. Depending on projects, you
have to talk with many teams. That's something that's very important because you might have a particular focus
working in a company. For me, it was, my focus was
on manufacturing processes. So, anything related to
speeding up manufacturing, or meeting manufacturing 4.0 standards, or trying to achieve those standards.
Those were my major goals. But sometimes requests do
come in from different teams, such as the human resources team. They have analytics to do that you might have to get involved in.
So those are the kind of things that causes a little bit of
variation in day to day tasks. As far as just a regular day, how it usually looks like is you usually end up getting
a project, obviously,
but once you get the project, the first step is, in the day,
is to really ask yourself, "Do I have all the data that I need? What are the sources that my
information is coming from?" You need to check all your
sources on a regular basis,
just to make sure data collection is good. All your pipelines are set up. There's no issues with
the data flowing in. And also checking, sometimes if you have a data engineer
or DBA, database administrator, on site, you don't have to worry about the systems, and if it's running and all that stuff, because he would tell you, but if you don't, which happens sometimes,
you might also have to check
up on your space, utilities, like you have enough
space in the SQL Server to add another application,
to collect data, stuff like that. But once you've figured out
the data collection processes
and everything is correct, usually you're good for
your daily checkups. Now you're to go ahead, and I usually ended up
making a lot of graphs just to analyze certain
different components
that the company's looking for. So, the company's looking for why is the reason that
certain product values are coming out so bad? You have to go ahead and
start, just on a daily basis,
just looking at how, what the process information
is in the database which is affecting these values and why is it actually causing
these values to go back? That's an analytical part of the day.
Now, after that, you go into
more of an automation side. Once you kind of figure out the analytics, you have to start. Now, this is where the coding comes in, where you are disseminating
scripts in Python or R,
and start scheduling it regularly to do the same task so that
different teams get graphs on a regular basis, rather than just one
analytical task per day. That's not very realistic
because you have other job tasks as well. Now, for these regular tasks,
you can either write scripts, or you can also use programs
like Tableau or Power BI. These are kind of visualization softwares, which regularly give them updates,
which is connected to the database. So that's how my day to
day schedule looks like. - What did you plan to
be when you grew up? - I actually originally planned
to be a software engineer. So that's what my initial goal was.
That was what I planned to be, yeah. - Do you see ties to what you
wanted to be along the way and where you've ended up? - Yeah, of course. So, like I said,
a software engineer is what
I wanted to be initially. So I went to school to do computer science as an undergrad. And that kind of, I took a couple of math courses
while I was doing my
computer science program. And I kind of got interested in statistics and computer science at the same time. So, I found out about
something called data science, which involves both to an extent.
So, I ended up getting into
data science in that sense. So as I went along the journey
of software engineering, I kind of found that math and data science to be a little bit more interesting. That's how I kind of veered
my way towards data science.
- Yeah, they're both kind of intertwined into data science, right? What are the skills that help
you move into this career? What hard skills or any soft skills? - Of course.
As far as hard skills go, just for real, being realistic, you do need a level of coding ability to be able to get into the field. Because in reality, there
is a lot of systems,
and a lot of databases, a lot of programs and scripts that you will have to deal with that you have to know how
to program to an extent. Specifically SQL, whether
that be Microsoft SQL,
or PL/SQL, which is related to Oracle. And a couple of programming
languages like Python would be very, very helpful
as you'll make scripts multiple times. R is also something that's very useful
because it basically has coding packages and math packages combined in it. So, that's a very good statistical and analytical coding tool. So as far as hard skills,
those are some things that
you would have to know. One thing that I've learned
after getting into the industry is that soft skills like communication, being able to write
documentation very well, those things are extremely important
because most of your time
would go into communicating with multiple teams, which is difficult for a person
who hasn't done it before, believe it or not. Like, you think you
would be able to do it,
but when you actually get put on the spot, asking the right kind of questions relating to where they're coming from is actually a very difficult task. Also being able to document
whatever you're writing
is something that I found
difficult initially, because it might seem good to
you because you're doing it. But when someone completely new comes in and reads what you're trying to do, especially how you're trying
to impact the business,
you need to be able to
give the documentation. So those are the kinds of skills. - Yeah, we definitely are
looking at those kinds of skills. Right? - Sure.
- So, you know, you were
kind of talking about, you know, working with teams. So, do you work alone, or
do you work with a team, and how often? - We were split into a smaller team.
So we had one data scientist, which dealt with the
manufacturing process. So he looked at the data,
tried to analyze it. We had software engineers
and integration engineers, who kind of are connected,
the end points from
basically from the machine in the manufacturing process, how do I get the information? So there has to be a
pipeline in that scenario. So he dealt with that, basically
trying to get me the data.
And then there's also a data engineer, or a database administrator
who has to be there to kind of deal with the
servers, the databases. So that was kind of how
our team was structured. - So, when you have meetings,
what are the types of
things that you talk about? - So, meetings, obviously, are regular. You can have meetings
with your internal team. I'll start with that. They, just basically talking
about the application
that's already running. What are the kind of
resources it's gonna take? And it's important for me to mention this because as a data scientist, whoever's trying to be a data scientist,
they should know about, even though they don't know
everything about databases, and all those things, they, and I don't know if in
college I specifically took, I don't remember myself
taking any course related to,
specifically related to
databases, how they function, how query should work. Speeding of query, some
students might take, but it's not very common. So you should spend the
time outside of class
to learn about databases in general, because you would be talking
with other data engineers or database administrators who
will ask you these questions about how much space do
you think it's gonna take? How many times is your script gonna run?
Because we only have
certain available space that we can provide to you. So these are the kinds of things
that is important to know. If you don't, you can still
learn on the job for sure. But I think it's better
knowing before you go on.
It will give you a good advantage. As far as internal meetings
go, that's how it usually goes. Also, you'd have to know
about programming concepts, just so that you can give a generic idea to any software developer if
you have to make any program
to collect the data. And people forget that the data actually has to be somewhere. Without that data, your
job is kind of redundant. So, you need to be able to talk
with a software developer as well, to actually, for him, helping
him make the application to collect information. Other meetings that go through. Now, these are the important meetings
where you had to talk
with the actual teams. Whether it be the engineering team, or whether it be the HR team, or whether it be the
quality department team, customer service team, it doesn't matter.
You kind of, you might have to talk with many teams depending on the situation. But the questions, how
those meetings usually goes, me asking a lot of questions,
and them really trying to answer them. I ask, I, in the beginning,
I ask a lot of questions that they might not think about, but it's important for me to know. Number one question I ask
is how many sources of data
do you actually have? Do you just have Excel sheets that you've been filling by hand? Can I get any other sources
of information from, even if it's from Twitter.
If you can get some
information from Twitter, go ahead and get that
information using APIs. Do you have other database
information stored somewhere else from the past that you can give me? And it's better to get just all the data.
It doesn't matter what kind of data it is. Doesn't matter what the quality is. It's better that you have all of it. And then you can work on fixing it, figuring out any flaws, working
with bad data, et cetera.
Because a lot of the times you will notice when you go to work, these questions are asked
because most of the time you'll notice that a lot of people don't actually have the data.
If they do, it's great. If they don't, that's a problem. So that's why you just ask them how many sources of information they have. Now, after that, I usually ask them
what their goal is with all of this. What are you really looking for? Because I can make a billion fancy graphs, which will not help them at all. - Right (laughs).
- You just need to make them a graph so that they can on a day to day basis, look at it and state what
is the decision I can make. This part is important because your value as a data scientist or a data analyst
is added on based on the business impact that you give the company. So we, unlike software developers, we don't have a product
that we're shooting out, or we're not working on slight features,
but we're actually trying
to make a business impact. So these questions are important to know because it's your job to
help them make that impact. You know? - So we need to ask a lot of questions--
- A lot of right questions, yeah. And ask them what they're
really looking for so that you can try to, you
can really try to help them. Yeah. - Awesome.
What is the challenge that you've overcome in your education or career? Domain knowledge is kind
of vastly underestimated, or grossly underestimated. And while you're studying,
because they speak about
this domain knowledge stuff, but you don't really understand
how important that is. If you're working in a company like me, we're working on cars, and batteries, and all these chemical technologies,
and mechanical technologies
that you have no... I did my computer science engineering, and then I did my Master's
in engineering data science. So I don't know much about chemistry, or like mechanical engineering
and stuff like that, right?
But there are a whole slew of people who have been working
here for years and years, and who've been building up
this understanding of the domain and how the product functions, and every single little aspect of it.
And you have to go in there and try to really
understand these aspects. And that is a big challenge
when you go to work, because depending on what
industry you work for, you really have to be able to understand
what your company is trying
to do in the industry, where their business values
are in the value chain. Like where does their
business value stand? Because if their business values come in research and development,
that's where your, kind of, your expertise is gonna really go in to try to help research and
development make new designs, make new ideas, and stuff like that. But if your value is,
if your company's value is
going more into manufacturing, you would have to,
like, really think about how you can use manufacturing data to maybe speed up the
process, reduce the costs, stuff like that.
So you see the difference between where research and development versus manufacturing development. Like, these are kind of domains that you really have to find out
and seeing where your
company is focusing on, and really trying to understand
what they're trying to do. That is a big challenge. A lot of people kind
of don't think about it before they join.
They're very excited to get the job. But once they get the job, you really have to dig into that. So that's a challenge
that I faced when I work, that kind of overcame (indistinct).
Took me a while. Took me about three months. - Yeah, I think that makes sense. You know, really understand the company that you're working for and--
- Yeah, exactly. And you don't know the, and also even, you're working
with a lot of engineers and stuff. They know their thing.
They know their stuff inside out. Like, they know what the product does. They've basically helped build it. So, if you're going to
come in there and say, I'm a data scientist,
and say that I'm gonna give
you some analytical work so that you can make changes
to your business process or your engineering process, that's gonna be, it's a bold statement to make, you know,
because these guys have,
these guys are educated, they are definitely very well equipped, like knowledge wise,
they're very well equipped with the product. So if you're gonna go
there and tell them that,
you need to definitely know what you're talking about, you know? Otherwise they won't take you seriously. - Yeah, definitely. - Yeah.
- So, the next session
that we're gonna do is, I titled it "Interesting Problems." And I'm just gonna ask you
three questions from here. So, I guess, without giving away too much, what do you think are
the interesting problems
of your field? - Interesting problems in my field. So, just in my field in particular, manufacturing is something
that a lot of people don't look into.
But bigger companies like Ford, GM, Tesla, Panasonic, GE, especially, these kind of companies focus a lot on, they do have, they own
a plethora of businesses in different industries, right?
Manufacturing is something
that they do in-house. So, as far as we go, one big problem that, one big movement that
everyone's trying to make is they're trying to move from the current manufacturing industry to 4.0,
manufacturing 4.0, where there is the less
of human interaction. Cause errors usually end up in that area. I'm not saying all human
contact has gone or anything, but very less.
So-- - I was gonna ask you, could you go in a little
bit into manufacturing 4.0, because I had heard
you mention it earlier. - Yes.
So, manufacturing 4.0 is the
idea where people want to kind of move into more of,
like manufacturing process where there is more automation,
less human interaction. Basically that's the gist of it, right? There's obviously a lot more to it,
but just to give a general idea. Right now there's a lot of situations where errors can come up because a lot of values might
not be calculated correctly. People are doing a lot of things by hand.
They're trying to move
away from that concept. And they're trying to move into a different kind
of industry standard. As far as data scientists go, we have a big part to play in that
because most of all of
these automation works will require a lot of machine learning and analytical tools to do it. How much of this particular... So, for an example, how much temperature
should we put this machine
at at this particular time based on your previous data, right? Based on what you know if
you have a hundred products in that machine, what should the temperature be?
This is just an example. It's not, like, it's not
something that you would do. But, yeah, those are the kinds of things that you would have to do. And you would know from the past
that all 100 products are there, if you keep the temperature
at a certain degree, the products are gonna go bad. So, rather than someone being
there and making a mistake, your programs should automatically
kind of understand that, you know. And that's where, that's something that everyone's
trying to move towards. The problem a lot of the times is, that people would face
when they go into this is
there is just, sometimes
there just might not be enough data collection in your company. It's a new concept. It's only been around
for two years, you know, like, the whole data science,
or data collection process,
it's really coming up now. So it might take a little bit of time for people to actually get
all the information they need, good data, bad data, so that your algorithm can
really learn from it, right?
Sometimes people might
not have good data at all. By good, I mean they might just have, they might not even data. By like, when you have data, there might be a lot of missing
values and stuff like that.
Bad data. Again, you might just not
have the years of data that you need to basically
get seasonality trends. So time series analysis
is kind of short there if you don't have the years
of data that you need.
Yeah. So those are the kinds of
issues that you face a lot, in manufacturing, at least. - So if you do get bad data,
what do you usually do with it? Do you just toss out the data set,
or you do you have to kind
of massage it a little? - Yeah. I try to preserve as much as I can. Like I said, we're still so young that data collection isn't as plentiful
as you'd want it to be. So you can't just chuck out
like a thousand data points, or even 10 data points. It's not very easy to just throw it away because it's just not a smart idea.
You'll just end up losing more data, and which will affect the
algorithm quite a bit, cause usually what happens is if you have a hundred data points, and you're trying to look
for bad quality products
in a hundred data points, most likely than not, majority of them will be good products. So you only have a small majority that's actually bad quality products,
but what caused those bad quality products are very important. But since you only have a
few of these data points, it's very difficult for your
algorithm to understand, okay, these kinds of situations
lead to a bad product rate.
So you need to have years of data so that you have these bad
product data accumulated, so you can teach your
algorithm understand this. I don't just chuck away data points. I try to massage it.
I try to preserve it as much as I can, whether it be, can I
fill in the information practically from somewhere else? Can I, instead of just removing it, can I average it out with
the other data points
in that column? Can it be zero? Will that be okay? Will that affect my process by any means? Will that show up normal
results, you know?
So those are the kind of things that you have to think about. - Cool. So, what sort of collaborations are you most interested in seeing?
And what do you see as the
benefits of those collaborations? - So, what I would like to see, like if you look at the financial sector, and my managers talk
about this all the time. His goal (indistinct) is actually,
they have a vision of making data science kind of like a day to day use. Like, you have to be
able to make a consumer use data science products every day, rather than just making them
request it and stuff like that.
The product has to be out there where they're using it consistently. And they're making
significant business decisions based on your algorithms. That is where real values lie.
So, in financial sectors,
you have tech sector, like you have a bank, you'll have a technology
division of the bank. If they're able to make higher ups, or if they're able to make management
make more important
decisions based on these, based on these analytical
tools that you've made, that will be the most
interesting thing to see because you will really
see whether the field is really helpful for
business or not, right?
So, as far as collaborations
go, I would like, I personally would like to see more, the technology division of a company collaborate more with the
management division of the company and make it a more day to
day kind of thing, you know?
- Yeah, yeah. - Yeah. - Is there anything in particular that you would like to talk about? Or an issue that you'd like to raise
that you haven't touched on yet? - I would like to make it very, more aware that in the
field of data science, it's not just statistics
and machine learning and analytics that you do.
It also involves a lot
of other disciplines like database administration
or data engineering. You have to be able to
understand those concepts because you need to be able to know what other people are doing,
cause it's closely related to your work. So much so that you really have to be able to guide them also in the
right direction to move so that you can finish your job. Data science is not as,
it's not as established as
software development is, like their work culture. And a lot of data scientists are younger. So we have to understand that
since it's not as established, even our hierarchical
system is not as established
in many companies. Cause sometimes if you're
in a smaller company, you kind of do multiple roles. Like, you wear many hats. You kind of do data engineering,
data science work as well.
But if you're in a bigger
company, it could be split apart. Like, you will have
specific data engineers, you will have specific data scientists. But it's, I think it's very
important for you to know that you should probably know what a lot,
like, what data engineers, or
database administrators do. You should know how databases function. So those are the kinds
of things that I think, I just wanted to mention
so that everyone knows. - Yeah, yeah.
And my last question, what advice would you give to
others that are interested in going into your field? Whether that be back in high
school when making decisions, or, you know, about an
additional education,
or people who are thinking
about a career change. - To advise for people, if they wanna become a data
scientist or a data engineer, I think it's important for them to know that you need some sort
of coding background.
So, if you have the
opportunity to go to college, try to pick something
related to math, coding, maybe dual majoring or something, where you have the, or maybe MIS as well, where you get some coding experience.
Cause that is important,
whichever way you look at it. If you have the coding experience already, you can also get it from Udemy. You can get it from online
sources, which is very valid. But if you have the
coding experience already,
then I would strongly recommend you start looking into
mathematical, like, courses because stats is important. You do learn about stats. Stats in particular, not
mathematics as a whole bunch.
Statistics in particular
is very important, probability stats. So, if you don't have that experience, then you should probably
try to get into that. But if you're at a starting
point where you have nothing,
if you're in high school where you're going to get
your high school diploma, I would first suggest
you do the coding part. If you can do both at the same time, like doing a dual degree ,or minoring,
or something like that, that's good. But if you can only pick
one, start with the coding, and then kind of work your way into math by doing online courses on the side. And then that'll be a good
base to get into data science.
- Well, great. Well, thanks for doing
the interview, Achintya. I really had a great
time listening to you. - Thank you. - It's, you know, it's
been a really good time.
- Sure, same. - Thanks for listening to
Buffalo State Data Talk. For more information
about starting your career as a data scientist, go to dataanalytics.buffalostate.edu.
Don't forget to subscribe so
that you get a notification each time we release a new episode. And join us September
1st for the next episode of Buffalo State Data Talk. (upbeat music fades)
Some content on this page is saved in PDF format. To view these files, download Adobe Acrobat Reader free. If you are having trouble reading a document, request an accessible copy of the PDF or Word Document.