In this episode Susan Walsh, the Managing Director of The Classification Guru, talks about how her company specializes in cleaning and classifying data for their clients. Susan shares how she works to make data fun and relatable through her company and her LinkedIn presence.
(upbeat music) - This is "Buffalo State Data Talk," the podcast where we introduce
you to how data is used and explore careers that involve data. - Hello, and welcome
back to another episode
of "Buffalo State Data Talk." I am your host, Heather Campbell, and thank you for joining
us for episode 15. Today, we'll be talking to Susan Walsh, the Founder and Managing Director
for the Classification Guru
and the fixer of dirty data. Welcome to the show, Susan. - Hello, thank you so much for having me. - We're so happy to have you. So could you start us off by
telling us about your company,
the Classification Guru
and what it is that you do? - Yeah. So I have had my business for 4 1/2 years, so it'd be my fifth birthday in June. I started when I'd be working
for a spend analytics company classifying spend data for procurement. So what that means is I'll
go through the whole dataset and classify it so that they know exactly
what they're spending
their money on. The spend analytics company
focused on the dashboards and analytics and all the shiny stuff, whereas really I could see the problem behind all of that was
the quality of the data.
So I set up my company and put
myself out there to the world to say, "Look, I'm here to clean
your data and classify it." And a lot of people thought
it was a great idea, but unfortunately,
nobody was looking for me because they didn't know I existed.
I've managed to create a niche for myself, which is quite spectacular,
really, when you think about it, especially nowadays, when
there seems to be an issue for everything. But it really is the, sometimes
the most simple things
are the best option. And this is what it is. I manually clean and classify data, reformat it, remove any duplicates, make it nice and tidy and readable.
And I have a team, and we
do most of that manually. There's no fancy trickery or software or automation that's doing it. It's just good, old-fashioned hard work. - That's so important.
You can't do any of that data analysis, you can't make those dashboards
and those beautiful graphs until you have clean, usable data. - Yeah, and I think there's
been a bit of a myth that you buy some software
and that will fix it all.
And people are starting to realize that it's not gonna fix it all at all. You really do have to put some effort and then there will always
be some level of people, human involvement in data
cleaning, data classification.
- So now that we have
a little bit of an idea of what you do, could you tell us about
like what a typical day or a typical week would look like for you? - So it's definitely, I
think I live a week in a day.
So for example, in one day I could be having client
meetings for projects that I'm running at the moment, I could be having new business calls with people who are
looking to use my services.
I could be doing podcasts. I could be getting requests for blogs, I could be writing blogs. I'm on LinkedIn, an awful lot, engaging with the community and posting.
This week I'm also gonna look
to create some new videos for content for LinkedIn. I have regular meetings with
people that I work with. So I have a social
media lady who helps me. So we have regular meetings.
I have a team meeting this morning and managing the projects
that we're running and also checking all the work and making sure that it's
good to go back to the client, and planning for the future as well.
So it can be quite hectic. - Yeah, I see what you
mean by a week in a day. You sound like you're keeping very busy. - Yeah. - So could you tell us a little bit more
about what kind of data it is
that you collect and clean? - Yeah, so it is generally, 80% of it will be financial data. So it will be a little bit
like a more detailed version of your bank statement.
They'll have the supplier
name or the retailer, whoever you've bought your goods from. And then they'll normally
be an invoice description. So you go to Walmart, you'll only see Walmart
on your bank statement.
What you wouldn't see is
the copy of the receipt and all the individual items. Whereas the financial data I
look at would be like that, it would be all the lines
of each individual thing you've bought and then a value.
And sometimes it's in
different currencies, if it's all around the world. It can be in different languages. And then we'll take that and
we'll start to classify it into groups or facilities
or professional services
or travel or marketing or
maintenance, repair and ops. It depends on the industry, but I have classified most
industries, I would say, and certainly most languages as well, so- - That's really interesting.
- It's been interesting.
Yeah. - So the data you have,
how are you storing it? Is there specific software that you use that you are cleaning the data in? - So as I said, yeah, we do it manually,
but there is an off-the-shelf
tool called Omniscope, which is our data modeling
and classification tool. So it means I can take multiple files, merge them all together into one big file, and then we can get the true
picture of what's happening,
and then classify or even clean the data where it could be reformatting addresses. You might have the whole
address in one sale, but you need to split out
over maybe four columns. So we'll start to work
in Omniscope to do that.
The data itself is saved
in the cloud in the UK. So it never leaves like
UK servers, et cetera, which is quite certainly something that I'm getting asked more about, especially from UK businesses.
Where's your data being stored? Who's looking after it? Who's working on it? And all my team are all
based in the UK as well. - Mm-um.
So once you've collected the data and you've cleaned it and
you've categorized it, what do you do then? What do you actually
give back to the clients? - So I will check, check and check again.
So it gets triple checked
before it goes out the door, and they get a single CSV
Excel file with all the data. And then they can put
it into their systems, they can split it up, do
whatever they want with it. We really do stop at the
point of cleaning it.
We don't do any analysis and
dashboards, anything like that. I really did focus in on the
part that I knew inside out, I knew I was good at. And that's really paid off for me. I think sometimes there's an expectation
you have to do everything, but actually sometimes when you just focus in on that one thing that you're really good at, that pays off. - Yeah, I mean, it sounds like
it is your company's growing.
(Heather laughs) - Yeah, it's unreal. I went home to Scotland at Christmas. I hadn't seen some of my
friends for two years, and when I saw them
last, I had no clients,
I had no money. I thought I might have to get a job. And two years on, my business is growing, I've written a book, I've
done a TEDx talk last year. Everything has changed
in the last two years.
It's been a long slog
over the last 4 1/2 years, but things really started to improve over the last couple of years. - That's excellent. And speaking of your book,
I actually have it here.
- Ah, you do. I still can't quite believe I wrote this. - It's so, so cool. I I've started reading it, and so far so useful.
So like I'm planning on using
the information in here and- - Ah, is amazing. - I highly recommend any of our listeners, check out "Between the Spreadsheets." - Amazon's been a bit
tricky with delivery,
but it is available from the
American Library Association and Bookshop.org. So there are other ways to get it, but that really means a
lot because I just wrote it sharing my knowledge,
but I didn't know if anyone
would benefit from it. And I've had some really
amazing feedback, so. - Yeah, yeah. And there's some great recommendations in the front of the book
by some really great
data scientists too, so. - Yes, thank you.
(Heather chuckles) - And well, the one thing
that I really, really love about this is that your
voice comes through. It's like a fun read.
It's not just like, oh, this
is how you classify data, which is like what you might expect. - Yes. Well, I guess that's also, I guess we didn't talk about that.
So to get attention for
my potential clients, I started making data
fun and more relatable. And one of the things that
I said to the publisher was I don't want to lose my voice. I want it to come through.
And they did torn it down a bit but it's good that it still comes through. I'm really pleased.
- It's definitely does. And we can put the link to
where you can get the book and the description of the episode.
The other question I have about the data and the product that you
give to your clients, do you hear back from them
about what improvements that they were able to make after they got this clean data?
- I have heard that it saved clients, I mean, I know I did
some work for one client and saved two of their
teams' six hours a week- - Wow. - Just by improving some
spreadsheets for them.
I know that another client
has saved tens of millions by getting (indistinct) no,
I'm not saying I did that, but by having the visibility
to see their data, I certainly contributed to that. I know that another client
found out they had way too
many consultants on their books and they needed to do a
rationalization exercise. - So what would you say is
your favorite part of your job. - Giving it back to the client and seeing the difference
it can make to them
is so rewarding because
people just underestimate the power of clean data, and it's always undervalued
and seen as a menial and the jobs that nobody wants to do, but there's so much power behind it.
And when you can show that
to them, it's quite spacial. - Yeah, it is a really
important part of the process. And the Buffalo State Data Science program is actually planning on
producing a data cleaning course in the future.
So maybe coming in 2023, we'll see. So one thing I really wanted to talk about was you are quite active on LinkedIn. - Yeah. - And I'm sure as the
founder of a business,
it's probably especially important for you to grow your personal brand. So can you talk a little bit about how you've utilized
LinkedIn to do this? - Yeah, so as I was
explaining at the start,
nobody knew that I existed
and my services existed. And although people needed it,
they weren't looking for it. So I needed a way to let the people who I needed to speak
to get some information on what I did.
And this started about 3 1/2 years ago. I started just connecting
with procurement people because initially when
I started the business, that's who I was targeting,
purely procurement people. And I'd send them some information
and they'd maybe keep on
file and keep hold of it. And then I realized that I
was spending a lot of time sending individual messages to people. But if I could do a post that could catch more people's attention,
then that's a better way
to spread my message. So I started experimenting
with different posts and different formats
looking at which ones people were responding to and started off really slow.
It was like maybe one or two posts a week. And then it was a couple more. And then it started being a couple of day and suddenly it's like I do
two most days now, religiously. And also wanted, again, to find that torn
that wasn't boring. It wasn't dry, it wasn't formal- - You're definitely not boring. (Heather and heather laughs) - Yeah.
But because of everything I did, suddenly people from the
data world saw my posts, and were like, "Yes, we know this too, we have the same problems." And so then I had a wider audience.
And that's where the book came around, 'cause I was like, people
need a reference guide. People were asking me, "What
resources can I read or go to to clean data?" And there was nothing out there.
Now there is- - Now there is. - Yay.
(Heather laughs) Yeah, I just saw opportunities
and just grabbed them. - So if our listeners
are not already following you on LinkedIn, I highly recommend that you follow Susan. I will put the link to her LinkedIn in the description of the episode. And one of my favorite
things that you post
is the little parodies that you do. Could you just tell us
briefly about those? - Is that the lip sync Sunday? - Yeah. - Yeah.
So basically, again, this
started in locked down, this is probably the last thing I thought I would ever be doing
in a professional website, but I do lip sync Sundays. So I mine to a song every week
and normally I'll try and
relate it to a data topic or a procurement topic
or a business topic. This week it was about women who, sorts who run the worlds, "Run the World (Girls)," Beyonce song.
So it's about women that are
gonna kick ass this year. (Heather laughs) But I also do Friday songs where my copywriter reworks certain songs. So for example, "Single Ladies"
have got a data ladies version. - Yes. - So I've got that as well. And actually, I just want to add, it often feels, whether
you have a business,
whether you're looking for a job, whatever reason you're on LinkedIn, it feels like nobody sees
you, nobody notices you. Sometimes it can take months for people, or years even to get in touch with me
after they've been seeing my content. So it's really important to be consistent and show up all the time consistently. And whether that's once
a week or once a day, keep it up because you
never know who's watching.
Don't expect instant results. It's definitely planting
little seeds all over the place and then watching them
grow at different times. - Excellent advice. So do you have any additional advice
on somebody who's interested
in starting in working in data? - Yeah. Don't be afraid that you don't know how, 'cause I don't know any of it and I'm still working in data.
Focusing, like don't focus in
on the bits you don't know, focus in on the bits
you are really good at and develop those because you will find
someone to partner with who will do the other bits.
You don't have to be good at everything. And find what you enjoy. I mean, if you don't enjoy it, what's the point, because you spend most
of your life working.
So trying some things out, try different areas. Before you get a job,
yeah, try some projects on freelancer websites
like UpWork or Fiverr, just take on a project.
You're not gonna make any money from it, but you'll pick up so much experience. - That's a really great idea. And even if you know that, maybe you're interested in data,
you wanna be a data analyst, you wanna be a data scientist, there's still so many
areas within data science that you could go into. You could do just the cleaning part.
You could do analysis database. You could work in higher education. You could work in marketing, you can work in healthcare, so- - Even like people analytics,
HR, it's everywhere now.
But the other thing is I can't do math. I really can't do math. I failed it so badly. It didn't show up on my certificate. It doesn't mean that you can't do data.
You can still do data
without knowing too much math or statistics or all that. It's not, like data is so broad now. - I'm sure that's comforting to know to some of our listeners.
- Yeah, like I avoid numbers all costs. (Heather chuckles) For me, it's all about
the text and the patterns that you see in the data. So words, word searches, things like that.
- Mm-um. Yeah, and you may learn
new areas that you like that you didn't know about. - Yeah, exactly. - So before we let you go,
is there anything else that
you'd like our listeners to know that we didn't get a
chance to cover today? - Actually, well, I guess
it's a data related point, but I created a kind of, not an analogy, but a methodology called coat.
So make sure your data has
it's coat on, like a jacket, so that it's consistent, it's organized, it's accurate and it's trustworthy. And actually you could
apply that to job hunting, consistently check every day,
be organized about who you're targeting and who you've applied to. Be accurate in your CV, your resume, what information you're telling, and then trust the process that,
keep going and something will happen. - Definitely. Great advice, I love it so much. Susan, thank you so much
for joining us today. - Thank you for having me.
- And to all of our listeners,
if you haven't already, check out our previous podcasts. They're available wherever
you listen to podcasts. For more information about starting your career
as a data scientist,
go to dataanalytics.buffalostate.edu, and don't forget to subscribe so that you get a notification each time we release a new episode of "Buffalo State Data Talk."
(upbeat music)
Some content on this page is saved in PDF format. To view these files, download Adobe Acrobat Reader free. If you are having trouble reading a document, request an accessible copy of the PDF or Word Document.