Posts

(PyCon 2014 Video) How To Get Started with Machine Learning – Melanie Warrick’s PyCon 2014 Talk

Save $500 on the machine learning course for developers (May 10-11, 2014 at Hackbright Academy). Use promo code “HB-BLOG” to save $500 when you register for the Hackbright machine learning course here.

Melanie Warrick, data scientist and software engineer for hire, Zipfian and Hackbright graduate, gives a talk at the annual PyCon 2014 conference.

Watch her full PyCon 2014 talk on machine learning:

Arthur Samuel defined machine learning as a “field of study that gives computers the ability to learn without being explicitly programmed”. Machine learning is about applying algorithm(s) in a program to solve the problem you are faced with and address the type of data that you have. You create a model that will help conduct pattern matching and/or predict results. Then you evaluate the model and iterate on it as needed to create the right type of solution for the problem.

Examples of machine learning (ML) in the real world include handwritten analysis, which uses neural nets to read millions of mail regularly to sort and classify all the different variations in written addresses. Weather prediction, fraud detection, search, facial recognition, and so forth are all examples of machine learning in the wild.

Algorithms

There are several types of ML algorithms to choose from and apply to a problem – some are listed below and are broken into categories to give an approach on how to think about applying them. When choosing an algorithm, it’s important to think about the goal/problem, the type of data available and the time and effort that you have to work on the solution.

Machine Learning Algorithms

A couple starting points to consider are – whether the data is unsupervised or supervised. Supervised is whether you have actual data that represent the results you are targeting in order to train the model. Spam filters are built on actual data that have been labeled as spam while unsupervised data doesn’t have a clear picture of the result. For unsupervised learning, there will be questions about the data and you can run algorithms on it to see if patterns emerge that help tell a story. Unsupervised is a challenging type of approach and typically there isn’t necessarily a “right” answer for the solution.

In addition, if the data is continuous (e.g. height, weight) or categorical/discrete (e.g. male/female, Canadian/American) that helps determine the type of algorithm to apply. Basically its about whether the data has a set amount of units that can be defined or if the variations in the data are nearly infinite. These are some ways to evaluate what you have to help identify an approach to solve the problem.

Note, the algorithms categorization has been simplified a bit to help provide context, but some of the algorithms do cross the above boundaries (i.e. linear regression).

Models

Once you have the data and an algorithmic approach, you can work on building a model. A model can be something as simple as an equation for a line (y=mx+b) or as complex as a neural net with many layers and nodes.

Linear Regression is a machine learning algorithm and a simple one to start with, where you find the best fit line to represent observed data. In the talk, I showed two different examples of having observed data that exhibited some type of linear trend. There was a lot of noise (data was scattered around the graph), but there was enough of a trend to demo linear regression.

When building a model with linear regression, you want to find the most optimal slope (m) and intercept (b) based on the actual data. See, algebra is actually applicable in the real world. This is a simpl algorithm to calculate the model yourself, but it’s better to leverage tools like scikit-learn’s library to help you more efficiently calculate the best fit line. What you are calculating is a line that minimizes the distance between all the observed data points.

After generating a model, evaluate the performance and iterate to improve the model as needed if it is not performing as expected. I have explained linear regression here.

Prediction

When we have a good model, you can take in new data and output predictions. Those predictions can feed into some type of data product or generate results for a report or visualization.

In my presentation, I used actual head size and brain weight data to build a model that predicts brain weight based on head size. Since the data was fairly small, this decreases the predictive power and increases the potential for error in the model. I went with this data since it was a demo, and I wanted to keep it simple. When graphed, the observed data was spread out which also indicated error and a lot of variance in the data. So it predicts weight with a good amount of variance in the model.

With the linear model I built, I was able to apply it so that I could feed it a head size (x) and it would calculate the predicted brain weight (y). Other models are more complex regarding the underlying math and application. To see the full code solution, checkout the github repository as noted above. The script is written a little differently from the slides because I created functions for each of the major steps. Also, there is an iPython notebook that shows some of the drafts I worked through to build out the code for the presentation

Tools

The Python stack is becoming popular for scientific computing because of the well-supported toolsets. Below is a list of key tools to start learning if you want to work with ML. There are many other Python libraries out there for more nuanced needs in the space as well as other stack packages to explore (R, Java, Julia).

If you are trying to figure out where to start, here are my recommendation:

• Scikit-Learn = machine learning algorithms
• Pandas = dataframe tool
• NumPy = matrix manipulation tool
• SciPy = stats models
• Matplotlib = visualization

Skills

In order to work with ML algorithms and problems, it’s important to build out your skill set regarding the following:

• Algorithms
• Statistics (probability, inferential, descriptive)
• Linear Algebra (vectors & matrices)
• Data Analysis (intuition)
• SQL, Python, R, Java, Scala (programming)
• Databases & APIs (get data)

Resources

Below is a beginning list of resources to get you started.

I highly recommend Andrew Ng’s class and a couple of links are to sites with more recommendations on what to check out next:

• Andrew Ng’s Machine Learning on Coursera
Khan Academy courses in linear algebra and stats
“Think Stats” by Allen Downey
• Zipfian’s “A Practical Intro to Data Science”
Metacademy
Open Source Data Science Masters
Stack Overflow, Data Tau, Kaggle
• Mentors

One point to note from this list and I stressed this in the talk – seek out mentors! They are out there and willing to help. You have to put it out there what you want to learn and then be aware when someone offers to help. Also, follow up. Don’t stalk the person but reach out to see if they will make a plan to meet you. They may only have an hour or they may give you more time than you expect. Just ask and if you don’t get a good response or have a hard time understanding what they share, don’t stop there. Keep seeking out mentors. They are an invaluable resource to get you much farther faster.

Last Point to Note

ML is not the solution for everything and many times can be overkill. You have to look at the problem you are working on to determine what makes the most sense in regards to your solution and how much data you have available.

I highly recommend looking for the simple solution first before reaching for something more complex and time-consuming. Sometimes regex is the right answer and there is nothing wrong with that. As mentioned to figure out an approach, it’s good to understand the problem, the data, the amount of data you have and timing to turn the solution around.

Good luck in your ML pursuit.

References

These are the main references I used in putting together my talk and post:

• Zipfian
• Framed.io
• “Analyzing the Analyzers” – Harlan Harris, Sean Murphy, Marck Vaisman
• “Doing Data Science” – Rachel Schutt & Cathy O’Neil
• “Collective Intelligence” – Toby Segaran
• “Some Useful Machine Learning Libraries” (blog)
• University GPA Linear Regression Example
• Scikit-Learn (esp. linear regression)
• Mozy Blog
• StackOverflow
• Wiki

This post was originally posted at Melanie Warrick’s blog.

Save $500 on the machine learning course for developers (May 10-11, 2014 at Hackbright Academy). Use promo code “HB-BLOG” to save $500 when you register for the Hackbright machine learning course here.

Leaprobo Project – AKA Gizgig Goes For A Walk – From Silicon Chef: Hardware Hackathon For Women

80% of the 150 attendees at Silicon Chef 2013 were WOMEN – and for many of the attendees, it was their first time coding, hardware hacking, or both. The hackathon goals were to break down the mental and physical barriers to entry for hardware-curious beginners. Simply put, we provided the silicon; they provided the enthusiasm and laptops.By Melanie Warrick (Hackbright Academy – spring 2013 class)

Last weekend at Silicon Chef, the women-centric hardware hackathon, there were over 100 participants and 20 projects demoed. My team built an Arduino on Parallax wheels controlled by a Leap Motion (3D programmable motion sensor).

Check out the video to see our robot car:

Leaprobo Project Idea

I bought the Parallax kit several months ago with the intent of attaching it to my Arduino because I think everything is better with wheels (almost everything). However, I was too busy with so many other projects and things to study that I didn’t get around to assembling it until now. When the hackathon came up, it seemed like the perfect motivator to break out the wheel kit. Plus, my team had heard about Leap Motion and controlling things with a sensor seemed like a fun. Thus, the idea for the Leaprobo project came together.

It seemed simple enough and yet not so much considering how new most of us were to hardware. Lucky for us, one of our team members had several years experience with robotics, and we had some great mentorship on system design and building.

Leaprobo Hardware Parts:

• Arduino Uno
• Leap Motion
• Parallax Robitics Shield Kit (for Arduino)
• USB cord
• HC-05 Bluetooth Chip (not finished)
• Accelerometer (not finished)
• Mac / Unix Terminal


Leap Motion Script

We created a Python script using the Leap’s library to read hand motions and then print out letters to the command line.

• Forward = Flat palm with all fingers kept together or one finger = print “F”
• Backward = 5 fingers spread open = print “B”
• Stop = Closed fist = print “S”
• Left = hand tilted left = print “L”
• Right = hand tilted right = print “R”


You can see the full script for the project at the following link:

Leap script = car.py

One tricky bit to note is the Leap Motion script required using the standard Python language that comes with Mac (which can sometimes be an older version). It threw errors if we tried to run it off the Homebrew installed version. So we created a virtual environment that explicitly pointed to the Mac Python version. Below is the command that made it work on our computers but other computers may be different:

$ virtualenv -p /usr/bin/python [envname]

Arduino

We built a script in the standard Arduino version of C to loop/continually read the serial port for input. A switch statement defined each case that corresponded to the outputted letters from the Python code. So if “F” is printed to the serial port, the corresponding case would call a forward function that told the servos to rotate (left counterclockwise at 1400 microseconds and right clockwise at 1600 microseconds). We had a case and corresponding function that gave servo direction based on each movement defined in the Leap code.

Note, we had to program opposite rotation for the servos for forward and backward because they are mounted in the Arduino in opposite directions. To see more about the script for the Arduino, checkout the link below.

Arduino script = car_test.ino

Parallax Wheel Kit

Parallax provides comprehensive online directions regarding how to put the wheel kit together. The first chapter gives a great overview of Arduino’s script language and the chapters covering how to build the shield is really good for someone new to hardware.

We actually assembled the wheel kit the weekend before so we had practice with it. We were ready to disassemble and reassemble if necessary, but the hackathon was pretty low-key and focused on us having fun and learning. Thankfully that was the case because we had enough roadblocks to keep us busy the whole weekend.

Serial Port – Pulling it Together

To pull the pieces together, we connected the Leap and the Arduino to one computer with USB cables. We ran the Leap program and redirected the output to store on the USB serial port file connected to the Arduino. The command that we used in the Mac terminal to make this happen is the following:

$ python car.py > /dev/tty.usbmodem1411

Note, your serial port name is probably different.

How it worked is that we ran the car.py program. While Leap Motion read the hand gestures, the program printed letters that were then written to the USB serial port file that was connected to the Arduino. The Arduino’s code continually read the serial port file for new information. When a letter showed up, the Arduino script would match it to a case and based on the called function, send directions to the servos on the Parallax shield; thus, making the car move.

Biggest Challenge – HC-05 Bluetooth

We tried to set up a Bluetooth connection with the Arduino vs. the USB cable that kept the car tethered to the Mac. Unfortunately the HC-05 chip we were using refused to read inputted data. It would pair with the computer and send output, but would not take input. We checked and rechecked many times the way the wires were setup. We tried a couple different HC-05 chips and different computers, but it just was not working for us. We fell back on the USB connection and thankfully we had one that was long enough. If anyone out there has heard of this issue and has ideas on how to fix it, please share.

Team

Our team did such an amazing job figuring things out and making the product come together. Several of us were new to programming and some had never met before. There was a lot of collaboration as well as capability to work independently that helped get the project done.

Meggie worked hard on figuring out how to implement the accelerometer, and it was ready to add to the car. Unfortunately we ran out of time because we were down to the wire making the main Leap to Arduino connection work.

Kara and Chris dived deep into understanding Arduino C and with Rita’s help Kara was able to build out the code that ran the car. Kara did such a fantastic coding job with only a month of programming, and she rocked the demo, driving the car like a pro after only 30 minutes practice.

Kelley pulled together the Python code for the Leap and had that ready for us to run once we started working on the serial port connection. I had already put together the wheel kit, was helping to give team direction and spent a good deal of time working on Bluetooth as well as troubleshooting pulling the pieces together.

Still we wouldn’t have been successful without Rita. She helped us to really defined and led our system design, coached different members on how to build the code in C and figured out the serial port connection to finally make the car work. We also had some great mentorship help from Mark, Magee and Jeremy.

It was a great team. We had a lot of fun and learned a ton. The best part was having it actually work by demo time.

Next up, I need to figure out how to make it fly, talk and do facial recognition. Simple, right?

This post was originally posted at Melanie Warrick’s blog.

Hackbrights Present Tech Talks at SheCodes Conference 2013

Hackbright instructor and Director of Operations Liz Howard gave a developer identity talk at SheCodes Conference last Friday at the Computer History Museum in Mountain View, California.

In addition to talking about identity as a woman software engineer, she appealed to the audience of women developers to give more tech talks – on topics like k-means clustering, busting CAP theorum at scale, machine learning algorithms, natural language processing, language design, how to build an operating system…

The three Hackbright graduates following demoed their projects.

First, Megan Speir came onstage to talk about machine learning and her app Morgan. She talked about natural language processing for Twitter sentiment analysis, and she showed code onscreen. Megan described how it was manipulated and parsed into an infographic, which she showed the audience.

Megan has also built a guest book photo booth as her Hackbright project, pairing a Raspberry Pi built into a physical “mini booth” with a Flask webapp.

Next, Hackbright grad Melanie Warrick talked about her full stack Hackbright project Sun Finder and the different languages and APIs that she used to deploy her app.

Onstage, Melanie presented a slide that displayed project structure and resources, such as the MVC (model, view, controller), various APIs and programming languages used. Then she ran her app for a live demo – and it worked! Sun Finder predicted that the Mission District in San Francisco would be partly cloudy and 58 degrees Fahrenheit.

Last but not least, Hackbright grad Michelle Glauser talks about her project BookFairy. She discussed her unique problem as a book nerd, and built her own solution at Hackbright – she learned Django, processed uploaded files, created URLs to request, scraped library sites with PyQuery, pulled in book ratings, checked book info and search results correlation using fuzzywuzzy and designed frontend using Python, Django and PyQuery.

The Hackbright grads stood onstage and answered questions from the audience. Thanks to everyone who attended SheCodes and shared encouragement about the tech talks – we hope to do more of them!