I was one of the first employees in Google’s developer relations team, joining in the fall of 2006. Did you know that when Google started doing developer relations, the group was called API Support and under TechOps which was in the Online Sales and Operations (OSO) org with @sherylsandberg as the org leader? What led to its prominence within Google?
DevOps. No, not that DevOps
The name changed, of course, to Developer Operations (DevOps, which means a very different thing today!) and we had very capable leadership under Mike Winton. But I think a singular decision had the greatest effect: the move into the Engineering org. Why?
Google has an engineering-driven culture
Many of you have heard me argue that DevRel is marketing, so why was DevRel being in the engineering org at Google so important? It’s contextual. Google is an engineering-led culture. To influence eng teams to effect change on behalf of the developer communities requires their respect.
Respect comes with understanding and helping with their challenges, but also knowing their language, systems, code, tooling and operations. Access to these at Google required being in the engineering org. Also, it greatly helped that a Eng VP had to sign off on the quality of all hires.
I’d argue that DevRel can be super successful in a marketing group, but it requires the right culture and leadership that truly gets the role it plays in product and community success. Without that, a home in engineering is much better for all.
How about having DevRel in a sales org?
No, DevRel never belongs in sales. DevRel is a long game, and sales is driven by short-term lead/opp/closing targets — all sales leaders will be tempted to reach these targets by involving some of their most talented technologists in DevRel, and that comes at the expense of our real long-term goals.
Recently a tweet of mine was revealed to have been included in a Twilio board of directors presentation from 2011. The tweet was about the simplicity of the developer experience for both Twilio and Google App Engine. What’s this have to do with Lakehouses? Everything.
My entire career has been about enabling simplified experiences for technologists so they can focus on what matters — the key differentiators of their business or application.
Google App Engine, though released before its time, made it a lot easier to launch and maintain production-ready web applications. Google Apps Marketplace made it easier to market B2B apps to a large audience. Neo4j and the property graph data model makes it easier to understand and query the relationships between data.
The same is true with Databricks and the Lakehouse architecture.
Nearly all large enterprises today have two-tier data architectures — combining a data lake and a data warehouse to power their data science, machine learning, data analytics, and business intelligence (BI).
Data Lake, storing all the fresh enterprise data, populated directly from key business applications and sources. By using popular open data formats, the data lake is great for compatibility with popular distributed computing frameworks and data science tools. However, a traditional data lake is typically slow to query plus lacks schema validation, transactions, and other features needed to ensure data integrity. Data Warehouse, with a subset of the enterprise data ETLd from the data lake, stores mission critical data “cleanly” in proprietary data formats. It’s fast to query this subset, but the underlying data is often stale due to the complex ETL processes used to move the data from the business apps -> lake -> warehouse. The proprietary data formats also make it difficult to work with the data in other systems and keep users locked in to their warehouse engine.
Simplicity is king
Why have a two-tier data architecture when a single tier will satisfy the performance and data integrity requirements, improve data freshness and reduce cost?
It simply wasn’t possible before the advent of data technologies like Delta Lake, enabling highly-performant access to data stored in open data formats (like Parquet) with the data integrity constraints and ACID transactions only previously possible in data warehouses.
I joined Databricks in October 2019. My first day was also the first day of the Spark + AI Summit in Amsterdam – a heck of an exciting introduction to a new team, new company and new community.
Why Databricks? I was very happy at Neo4j and certainly loved working with the Neo4j community. Databricks brought with it an exciting opportunity though – build a talented team at one of the fastest growing cloud data startups in history. I also have the opportunity to work directly with the founders and senior leadership at the company who understand the value of developer relations and the importance of building a great community of data scientists, data engineers and data analysts.
The team is now 6 located in San Francisco CA, Seattle WA, Boulder CO, Sante Fe NM, and Blacksburg VA. We have Developer Advocates and Program Managers all working together to grow awareness and adoption of Databricks and the open source projects which we support.
First-Year Team Accomplishments
Here’s some of our accomplishments from the first-year, working along with amazing collaborators across the company and community.
Launched the Databricks University Alliance, a community of professors at some of the world’s top universities sharing best practices and using Databricks to help teach data science, data engineering, data analytics and more.
Built and executed (along with the broader company) two of the largest virtual events for the Data + AI Community, with the June Spark + AI Summit and the fall Data + AI Summit Europe.
We have a really excited 2021 planned as the team continues many of the initiatives above and takes on new challenges. We’ll be focused on making it easier to learn data science, data engineering and data analytics, as well as making it simple to apply these learnings using Databricks. An important part of this mission will be growing and strengthening the community so we can all learn from each other.
Are you a data geek and want to join the adventure? We have data engineering/analytics advocate roles and developer (online) experience advocate roles open in the US as well as a regional advocate role in Europe. Reach out to me at (firstname).(lastname)@databricks.com if you want to learn more!
One of the most common questions we get at Neo4j is how to move from a SQL database to a Graph Database like Neo4j. The previous solution for accomplishing this was to export the SQL tables into CSV files and then importing the CSV files with neo4j-import or LOAD CSV. There’s a much better way: JDBC!
Neo4j JDBC Support
There are two distinct ways you can use JDBC within Neo4j:
Access Neo4j Data via JDBC. Do you have existing code that accesses your SQL database using JDBC, and you want to move that code to access Neo4j instead? Neo4j has a JDBC Driver. Just update your code to use the awesome power of the Cypher query language instead of SQL, and switch over the JDBC driver you’re using, and you’re off to the races!
Import SQL Databases into Neo4j. Do you have data in your SQL database that you want to move into a Graph? The APOC library for Neo4j has a set of procedures in apoc.load.jdbc to make this simple. This blog post will cover this use case.
Loading Sample Northwind SQL tables into MySQL
In order to run the code snippets in the following sections, you’ll need to have the Northwind SQL tables in a MySQL database accessible from your Neo4j server. I’ve published a GitHub Gist of the SQL script which you can execute in MySQL Workbench or using the command-line client.
In order to run this, I created a blank MySQL database in Docker:
docker run-P-eMYSQL_ROOT_PASSWORD=my-secret-pw-eMYSQL_DATABASE=northwind-eMYSQL_USER=northwind-eMYSQL_PASSWORD=my-secret-pw mysql
Loading data from RDBMS into Neo4j using JDBC
With the APOC JDBC support, you can load data from any type of database which supports JDBC. In this post, we’ll talk about moving data from a MySQL database to Neo4j, but you can apply this concept to any other type of database: PostgreSQL, Oracle, Hive, etc. You can use it for other NoSQL databases too, but APOC has direct support for MongoDB, Couchbase and more.
1. Install APOC and JDBC Driver into Neo4j plugins directory
Note: This step is not necessary if you’re using the Neo4j Sandbox and MySQL or PostgreSQL. Each Sandbox comes with APOC and the JDBC drivers for these database systems.
All JAR files placed in the Neo4j plugins directory are made available for use by Neo4j. We need to copy the APOC library and JDBC drivers into this directory.
First, download APOC. Be sure to grab the download that is for your version of Neo4j.
RETURNp2.ProductName,count(*)ASweight ORDER BY weight DESC LIMIT10;
If this was your first experience with Neo4j, you probably want to learn more about Neo4j’s Cypher query language. Neo4j has some great (free) online training you can take to learn more. You can also use the Cypher Refcard to power your journey to becoming a Graphista.
Bernie is sick and tired of hearing about Hillary’s e-mails and so am I. So, why am I writing about them? Well, they can possibly provide an interesting insight into how our government works (or doesn’t work) — if only they were in a better format than PDFs!! They represent a perfect graph!
Knowing the e-mails and senders+receivers is interesting, but I wanted to see what the e-mails are about! While the subject lines are included with the e-mails, they’re often opaque, like the meaningful subject “HEY” used in an e-mail from Jake Sullivan to Hillary Clinton. Natural language processing to the rescue!
I built a small Python script and used Py2neo to query all e-mails without attached topics. I then go through each e-mail and send the raw body text and subject to the Prismatic Topics API. The API returns a set of topics, which I then use to create REFERENCES relationships between the e-mails and topics. This code is based on the excellent post on the topic by Mark Needham.
Now I can explore e-mails by topic, like the graph below where I see e-mails related to David Cameron. When I double-clicked on the e-mail with subject ‘GUARDIAN’ in the Neo4j Browser, I can see all the other topics that e-mail references, including Sin Fein, Northern Ireland, Ireland, and Peace.
With this additional topic information, I can start to understand more context around Hillary’s e-mails.